Guillaume Gravier Home Page

Research topics
Projects
Ph. D. students
Software
Publications
Short bio
Contact

Research topics

From a general point of view, my research activities focus on the analysis of multimedia documents with the constant preoccupation of proposing (stochastic) models to combine all the sources of knowledge available. This general philosophy translates in two main areas:

Spoken document analysis: detecting and tracking audio events in videos; speaker segmentation and tracking; speech recognition; topic segmentation; spoken document indexing. I am currently interested in the following topics:

combining ASR and NLP for robust spoken document analysis
integrating knowledge (e.g. phonetic landmarks) in HMM-based ASR
motif/word discovery in audio streams

Multimedia stream modeling: joint models of multimedia streams for video analysis. The aim of this research is to devise models that can integrate the audio, visual and eventually textual information and represent their relations (temporal synchronisation model, correlations, etc.) for the analysis and structuring of videos and for audiovisual ASR. Current activities include:

learning the dependencies in Bayesian networks for event detection in videos
multimodal topic segmentation
speech-driven structuring of TV streams

Recent participation in projects (contribution to the project)

I am currently involved in the following projects

OpenSEM: an EIT ICT Labs open portal for semantic access to videos (video and spoken content analysis, navigation portal, program comittee of MediaEval 2011)
Rev-TV: using virtual reality for television program edition (speech recognition, lip sync)
Attelage de Systèmes Hétérogènes (ASH): harnessing heteregeneous speech recognition systems for collaborative speech recognition (speech recognition, knowledge integration)
Évaluations en Traitement Automatique de la Parole (ETAPE): evaluation campaign on TV stream transcription for the French language (on behalf of the AFCP)
Quaero: multimodal search engines (audio event detection, multimodal integration, video structure analysis)

Over the last few years, I have participated to the following projects

Rapsodis: improving speech recognition with syntax and semantics
Demi-Ton: multimedia stream structuring (multimedia integration, video structuring, speech transcription, transcribed text analysis)
Pelops: Soccer video analysis and repurposing (sound class detection, word spotting)
ESTER: French spoken document rich transcription evaluation campaign (campaign organization; BN rich transcription system development)
Domus Videum: video abstracting and navigation (sound class detection, multimedia integration)

Participation in the activities of the MUSCLE European Network of Excellence.

Ph. D. students

Ongoing Ph. D. thesis I am supervising:

Ludivine Kuznik. Browsing news archives (in collaboration with INA - funding pending)
Cédric Penet. Multimodal content based analysis for video on demand (in collaboration with Technicolor)
Stefan Ziegler. Landmark driven speech recognition
Julien Fayolle. Information retrieval in TV streams
Camille Guinaudeau. Speeh-based video structuring

Past Ph. D. students:

Armando Muscariello. Variability tolerant discovery of arbitrary repeating patterns in audio data with template matching. Ph. D. thesis, Université de Rennes 1, January 2011.
Gwénolé Lecorvé. Unsupervised topic adaptation for robust speech recognition. Ph. D. thesis, Université de Rennes 1, November 2010 (in French).
Siwar Baghdadi. Sparse events detection in videos with Bayesian networks. Ph. D. thesis, Université de Rennes 1, February 2010 (in French).
Wen Xuan Teng. Rapid speaker adaptation using a variable subspace of reference models. Ph. D. thesis, Université de Rennes 1, December 2008.
Stephane Huet.Morpho-syntactic knowledge and topic adaptation to improve speech recognition. Ph. D. thesis, Université de Rennes 1, December 2007 (in French)
Manolis Delakis. Multimodal tennis video structure analysis with segment models. Ph. D. thesis, Université de Rennes 1, October 2006.
Ewa Kijak. Multimodal sport video structuring with stochastic models. Ph. D. thesis, Université de Rennes 1, 2003 (in French).

More Ph. D. in which I have been or I am involved in (but not supervising in any way):

Romain Tavenard. Indexation de séquences de descripteurs pour exploiter audio et vidéo.
Xavier Naturel. Automatic structuring of TV streams. Ph. D. thesis, Université de Rennes 1, 2007 (in French).
Mathieu Ben. Robust approaches for automatic speaker verification using normalization and hierarchical adaptation. Ph. D. thesis, Université de Rennes 1, 2004 (in French).

Software development

I am actively participating in the development of the following free software toolkits:

SPro, a speech signal processing toolkit
AudioSeg, generic tools for audio segmentation
Sirocco, a large vocabulary decoder for speech recognition

These toolkits are the base (with a little help from HTK) of the IRENE broadcast news indexing platform , orginally developped for the French Ester rich transcription evaluation campaign in collaboration with François Yvon. Also check out my free ESTER resources page.

In the framework of the ASR/NLP work group I am coanimating, we have developed several pieces of code related to spoken document analysis. Among others, worth mentioning are:

IRISA News Topic Segmenter: wrapper to topic-segmenter for the segmentation of broadcast news
kiwi: keyword extraction from transcripts
fishnet: fish texts on the Internet related to a topic characterized by a few keywords (as given by kiwi)
match-maker: corpus based acquisition of semantic relations (and a bunch of relations from a large newspaper corpus)

These toolkits are not open-source freely distributed softwares but we are nevertheless willing to share. Feel free to contact me should you be interested in any of those.

Selected recent publications

Gwénolé Lecorvé, Guillaume Gravier, and Pascale Sébillot. Automatically finding semantically consistent N-grams to add new words in LVCSR systems. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2011.
Camille Guinaudeau, Guillaume Gravier, and Pascale Sébillot. Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations. In Proc. Annual Conf. of the Intl. Speech Communication Association (Interspeech), 2010.
Stéphane Huet, Guillaume Gravier, and Pascale Sébillot. Improvement of automatic speech recognitionsystems with morpho-syntax applied to French. Computer Speech and Language, (24):663-684, 2010.
Armando Muscariello, Guillaume Gravier, and Frédéric Bimbot. Audio keyword extraction by unsupervised word discovery. In Conf. of the Intl. Speech Communication Association (Interspeech), pages 2843-2846, 2009.
Sylvain Galliano, Guillaume Gravier, and Laura Chaubard. The ESTER 2 evaluation campaign for the rich transcription of French radio broadcasts. In Proc. Annual Intl. Speech Communication Association Conference (Interspeech), pages 2583-2586, 2009.
Manolis Delakis, Guillaume Gravier, and Patrick Gros. Audiovisual Integration with Segment Models for Tennis Video Parsing. Computer Vision and Image Understanding, 111(2):142-154, August 2008.
Siwar Baghdadi, Guillaume Gravier, Claire-Hélène Demarty, and Patrick Gros. Structure learning in Bayesian network based video indexing. In IEEE Intl. Conf. on Multimedia and Exhibition, pages 667-680, 2008.
Wen Xuan Teng, Guillaume Gravier, Frédéric Bimbot, and Frédéric Soufflet. Speaker adaptation by variable reference model subspace and application to large vocabulary speech recognition. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, pages 4381-4384, April 2009.
Gwénolé Lecorvé, Guillaume Gravier, and Pascale Sébillot. An unsupervied Web-based topic language model adaptation method. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing, pages 5081-5084, April 2008..

Check out my complete list of publications.

Short bio

I obtained a master degree in Applied Mathematics at the Institut National des Sciences Appliquees (INSA Rouen) in 1995 and worked on speech synthesis at ELAN Informatique from 1996 to 1997. I received a Ph. D. in Signal and Image Processing (Toward speech modeling with Markov random fields) at the Ecole National Superieure des Telecommunications (ENST Paris) in 2000. After a one year post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology group at IBM T. J. Watson research center from 2001 to 2002. Since 2002, I am a research fellow at the Centre National pour la Recherche Scientifique (CNRS), working at the Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA). I received the Habilitation à Diriger des Recherches (HDR) de l'Université de Rennes 1, spécialité Informatique, in 2009.

Guillaume Gravier, Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel : +33 2 99 84 72 39 / Fax : +33 2 99 84 71 71
firstname.secondname@irisa.fr