From a general point of view, my research activities focus on
analysis of multimedia documents with the constant preoccupation
of proposing (stochastic) models to combine all the sources of
knowledge available. This
general philosophy translates in
two main areas:
- Spoken document
analysis: detecting and tracking audio events in videos; speaker
segmentation and tracking;
speech recognition; topic segmentation; spoken document indexing. I am
interested in the following topics:
modeling: joint models of multimedia streams for video analysis.
The aim of this research is to devise models that can integrate the
audio, visual and eventually textual information and represent their
synchronisation model, correlations, etc.) for the analysis and
structuring of videos and for audiovisual ASR. Current
- combining ASR and NLP for robust spoken document
- integrating knowledge (e.g.
phonetic landmarks) in
- motif/word discovery in audio streams
- learning the dependencies in Bayesian networks for
event detection in videos
- multimodal topic segmentation
- speech-driven structuring of TV streams
Recent participation in
projects (contribution to the project)
I am currently involved in the following projects
Over the last few years, I have participated to the following projects
- OpenSEM: an EIT ICT Labs
open portal for semantic access to videos (video
and spoken content analysis, navigation portal, program comittee of MediaEval 2011)
- Rev-TV: using virtual reality for television program edition (speech
recognition, lip sync)
- Attelage de Systèmes
Hétérogènes (ASH): harnessing heteregeneous speech
recognition systems for collaborative speech recognition (speech
recognition, knowledge integration)
- Évaluations en Traitement Automatique de la
Parole (ETAPE): evaluation campaign on TV stream transcription for the
French language (on behalf of the AFCP)
multimodal search engines (audio event
detection, multimodal integration, video structure
Participation in the activities of the MUSCLE European Network of
- Rapsodis: improving speech recognition with syntax
multimedia stream structuring (multimedia
integration, video structuring, speech transcription, transcribed text
- Pelops: Soccer video analysis and repurposing (sound
class detection, word spotting)
French spoken document rich transcription evaluation campaign (campaign
organization; BN rich transcription system development)
- Domus Videum: video abstracting and navigation (sound
class detection, multimedia integration)
Ph. D. students
Ongoing Ph. D. thesis I am supervising:
- Ludivine Kuznik. Browsing news archives (in collaboration with
INA - funding pending)
- Cédric Penet. Multimodal content based analysis for video
on demand (in collaboration with Technicolor)
- Stefan Ziegler. Landmark driven speech recognition
- Julien Fayolle. Information retrieval in TV streams
- Camille Guinaudeau. Speeh-based video structuring
Past Ph. D. students:
More Ph. D. in which I have been or I am involved in (but not
supervising in any way):
- Armando Muscariello. Variability tolerant discovery of arbitrary
repeating patterns in audio data with template matching. Ph. D. thesis,
Université de Rennes 1, January 2011.
- Gwénolé Lecorvé. Unsupervised
topic adaptation for robust speech recognition. Ph. D. thesis,
Université de Rennes 1, November 2010 (in French).
- Siwar Baghdadi. Sparse events detection in videos
with Bayesian networks. Ph. D. thesis, Université de Rennes 1,
February 2010 (in French).
- Wen Xuan Teng. Rapid speaker
adaptation using a variable subspace of reference models. Ph.
D. thesis, Université de Rennes 1, December 2008.
- Stephane Huet.Morpho-syntactic knowledge and topic
adaptation to improve speech recognition. Ph. D. thesis, Université
de Rennes 1, December 2007 (in French)
- Manolis Delakis. Multimodal
tennis video structure analysis with segment models. Ph. D.
thesis, Université de Rennes 1, October 2006.
- Ewa Kijak. Multimodal
sport video structuring with stochastic models. Ph. D.
Université de Rennes 1, 2003 (in French).
- Romain Tavenard. Indexation de séquences de descripteurs
pour exploiter audio et vidéo.
- Xavier Naturel. Automatic structuring of TV streams.
Ph. D. thesis, Université de Rennes 1, 2007 (in French).
- Mathieu Ben. Robust approaches for automatic speaker verification
using normalization and hierarchical adaptation. Ph. D. thesis,
Université de Rennes 1, 2004 (in French).
I am actively participating in the development of the following free
a speech signal processing toolkit
generic tools for audio segmentation
a large vocabulary decoder for speech recognition
These toolkits are the base (with a little help from HTK) of the
IRENE broadcast news indexing platform ,
orginally developped for the French
Ester rich transcription evaluation campaign in collaboration with François Yvon. Also
check out my free ESTER
In the framework of the ASR/NLP work group I am coanimating, we have
pieces of code related to spoken document analysis. Among others, worth
- IRISA News Topic Segmenter: wrapper to
topic-segmenter for the segmentation of broadcast news
- kiwi: keyword extraction from transcripts
- fishnet: fish texts on the Internet related to a
topic characterized by a few keywords (as given by kiwi)
- match-maker: corpus based acquisition of semantic
relations (and a bunch of relations from a large newspaper corpus)
These toolkits are not open-source freely distributed softwares but
we are nevertheless willing to share. Feel free to contact
should you be interested in any of those.
Selected recent publications
Check out my complete
list of publications.
- Gwénolé Lecorvé,
Guillaume Gravier, and Pascale Sébillot. Automatically finding
semantically consistent N-grams to add new words in LVCSR systems. In
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
- Camille Guinaudeau,
Guillaume Gravier, and Pascale Sébillot. Improving
ASR-based topic segmentation of TV programs with confidence measures
and semantic relations.
In Proc. Annual Conf. of the Intl. Speech Communication Association
Gravier, and Pascale Sébillot. Improvement of automatic
speech recognitionsystems with morpho-syntax applied to French. Computer
Speech and Language, (24):663-684,
Gravier, and Frédéric Bimbot. Audio
extraction by unsupervised word discovery. In Conf.
of the Intl. Speech Communication Association (Interspeech), pages
- Sylvain Galliano,
Guillaume Gravier, and Laura Chaubard. The ESTER
2 evaluation campaign for the rich transcription of French radio
In Proc. Annual Intl. Speech Communication Association
Gravier, and Patrick Gros. Audiovisual Integration with
Segment Models for Tennis Video Parsing. Computer Vision
and Image Understanding, 111(2):142-154, August 2008.
Gravier, Claire-Hélène Demarty, and Patrick Gros. Structure
Bayesian network based video indexing. In IEEE Intl.
Conf. on Multimedia and Exhibition, pages 667-680, 2008.
- Wen Xuan
Teng, Guillaume Gravier,
Frédéric Bimbot, and Frédéric Soufflet. Speaker
variable reference model subspace and application to large vocabulary
speech recognition. In IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing, pages 4381-4384, April 2009.
Lecorvé, Guillaume Gravier, and Pascale Sébillot. An
Web-based topic language model adaptation method. In IEEE
Intl. Conf. on Acoustics, Speech and Signal Processing, pages
5081-5084, April 2008..
I obtained a master degree in Applied Mathematics at the Institut National des Sciences
Appliquees (INSA Rouen) in 1995 and worked on speech synthesis at ELAN Informatique from 1996 to 1997. I
received a Ph. D. in Signal and Image Processing (Toward speech
modeling with Markov random fields) at the Ecole
National Superieure des Telecommunications (ENST Paris) in 2000.
After a one year post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology
group at IBM T. J. Watson research center from 2001 to 2002. Since
2002, I am a research fellow at the Centre
National pour la Recherche Scientifique (CNRS), working at the Institut de Recherche en Informatique et
Systèmes Aléatoires (IRISA). I received the
Habilitation à Diriger des Recherches (HDR) de
l'Université de Rennes 1, spécialité Informatique,
Gravier, Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel : +33 2 99 84 72 39 / Fax : +33 2 99 84 71 71