
Research
topics
Projects
Ph. D. students
Software
Publications
Short bio
Contact
|
Research topics
From a general point of view, my research activities focus on the
analysis of multimedia documents with the constant preoccupation of
proposing (stochastic) models to combine all the sources of knowledge
available. This general philosophy translates in two main areas:
- Spoken document
analysis: detecting and tracking audio events in videos; speaker
segmentation and tracking;
speech recognition; topic segmentation; spoken document indexing. I am
currently
interested in the following topics:
- combining ASR and NLP for robust spoken document
analysis
- integrating knowledge (e.g. phonetic landmarks) in
HMM-based ASR
- motif/word discovery in audio streams
- Multimedia stream
modeling: joint models of multimedia streams for video analysis.
The aim of this research is to devise models that can integrate the
audio, visual and eventually textual information and represent their
relations (temporal
synchronisation model, correlations, etc.) for the analysis and
structuring of (sport) videos and for audiovisual ASR. Current
activities include:
- learning the dependencies in Bayesian networks for
event detection in videos
- multimodal topic segmentation
- speech-driven structuring of TV streams
Recent participation in
projects (contribution to the project)
I am currently involved in the following projects
- Attelage de Systèmes
Hétérogènes (ASH): harnessing heteregeneous speech
recognition systems for collaborative speech recognition (speech
recognition, knowledge integration)
- Évaluations en Traitement Automatique de la
Parole (ETAPE): evaluation campaign on TV stream transcription for the
French language (on behalf of the AFCP)
- Quaero:
multimodal search engines (audio event
detection, multimodal integration, video structure
analysis)
Over the last few years, I have participated to the following projects
- Rapsodis: improving speech recognition with syntax
and semantics
- Demi-Ton:
multimedia stream structuring (multimedia
integration, video structuring, speech transcription, transcribed text
analysis)
- Pelops: Soccer video analysis and repurposing (sound
class detection, word spotting)
- ESTER:
French spoken document rich transcription evaluation campaign (campaign
organization; BN rich transcription system development)
- Domus Videum: video abstracting and navigation (sound
class detection, multimedia integration)
Participation in the activities of the MUSCLE European Network of
Excellence.
Ph. D. students
Ongoing Ph. D. thesis I am supervising:
- Julien Fayolle. Information retrieval in TV streams
- Camille Guinaudeau. Speeh-based video structuring
- Armando Muscariello. Audio motif discover
- Gwénolé Lecorvé. Unsupervised
topic adaptation for robust speech recognition
- Siwar Baghdadi. Sparse events detection in videos
with Bayesian networks (in collaboration with Thomson Corporate
Research)
Past Ph. D. students:
Software development
I am actively participating in the development of the following free
software toolkits:
- SPro,
a speech signal processing toolkit
- AudioSeg,
generic tools for audio segmentation
- Sirocco,
a large vocabulary decoder for speech recognition
- topic-segmenter,
a transcript-based topic segmentation program for spoken documents
These toolkits are the base (with a little help from HTK) of the
IRENE broadcast news indexing platform developed in collaboration with François Yvon,
orginally for the French
Ester rich transcription evaluation campaign. Also check out my free ESTER resources page.
In the framework of the ASR/NLP work group, we have developed several
pieces of code related to spoken document analysis. Among others, worth
mentioning are:
- IRISA News Topic Segmenter: wrapper to
topic-segmenter for the segmentation of broadcast news
- kiwi: keyword extraction from transcripts
- fishnet: fish texts on the Internet related to a
topic characterized by a few keywords (as given by kiwi)
- match-maker: corpus based acquisition of semantic
relations (and a bunch of relations from a large newspaper corpus)
These toolkits are not freely distributed but feel free to contact me
should you be interested in any of those things.
Selected publications
- Armando Muscariello, Guillaume
Gravier, and Frédéric Bimbot. Audio
keyword
extraction by unsupervised word discovery. In Conf.
of the Intl. Speech Communication Association (Interspeech), pages
2843-2846, 2009.
- Stéphane Huet, Guillaume
Gravier, and Pascale Sébillot. Improvement of automatic
speech recognitionsystems with morpho-syntax applied to French.
To appear in Computer Speech and Language.
- Manolis Delakis, Guillaume
Gravier, and Patrick Gros. Audiovisual Integration with
Segment Models for Tennis Video Parsing. Computer Vision
and Image Understanding, 111(2):142-154, August 2008.
- Siwar Baghdadi, Guillaume
Gravier, Claire-Hélène Demarty, and Patrick Gros. Structure
learning in
Bayesian network based video indexing. In IEEE Intl.
Conf. on Multimedia and Exhibition, pages 667-680, 2008.
- Wen Xuan Teng, Guillaume Gravier,
Frédéric Bimbot, and Frédéric Soufflet. Speaker
adaptation by
variable reference model subspace and application to large vocabulary
speech recognition. In IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing, pages 4381-4384, April 2009.
- Gwénolé
Lecorvé, Guillaume Gravier, and Pascale Sébillot. An
unsupervied
Web-based topic language model adaptation method. In IEEE
Intl. Conf. on Acoustics, Speech and Signal Processing, pages
5081-5084, April 2008.
- X. Naturel, G. Gravier, and P. Gros. Fast
Structuring of Large
Television Streams using Program Guides.
In Intl. Workshop on Adaptive Multimedia Retrieval,
2006.
- M. Ben, F. Bimbot, and G. Gravier. A
model space framework
for efficient speaker
detection.
In European Conference on Speech Communication and Technology,
2005.
- M. Ben,
M. Betser,
F. Bimbot, and G. Gravier. Speaker
Diarization using bottom-up clustering
based on a Parameter-derived Distance between adapted GMMs.
In Intl. Conf. on Speech and Language Processing,
2004.
- M. Betser and G. Gravier. Multiple
events tracking in
sound tracks.
In Intl. Conf. on Multimedia and Exhibition,
2004.
- Gerasimos
Potamianos,
Chalapathy Neti, Guillaume Gravier, Ashutosh Garg, Andrew W. Senior. Recent
advances in the automatic recognition of
audio-visual
speech. IEEE Proceedings, 91(9):1306-1326, 2003.
Check out my complete list of publications.
Short bio
I obtained a master degree in Applied Mathematics at the Institut National des Sciences
Appliquees (INSA Rouen) in 1995 and worked on speech synthesis at ELAN Informatique from 1996 to 1997. I
received a Ph. D. in Signal and Image Processing (Toward speech
modeling with Markov random fields) at the Ecole
National Superieure des Telecommunications (ENST Paris) in 2000.
After a one year post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology
group at IBM T. J. Watson research center from 2001 to 2002. Since
2002, I am a research fellow at the Centre
National pour la Recherche Scientifique (CNRS), working at the Institut de Recherche en Informatique et
Systèmes Aléatoires (Irisa, INRIA Rennes).
|