the author's ugly face

Research topics
Projects
Ph. D. students
Software
Publications
Short bio
Contact

                           

Research topics

From a general point of view, my research activities focus on the analysis of multimedia documents with the constant preoccupation of proposing (stochastic) models to combine all the sources of knowledge available. This general philosophy translates in two main areas:

  1. Spoken document analysis: detecting and tracking audio events in videos; speaker segmentation and tracking; speech recognition; topic segmentation; spoken document indexing. I am currently interested in the following topics:
    • combining ASR and NLP for robust spoken document analysis
    • integrating knowledge (e.g. phonetic landmarks) in HMM-based ASR
    • motif/word discovery in audio streams

  2. Multimedia stream modeling: joint models of multimedia streams for video analysis. The aim of this research is to devise models that can integrate the audio, visual and eventually textual information and represent their relations (temporal synchronisation model, correlations, etc.) for the analysis and structuring of (sport) videos and for audiovisual ASR. Current activities include:
    • learning the dependencies in Bayesian networks for event detection in videos
    • multimodal topic segmentation
    • speech-driven structuring of TV streams

Recent participation in projects (contribution to the project)

I am currently involved in the following projects
  • Attelage de Systèmes Hétérogènes (ASH): harnessing heteregeneous speech recognition systems for collaborative speech recognition (speech recognition, knowledge integration)
  • Évaluations en Traitement Automatique de la Parole (ETAPE): evaluation campaign on TV stream transcription for the French language (on behalf of the AFCP)
  • Quaero: multimodal search engines (audio event detection, multimodal integration, video structure analysis)
Over the last few years, I have participated to the following projects
  • Rapsodis: improving speech recognition with syntax and semantics
  • Demi-Ton: multimedia stream structuring (multimedia integration, video structuring, speech transcription, transcribed text analysis)
  • Pelops: Soccer video analysis and repurposing (sound class detection, word spotting)
  • ESTER: French spoken document rich transcription evaluation campaign (campaign organization; BN rich transcription system development)
  • Domus Videum: video abstracting and navigation (sound class detection, multimedia integration)
Participation in the activities of the MUSCLE European Network of Excellence.

Ph. D. students

Ongoing Ph. D. thesis I am supervising:
  • Julien Fayolle. Information retrieval in TV streams
  • Camille Guinaudeau. Speeh-based video structuring
  • Armando Muscariello. Audio motif discover
  • Gwénolé Lecorvé. Unsupervised topic adaptation for robust speech recognition
  • Siwar Baghdadi. Sparse events detection in videos with Bayesian networks (in collaboration with Thomson Corporate Research)
Past Ph. D. students:

Software development

I am actively participating in the development of the following free software toolkits:
  • SPro, a speech signal processing toolkit
  • AudioSeg, generic tools for audio segmentation
  • Sirocco, a large vocabulary decoder for speech recognition
  • topic-segmenter, a transcript-based topic segmentation program for spoken documents
These toolkits are the base (with a little help from HTK) of the IRENE broadcast news indexing platform developed in collaboration with François Yvon, orginally for the French Ester rich transcription evaluation campaign. Also check out my free ESTER resources page.

In the framework of the ASR/NLP work group, we have developed several pieces of code related to spoken document analysis. Among others, worth mentioning are:
  • IRISA News Topic Segmenter: wrapper to topic-segmenter for the segmentation of broadcast news
  • kiwi: keyword extraction from transcripts
  • fishnet: fish texts on the Internet related to a topic characterized by a few keywords (as given by kiwi)
  • match-maker: corpus based acquisition of semantic relations (and a bunch of relations from a large newspaper corpus)
These toolkits are not freely distributed but feel free to contact me should you be interested in any of those things.

Selected publications

Check out my complete list of publications.

Short bio

I obtained a master degree in Applied Mathematics at the Institut National des Sciences Appliquees (INSA Rouen) in 1995 and worked on speech synthesis at ELAN Informatique from 1996 to 1997. I received a Ph. D. in Signal and Image Processing (Toward speech modeling with Markov random fields) at the Ecole National Superieure des Telecommunications (ENST Paris) in 2000. After a one year post-doctoral stay at Irisa, I joined the Audio Visual Speech Technology group at IBM T. J. Watson research center from 2001 to 2002. Since 2002, I am a research fellow at the Centre National pour la Recherche Scientifique (CNRS), working at the Institut de Recherche en Informatique et Systèmes Aléatoires (Irisa, INRIA Rennes).

Guillaume Gravier, Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel : +33 2 99 84 72 39 / Fax : +33 2 99 84 71 71
firstname.secondname@irisa.fr
[Last update: october 2006]