I am a 2nd-year PhD student in the INRIA / TEXMEX reasearch team at the University of Rennes 1, working on "Information retrieval in TV streams" under the supervision of Fabienne Moreau, Christian Raymond, Guillaume Gravier and Patrick Gros.
I obtained a master degree in Signal, Image, Embedded Systems and Automatics (SISEA) in 2009 and graduated from the ENSSAT engineering school in Electronics, Industrial Informatics and Multimedia Data Processing in 2008.
See my resume.
I am generally interested in :
The main focus of our research is to conceive new generation of information retrieval (IR) systems for TV streams considering that recognizing the speech is the best - or at least the easiest - way to extract the semantic information to be indexed. Directly indexing automatic speech recognition (ASR) transcripts remains nevertheless a difficult task. These transcripts are unstructured (there is no sentence, no punctuation and no capitalization) and noisy (they contain erroneous words that don't convey the original meaning of the truly uttered words). There are two main consequences : first, classical natural language processing techniques often used on structured and clean text to extract relevant information are not adapted to that kind of data; second, depending on the word error rate, they miss more or less relevant information such as named entities (e.g. names, places, organizations) and out-of-vocabulary words (e.g. specialised terms, neologisms, unknown named entities).
This work first aims at finding a more precise and less noisy representation of speech than the classical ASR textual transcripts for high-level and robust spoken content analysis tasks such as summerization, machine translation, topic threading or document expansion. Word-level confidence measures ([Fayolle et al. IS 2010] and [Fayolle et al. AND 2010]) indicating the reliability of the recognized words can be used to distinguish the reliable areas of the transcripts that may worth to be kept from the unreliable areas that worth to be removed or better to be replaced by a lower level of representation (e.g. sub-words, audio features) keeping a part of the information to be retrieved. Named entities can also be recognized by robust methods when they are present in the transcripts ([Raymond and Fayolle TALN 2010]). To better represent spoken contents, we propose to combine the multi-level and reliable information that are following :
Second, we would like to use this multi-level represention for speech-based multimedia content retrieval. This implies to adapt the classical IR techniques to multi-level indexes for which we will have to define their structure, their relations, their combination and the different searching strategies. We have also to take into account that the notion of document is not clearly defined in the context of TV streams.
International Conferences
[slides]
[slides]
[slides]
Load the BibTeX file.
Address : Julien Fayolle, INRIA, Campus de Beaulieu, 35042 RENNES Cedex, France
Tel : +33 2 99 84 74 26 / Fax : +33 2 99 84 71 71
E-mail : julien.fayolle [at] inria.fr
Secretary: Loïc Lesage, +33 2 99 84 74 37, Loic.Lesage [at] inria.fr