Publications
2016
Vedran Vukotic, Christian Raymond & Guillaume Gravier
A step beyond local observations with a dialog aware bidirectional GRU network for Spoken Language Understanding
Interspeech  
September, San Francisco, United States
Abstract: Architectures of Recurrent Neural Networks (RNN) recently become a very popular choice for Spoken Language Understanding (SLU) problems; however, they represent a big family of different architectures that can furthermore be combined to form more complex neural networks. In this work, we compare different recurrent networks, such as simple Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, Gated Memory Units (GRU) and their bidirectional versions, on the popular ATIS dataset and on MEDIA, a more complex French dataset. Additionally, we propose a novel method where information about the presence of relevant word classes in the dialog history is combined with a bidirectional GRU, and we show that combining relevant word classes from the dialog history improves the performance over recurrent networks that work by solely analyzing the current sentence.
BibTeX:

@inproceedings{Vukotic.etal_2016,
  author = {Vukotic, Vedran and Raymond, Christian and Gravier, Guillaume},
  title = {{A step beyond local observations with a dialog aware bidirectional GRU network for Spoken Language Understanding}},
  booktitle = {{Interspeech}},
  year = {2016},
  month = {September},
  address = {San Francisco, United States},
  url = {https://hal.inria.fr/hal-01351733}
}

Vedran Vukotic, Christian Raymond & Guillaume Gravier
Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications
ICMR  
June, New York, United States
Abstract: Common approaches to problems involving multiple modalities (classi cation, retrieval, hyperlinking, etc.) are early fusion of the initial modalities and crossmodal translation from one modality to the other. Recently, deep neural networks, especially deep autoencoders, have proven promising both for crossmodal translation and for early fusion via multimodal embedding. In this work, we propose a exible crossmodal deep neural network architecture for multimodal and crossmodal representation. By tying the weights of two deep neural networks, symmetry is enforced in central hidden layers thus yielding a multimodal representation space common to the two original representation spaces. The proposed architecture is evaluated in multimodal query expansion and multimodal retrieval tasks within the context of video hyperlinking. Our method demonstrates improved crossmodal translation capabilities and produces a multimodal embedding that signicantly outperforms multimodal embeddings obtained by deep autoencoders, resulting in an absolute increase of 14.14 in precision at 10 on a video hyperlinking task
BibTeX:

@inproceedings{Vukotic.etal_2016a,
  author = {Vukotic, Vedran and Raymond, Christian and Gravier, Guillaume},
  title = {{Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications}},
  booktitle = {{ICMR}},
  year = {2016},
  month = {June},
  address = {New York, United States},
  url = {https://hal.inria.fr/hal-01314302}
}

2015
Vedran Vukotic, Christian Raymond & Guillaume Gravier
Is it time to switch to Word Embedding and Recurrent Neural Networks for Spoken Language Understanding?
InterSpeech  
September, Dresde, Germany
Abstract: Recently, word embedding representations have been investigated for slot filling in Spoken Language Understanding, along with the use of Neural Networks as classifiers. Neural Networks, especially Recurrent Neural Networks, that are specifically adapted to sequence labeling problems, have been applied successfully on the popular ATIS database. In this work, we make a comparison of this kind of models with the previously state-of-the-art Conditional Random Fields (CRF) classifier on a more challenging SLU database. We show that, despite efficient word representations used within these Neural Networks, their ability to process sequences is still significantly lower than for CRF, while also having a drawback of higher computational costs, and that the ability of CRF to model output label dependencies is crucial for SLU.
BibTeX:

@inproceedings{Vukotic.etal_2015,
  author = {Vedran Vukotic and Christian Raymond and Guillaume Gravier},
  title = {Is it time to switch to Word Embedding and Recurrent Neural Networks for Spoken Language Understanding?},
  booktitle = {InterSpeech},
  year = {2015},
  month = {September},
  address = {Dresde, Germany}
}

Vedran Vukotic, Vincent Claveau & Christian Raymond
IRISA at DeFT 2015: Supervised and Unsupervised Methods in Sentiment Analysis
DEFT  
June, Caen, France
Abstract: In this work, we present the participation of IRISA Linkmedia team at DeFT 2015. The team participated in two tasks: i) valence classification of tweets and ii) fine-grained classification of tweets (which includes two sub-tasks: detection of the generic class of the information expressed in a tweet and detection of the specific class of the opinion/sentiment/emotion. For all three problems, we adopt a standard machine learning framework. More precisely, three main methods are proposed and their feasibility for the tasks is analyzed: i) decision trees with boosting (bonzaiboost), ii) Naive Bayes with
Okapi and iii) Convolutional Neural Networks (CNNs). Our approaches are voluntarily knowledge free and text-based only, we do not exploit external resources (lexicons, corpora) or tweet metadata. It allows us to evaluate the interest of each method and of traditional bag-of-words representations vs. word embeddings
BibTeX:

@inproceedings{Vukotic.etal_2015a,
  author = {Vedran Vukotic and Vincent Claveau and Christian Raymond},
  title = {IRISA at DeFT 2015: Supervised and Unsupervised Methods in Sentiment Analysis},
  booktitle = {DEFT},
  year = {2015},
  month = {June},
  address = {Caen, France}
}

Davy Weissenbacher & Christian Raymond
Tree-Structured Named Entities Extraction from Competing Speech Transcriptions
NLDB  
Passau, Germany
Abstract: When real applications are working with automatic speech transcription, the first source of error does not originate from the incoherence in the analysis of the application but from the noise in the automatic transcriptions. This study presents a simple but effective method to generate a new transcription of better quality by combining utterances from competing transcriptions. We have extended a structured Named Entity (NE) recognizer submitted during the ETAPE Challenge. Working on French TV and Radio programs, our system revises the transcriptions provided by making use of the NEs it has detected. Our results suggest that combining the transcribed utterances which optimize the F-measures, rather than minimizing the WER scores, allows the generation of a better transcription for NE extraction. The results show a small but significant improvement of 0.9% SER against the baseline system on the ROVER transcription. These are the best performances reported to date on this corpus.
BibTeX:

@inproceedings{Weissenbacher.Raymond_2015,
  author = {Davy Weissenbacher and Christian Raymond},
  title = {Tree-Structured Named Entities Extraction from Competing Speech Transcriptions},
  booktitle = {NLDB},
  year = {2015},
  address = {Passau, Germany}
}

2014
Antoine Laurent, Nathalie Camelin & Christian Raymond
Boosting de bonzaïs pour la combinaison efficace de descripteurs : application à l’identification du rôle du locuteur
Journées d'Études sur la Parole  
Juin, Le Mans, France
Abstract: Dans ce travail, nous nous intéressons au problème de la détection du rôle du locuteur dans les émissions d’actualités radiotélévisées. Dans la littérature, les solutions proposées sont de combiner des indicateurs variés provenant de l’acoustique, de la transcription et/ou de son analyse par des méthodes d’apprentissage automatique. De nombreuses études font ressortir l’algorithme de boosting sur des règles de décision simples comme l’un des plus efficaces à combiner ces différents descripteurs. Nous proposons ici une modification de cet algorithme état-de-l’art en remplaçant ces règles de décision simples par des mini arbres de décision que nous appelons bonzaïs. Les expériences comparatives menées sur le corpus EPAC montrent que cette modification améliore largement les performances du système tout en réduisant le temps d’apprentissage de manière conséquente.
BibTeX:

@inproceedings{Laurent.etal_2014,
  author = {Antoine Laurent and Nathalie Camelin and Christian Raymond},
  title = {Boosting de bonzaïs pour la combinaison efficace de descripteurs : application à l’identification du rôle du locuteur},
  booktitle = {Journées d'Études sur la Parole},
  year = {2014},
  month = {Juin},
  address = {Le Mans, France}
}

Antoine Laurent, Nathalie Camelin & Christian Raymond
Boosting bonsai trees for efficient features combination : application to speaker role identification
InterSpeech  
September, Singapour
Abstract: In this article, we tackle the problem of speaker role detection from broadcast news shows. In the literature, many proposed solutions are based on the combination of various features coming from acoustic, lexical and semantic information with a machine learning algorithm. Many previous studies mention the use of boosting over decision stumps to combine efficiently these features. In this work, we propose a modification of this state-of-the-art machine learning algorithm changing the weak learner (decision stumps) by small decision trees, denoted bonsai trees. Experiments show that using bonsai trees as weak learners for the boosting algorithm largely improves both system error rate and learning time.
BibTeX:

@inproceedings{Laurent.etal_2014a,
  author = {Antoine Laurent and Nathalie Camelin and Christian Raymond},
  title = {Boosting bonsai trees for efficient features combination : application to speaker role identification},
  booktitle = {InterSpeech},
  year = {2014},
  month = {September},
  address = {Singapour}
}

Yann Ricquebourg, Christian Raymond, Baptiste Poirriez, Aurélie Lemaitre & Bertrand Coüasnon
Boosting bonsai trees for handwritten/printed text discrimination
Document Recognition and Retrieval (DRR)  
February, San Francisco, California, USA
Abstract: Boosting over decision-stumps proved its eciency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to overfitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.
BibTeX:

@inproceedings{Ricquebourg.etal_2013,
  author = {Yann Ricquebourg and Christian Raymond and Baptiste Poirriez and Aurélie Lemaitre and Bertrand Coüasnon},
  title = {Boosting bonsai trees for handwritten/printed text discrimination},
  booktitle = {Document Recognition and Retrieval (DRR)},
  year = {2014},
  month = {February},
  address = {San Francisco, California, USA}
}

2013
Christian Raymond
Robust tree-structured named entities recognition from speech
Proceedings of the International Conference on Acoustic Speech and Signal Processing  
May, Vancouver, Canada
Abstract: Named Entity Recognition (NER) is a well-known Natural Language Processing (NLP) task, used as a preliminary processing to provide a semantic level to more complex tasks. Recently a new set of named entities has been defined, this set has a multilevel tree structure, where base entities are combined to define more complex ones. In this paper I describe, an effective and original NER system robust to noisy speech inputs that ranked first at the 2012 ETAP NER evaluation campaign with results far better than those of the other participating systems.
BibTeX:

@inproceedings{Raymond_2013,
  author = {Christian Raymond},
  title = {Robust tree-structured named entities recognition from speech},
  booktitle = {{P}roceedings of the {I}nternational {C}onference on {A}coustic {S}peech and {S}ignal {P}rocessing},
  year = {2013},
  month = {May},
  address = {Vancouver, Canada}
}

2012
Vincent Claveau & Christian Raymond
Participation de l'IRISA à DeFT2012 : recherche d'information et apprentissage pour la génération de mots-clés
Défi Fouille de Texte (DEFT2012)  
Juin, Grenoble
Abstract: This paper describes the IRISA participation to the DeFT 2012 text-mining challenge. It consisted in the automatic attribution or generation of keywords to scientific journal articles. Two tasks were proposed which led us to test two different strategies. For the first task, a list of keywords was provided. Based on that, our first strategy is to consider that as an Information Retrieval problem in which the keyword are the queries, which are attributed to the best ranked documents. This approach yielded very good results. For the second task, only the articles were known; for this task, our approach is chiefly based on a term extraction system whose results are reordered by machine learning.
BibTeX:

@inproceedings{Claveau.Raymond_2012,
  author = {Vincent Claveau and Christian Raymond},
  title = {Participation de l'IRISA à DeFT2012 : recherche d'information et apprentissage pour la génération de mots-clés},
  booktitle = {Défi Fouille de Texte (DEFT2012)},
  year = {2012},
  month = {Juin},
  address = {Grenoble}
}

Julien Fayolle, Fabienne Moreau, Christian Raymond & Guillaume Gravier
Automates lexico-phonétiques pour l'indexation et la recherche de segments de parole
Journées d'Études sur la Parole  
vol.1, pages 49-56, Juin, Grenoble
Abstract: This paper presents a new indexing method for spoken utterances combining lexical and phonetic hypothesis in an hybrid index build from automata. The retrieval is realised by a lexical-phonetic and semi-imperfect matching tolerating imperfections to improve the recall. A feature vector (edition scores, confidence measures, durations) weighting each transition helps to filter the candidate utterances for a better precision. The experiments show the complementarity of lexical and phonetic representations and their interest to retrieve named entity queries.
BibTeX:

@inproceedings{Fayolle.Moreau.ea_2012,
  author = {Julien Fayolle and Fabienne Moreau and Christian Raymond and Guillaume Gravier},
  title = {Automates lexico-phonétiques pour l'indexation et la recherche de segments de parole},
  booktitle = {Journées d'Études sur la Parole},
  year = {2012},
  volume = {1},
  pages = {49-56},
  month = {Juin},
  address = {Grenoble}
}

Julien Fayolle, Murat Saraclar, Fabienne Moreau, Christian Raymond & Guillaume Gravier
Lexical-phonetic automata for spoken utterance indexing and retrieval
InterSpeech  
September, Portland, Oregon, USA
Abstract: This paper presents a method for indexing spoken utterances which combines lexical and phonetic hypotheses in a hybrid index built from automata. The retrieval is realised by a lexical-phonetic and semi-imperfect matching whose aim is to improve the recall. A feature vector, containing edit distance scores and a confidence measure, weights each transition to help the filtering of the candidate utterance list for a more precise search. Experiment results show that the lexical and phonetic representations are complementary and we compare the hybrid search with the state-of-the-art cascaded search to retrieve named entity queries.
BibTeX:

@inproceedings{Fayolle.Saraclar.ea_2012,
  author = {Julien Fayolle and Murat Saraclar and Fabienne Moreau and Christian Raymond and Guillaume Gravier},
  title = {Lexical-phonetic automata for spoken utterance indexing and retrieval},
  booktitle = {InterSpeech},
  year = {2012},
  month = {September},
  address = {Portland, Oregon, USA}
}

Christian Raymond & Vincent Claveau
Apprentissage supervisé et paresseux pour la fouille de textes
Expérimentations et évaluations en fouille de textes  
pages chapitre 11, November,
BibTeX:

@incollection{Raymond.Claveau_2012,
  author = {Raymond, Christian and Claveau, Vincent},
  title = {{Apprentissage supervisé et paresseux pour la fouille de textes}},
  booktitle = {{Expérimentations et évaluations en fouille de textes}},
  publisher = {Hermes - Lavoisier},
  year = {2012},
  pages = {chapitre 11},
  month = {November},
  url = {http://www.lavoisier.fr/livre/notice.asp?ouvrage=2667584}
}

2011
Stefan Hahn, Marco Dinarelli, Christian Raymond, Fabrice Lefèvre, Patrick Lehnen, Renato De Mori, Alessandro Moschitti, Hermann Ney & Giuseppe Riccardi
Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages
IEEE Transactions on Audio, Speech and Language Processing
vol. 19 (6), pages 1569--1583, August
Abstract: One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FST), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMM), or Support Vector Machines (SVM) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRF) or Dynamic Bayesian Networks (DBN). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the CER drops to 12.0%.
BibTeX:

@article{Hahn.Dinarelli.ea_2010,
  author = {Stefan Hahn and Marco Dinarelli and Christian Raymond and Fabrice Lefèvre and Patrick Lehnen and De Mori, Renato and Alessandro Moschitti and Hermann Ney and Giuseppe Riccardi},
  title = {Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages},
  journal = {{IEEE} {T}ransactions on {A}udio, {S}peech and {L}anguage {P}rocessing},
  publisher = {IEEE Press},
  year = {2011},
  volume = {19},
  number = {6},
  pages = {1569--1583},
  month = {August},
  address = {Piscataway, NJ, USA},
  doi = {10.1109/TASL.2010.2093520}

}

Christian Raymond & Vincent Claveau
Participation de l'IRISA à DEFT 2011: expériences avec des approches d'apprentissage supervisé et non-supervisé
Défi Fouille de Texte (DEFT2011)  
Juillet, Montpellier
Abstract: This article presents the participation of IRISA TexMex team at DEFT in 2011. We participated in the two proposed tasks and all tracks. We explored different approaches. We employed specific learning techniques based on boosting over decision trees and lazy-learning together with weights from the information retrieval field. These different approaches enabled us to obtain good results since we rank first on the task of dating and we obtained an accuracy of 99% and 99.5% on the pairing task.
BibTeX:

@inproceedings{Raymond.Claveau_2011,
  author = {Christian Raymond and Vincent Claveau},
  title = {Participation de l'IRISA à DEFT 2011: expériences avec des approches d'apprentissage supervisé et non-supervisé},
  booktitle = {Défi Fouille de Texte (DEFT2011)},
  year = {2011},
  month = {Juillet},
  address = {Montpellier}
}

2010
Frédéric Béchet, Christian Raymond, Frédéric Duvert & Renato De Mori
Frame Based Interpretation Of Conversational Speech
Spoken Language Technologies Workshop  
December, Berkeley, California, U.S.A
Abstract: Two approaches to Spoken Language Understanding based on frames describing chunked knowledge are described. They are applied to the MEDIA corpus annotated in terms of concepts expressing chunks of spoken sentences. General rules of knowledge composition and inference appear to be adequate to effectively applying the application ontology for obtaining frame based representations of dialogue turns. The main difficulty appears to be the characterization of the syntactic knowledge expressing semantic links between knowledge chunks. This knowledge can be hand-crafted or automatically learned from examples. It is shown that the latter approach outperforms the former if applied to ASR error prone transcriptions.
BibTeX:

@inproceedings{Bechet.Raymond.ea_2010,
  author = {Frédéric Béchet and Christian Raymond and Frédéric Duvert and De Mori, Renato},
  title = {Frame Based Interpretation Of Conversational Speech},
  booktitle = {Spoken Language Technologies Workshop},
  year = {2010},
  month = {December},
  address = {Berkeley, California, U.S.A}
}

Julien Fayolle, Fabienne Moreau, Christian Raymond, Guillaume Gravier & Patrick Gros
CRF-based Combination of Contextual Features to Improve A Posteriori Word-level Confidence Measures
InterSpeech  
September, Makuari, Japan
Abstract: This paper addresses the issue of confidence measure reliability provided by automatic speech recognition systems for use in various spoken language processing applications. In this context, we propose a conditional random field (CRF)-based combination of contextual features to improve word-level confidence measures. More precisely, the method consists in combining phonetic, lexical, linguistic and semantic features to enhance confidence measures, explicitely exploiting context information. The combination is performed using CRF whose selected patterns enable to establish a precise diagnosis about the interest of individual and contextual features. Experiments, conducted the large French broadcast news corpus ESTER, demonstrate the added-value of the proposed CRF-based combination of contextual features, with a significant improvement of the normalized cross entropy and of the equal error rate.
BibTeX:

@inproceedings{Fayolle.Moreau.ea_2010,
  author = {Julien Fayolle and Fabienne Moreau and Christian Raymond and Guillaume Gravier and Patrick Gros},
  title = {CRF-based Combination of Contextual Features to Improve A Posteriori Word-level Confidence Measures},
  booktitle = {InterSpeech},
  year = {2010},
  month = {September},
  address = {Makuari, Japan}
}

Julien Fayolle, Fabienne Moreau, Christian Raymond & Guillaume Gravier
Reshaping Automatic Speech Transcripts for Robust High-level Spoken Document Analysis
Analytics for Noisy Unstructured Text Data  
October, Toronto, Canada
Best Student Paper Award
Abstract: High-level spoken document analysis is required in many applications seeking access to the semantic content of audio data, such as information retrieval, machine translation or automatic summarization. It is nevertheless a difficult task that is generally based on transcripts provided by an automatic speech recognition system. Unlike standard texts, transcripts belong to the category of highly noisy data because of word recognition errors that affect, in particular, very significant words such as named entities (eg. person's names, locations, organizations). Transcripts also contain specificities of spoken language that make ineffective their processing by natural language processing tools designed for texts. To overcome these issues, this paper proposes a method to reshape automatic speech transcripts for robust high-level spoken document analysis. The method consists in conceiving a new word-level confidence measure that may efficiently ensure the reliability of transcribed words, focusing on words that are relevant for high-level spoken document analysis such as named entities. The approach consists in combining different features collected from various sources of knowledge thanks to a machine learning method based on conditional random fields. In addition to standard features (morphosyntactic, linguistic and phonetic), we introduce new semantic features based on the decisions of three robust named entity recognition systems to better estimate the reliability of named entities. Experiments, conducted on the French broadcast news corpus ESTER, demonstrate the added-value of the proposed word-level confidence measure for error detection and named entity recognition, with respect to the basic confidence measure provided by an automatic speech recognition system.
BibTeX:

@inproceedings{Fayolle.Moreau.ea_2010a,
  author = {Julien Fayolle and Fabienne Moreau and Christian Raymond and Guillaume Gravier},
  title = {Reshaping Automatic Speech Transcripts for Robust High-level Spoken Document Analysis},
  booktitle = {Analytics for Noisy Unstructured Text Data},
  year = {2010},
  month = {October},
  address = {Toronto, Canada},
  note = {Best Student Paper Award}
}

Christian Raymond & Julien Fayolle
Reconnaissance robuste d'entités nommées sur de la parole transcrite automatiquement
Traitement Automatique des Langues Naturelles  
July, Montréal, Canada
Abstract: Automatic speech transcripts are an important, but noisy, ressource to index spoken multimedia documents (e.g. broadcast news). In order to improve both indexation and information retrieval, extracting semantic information from these erroneous transcripts is an interesting challenge. Among these meaningful contents, there are named entities (e.g. names of persons) which are the subject of this work. Traditional named entity taggers are based on manual and formal grammars. They obtain correct performance on text or clean manual speech transcripts, but they have a lack of robustness when applied on automatic transcripts. We are introducing, in this work, three methods for named entity recognition based on machine learning algorithms, namely conditional random fields, support vector machines, and finite state transducers. We are also introducing a method to make consistant the training data when they are annotated with slightly different conventions. We show that our tagger systems are among the most robust when applied to the evaluation data of the French ESTER 2 campaign in the most difficult conditions where transcripts are particularly noisy.
BibTeX:

@inproceedings{Raymond.Fayolle_2010,
  author = {Christian Raymond and Julien Fayolle},
  title = {Reconnaissance robuste d'entités nommées sur de la parole transcrite automatiquement},
  booktitle = {Traitement Automatique des Langues Naturelles},
  year = {2010},
  month = {July},
  address = {Montréal, Canada}
}

Christophe Servan, Nathalie Camelin, Christian Raymond, Frédéric Béchet & Renato De Mori
On The Use Of Machine Translation For Spoken Language Understanding Portability
Proceedings of the International Conference on Acoustic Speech and Signal Processing  
pages 5330 -5333, Dallas, Texas, USA
Abstract: Across language portability of a spoken language understanding system (SLU) deals with the possibility of reusing with moderate effort in a new language knowledge and data acquired for another language.
The approach proposed in this paper is motivated by the availability of the fairly large MEDIA corpus carefully transcribed in French and semantically annotated in terms of constituents. A method is proposed for manually translating a portion of the training set for training an automatic machine translation (MT) system to be used for translating the remaining data. As the source language is annotated in terms of concept tags, a solution is presented for automatically transferring these tags to the translated corpus. Experimental results are presented on the accuracy of the translation expressed with the BLEU score as function of the size of the training corpus. It is shown that the process leads to comparable concept error rates in the two languages making the proposed approach suitable for SLU portability across languages.
BibTeX:

@inproceedings{Servan.Camelin.ea_2010,
  author = {Christophe Servan and Nathalie Camelin and Christian Raymond and Frédéric Béchet and De Mori, Renato},
  title = {On The Use Of Machine Translation For Spoken Language Understanding Portability},
  booktitle = {{P}roceedings of the {I}nternational {C}onference on {A}coustic {S}peech and {S}ignal {P}rocessing},
  year = {2010},
  pages = {5330 -5333},
  address = {Dallas, Texas, USA},
  doi = {10.1109/ICASSP.2010.5494960}

}

2008
Stefan Hahn, Patrick Lehnen, Christian Raymond & Hermann Ney
A Comparison of Various Methods for Concept Tagging for Spoken Language Understanding
Proceedings of the Language Resources and Evaluation Conference  
May, Marrakech, Morocco
Abstract: The extraction of flat concepts out of a given word sequence is usually one of the first steps in building a spoken language understanding (SLU) or dialogue system. This paper explores five different modelling approaches for this task and presents results on a state-of-the art corpus. Additionally, two log-linear modelling approaches could be further improved by adding morphologic knowledge. This paper goes beyond what has been reported in the literature, e.g. in (Raymond & Riccardi 07). We applied the models on the same training and testing data and used the NIST scoring toolkit to evaluate the experimental results to ensure identical conditions for each of the experiments and the comparability of the results.
BibTeX:

@inproceedings{Hahn.Lehnen.ea_2008,
  author = {Stefan Hahn and Patrick Lehnen and Christian Raymond and Hermann Ney},
  title = {A Comparison of Various Methods for Concept Tagging for Spoken Language Understanding},
  booktitle = {Proceedings of the Language Resources and Evaluation Conference },
  year = {2008},
  month = {May},
  address = {Marrakech, Morocco}
}

Christian Raymond & Giuseppe Riccardi
Learning with Noisy Supervision for Spoken Language Understanding
Proceedings of the International Conference on Acoustic Speech and Signal Processing  
pages 4989-4992, Las Vegas, USA
Abstract: Data-driven Spoken Language Understanding (SLU) systems need semantically annotated data which are expensive, time consuming and prone to human errors. Active learning has been successfully applied to automatic speech recognition and utterance classification. In general, corpora annotation for SLU involves such tasks as sentence segmentation, chunking or frame labeling and predicate-argument annotation. In such cases human annotations are subject to errors increasing with the annotation complexity. We investigate two alternative noise-robust active learning strategies that are either data-intensive or supervision-intensive. The strategies detect likely erroneous examples and improve significantly the SLU performance for a given labeling cost. We apply uncertainty based active learning with conditional random fields on the concept segmentation task for SLU. We perform annotation experiments on two databases, namely ATIS (English) and Media (French). We show that our noise-robust algorithm could improve the accuracy up to 6% (absolute) depending on the noise level and the labeling cost.
BibTeX:

@inproceedings{Raymond.Riccardi_2008,
  author = {Christian Raymond and Giuseppe Riccardi},
  title = {Learning with Noisy Supervision for Spoken Language Understanding},
  booktitle = {{P}roceedings of the {I}nternational {C}onference on {A}coustic {S}peech and {S}ignal {P}rocessing},
  year = {2008},
  pages = {4989-4992},
  address = {Las Vegas, USA},
  doi = {10.1109/ICASSP.2008.4518778}

}

Christian Raymond & Kepa Joseba Rodriguez
Annotation dynamique dans le corpus italien de dialogues spontanés LUNA
Journées d'Études sur la Parole  
Juin, Avignon, France
Abstract: In the context of the LUNA project, this paper presents the semantic annotation procedure we are following on an italian corpus. This corpus consists in human-human spontaneous dialogues recorded in the call center of the help desk facility of the Consortium for Information Systems of the Piedmont. The aim of our semantic annotation procedure is to speed up and make more reliable the manual annotation of the corpus. This procedure consists in using a statistical learner to annotate automatically at the semantic level transcribed files and to generate automatically annotated files in the input format of the annotation tool : human annotators have just to check and correct these annotations instead of starting from scratch. In order to converge as fast as possible to reliable automatic annotations and so minimizing the human effort, this procedure follows the active learning paradigm. The active learning procedure is coupled with an annotation error detection to assure more reliable annotation.
BibTeX:

@inproceedings{Raymond.Rodriguez_2008,
  author = {Christian Raymond and Kepa Joseba Rodriguez},
  title = {Annotation dynamique dans le corpus italien de dialogues spontanés LUNA},
  booktitle = {Journées d'Études sur la Parole},
  year = {2008},
  month = {Juin},
  address = {Avignon, France}
}

Christian Raymond, Kepa Joseba Rodriguez & Giuseppe Riccardi
Active Annotation in the LUNA Italian Corpus of Spontaneous Dialogues
Proceedings of the Language Resources and Evaluation Conference  
May, Marrakech, Morocco.
Abstract: In this paper we present an active approach to annotate with lexical and semantic labels an Italian corpus of conversational human-human and Wizard-of-Oz dialogues. This procedure consists in the use of a machine learner to assist human annotators in the labeling task. The computer assisted process engages human annotators to check and correct the automatic annotation rather than starting the annotation from un-annotated data. The active learning procedure is combined with an annotation error detection to control the reliablity of the annotation. With the goal of converging as fast as possible to reliable automatic annotations minimizing the human effort, we follow the active learning paradigm, which selects for annotation the most informative training examples required to achieve a better level of performance. We show that this procedure allows to quickly converge on correct annotations an thus minimize the cost of human supervision.
BibTeX:

@inproceedings{Raymond.Rodriguez.ea_2008,
  author = {Christian Raymond and Kepa Joseba Rodriguez and Giuseppe Riccardi},
  title = {Active Annotation in the LUNA Italian Corpus of Spontaneous Dialogues},
  booktitle = {Proceedings of the Language Resources and Evaluation Conference },
  year = {2008},
  month = {May},
  address = {Marrakech, Morocco.}
}

2007
Alessandro Moschitti, Giuseppe Riccardi & Christian Raymond
Spoken Language Understanding With Kernels For Syntactic/Semantic Structures
Proceedings IEEE Workshop on Automatic Speech Recognition and Understanding  
Kyoto, Japan
Abstract: Automatic concept segmentation and labeling are the fundamental problems of Spoken Language Understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with Support VectorMachines and the above kernels on the standard ATIS dataset. The proposed algorithmautomatically parses natural language text with offthe-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposedmodel is very accurate and competitive with respect to state-of-the art models when combined with n-gram based models.
BibTeX:

@inproceedings{Moschitti.Riccardi.ea_2007,
  author = {Alessandro Moschitti and Giuseppe Riccardi and Christian Raymond},
  title = {Spoken Language Understanding With Kernels For Syntactic/Semantic Structures},
  booktitle = {{P}roceedings {IEEE} {W}orkshop on {A}utomatic {S}peech {R}ecognition and {U}nderstanding},
  year = {2007},
  address = {Kyoto, Japan},
  doi = {10.1109/ASRU.2007.4430106}

}

Christian Raymond, Frédéric Béchet, Nathalie Camelin, Renato De Mori & Géraldine Damnati
Sequential decision strategies for machine interpretation of speech
IEEE Transactions on Audio, Speech and Language Processing
vol. 15 pages 162-171,
Abstract: Recognition errors made by Automatic Speech Recognition (ASR) systems may not prevent the development of useful dialogue applications if the interpretation strategy has an introspection capability for evaluating the reliability of the results. This paper proposes an interpretation strategy which is particularly effective when applications are developped with a training corpus of moderate size. From the lattice of word hypotheses generated by an ASR system, a short list of conceptual structures is obtained with a set of Finite State Machines (FSM). Interpretation or a rejection decision is then performed by a treebased strategy. The nodes of the tree correspond to elaboration decision units containing a redundant set of classifiers. A decision tree based and two large margin classifiers are trained with a development set to become interpretation knowledge sources. Discriminative training of the classifiers selects linguistic and confidence based features for contributing to a cooperative assessment of the reliability of an interpretation. Such an assessment leads to the definition of a limited number of reliability states. The probability, that a proposed interpretation is correct, is provided by its reliability state and transmitted to the dialogue manager. Experimental results are presented for a telephone service application.
BibTeX:

@article{Raymond.Bechet.ea_2007,
  author = {Christian Raymond and Frédéric Béchet and Nathalie Camelin and De Mori,Renato and Géraldine Damnati},
  title = {Sequential decision strategies for machine interpretation of speech},
  journal = {{IEEE} {T}ransactions on {A}udio, {S}peech and {L}anguage {P}rocessing},
  year = {2007},
  volume = {15},
  pages = {162-171},
  doi = {10.1109/TASL.2006.876862}

}

Christian Raymond & Giuseppe Riccardi
Generative and Discriminative Algorithms for Spoken Language Understanding
InterSpeech  
pages 1605-1608, August, Antwerp, Belgium
Abstract: Spoken Language Understanding (SLU) for conversational systems (SDS) aims at extracting concept and their relations from spontaneous speech. Previous approaches to SLU have modeled concept relations as stochastic semantic networks ranging from generative approach to discriminative. As spoken dialog systems complexity increases, SLU needs to perform understanding based on a richer set of features ranging from a-priori knowledge, long dependency, dialog history, system belief, etc. This paper studies generative and discriminative approaches to modeling the sentence segmentation and concept labeling. We evaluate algorithms based on Finite State Transducers (FST) as well as discriminative algorithms based on Support Vector Machine sequence classifier based and Conditional Random Fields (CRF). We compare them in terms of concept accuracy, generalization and robustness to annotation ambiguities. We also show how non-local non-lexical features (e.g. a-priori knowledge) can be modeled with CRF which is the best performing algorithm across tasks. The evaluation is carried out on two SLU tasks of different complexity, namely ATIS and MEDIA corpora.
BibTeX:

@inproceedings{Raymond.Riccardi_2007,
  author = {Christian Raymond and Giuseppe Riccardi},
  title = {Generative and Discriminative Algorithms for Spoken Language Understanding},
  booktitle = {InterSpeech},
  year = {2007},
  pages = {1605-1608},
  month = {August},
  address = {Antwerp, Belgium}
}

Christian Raymond, Giuseppe Riccardi, Kepa Joseba Rodriguez & Joanna Wiśniewska
LUNA Corpus: an Annotation Scheme for a Multi-domain Multi-lingual Dialogue Corpus
Workshop on the Semantics and Pragmatics of Dialogue, DECALOG'2007  
May, Rovereto, Italy
Abstract: The LUNA corpus is a multi-domain multilingual dialogue corpus currently under development. The corpus will be annotated at multiple levels to include annotations of syntactic, semantic and discourse information and used to develop a robust natural spoken language understanding toolkit for multilingual dialogue services.
BibTeX:

@inproceedings{Raymond.Riccardi.ea_2007,
  author = {Christian Raymond and Giuseppe Riccardi and Kepa Joseba Rodriguez and Joanna Wi\'{s}niewska},
  title = {LUNA Corpus: an Annotation Scheme for a Multi-domain Multi-lingual Dialogue Corpus},
  booktitle = {Workshop on the Semantics and Pragmatics of Dialogue, DECALOG'2007},
  year = {2007},
  month = {May},
  address = {Rovereto, Italy}
}

Kepa Joseba Rodriguez, Stefanie Dipper, Michael Götze, Massimo Poesio, Giuseppe Riccardi, Christian Raymond & Joanna Rabiega-Wiśniewska
Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus
Linguistic Annotation Workshop  
Prague
Abstract: The LUNA corpus is a multi-lingual, multidomain spoken dialogue corpus currently under development that will be used to develop a robust natural spoken language understanding toolkit for multilingual dialogue services. The LUNA corpus will be annotated at multiple levels to include annotations of syntactic, semantic, and discourse information; specialized annotation tools will be used for the annotation at each of these levels. In order to synchronize these multiple layers of annotation, the PAULA standoff exchange format will be used. In this paper, we present the corpus and its PAULA-based architecture.
BibTeX:

@inproceedings{Rodriguez.Dipper.ea_2007,
  author = {Kepa Joseba Rodriguez and Stefanie Dipper and Michael Götze and Massimo Poesio and Giuseppe Riccardi and Christian Raymond and Joanna Rabiega-Wi\'{s}niewska},
  title = {Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus},
  booktitle = {Linguistic Annotation Workshop},
  year = {2007},
  address = {Prague}
}

2006
Christian Raymond, Frédéric Béchet, Renato De Mori & Géraldine Damnati
On the use of finite state transducers for semantic interpretation
Speech Communication
vol. 48 (3-4), pages 288-304, March-April
Abstract: A Spoken Language Understanding (SLU) system is described. It generates hypotheses of conceptual constituents with a translation process. This process is performed by Finite State Transducers (FST) which accept word patterns from a lattice of word hypotheses generated by an Automatic Speech Recognition (ASR) system. FSTs operate in parallel and may share word hypotheses at their input. Semantic hypotheses are obtained by composition of compatible translations under the control of composition rules. Interpretation hypotheses are scored by the sum of the posterior probabilities of paths in the lattice of word hypotheses supporting the interpretation. A compact structured n-best list of interpretation is obtained and used by the SLU interpretation strategy.
BibTeX:

@article{Raymond.Bechet.ea_2006,
  author = {Christian Raymond and Frédéric Béchet and De Mori, Renato and Géraldine Damnati},
  title = {On the use of finite state transducers for semantic interpretation},
  journal = {Speech Communication},
  year = {2006},
  volume = {48},
  number = {3-4},
  pages = {288-304},
  month = {March-April},
  doi = {10.1016/j.specom.2005.06.012}

}

Christophe Servan, Christian Raymond, Frédéric Béchet & Pascal Nocéra
Conceptual decoding from word lattices: application to the spoken dialogue corpus MEDIA
Proceedings of the International Conference on Spoken Language Processing  
Pittsburgh, USA
Abstract: Within the framework of the French evaluation program MEDIA on spoken dialogue systems, this paper presents the methods proposed at the LIA for the robust extraction of basic conceptual constituents (or concepts) from an audio message. The conceptual decoding model proposed follows a stochastic paradigm and is directly integrated into the Automatic Speech Recognition (ASR) process. This approach allows us to keep the probabilistic search space on sequences of words produced by the ASR module and to project it to a probabilistic search space of sequences of concepts. This paper presents the first ASR results on the French spoken dialogue corpus MEDIA, available through ELDA. The experiments made on this corpus show that the performance reached by our approach is better than the traditional sequential approach that looks first for the best sequence of words before looking for the best sequence of concepts.
BibTeX:

@inproceedings{Servan.Raymond.ea_2006,
  author = {Christophe Servan and Christian Raymond and Frédéric Béchet and Pascal Nocéra},
  title = {Conceptual decoding from word lattices: application to the spoken dialogue corpus MEDIA},
  booktitle = {{P}roceedings of the {I}nternational {C}onference on {S}poken {L}anguage {P}rocessing},
  year = {2006},
  address = {Pittsburgh, USA}
}

Christophe Servan, Christian Raymond, Frédéric Béchet & Pascal Nocéra
Décodage conceptuel à partir de graphes de mots sur le corpus de dialogue Homme-Machine MEDIA
Journées d'Études sur la Parole (JEP)  
Juin, Dinard, France
Abstract: Within the framework of the French evaluation program MEDIA on spoken dialogue systems, this paper presents the methods proposed at the LIA for the robust extraction of basic conceptual constituents (or concepts) from an audio message. The conceptual decoding model proposed follows a stochastic paradigm and is directly integrated into the Automatic Speech Recognition (ASR) process. This approach allows us to keep the probabilistic search space on sequences of words produced by the ASR module and to project it to a probabilistic search space of sequences of concepts. The experiments carried on on the MEDIA corpus show that the performance reached by our approach is better than the traditional sequential approach that looks first for the best sequence of words before looking for the best sequence of concepts.
BibTeX:

@inproceedings{Servan.Raymond.ea_2006a,
  author = {Christophe Servan and Christian Raymond and Frédéric Béchet and Pascal Nocéra},
  title = {Décodage conceptuel à partir de graphes de mots sur le corpus de dialogue Homme-Machine MEDIA},
  booktitle = {Journées d'Études sur la Parole (JEP)},
  year = {2006},
  month = {Juin},
  address = {Dinard, France}
}

2005
Christian Raymond
Décodage conceptuel: co-articulation des processus de transcription et compréhension dans les systèmes de dialogue
School: Universite d'Avignon et des Pays du Vaucluse  
Décembre, Avignon
Abstract: In spoken dialog systems, the process of understanding consists in building a semantic representation from some elementary semantic units called concept. We propose in this document a SLU (Spoken Language Understanding) module. First we introduces a conceptual language model for the detection and the extraction of semantic basic concepts from a speech signal. A decoding process is described with a simple example. This decoding process extracts, from a word lattice generated by an Automatic Speech Recognition (ASR) module, a structured n-best list of interpretations (set of concepts). This list contains all the interpretations that can be found in the word lattice, with their posterior probabilities, and the n-best values for each interpretation.
Then we introduces some confidence measures used to estimate the quality of the result of the previous decoding process. Finally, we describes the integration of the proposed SLU module in a dialogue application, involving a decision strategy based on the confidence measures introduced before.
BibTeX:

@phdthesis{Raymond_2005,
  author = {Christian Raymond},
  title = {Décodage conceptuel: co-articulation des processus de transcription et compréhension dans les systèmes de dialogue},
  school = {Universite d'Avignon et des Pays du Vaucluse},
  year = {2005},
  month = {Décembre},
  address = {Avignon}
}

Christian Raymond, Frédéric Béchet, Nathalie Camelin, Renato De Mori & Géraldine Damnati
Semantic Interpretation With Error Correction
Proceedings of the International Conference on Acoustic Speech and Signal Processing  
vol.1, pages 29-32, Philadelphie, USA
Abstract: This paper presents a semantic interpretation strategy, for Spoken Dialogue Systems, including an error correction process. Semantic interpretations output by the Spoken Understanding module may be incorrect, but some semantic components may be correct. A set of situations will be introduced, describing semantic confidence based on the agreement of semantic interpretations proposed by different classification methods. The interpretation strategy considers, with the highest priority, the validation of the interpretation arising from the most likely sequence of words. If the probability, given by our confidence score model, that this interpretation is not correct is high, then possible corrections of it are considered using the other sequences in the N-best lists of possible interpretations. This strategy is evaluated on a dialogue corpus provided by France Telecom R&D and collected for a tourism telephone service. Significant reduction in understanding error rate are obtained as well as powerful new confidence measures.
BibTeX:

@inproceedings{Raymond.Bechet.ea_2005,
  author = {Christian Raymond and Frédéric Béchet and Nathalie Camelin and De Mori, Renato and Géraldine Damnati},
  title = {Semantic Interpretation With Error Correction},
  booktitle = {{P}roceedings of the {I}nternational {C}onference on {A}coustic {S}peech and {S}ignal {P}rocessing},
  year = {2005},
  volume = {1},
  pages = {29-32},
  address = {Philadelphie, USA},
  doi = {10.1109/ICASSP.2005.1415042}

}

2004
Christian Raymond, Fréderic Béchet, Renato De Mori & Géraldine Damnati
Stratégie de décodage conceptuel pour les applications de dialogue oral
XXVième Journées d'Études sur la parole (JEP)  
19-22 Avril, Fès, Maroc
Abstract: The approach proposed in this paper is an alternative to the traditional sequential architecture of Spoken Dialogue Systems where transcribing and understanding a speech signal are two separate processes. By representing all the conceptual structures handled by the Dialogue Manager by Finite State Machines and by building a conceptual model that contains all the possible interpretations at a given dialogue state, we propose a decoding architecture that search first for the best conceptual interpretations before looking for the best strings of words. The output of this process is a structured n-best list of hypotheses, at the concept and word levels. Several confidence measures are then used in order to rescore and select a candidate from this list. This paper reports significant understanding error rate reduction on a tourist inquiry application developped by France Telecom R&D.
BibTeX:

@inproceedings{Raymond.Bechet.ea_2004,
  author = {Christian Raymond and Fréderic Béchet and De Mori, Renato and Géraldine Damnati},
  title = {Stratégie de décodage conceptuel pour les applications de dialogue oral},
  booktitle = {XXVième Journées d'Études sur la parole (JEP)},
  year = {2004},
  month = {19-22 Avril},
  address = {Fès, Maroc}
}

Christian Raymond, Frédéric Béchet, Renato De Mori & Géraldine Damnati
On the Use of Confidence for Statistical Decision in Dialogue Strategies
Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue  
pages 102--107, April 30 - May 1, Cambridge, Massachusetts, USA
Abstract: This paper describes an interpretation and decision strategy that minimizes interpretation errors and perform dialogue actions which may not depend on the hypothesized concepts only, but also on confidence of what has been recognized. The concepts introduced here are applied in a system which integrates language and interpretation models into Stochastic Finite State Transducers (SFST). Furthermore, acoustic, linguistic and semantic confidence measures on the hypothesized word sequences are made available to the dialogue strategy. By evaluating predicates related to these confidence measures, a decision tree automatically learn a decision strategy for rescoring a n-best list of candidates representing a user's utterance. The different actions that can be then performed are chosen according to the confidence scores given by the tree.
BibTeX:

@inproceedings{Raymond.Bechet.ea_2004a,
  author = {Christian Raymond and Frédéric Béchet and De Mori, Renato and Géraldine Damnati},
  title = {On the Use of Confidence for Statistical Decision in Dialogue Strategies},
  booktitle = {{P}roceedings of the 5th SIGdial Workshop on Discourse and Dialogue},
  publisher = {Association for Computational Linguistics},
  year = {2004},
  pages = {102--107},
  month = {April 30 - May 1},
  address = {Cambridge, Massachusetts, USA}
}

Christian Raymond, Fréderic Béchet, Renato De Mori, Géraldine Damnati & Yannick Estève
Automatic learning of interpretation strategies for spoken dialogue systems
Proceedings of the International Conference on Acoustic Speech and Signal Processing  
vol.1, pages 425-428, Montreal, Canada
Abstract: This paper proposes a new application of automatically trained decision trees to derive the interpretation of a spoken sentence. A new strategy for building structured cohorts of candidates is also described. By evaluating predicates related to the acoustic confidence of the words expressing a concept, the linguistic and semantic consistency of candidates in the cohort and the rank of a candidate within a cohort, the decision tree automatically learn a decision strategy for rescoring or rejecting a n-best list of candidates representing a user's utterance. A relative reduction of 18.6% in the Understanding Error Rate is obtained by our rescoring strategy with no utterance rejection and a relative reduction of 43.1% of the same error rate is achieve with a rejection rate of only 8% of the utterances.
BibTeX:

@inproceedings{Raymond.Bechet.ea_2004b,
  author = {Christian Raymond and Fréderic Béchet and De Mori, Renato and Géraldine Damnati and Yannick Estève},
  title = {Automatic learning of interpretation strategies for spoken dialogue systems},
  booktitle = {{P}roceedings of the {I}nternational {C}onference on {A}coustic {S}peech and {S}ignal {P}rocessing},
  year = {2004},
  volume = {1},
  pages = {425-428},
  address = {Montreal, Canada},
  doi = {10.1109/ICASSP.2004.1326013}

}

2003
Yannick Estève, Christian Raymond, Frédéric Béchet & Renato De Mori
Conceptual Decoding for Spoken Dialog Systems
Proceedings of European Conference on Speech Communication and Technology  
pages 3033-3036, Geneva, Switzerland
Abstract: A search methodology is proposed for performing conceptual decoding process. Such a process provides the best sequence of word hypotheses according to a set of conceptual interpretations. The resulting models are combined in a network of Stochastic Finite State Transducers. This approach is a framework that tries to bridge the gap between speech recognition and speech understanding processes. Indeed, conceptual interpretations are generated according to both a semantic representation of the task and a system t belief which evolves according to the dialogue states. Preliminary experiments on the detection of semantic entities (mainly named entities) in a dialog application have shown that interesting results can be obtained even if the Word Error Rate is pretty high.
BibTeX:

@inproceedings{Esteve.Raymond.ea_2003,
  author = {Yannick Estève and Christian Raymond and Frédéric Béchet and De Mori, Renato},
  title = {Conceptual Decoding for Spoken Dialog Systems},
  booktitle = {{P}roceedings of {E}uropean {C}onference on {S}peech {C}ommunication and {T}echnology},
  year = {2003},
  pages = {3033-3036},
  address = {Geneva, Switzerland}
}

Yannick Estève, Christian Raymond, Renato De Mori & David Janiszek
On the use of linguistic consistency in systems for human-computer dialogs
IEEE Transactions on Speech and Audio Processing
vol. 11 (6), pages 746-756, November
Abstract: This paper introduces new recognition strategies based on reasoning about results obtained with different Language Models (LMs). Strategies are built following the conjecture that the consensus among the results obtained with different models gives rise to different situations in which hypothesized sentences have different word error rates (WER) and may be further processed with other LMs. New LMs are built by data augmentation using ideas from latent semantic analysis and trigram analogy. Situations are defined by expressing the consensus among the recognition results produced with different LMs and by the amount of unobserved trigrams in the hypothesized sentence. The diagnostic power of the use of observed trigrams or their corresponding class trigrams is compared with that of situations based on values of sentence posterior probabilities. In order to avoid or correct errors due to syntactic inconsistence of the recognized sentence, automata, obtained by explanationbased learning, are introduced and used in certain conditions. Semantic Classification Trees are introduced to provide sentence patterns expressing constraints of long distance syntactic coherence. Results on a dialogue corpus provided by France Telecom R&D have shown that starting with a WER of 21.87% on a test set of 1422 sentences, it is possible to subdivide the sentences into three sets characterized by automatically recognized situations. The first one has a coverage of 68% with a WER of 7.44%. The second one has various types of sentences with a WER around 20%. The third one contains 13% of the sentences that should be rejected with a WER around 49%. The second set characterizes sentences that should be processed with particular care by the dialogue interpreter with the possibility of asking a confirmation from the user.
BibTeX:

@article{Esteve.Raymond.ea_2003a,
  author = {Yannick Estève and Christian Raymond and De Mori, Renato and David Janiszek},
  title = {On the use of linguistic consistency in systems for human-computer dialogs},
  journal = {{IEEE} {T}ransactions on {S}peech and {A}udio {P}rocessing},
  year = {2003},
  volume = {11},
  number = {6},
  pages = {746-756},
  month = {November},
  doi = {10.1109/TSA.2003.818318}

}

Christian Raymond
Mesures de confiance pour la reconnaissance de la parole dans des applications de dialogue homme-machine
Majecstic  
Octobre, Marseille, France
Abstract: Dans les applications de dialogue homme-machine, l'interprétation sémantique de la phrase exprimée par un utilisateur est effectuée sur la transcription générée par le moteur de reconnaissance de la parole. Cette reconnaissance n'est pas optimale, et la qualité de l'interprétation sémantique de la phrase est bien sur très dépendante de la qualité de la reconnaissance. Ce papier introduit des mesures de confiance sur des hypothèses de reconnaissance de parole afin de pouvoir prédire ou estimer la qualité de la reconnaissance afin d'en informer le module de gestion de dialogue qui pourra choisir le type de stratégie a appliquer : si la phrase a un indice de confiance correct, elle peut être passée au module d'interprétation ; si elle a un indice de confiance très faible, le module de dialogue peut choisir de la rejeter et de demander une répétition a l'utilisateur ; dans les autres cas, des méthodes spécifiques peuvent éventuellement être appliquées afin de tenter de les corriger.
BibTeX:

@inproceedings{Raymond_2003,
  author = {Christian Raymond},
  title = {Mesures de confiance pour la reconnaissance de la parole dans des applications de dialogue homme-machine},
  booktitle = {Majecstic},
  year = {2003},
  month = {Octobre},
  address = {Marseille, France}
}

Christian Raymond, Yannick Estève, Fréderic Béchet, Renato De Mori & Géraldine Damnati
Belief confirmation in Spoken Dialogue Systems using confidence measures
Proceedings IEEE Workshop on Automatic Speech Recognition and Understanding  
St. Thomas, US-Virgin Islands
Abstract: The approach proposed is an alternative to the traditional architecture of Spoken Dialogue Systems where the system belief is either not taken into account during the Automatic Speech Recognition process or included in the decoding process but never challenged. By representing all the conceptual structures handled by the Dialogue Manager by Finite State Machines and by building a conceptual model that contains all the possible interpretations of a given wordgraph, we propose a decoding architecture that searches first for the best conceptual interpretation before looking for the best string of words. Once both N-best sets (at the concept level and at the word level) are generated, a verification process is performed on each N-best set using acoustic and linguistic confidence measures. A first selection strategy that does not include for the moment the Dialogue context is proposed and significant error reduction on the understanding measures are obtained.
BibTeX:

@inproceedings{Raymond.Esteve.ea_2003,
  author = {Christian Raymond and Yannick Estève and Fréderic Béchet and De Mori, Renato and Géraldine Damnati},
  title = {Belief confirmation in Spoken Dialogue Systems using confidence measures},
  booktitle = {{P}roceedings {IEEE} {W}orkshop on {A}utomatic {S}peech {R}ecognition and {U}nderstanding},
  year = {2003},
  address = {St. Thomas, US-Virgin Islands},
  doi = {10.1109/ASRU.2003.1318420}

}

2002
Renato De Mori, Yannick Estève & Christian Raymond
On the use of structures in language models for dialogue
Proceedings of the International Conference on Spoken Language Processing  
pages 929-932, Denver, Colorado, USA
Abstract: The paper describes the combined use of three new language modelling paradigms. They are: generation of plausible trigrams by analogy, explanation-based generation of error-correcting automata, and disambiguation using Semantic Classification Trees. Tangible word error rate reduction is observed by the combined use of these paradigms.
BibTeX:

@inproceedings{DeMori.Esteve.ea_2002,
  author = {De Mori, Renato and Yannick Estève and Christian Raymond},
  title = {On the use of structures in language models for dialogue},
  booktitle = {{P}roceedings of the {I}nternational {C}onference on {S}poken {L}anguage {P}rocessing},
  year = {2002},
  pages = {929-932},
  address = {Denver, Colorado, USA}
}

Yannick Estève, Christian Raymond & Renato De Mori
On the use of structure in language models for dialogue : specific solutions for specific problems
ISCA Tutorial and Research Workshop on Multi-Modal Dialogue in Mobile Environments  
June, Kloster Irsee, Germany
Abstract: Availability of large corpora for training language models to develop dialogue systems is rare. Fortunately, for specific dialogue application, many sentences follow a limited number of typical patterns. In a language like French, frequent errors are due to homophones.Three paradigms are proposed in this paper to rescore a trellis of hypothesized words. They are based on sentence patterns detected in the most likely sentence hypothesized in a first recognition phase.
BibTeX:

@inproceedings{Esteve.Raymond.ea_2002,
  author = {Yannick Estève and Christian Raymond and De Mori, Renato},
  title = {On the use of structure in language models for dialogue : specific solutions for specific problems},
  booktitle = {{ISCA} {T}utorial and {R}esearch {W}orkshop on {M}ulti-{M}odal {D}ialogue in {M}obile {E}nvironments},
  year = {2002},
  month = {June},
  address = {Kloster Irsee, Germany}
}

Christian Raymond, Patrice Bellot & Marc El-Bèze
Enrichissement de requêtes pour la recherche documentaire selon une classification non-supervisée
13ème Congrès Francophone AFRIF-AFIA de Reconnaissance des Formes et d'Intelligence Artificielle (RFIA'2002)  
pages 625 à 632, Janvier, Angers, France
Abstract: Natural language query formulation is a crucial task in the information retrieval (IR) process. Automatic expanding and refining of queries can be realized in different ways : extracting some words from top retrieved documents (retrieval feedback) or from thesauri, computing new query term weights according to top retrieved documents... In this paper, the information retrieval system SIAC is employed to obtain an initial set of documents from a query. Then, a classification method employing unsupervised decision trees (UDTs) is performed to classify the document retrieved sentences according to some words extracted automatically from these documents (some sentences contain the chosen words, some do not). A boolean expression composed of these selected words is directly associated to each decision tree node. This paper shows that expanding queries with the words connected with the best nodes allows to significantly improve retrieval precision.
BibTeX:

@inproceedings{Raymond.Bellot.ea_2002,
  author = {Christian Raymond and Patrice Bellot and Marc El-Bèze},
  title = {Enrichissement de requêtes pour la recherche documentaire selon une classification non-supervisée},
  booktitle = {13ème Congrès Francophone AFRIF-AFIA de Reconnaissance des Formes et d'Intelligence Artificielle (RFIA'2002)},
  year = {2002},
  pages = {625 à 632},
  month = {Janvier},
  address = {Angers, France}
}

2001
Christian Raymond
Réécriture de requêtes pour la recherche documentaire selon une methode de classification à base d'arbres de décision non-supervisé
School: University Of Avignon  
Juin, Marseille, Luminy
Abstract: Une difficulté majeure dans l'utilisation d'un système de recherche documentaire est le choix du vocabulaire à employer pour exprimer une requête. L'enrichissement de la requête peut prendre plusieurs formes : ajout de mots extraits automatiquement des documents rapportes, reestimation des poids attribues à chacun des mots de la requête initiale, etc. Le système de recherche documentaire SIAC est utilise pour extraire un premier jeu de documents à partir d'une requête. Une méthode de classification non supervisée, à base d'arbres de décision, est ensuite exploitée pour classer les phrases des documents trouves et les documents eux-mêmes, selon qu'elles/ils contiennent ou non certains mots extraits automatiquement de l'ensemble des documents rapportés. À chaque nœud de l'arbre, peut être associée une expression booléenne mettant en jeu les mots sélectionnés
lors de la classification. Nous montrons, À l'aide des données de la seconde campagne d'évaluation Amaryllis, que la réécriture de la requête suivant les expressions booléennes correspondant aux meilleures feuilles permet d'améliorer la précision de la recherche documentaire. La réécriture de la requête, au vu de la structure des arbres de décision, amène à se pencher sur le traitement de la négation dans les systèmes de recherche ainsi qu'à une reformulation des critères de pondérations habituellement utilisés.
BibTeX:

@mastersthesis{Raymond_2001,
  author = {Christian Raymond},
  title = {Réécriture de requêtes pour la recherche documentaire selon une methode de classification à base d'arbres de décision non-supervisé},
  school = {University Of Avignon},
  year = {2001},
  month = {Juin},
  address = {Marseille, Luminy}
}