François Coste, Research scientist (CR), Inria

Printer-friendly version

fcInria, Univ Rennes, CNRS, IRISA

Campus de Beaulieu
F-35042 Rennes Cedex
Tel: +33 (0) 2 99 84 74 91
Fax: +33 (0) 2 99 84 71 71


News and highlights


Learning grammars and application to linguistic modelling of biological sequences

Keywords: Grammatical Inference, Machine Learning, Protein Structures and Functions, DNA...


Protomata Learner infers automata to model (even heterogenous) families of protein sequences. The current version, Protomata 2.0 can be used through a web interface on the Genouest Bioinformatics platform server. We are working on version 3.0...

PhD Student

Won the accessit thesis prize from AFIA (2ndex-aequo, read interview) in 2012. Congrats Matthias!


I am currently involved in the following projects:

  • IDEALG Seaweed for the future, ANR Investissements d'avenir, Biotechnology and Bioressource
  • Characterization of desaturases with Pleiade team, IPL Algae in silico

Previous projects:

  • Grammatical inference methods in classification of amyloidogenic proteins with Politechnika Wroclawska, Polland, funded by Polish National Science Center
  • "Omics"-Line of the Chilean CIRIC-Inria Center
  • PEPS project: Characterisation and identification of viral sequences in marine metagenomes
  • ANR Biotempo: Languages, time representations and hybrid models for the analysis of incomplete models in molecular biology
  • ANR LepidOLF: Microgénomique de la sensille phéromonale d’un lépidoptère : une approche novatrice pour comprendre les mécanismes olfactifs et leur modulation
  • ANR Pelican : Competing for light in the ocean: An integrative genomic approach of the ecology, diversity and evolution of cyanobacterial pigment types in the marine environment
  • Collaboration MINCyT (ex SECyT) - INRIA with the  "Grupo de Procesamiento de Lenguaje Natural " of Gabriel Infante-Lopez: Modélisation linguistique de séquences génomiques par apprentissage de grammaires
  • ANR Proteus: Reconnaissance de pli et repliement inverse : vers une prédiction à grande échelle des structures de protéines
  • ANR Modulome: Deciphering and modelling the structural organization of genomes

Grammatical Inference Benchmarks and Competitions

  • I am making up a grammatical inference benchmarks repository (GIB): don't hesitate to contribute with your own data sets, especially real world ones !
  • I am maintaining the Gowachin server, a continuation of the Abbadingo One DFA learning competition, allowing to generate parametrized problems. I have also co-organized Omphalos, the competition on learning context-free languages, which is now over but the data sets are still available... If you are interested in grammatical inference competitions, you should also have a look at: Zulu, Stamina (2010), and PAutomaC (2012) SPiCe (2016).


Older lectures:

Selected publications

(more complete list here)

Primers and reviews

Looking at long distance correlations

Residues coevolution
Protein sequences and structures
Learning context-free grammars

Learning automata

  • CyanoLyase: a database of phycobilin lyase sequences, motifs and functions, Anthony Bretaudeau, François Coste, Florian Humily, Laurence Garczarek, Gildas Le Corguillé, Christophe Six, Morgane Ratin, Olivier Collin, Wendy M Schluchter, Frédéric Partensky. Nucleic Acids Research, Oxford University Press, 2012
  • Learning Automata on Protein Sequences, François Coste and Goulven Kerbellec, JOBIM 2006 (abstract, paper, slides, dataset).
  • A Similar Fragments Merging Approach to Learn Automata on Proteins, François Coste and Goulven Kerbellec, ECML 2005. (abstract, paper, extended version, data sets).
    Some slides presenting this work and more at a grammatical inference workshop: slides, 4 per pages for printing
  • Introducing Domain and Typing Bias in Automata Inference, François Coste, Daniel Fredouille, Christopher Kermorvant and Colin de la Higuera. ICGI 2004. paper (.pdf), slides (.ppt, 2.2MB)
  • Mutually compatible and incompatible merges for the search of the smallest consistent DFA, John Abela, François Coste and Sandro Spina. ICGI 2004. paper (.pdf), slides (.ppt)
  • Unambiguous automata inference by means of state-merging methods. François Coste, Daniel Fredouille, ECML'03. paper (.ps.gz, .pdf) complementary experiments (.ps.gz, .pdf), benchmarks (.tar.gz), slides (.ppt).
    Parsing ambiguity!
  • What is the Search Space for the Inference of Non Deterministic, Unambiguous and Deterministic Automata ? François Coste, Daniel Fredouille, Techn. Report, RR-4907, 2003
  • Efficient ambiguity detection in C-NFA, a step toward inference of non deterministic automata, François Coste, Daniel Fredouille, ICGI 2000, Grammatical inference: algorithms and applications, Lisbonne , 25-38 , september , 2000. paper (.ps.gz, .pdf) benchmark (.tar.gz).
    Classification ambiguity!
  • State merging inference of finite state classifiers, François Coste, INRIA/IRISA, May 1999, report (.ps.gz, .pdf)
  • Regular Inference as a graph coloring problem, François Coste, Jacques Nicolas, ICML97, Grammatical Inference Workshop, Nashville TN, USA, 1997 (.ps, .pdf)

Ph.D. Thesis

Apprentissage d'automates classifieurs en inférence grammaticale, IRISA/Université de Rennes 1, 27 janvier 2000.
Advisor: Jacques Nicolas.


This page is updated on an irregular basis... browse HAL for new publications