François Coste, Research scientist (CR1), Inria

Printer-friendly versionSend by email

fcDyliss team, Irisa / Inria Rennes - Bretagne Atlantique 

Campus de Beaulieu, 35042 Rennes Cedex, France.
Tel: +33 (0) 2 99 84 74 91
Fax: +33 (0) 2 99 84 71 71



  • The chapter Learning the Language of Biological Sequences, with other nice chapters in the book Topics in Grammatical Inference edited by Jeffrey Heinz and José Sempere, is now available. Following a talk given for the 10th anniversary of ICGI (ICGI'10), it reviews advances on modeling biological sequences, from Pattern/Motif Discovery to Grammatical Inference, trying to help intuition with practical examples (kind of O'REILLY book's style). Don't hesitate to send me feedback or corrections!


Learning grammars and application to linguistic modelling of biological sequences

Keywords: Grammatical Inference, Machine Learning, Protein Structures and Functions, DNA...


Protomata Learner infers automata to model (even heterogenous) families of protein sequences. The new version, Protomata 2.0, is available: you can use it through a web interface on the Genouest Bioinformatics platform server.

Grammatical Inference Benchmarks and Competitions

  • I am making up a grammatical inference benchmarks repository (GIB): don't hesitate to contribute with your own data sets, especially real world ones !
  • I am maintaining the Gowachin server, a continuation of the Abbadingo One DFA learning competition, allowing to generate parametrized problems. I have also co-organized Omphalos, the competition on learning context-free languages, which is now over but the data sets are still available... If you are interested in grammatical inference competitions, you should also have a look at: Zulu, Stamina (2010), and PAutomaC (2012) SPiCe (2016).

PhD Students

Won the accessit thesis prize from AFIA (2ndex-aequo, read interview) in 2012. Congrats Matthias!


I am currently involved in the following projects:

  • IDEALG Seaweed for the future, ANR Investissements d'avenir, Biotechnology and Bioressource
  • "Omics"-Line of the Chilean CIRIC-Inria Center

Previous projects:

  • PEPS project: Characterisation and identification of viral sequences in marine metagenomes
  • ANR Biotempo: Languages, time representations and hybrid models for the analysis of incomplete models in molecular biology
  • ANR LepidOLF: Microgénomique de la sensille phéromonale d’un lépidoptère : une approche novatrice pour comprendre les mécanismes olfactifs et leur modulation
  • ANR Pelican : Competing for light in the ocean: An integrative genomic approach of the ecology, diversity and evolution of cyanobacterial pigment types in the marine environment
  • Collaboration MINCyT (ex SECyT) - INRIA with the  "Grupo de Procesamiento de Lenguaje Natural " of Gabriel Infante-Lopez: Modélisation linguistique de séquences génomiques par apprentissage de grammaires
  • ANR Proteus: Reconnaissance de pli et repliement inverse : vers une prédiction à grande échelle des structures de protéines
  • ANR Modulome: Deciphering and modelling the structural organization of genomes


Selected publications

(more complete list here)

Primers and reviews

Looking at long distance correlations

Protein structures
Learning context-free grammars

Learning automata

  • CyanoLyase: a database of phycobilin lyase sequences, motifs and functions, Anthony Bretaudeau, François Coste, Florian Humily, Laurence Garczarek, Gildas Le Corguillé, Christophe Six, Morgane Ratin, Olivier Collin, Wendy M Schluchter, Frédéric Partensky. Nucleic Acids Research, Oxford University Press, 2012
  • Learning Automata on Protein Sequences, François Coste and Goulven Kerbellec, JOBIM 2006 (abstract, paper, slides, dataset).
  • A Similar Fragments Merging Approach to Learn Automata on Proteins, François Coste and Goulven Kerbellec, ECML 2005. (abstract, paper, extended version, data sets).
    Some slides presenting this work and more at a grammatical inference workshop: slides, 4 per pages for printing
  • Introducing Domain and Typing Bias in Automata Inference, François Coste, Daniel Fredouille, Christopher Kermorvant and Colin de la Higuera. ICGI 2004. paper (.pdf), slides (.ppt, 2.2MB)
  • Mutually compatible and incompatible merges for the search of the smallest consistent DFA, John Abela, François Coste and Sandro Spina. ICGI 2004. paper (.pdf), slides (.ppt)
  • Unambiguous automata inference by means of state-merging methods. François Coste, Daniel Fredouille, ECML'03. paper (.ps.gz, .pdf) complementary experiments (.ps.gz, .pdf), benchmarks (.tar.gz), slides (.ppt).
    Parsing ambiguity!
  • What is the Search Space for the Inference of Non Deterministic, Unambiguous and Deterministic Automata ? François Coste, Daniel Fredouille, Techn. Report, RR-4907, 2003
  • Efficient ambiguity detection in C-NFA, a step toward inference of non deterministic automata, François Coste, Daniel Fredouille, ICGI 2000, Grammatical inference: algorithms and applications, Lisbonne , 25-38 , september , 2000. paper (.ps.gz, .pdf) benchmark (.tar.gz).
    Classification ambiguity!
  • State merging inference of finite state classifiers, François Coste, INRIA/IRISA, May 1999, report (.ps.gz, .pdf)
  • Regular Inference as a graph coloring problem, François Coste, Jacques Nicolas, ICML97, Grammatical Inference Workshop, Nashville TN, USA, 1997 (.ps, .pdf)

Ph.D. Thesis

Apprentissage d'automates classifieurs en inférence grammaticale, IRISA/Université de Rennes 1, 27 janvier 2000.
Advisor: Jacques Nicolas.


This page is updated on an irregular basis... browse HAL for new publications