Scientific Axes
Activity Report

Jacques NICOLAS PDF Print
Article Index
Page 2

jnicolas Mail Jacques.Nicolas[at]iris

Address Symbiose - Room A 109
  INRIA/Irisa - Campus de Beaulieu
  35042 RENNES Cedex - France
Tel +33 2 99 84 73 12
Fax +33 2 99 84 71 71
Current Position Senior Research Scientist at INRIA


Research interests

  • Bioinformatics
    • Algorithmics on sequences
    • Syntactical Analysis of Biological Sequences
    • Pattern Discovery
  • Machine Learning
    • Grammatical Inference
    • Version spaces, Decision trees, Inductive Logic Programming
  • Logic Programming
    • Prolog
    • Answer Set Programming

My background is in Computer Science and Machine Learning.

I have started my research work in 1984 in a team interested in Knowledge Representation (KR), mainly on various logical frameworks. My PhD thesis has focused on the issue of generalization within the framework of Version spaces and with a representation language that was a decidable subset of first order predicate logic (the so-called Bernays Schonfinkel class). The aim was to produce deductively a formula that was a consequence of a given set of formulae (positive and negative instances of a concept to be learned). Generalization was achieved using a set of elementary operators and a dedicated theorem prover. I have continued this work in the field of Artificial Intelligence during several years.

I have discovered issues of knowledge representation and classification in Biology with the decisive encounter of J. Lebbe and R. Vignes in 1989 and Molecular Biology and Bioinformatics through a summer school in Paris in 1998, thanks to people like A. Danchin, A. Henault and J.-L. Risler. This has been quite a revelation and I am trying since then to share and pass on this enthusiasm. Bioinformatics is not only an opportunity to meet people in many scientific fields and to be introduced in the richness of the various mechanisms of life: it is also a source of challenging problems in computer science.

Helping in modelling is a key role of the bioinformatician. My basic line of research follows the idea that unlike many chemical or physical processes, the biological mechanisms are largely governed by a logic of discrete behaviours. This follows from the compact, hierarchical architecture of cells and the importance of relations between components that are characteristics of living organisms. In such a context, I am convinced that symbolic techniques have to play a major part in the study of life, wether in combinatorial data analysis, in machine learning or in automated reasoning. I am mainly interested in macromolecular sequences and studying explicit models relating sequences to structures or functions. I try to develop the point of view of the theory of languages in the analysis of sequences, with the double aim of formalizing meaningful classes and to give access to the biologists to the power of expressive languages.


I am particularly in charge of the research axis "Analysis of sequences with formal languages" in Symbiose. I am interested in syntactical modeling either on nucleic or proteic sequences. This axis is made up two sections.

The first one studies the formal and practical consequences of considering sequences of proteins in Grammatical Inference. The aim is to learn relevant characteristic models from sets of sequences that are known to belong to a target family or on the contrary, not to belong to this family. I have supervised several thesis on this topic, including difficult questions like "how to infer non-deterministic automata, since they seem more adapted to the expression of biological models than deterministic ones?" (D. Fredouille), "how to take into account a partially ordered structure on the subsets of the alphabet during inference, each subset reflecting some physico-chemical property on amino acids?"(A. Leroux) or "how to learn non-regular patterns such as contextual structures met in disulfide bonds in proteins"(I. Jacquemin).

The second part considers that the construction of the model is in charge of the biologist and the challenge is then to offer him/her a language of maximal expressivity while allowing whole genome analysis (billions of letters). Our approach is to compile data into efficient data structures like suffix arrays and to develop parsers on top of an abstract machine running on this data structure. Concerning expressivity, we develop researches on a logical string variable language, allowing to handle in an abstract way a string and its transformations. We have already validated such a framework on several biological issues: discovery of dog olfactive receptors, discovery of human beta-defensins or discovery of transposons in A. thaliana.


Selected Publications

Researches in Computational biology in Europe

Quick overview of the research axes of Symbiose
Linguistic Analysis of biological sequences The successor of Stan, Logol, is now available on the Genouest web site and is the best available tool to date with this level of expressivity for parsing whole genomes.
We are also interested by the lexical level and the detection of genomic repeats (modules) within structures such as transposable elements or Crisprs.

An exploratory tool for repeats mining in genomes and its application to the detection of genomic transfers between viruses and bacteria.

Segmentation of a family of genomic sequences into meaningful domains applied to the analysis of mobile genetic elements


Habilitation Document (HDR, in french) Papers submitted
  • Local and Maximal Repeats J. Nicolas; C. Rousseau; A. Siegel; P. Peterlongo; F. Coste; P. Durand; S. Tempel; A.-S. Valin; F. Mahé.

Pattern discovery in biological sequences A review of the state of the art
  • Disulfide bonds prediction using inductive logic programming I Jacquemin and J Nicolas In: Workshop on Constraint Based Methods for Bioinformatics, WCB, Sitges, Spain, pages 56-65 (2005).
  • Cooperative metaheuristics for exploring proteomic data. R Gras, D Hernandez, P Hernandez, N Zangger, Y Mescam, J Frey, O Martin, J Nicolas, and R Appel. Artificial Intelligence Review. 20(1):95-120., 2004
  • Genome wide distribution and potential regulatory functions of AtATE, a novel miniature inverted-repeat transposable element that is present in the promoter region of one of the Arginine Decarboxylase genes in Arabidopsis thaliana, A. Elamrani, L. Marie, A. Aïnouche, J. Nicolas, I. Couée. Molecular Genetics and Genomics, 267, 2001, p. 459-471. (http)
  • A symbolic-numeric approach to find patterns in genomes : Application to the translation initiation sites of E. coli. C. Delamarche, P. Guerdoux-Jamet, R. Gras and J. Nicolas, Biochimie, 81, Elsevier, 1999. (http)

Machine learning applied to

Gene Discovery
More than 1000 olfactory receptor genes discovered in a non assembled version (36 M sequences) of dog genome.
    TrackProt: Looking for new Human beta-defensins in whole genomes, with a syntactical approach J. Nicolas, F. Bourgeon, Y. Bastide, G. Ranchy , C. Alland, F. Aubry, Y. Mescam, B. Jegou and C. Pineau.
More than 30 new Human beta-defensins (anti-microbial peptides) have been discovered and validated.

Theorem proving

Grammatical Inference
A study on grammatical inference in the framework of logic programming
  • How considering incompatible state mergings may reduce the DFA induction search tree, F.Coste, J.Nicolas, Fourth International Colloquium on Grammatical Inference (ICGI'98), Ames Iowa, USA, 1998. (abstract, compressed postscript)

  • Sequence classification of water channels and related proteins in view of functional predictions. Basavanneppa Tallur, Jacques Nicolas, A. Froger, D. Thomas et C. Delamarche, Theoretical chemistry accounts, 1998.
  • A method for classifying unaligned biological sequences. B. Tallur and J. Nicolas, in IFCS-96: Data Science, Classification and Related Methods, Springer Verlag, Tokyo, 1997.
  • Twelve numerical, symbolic and hybrid supervised classification methods. O. Gascuel, B. Bouchon-Meunier, G. Caraux, P. Gallinari, A. Guénoche, Y. Guermeur, Y. Lechevallier, C. Marsala, L. Miclet, J. Nicolas, R. Nock, M. Ramdani, M. Sebag, Basavanneppa Tallur, G. Venturini et P. Vitte, « », Int. J. of Pattern Recognition and Artificial Intelligence, 12, n° 5, 1998, pages 517-572. (http)

See all publications.


White Flower


Former PhD students

  • Catherine Belleannée Vers un démonstrateur de théorèmes adaptatif, jan. 1991
  • Raoul Vorc'h Généralisation et abstraction en démonstration automatique feb. 1992
  • Francis Courtot CARLA : acquisition et induction sur le matériau compositionnel jan. 1992
  • Jean-Yves Giordano Inférence de grammaires algébriques jan. 1995

  • Robin Gras Un outil interactif de recherche de motifs dans les grandes séquences génétiques fondé sur l’arbre des suffixes. dec. 1997

  • François Coste Apprentissage d'automates classifieurs en inférence grammaticale, jan. 2000.
  • Daniel Fredouille Inférence d'automates finis non déterministes par gestion de l'ambiguïté, en vue d'applications en bioinformatique. oct. 2003
  • Aurélien Leroux Inférence grammaticale sur des alphabets ordonnés : application à la découverte de motifs dans des familles de protéines june 2005
  • Ingrid Jacquemin Inférence grammaticale et Programmation logique inductive pour la prédiction de ponts disulfures dans les protéines dec. 2005
  • André Floeter (cotutored by T. Schaub, Potsdam University) "Analysing microbiological expression data based on decision tree induction" jan. 2006
  • Yoann Mescam (cotutored by R. Gras, SIB Genève) "Localization of covarying patterns in biological sequences"
  • Sébastien Tempel Etude d’une famille d’éléments transposables à l’échelle d’un génome entier, Arabidopsis thaliana, par analyse de signatures. (cotutored by A. El Amrani, Ecobio, Université de Rennes) juin 2007

Recent Projects

      • Project granted by ANR (French National Agency for Research) from 2006 to 2008. Coordinator: Jacques Nicolas.
      • Other involved teams: Laboratoire d'Etude des Parasites Génétiques (LEPG, Tours), Laboratoire de Microbiologie des Environnements Extrêmes (LM2E, Brest) and Laboratoire Dynamique du Génome et Evolution, Institut Jacques Monod (LDGE, Paris).
      • Aim: providing methods for the identification, visualization and formal modelling of the structure of genomes in terms of an assembly of nucleotides “modules” that are repeated along a genome or between several genomes. Combined together, these methods will provide an appropriate methodology for a fruitful production of hypotheses concerning genome organizations. The challenge is to allow the biologist to represent and reason on large genomic sequences in an abstract way, by segmenting them into modules and revealing the organization of such modules. The project includes three biological laboratories involved in the study of mobile genetic elements in archae, bacteria and eukayotes, which provide the biological context of this study.
      • Contribution of the team: All bioinformatics aspects. Extraction of modules through the specification of a new formalisation of repeats, the flexible maximal repeats and a segmentation algorithm; Development of special purpose architectures for the treatment of such indexes based on reconfigurable devices (FPGA); Conception of a browser for the visualization of modules to help the interpretation of structures emerging from the previous step; Analysis of the organization of modules with a grammatical approach.
  • Modulome: "Deciphering and modelling the structural organization of genomes."

      • European Integrated Project, FP6, Information Society Technology from 2006 to 2010. Coordinators: Rémi Ronchaud, ERCIM and Manolis Tsiknakis, ICS-FORTH.
      • Other involved teams: 25 mostly european laboratories, including Forth, University of Amsterdam, Institut Jules Bordet, Swiss Institute of Bioinformatics, Universidad politechnica de Madrid, HealthGrid, University of Oxford and Hokkaido University.
      • Aim: to deliver to the cancer research community an integrated Clinico-Genomic ICT environment enabled by a powerful GRID infrastructure. It involves GRID aspects (delivery of a European Biomedical GRID infrastructure offering seamless mediation services for sharing data and data-processing methods), Knowledge representation aspects (ontology based integration of clinical and genomic/proteomic data) and Machine learning/Data Mining aspects (to support and improve knowledge discovery processes from shared data).
      • Contribution of the team: parallelism (tumor growth simulation and GRID node), clustering and visualization to help mining of genomic data.
  • ACGT: "Advancing Clinico-Genomic Clinical Trials on Cancer." ACGT_6.jpg

Basic Lab.: Biocellular Assistant on a Silicium Intelligent Chip

  • This is a very preliminary project aiming at integrating bioinformatics in lab on chips. The challenge is to control an experimental micro or nano-scaled device with automatic reasoning capacities. See the superb project of R. King and al. on the Robot Scientist Adam. We are interested to try similar approaches: probably more to come here next year...


  • Master Bioinformatique Université de Rennes : Algorithms on sequences
  • Bioinformatics and algorithmics on words : University of Potsdam, Germany



  • Since 2002 Team Leader of Inria Project Symbiose (Bioinformatics, 27 people)
  • Since 2002 Head of Bioinformatics for Ouest Genopole (a consortium of more than 50 public laboratories -mostly biological labs- for large scale analysis in genomics and post-genomics).
  • 1998-2001 Team Leader of Inria Project Aïda (Artificial Intelligence, Machine Learning and Diagnosis, 34 people)
  • 1988-1997 Member of Inria Project Repco (Knowledge representation, Team Leader Philippe Besnard)


  • Member of the Scientific and Research Council of Ouest genopole since jan. 2002
  • Member of the Scientific and Research Council of department MIA INRA since oct. 2002
  • Member of the Scientific and Research Council of « Animal Bioinformatics» INRA since janv. 2006
  • Member of the program committee of JOBIM and ICGI


  • 1987 : PhD thesis in Computer Science, University of Rennes


 Daisy Flower


  • Biogenouest
  Bioinformatics Platform of Ouest Genopole
  Master de Modélisation des systèmes biologiques - Rennes
  Interstices : dossier sur la bioinformatique
  • jobim2012.jpg
  JOBIM , the french conference on bioinformatics
Do not miss the next edition in RENNES !
  • logoismb.jpg
  ISMB, the international conference on bioinformatics
  • icgi08.png
  ICGI , the international conference on grammatical inference


Symbiose Project Team - INRIA/Irisa © 2007 - 2008