Skip to content
  Projet Symbiose  

Jacques Nicolas

Document Actions
Research scientist in bioinformatics

Logo Irisa

                                
                         
                 
Photo red gerbera
 Jacques.Nicolas At irisa.fr   IRISA, room A109
  Team Leader of Symbiose


  Projet Symbiose
  Research fellow at INRIA   Campus de Beaulieu
 

 F-35042 Rennes Cedex France
+33 (0)2 99 84 73 12

Research interests

  • Bioinformatics
    • Algorithmics on sequences
    • Syntactical Analysis of Biological Sequences
    • Pattern Discovery
  • Machine Learning
    • Grammatical Inference
    • Version spaces, Decision trees, Inductive Logic Programming
  • Logic Programming


My background is in Computer Science and Machine Learning.

I have started my research work in 1984 in a team interested in Knowledge Representation (KR), mainly on various logical frameworks. My PhD thesis has focused on the issue of generalization within the framework of Version spaces and with a representation language that was a decidable subset of first order predicate logic (the so-called Bernays Schonfinkel class). The aim was to produce deductively a formula that was a consequence of a given set of formulae (positive and negative instances of a concept to be learned).  Generalization was achieved using a set of elementary operators and a dedicated theorem prover.  I have continued this work in the field of Artificial Intelligence during several years.

I have discovered issues of knowledge representation and classification in Biology with the decisive encounter of J. Lebbe and R. Vignes in 1989 and Molecular Biology and Bioinformatics through a summer school in Paris in 1998, thanks to  people like A. Danchin, A. Henault and J.-L. Risler.  This has been quite a revelation and I am trying since then to share and pass on this enthusiasm. Bioinformatics is not only an opportunity to meet people in many scientific fields and to be introduced in the richness of the various mechanisms of life: it is also a source of challenging problems in computer science.
Helping in modelling is a key role of the bioinformatician.  My basic line of research follows the idea that unlike many chemical or physical processes, the biological mechanisms are largely governed by a logic of discrete behaviours. This  follows from the compact , hierarchical architecture of cells and the importance of relations between components that are characteristics of living organisms.  In such a context, I am convinced that symbolic techniques have to play a major part in the study of life, wether in combinatorial data analysis, in machine learning or in automated reasoning.
I am mainly interested in macromolecular sequences and studying explicit models relating sequences to structures or functions. I try to develop the point of view of the theory of languages in the analysis of sequences, with the double aim of formalizing meaningful classes and to give access to the biologists to the power of expressive languages.

I am particularly in charge of the research axis "Analysis of sequences with formal languages" in Symbiose. I am interested in syntactical modelling either on nucleic or proteic sequences. This axis is made up two sections.
The first one studies the formal and practical consequences of considering sequences of proteins in Grammatical Inference. The aim is to learn relevant characteristic models from sets of sequences that are known to belong to a target family or on the contrary, not to belong to this family. I have supervised several thesis on this topic, including difficult questions like "how to infer non-deterministic automata, since they seem more adapted to the expression of biological models than deterministic ones?" (D. Fredouille), "how to take into account a partially ordered structure on the subsets of the alphabet during inference, each subset reflecting some physico-chemical property on amino acids?"(A. Leroux) or "how to learn non-regular patterns such as contextual structures met in disulfide bonds in proteins"(I. Jacquemin).
 The second part considers that the construction of the model is in charge of the biologist and the challenge is then to offer him/her a language of maximal expressivity while allowing whole genome analysis (billions of letters). Our approach is to compile data into efficient data structures like generalized suffix trees and to develop parsers on top of an abstract machine running on this data structure. Our mid term goal is to develop such a machine at the hardware level, making profit of the results of the Remix project. Concerning expressivity, we develop researches on a logical string variable language, allowing to handle in an abstract way a string and its transformations.
We have already validated such a framework on several biological issues: discovery of dog olfactive receptors, discovery of human beta-defensins or discovery of transposons in A. thaliana. 

Selected Publications 


Quick overview of the research axes of Symbiose
Linguistic Analysis of biological sequences
The best available tool to date with this level of expressivity for parsing whole genomes

An exploratory tool for repeats mining in genomes and its application to the detection of genomic transfers between viruses and bacteria

Segmentation of genomic sequences into meaningful domains and application on the analysis of mobile genetic elements


Pattern discovery in biological sequences

A review of the state of the art
  •  Disulfide bonds prediction using inductive logic programming I Jacquemin and J Nicolas In: Workshop on Constraint Based Methods for Bioinformatics, WCB, Sitges, Spain, pages 56-65 (2005).
  • Cooperative metaheuristics for exploring proteomic data. R Gras, D Hernandez, P Hernandez, N Zangger, Y Mescam, J Frey, O Martin, J Nicolas, and R Appel. Artificial Intelligence Review. 20(1):95-120.,  2004
  • Genome wide distribution and potential regulatory functions of AtATE, a novel miniature inverted-repeat transposable element that is present in the promoter region of one of the Arginine Decarboxylase genes in Arabidopsis thaliana, A. Elamrani, L. Marie, A. Aïnouche, J. Nicolas, I. Couée.  Molecular Genetics and Genomics, 267, 2001, p. 459-471. (http)
  • A symbolic-numeric approach to find patterns in genomes : Application to the translation initiation sites of E. coli. C. Delamarche, P. Guerdoux-Jamet, R. Gras and J. Nicolas, Biochimie, 81, Elsevier, 1999. (http)

Machine learning  applied to

Gene Discovery
More than 1000 olfactory receptor genes discovered in a non assembled version (36 M sequences) of dog genome.

    TrackProt: Looking for new Human beta-defensins in whole genomes, with a syntactical approach J. Nicolas, F. Bourgeon, Y. Bastide, G. Ranchy , C. Alland, F. Aubry, Y. Mescam, B. Jegou and C. Pineau.   Submitted to Nature Biotechnology 2006.
More than 30 new Human beta-defensins (anti-microbial peptides) have been discovered and validated.
Metabolomics
Theorem proving


Grammatical Inference
A study on grammatical inference in the framework of logic programming
  • How considering incompatible state mergings may reduce the DFA induction search tree, F.Coste, J.Nicolas, Fourth International Colloquium on Grammatical Inference (ICGI'98), Ames Iowa, USA, 1998.  (abstract, compressed postscript)

Clustering
  • Sequence classification of water channels and related proteins in view of functional predictions. Basavanneppa Tallur, Jacques Nicolas, A. Froger, D. Thomas et C. Delamarche, Theoretical chemistry accounts, 1998.
  • A method for classifying unaligned biological sequences. B. Tallur and J. Nicolas, in IFCS-96: Data Science, Classification and Related Methods, Springer Verlag, Tokyo, 1997.
  • Twelve numerical, symbolic and hybrid supervised classification methods. O. Gascuel, B. Bouchon-Meunier, G. Caraux, P. Gallinari, A. Guénoche, Y. Guermeur, Y. Lechevallier, C. Marsala, L. Miclet, J. Nicolas, R. Nock, M. Ramdani, M. Sebag, Basavanneppa Tallur, G. Venturini et P. Vitte, «  », Int. J. of Pattern Recognition and Artificial Intelligence, 12, n° 5, 1998, pages 517-572. (http)

See all publications.

PhD students


  • Thibaut Hénin "Conception d’un système hypothético-déductif de planification d’expériences
    pour un laboratoire sur puce" (cotutored by Torsten Schaub, Potsdam University, Germany)

Former PhD students

  • Catherine Belleannée   Vers un démonstrateur de théorèmes adaptatif, jan. 1991
  • Raoul Vorc'h Généralisation et abstraction en démonstration automatique feb. 1992
  • Francis Courtot  CARLA : acquisition et induction sur le matériau compositionnel jan. 1992
  • Jean-Yves Giordano  Inférence de grammaires algébriques jan. 1995

  • Robin Gras  Un outil interactif de recherche de motifs dans les grandes séquences génétiques fondé sur l’arbre des suffixes. dec. 1997

  • François Coste     Apprentissage d'automates classifieurs en inférence grammaticale,  jan. 2000.
  • Daniel Fredouille Inférence d'automates finis non déterministes par gestion de l'ambiguïté, en vue d'applications en bioinformatique. oct. 2003
  • Aurélien Leroux  "Inférence grammaticale sur des alphabets ordonnés : application à la découverte de motifs dans des familles de protéines"  june 2005
  • Ingrid Jacquemin "Inférence grammaticale et Programmation logique inductive pour la prédiction de ponts disulfures dans les protéines"  dec. 2005
  • André Floeter  (cotutored by T. Schaub, Potsdam University) "Analysing microbiological expression data based on decision tree induction"  jan. 2006
  • Yoann Mescam (cotutored by R. Gras, SIB Genève) "Localization of covarying patterns in biological sequences"
  • Sébastien Tempel “Etude d’une famille d’éléments transposables à l’échelle d’un génome entier, Arabidopsis thaliana, par analyse de signatures.” (cotutored by A. El Amrani, Ecobio, Université de Rennes)  juin 2007

Recent Projects

    Modulome: "Deciphering and modelling the structural organization of genomes." 

      • Project granted by ANR (French National Agency for Research) from 2006 to 2008. Coordinator: Jacques Nicolas.
      • Other involved teams: Laboratoire d'Etude des Parasites Génétiques (LEPG, Tours), Laboratoire de Microbiologie des Environnements Extrêmes (LM2E, Brest) and Laboratoire Dynamique du Génome et Evolution, Institut Jacques Monod (LDGE, Paris).
      • Aim: providing methods for the identification, visualization and formal modelling of  the structure of genomes in terms of an assembly of nucleotides “modules” that are repeated along a genome or between several genomes. Combined together, these methods will provide an appropriate methodology for a fruitful production of hypotheses concerning genome organizations. The challenge is to allow the biologist to represent and reason on large genomic sequences in an abstract way, by segmenting them into modules and revealing the organization of such modules. The project includes three biological laboratories involved in the study of mobile genetic elements in archae, bacteria and eukayotes, which provide the biological context of this study.
      • Contribution of the team: All bioinformatics aspects. Extraction of modules through the specification of a new formalisation of repeats, the flexible maximal repeats and a segmentation algorithm; Development of special purpose architectures for the treatment of such indexes based on reconfigurable devices (FPGA); Conception of a browser for the visualization of modules to help the interpretation of structures emerging from the previous step; Analysis of the organization of modules with a grammatical approach.

    ACGT: "Advancing Clinico-Genomic Clinical Trials on Cancer."  ACGT_6.jpg

      • European Integrated Project, FP6, Information Society Technology from 2006 to 2010. Coordinators: Rémi Ronchaud, ERCIM and Manolis Tsiknakis, ICS-FORTH.
      • Other involved teams: 25 mostly european laboratories, including Forth, University of Amsterdam, Institut Jules Bordet, Swiss Institute of Bioinformatics, Universidad politechnica de Madrid, HealthGrid, University of Oxford and Hokkaido University.
      • Aim: to deliver to the cancer research community an integrated Clinico-Genomic ICT environment enabled by a powerful GRID infrastructure. It involves GRID aspects (delivery of a European Biomedical GRID infrastructure offering seamless mediation services for sharing data and data-processing methods), Knowledge representation aspects (ontology based integration of clinical and genomic/proteomic data) and Machine learning/Data Mining aspects (to support and improve knowledge discovery processes from shared data).
      • Contribution of the team: parallelism (tumor growth simulation and GRID node), clustering and visualization to help mining of genomic data.

    Ichnovirus: Origins of the capside proteins and the dsDNA genome of ensymbiogenic insect viruses : the Ichnoviruses

      • PICS CNRS project (International cooperation project with United States) from 2006 to 2008. Coordinator: Yves Bigot and Brian Frederici.
      • Other involved teams: : Laboratoire d’Etude des Parasites Génétiques (LEPG, Tours), Department of Entomology & Graduate Programs in Genetics & Microbiology, University of California, Riverside
      • Aim: analysis of the impact of the viruses with a large double-stranded DNA genome on the evolution of the eukaryotic genomes. The main originality is to analyse a continuum in eukaryotes that is consisted of some viruses, the Ascovirus and the Ichnovirus, and some transposons related to the Helitron and the Tlr elements. In particular, two target goals will be to define the origin of the capsid proteins of the Ichnovirus virions and to analyze the involvement of the Helitron elements in the evolution of the Ichnovirus genomes.
      • Contribution of the team: Production of characteristic signatures of genes of virus capsides, Search of mobile elements in genomes (helitrons), Construction of dedicated similarity matrices.

    IBN: "Integrated Biological Networks."  ACGT_6.jpg

      • ARC INRIA (Concerted Research Action) from 2005 to 2006. Coordinator: Marie-France Sagot.
      • Other involved teams: HELIX and MISTIS, Inria Rhône-Alpes, Grenoble; Systems Biology Unit, Pasteur Institute, Paris; BIA and SSB, INRA, Toulouse;
      • Aim: Modelling biochemical and evolutionary networks, and analysing the relation between the two.
      • Contribution of the team: Pattern discovery in promoter regions.

Basic Lab.: Biocellular Assistant on a Silicium Intelligent Chip

  • This is a very preliminary project aiming at integrating bioinformatics in lab on chips. The challenge is to control an experimental micro or nano-scaled device with automatic reasoning capacities. See the superb project of R. King and al. on the Robot Scientist Adam. We are interested to try similar approaches: probably more to come here next year...

Teaching

  • Master Bioinformatique Université de Rennes : Algorithms on sequences
  • Bioinformatics and algorithmics on words : University of Potsdam, Germany

Cursus

Position

  • Since 2002 Team Leader of Inria Project Symbiose (Bioinformatics, 27 people)
  • Since 2002 Head of Bioinformatics for Ouest Genopole (a consortium of more than 50 public laboratories -mostly biological labs- for large scale analysis in genomics and post-genomics).
  • 1998-2001 Team Leader of Inria Project Aïda (Artificial Intelligence, Machine Learning and Diagnosis, 34 people)
  • 1988-1997 Member of Inria Project Repco (Knowledge representation, Team Leader Philippe Besnard)
  • file for my application to an Inria director position

Panels

Member of the Scientific and Research Council of Ouest genopole since jan. 2002
Member of the Scientific and Research Council of department MIA INRA since oct. 2002
Member of the Scientific and Research Council  of « Animal Bioinformatics» INRA since janv. 2006.

Member of the program committee of JOBIM and  ICGI

Formation

  • 1987 : PhD thesis in Computer Science, University of Rennes

Links


  • OUEST-Genopole
   Bioinformatics Platform of Ouest Genopole

  •  
  Master de Bioinformatique - Rennes

  •  
  Interstices : dossier sur la bioinformatique

  •  
  
JOBIM, the french conference on bioinformatics



ISMB, the international conference on bioinformatics

  • icgi06.png

ICGI, the international conference on grammatical inference

Created by eretout
Last modified 21.03.2008 02:12 PM