Jacques Nicolas
![]() |
|
|
|
![]() |
| Jacques.Nicolas At irisa.fr | IRISA, room A109 | |||
| Team Leader of Symbiose | Projet Symbiose | |||
| Research fellow at INRIA | Campus de Beaulieu | |||
| F-35042 Rennes Cedex France +33 (0)2 99 84 73 12 |
Research interests
- Bioinformatics
- Algorithmics on sequences
- Syntactical Analysis of Biological Sequences
- Pattern Discovery
- Machine Learning
- Grammatical Inference
- Version spaces, Decision trees, Inductive Logic Programming
- Logic Programming
My background is in Computer Science and Machine Learning.
I have started my research work in 1984 in a team interested in Knowledge Representation (KR), mainly on various logical frameworks. My PhD thesis has focused on the issue of generalization within the framework of Version spaces and with a representation language that was a decidable subset of first order predicate logic (the so-called Bernays Schonfinkel class). The aim was to produce deductively a formula that was a consequence of a given set of formulae (positive and negative instances of a concept to be learned). Generalization was achieved using a set of elementary operators and a dedicated theorem prover. I have continued this work in the field of Artificial Intelligence during several years.
I
have discovered issues of knowledge representation and classification
in Biology with the decisive encounter of J. Lebbe and R. Vignes in
1989 and Molecular Biology and Bioinformatics through a summer school
in Paris in 1998, thanks to people like A. Danchin, A. Henault
and J.-L. Risler. This has been quite a revelation and I am
trying since then to share and pass on this enthusiasm. Bioinformatics
is not only an opportunity to
meet people in many scientific fields and to be introduced in the
richness of the various mechanisms of life: it is also a source of
challenging problems in computer science.
Helping in modelling is a key role of the bioinformatician. My basic
line of research follows the idea that unlike many chemical or physical
processes, the biological mechanisms are largely governed by a logic of
discrete behaviours. This follows from the compact ,
hierarchical architecture of cells and the importance of relations
between components that are characteristics of living organisms.
In such a context, I am convinced that symbolic techniques have to play
a major part in the study of life, wether in combinatorial data
analysis, in machine learning or in automated reasoning.
I am
mainly interested in macromolecular sequences and studying explicit
models relating
sequences to structures or functions. I try to develop the point of
view of
the theory of languages in the analysis of sequences, with the double
aim of formalizing meaningful classes and to give access to the
biologists to the power of expressive languages.
I am particularly in charge of the research axis "Analysis of sequences with formal languages"
in Symbiose. I am interested in syntactical modelling either on nucleic
or proteic sequences. This axis is made up two sections.
The first one studies the formal and practical consequences of considering sequences of proteins in Grammatical Inference.
The aim is to learn relevant characteristic models from sets of
sequences that are known to belong to a target family or on the
contrary, not to belong to this family. I have supervised several
thesis on this topic, including difficult questions like "how to infer
non-deterministic automata, since they seem more adapted to the
expression of biological models than deterministic ones?" (D.
Fredouille), "how to take into account a partially ordered structure on
the subsets of the alphabet during inference, each subset reflecting
some physico-chemical property on amino acids?"(A. Leroux) or "how to
learn non-regular patterns such as contextual structures met in
disulfide bonds in proteins"(I. Jacquemin).
The second part
considers that the construction of the model is in charge of the
biologist and the challenge is then to offer him/her a language of maximal expressivity while allowing whole genome analysis
(billions of letters). Our approach is to compile data into efficient
data structures like generalized suffix trees and to develop parsers on
top of an abstract machine running on this data structure. Our mid term
goal is to develop such a machine at the hardware level, making profit
of the results of the Remix
project. Concerning expressivity, we develop researches on a logical
string variable language, allowing to handle in an abstract way a
string and its transformations.
We have already validated such a
framework on several biological issues: discovery of dog olfactive
receptors, discovery of human beta-defensins or discovery of
transposons in A. thaliana.
Selected Publications
- Applying Complex Models on Genomic Data. P. Durand, D. Lavenier, M. Leborgne, A. Siegel, P. Veber and J. Nicolas, ERCIM News (60), 2005.
- Suffix-Tree ANalyser (STAN): looking for nucleotidic and peptidic patterns in chromosomes, J.Nicolas, P. Durand, G. Ranchy, S. Tempel, and A.-S. Valin Bioinformatics 21(24):4408 -- 4410. 2005.
- Browsing repeats in genomes: Pygram and an application to non-coding region analysis, P. Durand, F. Mahé, A.-S. Valin and J.Nicolas, BMC Bioinformatics, vol. 7, 477, 2006.
- Domain organization within repeated DNA sequences: application to the study of a family of transposable elements., S. Tempel, M. Giraud, D. Lavenier, I.-C. Lerman, A.-S. Valin, I. Couee, A. E. Amrani and J. Nicolas, Bioinformatics, 2006, vol. 22, no 16, p. 1948 – 1954.
- Model-based Identification of Helitrons Results in a New Classification of Their Families in Arabidopsis thaliana, S. Tempel; J. Nicolas; A. El Amrani and I. Couée, Gene, 403(1-2):1299-1305 2007.
Pattern discovery in biological sequences
- Motif discovery on promoter sequences, M Haeussler and J Nicolas Inria Research Report n° 5714, 2005.
- Disulfide bonds prediction using inductive logic programming I Jacquemin and J Nicolas In: Workshop on Constraint Based Methods for Bioinformatics, WCB, Sitges, Spain, pages 56-65 (2005).
- Cooperative metaheuristics for exploring proteomic data. R Gras, D Hernandez, P Hernandez, N Zangger, Y Mescam, J Frey, O Martin, J Nicolas, and R Appel. Artificial Intelligence Review. 20(1):95-120., 2004
- Genome
wide distribution and potential regulatory functions of AtATE, a novel
miniature inverted-repeat transposable element that is present in the
promoter region of one of the Arginine Decarboxylase genes in
Arabidopsis thaliana, A. Elamrani, L. Marie, A. Aïnouche,
J. Nicolas, I. Couée. Molecular
Genetics and Genomics, 267, 2001, p. 459-471. (http)
- A symbolic-numeric approach to find patterns in genomes :
Application to the translation initiation sites of E. coli. C.
Delamarche, P. Guerdoux-Jamet, R. Gras and J. Nicolas, Biochimie, 81,
Elsevier, 1999. (http)
Machine learning applied to
Gene Discovery
- The dog and rat olfactory receptor repertoires, P Quignon, M Giraud, M Rimbault, P Lavigne, S Tacher, E Morin, E Retout, A S Valin, K Lindblad-Toh, J Nicolas, and F Galibert . Genome Biology 6(10):R83. 2005
-
TrackProt: Looking for new Human beta-defensins in whole genomes, with
a syntactical approach J. Nicolas, F. Bourgeon, Y. Bastide, G.
Ranchy , C. Alland, F. Aubry, Y. Mescam, B. Jegou and C.
Pineau. Submitted to Nature Biotechnology 2006.
Metabolomics
-
Threshold extraction in metabolite concentration data
A Floeter, J Nicolas, T Schaub, and J Selbig Bioinformatics 20:1491-1494. 2004.
Theorem proving
La preuve à la lumière de l'intelligence artificielle. Vers un démonstrateur adaptatif C.Belleannée, J. Nicolas and R. Vorc'h, Presses Universitaires de France, 1999.
Grammatical Inference
- Grammatical inference as unification, J.Nicolas, n° 3632, INRIA, jul. 1999. (compressed postscript.)
- How considering incompatible state mergings may
reduce the DFA
induction
search tree, F.Coste, J.Nicolas, Fourth International Colloquium on
Grammatical Inference (ICGI'98),
Ames Iowa, USA, 1998. (abstract,
compressed
postscript)
- Inference of finite automata: reducing the search space with an ordering of pairs of states, F.Coste, J.Nicolas, 10th European Conference on Machine Learning (ECML'98), Chemnitz, Germany, 1998. (abstract, compressed postscript)
- Regular Inference as a graph coloring problem, F.Coste and J.Nicolas, ICML97, Grammatical Inference Workshop, Nashville TN, USA, 1997.postcript, compressed postscript)
Clustering
- Sequence classification of water channels and related proteins in view of functional predictions. Basavanneppa Tallur, Jacques Nicolas, A. Froger, D. Thomas et C. Delamarche, Theoretical chemistry accounts, 1998.
- A method for classifying
unaligned biological sequences. B. Tallur and J. Nicolas, in
IFCS-96: Data Science,
Classification and Related Methods, Springer Verlag, Tokyo, 1997.
- Twelve numerical, symbolic and hybrid supervised classification methods. O. Gascuel, B. Bouchon-Meunier, G. Caraux, P. Gallinari, A. Guénoche, Y. Guermeur, Y. Lechevallier, C. Marsala, L. Miclet, J. Nicolas, R. Nock, M. Ramdani, M. Sebag, Basavanneppa Tallur, G. Venturini et P. Vitte, « », Int. J. of Pattern Recognition and Artificial Intelligence, 12, n° 5, 1998, pages 517-572. (http)
See all publications.
PhD students
- Thibaut Hénin "Conception d’un système hypothético-déductif de planification d’expériences
pour un laboratoire sur puce" (cotutored by Torsten Schaub, Potsdam University, Germany)
Former PhD students
- Catherine Belleannée Vers un démonstrateur de théorèmes adaptatif, jan. 1991
- Raoul Vorc'h Généralisation et abstraction en démonstration automatique feb. 1992
- Francis Courtot CARLA : acquisition et induction sur le matériau compositionnel jan. 1992
Jean-Yves Giordano Inférence de grammaires algébriques jan. 1995
Robin Gras Un outil interactif de recherche de motifs dans les grandes séquences génétiques fondé sur l’arbre des suffixes. dec. 1997
- François Coste Apprentissage d'automates classifieurs en inférence grammaticale, jan. 2000.
- Daniel Fredouille Inférence d'automates finis non déterministes par gestion de l'ambiguïté, en vue d'applications en bioinformatique. oct. 2003
- Aurélien
Leroux "Inférence grammaticale sur
des alphabets ordonnés : application à la
découverte de motifs dans des familles de
protéines" june 2005
- Ingrid
Jacquemin "Inférence
grammaticale et Programmation logique inductive pour la
prédiction de ponts disulfures dans les protéines" dec. 2005
- André
Floeter (cotutored by T. Schaub, Potsdam
University) "Analysing microbiological expression data based on
decision tree induction" jan. 2006
- Yoann Mescam (cotutored by R. Gras, SIB Genève) "Localization of covarying patterns in biological sequences"
- Sébastien
Tempel “Etude d’une
famille d’éléments
transposables à l’échelle d’un génome entier,
Arabidopsis thaliana, par analyse de signatures.”
(cotutored by A. El Amrani, Ecobio,
Université de Rennes) juin 2007
Recent Projects
- Project granted by ANR (French National Agency for Research) from 2006 to 2008. Coordinator: Jacques Nicolas.
- Other involved teams: Laboratoire d'Etude des Parasites Génétiques (LEPG, Tours), Laboratoire de Microbiologie des Environnements Extrêmes (LM2E, Brest) and Laboratoire Dynamique du Génome et Evolution, Institut Jacques Monod (LDGE, Paris).
- Aim: providing methods for the identification, visualization and formal modelling of the structure of genomes in terms of an assembly of nucleotides “modules” that are repeated along a genome or between several genomes. Combined together, these methods will provide an appropriate methodology for a fruitful production of hypotheses concerning genome organizations. The challenge is to allow the biologist to represent and reason on large genomic sequences in an abstract way, by segmenting them into modules and revealing the organization of such modules. The project includes three biological laboratories involved in the study of mobile genetic elements in archae, bacteria and eukayotes, which provide the biological context of this study.
- Contribution of the team: All bioinformatics aspects. Extraction of modules through the specification of a new formalisation of repeats, the flexible maximal repeats and a segmentation algorithm; Development of special purpose architectures for the treatment of such indexes based on reconfigurable devices (FPGA); Conception of a browser for the visualization of modules to help the interpretation of structures emerging from the previous step; Analysis of the organization of modules with a grammatical approach.
Modulome: "Deciphering and modelling the structural organization of genomes."
- European Integrated Project, FP6, Information Society Technology from 2006 to 2010. Coordinators: Rémi Ronchaud, ERCIM and Manolis Tsiknakis, ICS-FORTH.
- Other involved teams: 25 mostly european laboratories, including Forth, University of Amsterdam, Institut Jules Bordet, Swiss Institute of Bioinformatics, Universidad politechnica de Madrid, HealthGrid, University of Oxford and Hokkaido University.
- Aim: to deliver to the cancer research community an integrated Clinico-Genomic ICT environment enabled by a powerful GRID infrastructure. It involves GRID aspects (delivery of a European Biomedical GRID infrastructure offering seamless mediation services for sharing data and data-processing methods), Knowledge representation aspects (ontology based integration of clinical and genomic/proteomic data) and Machine learning/Data Mining aspects (to support and improve knowledge discovery processes from shared data).
- Contribution of the team: parallelism (tumor growth simulation and GRID node), clustering and visualization to help mining of genomic data.
ACGT: "Advancing Clinico-Genomic Clinical Trials on Cancer."
- PICS CNRS project (International cooperation project with United States) from 2006 to 2008. Coordinator: Yves Bigot and Brian Frederici.
- Other involved teams: : Laboratoire d’Etude des Parasites Génétiques (LEPG, Tours), Department of Entomology & Graduate Programs in Genetics & Microbiology, University of California, Riverside
- Aim: analysis of the impact of the viruses with a large double-stranded DNA genome on the evolution of the eukaryotic genomes. The main originality is to analyse a continuum in eukaryotes that is consisted of some viruses, the Ascovirus and the Ichnovirus, and some transposons related to the Helitron and the Tlr elements. In particular, two target goals will be to define the origin of the capsid proteins of the Ichnovirus virions and to analyze the involvement of the Helitron elements in the evolution of the Ichnovirus genomes.
- Contribution of the team: Production of characteristic signatures of genes of virus capsides, Search of mobile elements in genomes (helitrons), Construction of dedicated similarity matrices.
Ichnovirus: Origins of the capside proteins and the dsDNA genome of ensymbiogenic insect viruses : the Ichnoviruses
- ARC INRIA (Concerted Research Action) from 2005 to 2006. Coordinator: Marie-France Sagot.
- Other involved teams: HELIX and MISTIS, Inria Rhône-Alpes, Grenoble; Systems Biology Unit, Pasteur Institute, Paris; BIA and SSB, INRA, Toulouse;
- Aim: Modelling biochemical and evolutionary networks, and analysing the relation between the two.
- Contribution of the team: Pattern discovery in promoter regions.
IBN: "Integrated Biological Networks."
Basic Lab.: Biocellular Assistant on a Silicium Intelligent Chip
- This is a very preliminary project aiming at integrating bioinformatics in lab on chips. The challenge is to control an experimental micro or nano-scaled device with automatic reasoning capacities. See the superb project of R. King and al. on the Robot Scientist Adam. We are interested to try similar approaches: probably more to come here next year...
Teaching
- Master Bioinformatique Université de Rennes : Algorithms on sequences
Bioinformatics and algorithmics on words : University of Potsdam, Germany
Cursus
Position
- Since 2002 Team Leader of Inria Project Symbiose (Bioinformatics, 27 people)
- Since 2002 Head of Bioinformatics for Ouest Genopole (a
consortium of more than 50 public laboratories -mostly biological labs-
for large scale analysis in genomics and post-genomics).
- 1998-2001 Team Leader of Inria Project Aïda (Artificial
Intelligence, Machine Learning and Diagnosis, 34 people)
- 1988-1997 Member of Inria Project Repco (Knowledge representation, Team Leader Philippe Besnard)
- file for my application to an Inria director position
Panels
Member of the Scientific and Research Council of Ouest genopole since jan. 2002
Member of the Scientific and Research Council of department MIA INRA since oct. 2002
Member of the Scientific and Research Council of « Animal Bioinformatics» INRA since janv. 2006.
Member of the program committee of JOBIM and ICGI
Formation
- 1987 : PhD thesis in Computer Science, University of Rennes
Links
| Bioinformatics Platform of Ouest Genopole | |||
| Master de Bioinformatique - Rennes | |||
| Interstices : dossier sur la bioinformatique | |||
| JOBIM, the french conference on bioinformatics | |||
| ISMB, the international conference on bioinformatics | |||
| ICGI, the international conference on grammatical inference |







