Home
Scientific Axes
Members
Publications
Software
Collaborations
Activity Report
Seminars
Positions





Learning grammars on genomic sequences Print
Written by François COSTE   

Learning grammars on genomic sequences

The position has been fulfilled.

Internship subject:

Using a linguistic approach for modeling genomic sequences has been advocated for a long time by David Searls [1]. Models may sometimes be designed by experts. In the team, we study how to automatically design these models by machine learning and we have proposed a successful approach for learning automata on protein sequences [2,3]. The subject of the internship is to study how this approach can be extended to learn more expressive grammars [4,5,6] allowing to model more easily long distance correlations. The proposed algorithm will be implemented and tested on real genomic datasets.

Keywords: Machine learning, Bioinformatics, Formal Grammars

Duration: 6 months

Prerequisites: Master studies in computer science or equivalent (this is a research subject: applicants should be able to continue with a PhD thesis after the internship)

Application: Elligible students to INRIA internship have to apply through this program but don't hesitate to contact This e-mail address is being protected from spam bots, you need JavaScript enabled to view it

François Coste


Bibliography:

[1] The language of genes, David Searls, Nature, 2002.

[2] Learning automata on protein sequences, François Coste and Goulven Kerbellec, JOBIM 2006.

[3] Apprentissage d'automates modélisant des familles de séquences protéiques, Goulven Kerbellec, Computer Science PhD Thesis, Université de Rennes 1, June 2008

[4] Polynomial identification in the limit of substitutable context-free languages, Alexander Clark and Rémi Eyraud, Journal of Machine Learning Research, August 2007.

[5] Comparing two unsupervised grammar induction systems: Alignment-Based Learning vs. EMILE, Menno van Zaanen and Pieter Adriaans, Technical Report: TR2001.05

[6] Unsupervised learning of natural languages, Zach Solan, David Horn, Eytan Ruppin, and Shimon Edelman, in Proc. Natl. Acad. Sci., August, 2005
 
< Prev   Next >

Symbiose Project Team - INRIA/Irisa © 2007 - 2008