Symbiose Project Team - INRIA/Irisa © 2007 - 2008

Mardi 15 Février, 14h - Soutenance de Thèse de Matthias Gallé |

Written by Pierre PETERLONGO | |

## Searching for Compact Hierarchical Structures in DNA by means of the Smallest Grammar Problem14h Amphithéatre, INRIA RennesMotivated by the goal of discovering hierarchical structures inside DNA sequences, we address the Smallest Grammar Problem, the problem of finding a smallest context-free grammar that generates exactly one sequence. This NP-Hard problem has been widely studied for applications like Data Compression, Structure Discovery and Algorithmic Information Theory. We propose a formalisation of the Smallest Grammar Problem based on two complementary optimisation problems: the choice of constituents of the final grammar and the choice of how to parse the sequence with these constituents. We name this last problem the ``Minimal Grammar Parsing'' problem and give a polynomial solution for it. This decomposition allows us to define a new complete and correct search space for the Smallest Grammar Problem. Based on this search space, we propose new algorithms able to return grammars 10% smaller than the state of the art on complete genomes. Special care has been taken about the efficiency of the algorithms by considering different equivalence classes of repeats and by introducing an efficient in-place schema to update the suffix array data structure used to compute these words. Regarding the mentioned applications, we consider the impact of the non-uniqueness of smallest grammars on Structure Discovery. We prove that the number of smallest grammars can be exponential in the size of the sequence and then analyse the stability of the discovered structures between minimal grammars for real-life examples. With respect to Data Compression, we extend our algorithms to use rigid patterns as words and achieve compression rate up to 25\ better compared to the previous best DNA grammar-based coder. Jury: Alberto Apostolico (rapporteur) Alexander Clark (rapporteur) François Coste (examinateur) Gabriel Infante-Lopez (directeur de thèse) Jacques Nicolas (directeur de thèse) Eric Rivals (examinateur) |

http://www.irisa.fr/symbiose |
Powered by Joomla! |
Generated:Wed, 23 May 2018 12:47:12 +0200 |