ARC FLASH     

Seed Optimisation and Indexing of Genomic Databases      

General
Overview
Intranet
People
Actions
Seed Design
Hardware Synthesis
Application
Contacts
Dominique Lavenier
Partners
Symbiose Project
LIFL bioinfo Group
LESTER lab
LBBMA lab
Related links
ACI ReMIX
West Genopole
      
ARC INRIA

Comparing 700 000 bacterium proteins vs the Human Genome

The Inserm U694 laboratory is involved in the mitochondrial diseases. The strategy is to perform an in-silico study to locate on the human genome potential mitochondrial proteins. As the mitochondry may originate from ancestral bacteria, a systematic comparison with the proteom of all available bacteria must be done.

From a computational point of view, this is equivalent to perform a tblastn treatment of 700,000 proteins against the human genome. The computation time has been estimated to about 1 year on the Inserm U694 server.

Results
A tblastn-like program has been implemented. The indexing sheme is based on blast-like seeds and acts as a reference. The size of the index represent about 40 times the size of the human genome raw data (about 90 Gbytes).

A reconfigurable operator implementaing the time consumming part of the tblastn process has been designed. It houses 160 small dedicated processors working in parallel. With a single ReMIX board, the complete human genome against the bacterial proteom (700 000 proteins) has been processed in 10 days.

Based on the algorithmic enhancements provided by the design of new seeds we can now expect a reduction of 25% on both the computation time and the ReMIX FLASH occupancy. It can also be pointed out that these results can be generalized to standard computers, especially for multicore architectures.

Publications