GENSCALE : Scalable, Optimized and Parallel Algorithms for Genomics
The GenScale team works in close connection with biologist colleagues to propose algorithms and their implementations to process large genomic data generated by DNA sequencing technologies. Those data are error-prone, scattered, and massive (terabytes of sequences generated within a few days). In this context, GenScale members focus on three main axes:
Analyzing complex features
The team proposes novel approaches to detect genomic variants and to precisely assemble the genome or the chromosome sequences. The ultimate goal is to obtain one sequence per sequenced chromosome or species, together with their associated variations. Techniques are based on algorithms on strings, on graph analyses, on data representation, on linear programming, and ASP solvers.
Exploring and Querying
To scale up the amount of data to be treated, the team proposes new methodological solutions based on advanced data-structures to index and screen large genomic databanks, enabling the detection of specific markers attached to diseases, to the genomics analysis of thousands of full genomes, and to the analyses of gut microbiomes. Techniques are based on data indexation, data correction, and again on algorithms on strings and graphs.
Explore the problem of archiving large volumes of data on DNA molecules, involving problematics such as the development of specific DNA file system, error-correcting codes, information security, DNA synthesis, DNA sequencing, data genomic treatment, etc.