Algorithms based on k-mers for ancient oral metagenomics: Tools for contamination removal and assessment in palaeometagenomics

Starting on
Ending on
IRISA Rennes
Camila Duitama-Gonzalez (Institut Pasteur)

Palaeometagenomics is the study of ancient genetic material by using metagenomic sequencing, a process that entails the characterization of the DNA from all the organisms in a sample. By ancient genetic material we refer to the DNA that comes from a non-living source and that shows signs of molecular degradation. Dental calculus has proven to be an exceptionally rich source of ancient DNA (aDNA) and it has been used to investigate the evolution of the oral microbiome, as well as human oral health and diet. Despite the establishment of rigorous laboratory protocols for aDNA contamination control, aDNA samples are still highly susceptible to contamination from environmental sources, which can drastically alter the microbial composition and lead to erroneous conclusions after downstream analyses. This dissertation proposes two algorithms that rely on k-mers (sub-sequences of DNA) to address two relevant challenges in the field of palaeometagenomics: contamination assessment via Microbial Source Tracking and contamination removal at the read level. The former task resulted in a first-author publication and an open-software called decOM, while the latter has also been published as a first-author paper accompanied by an open-software called aKmerBroom. Both methods were tested on ancient oral metagenomic data, yet their utility can be extended to samples that do not originate from ancient oral sources. Overall, this thesis has proven that k-mer-based algorithms have an immense potential for contamination removal and contamination assessment of metagenomes, as they leverage the wealth of metagenomic information that has been sequenced and made publicly available throughout the years.

For internal attendees

Symbiose seminars :…