AI for Computational Protein Design: bridging symbolic and numerical IA

Séminaire

Date de début

jeu 12/11/2020 - 10:30

Date de fin

jeu 12/11/2020 - 10:30

Lieu

Webminaire

Salle

IRISA Rennes (Aurigny, Sicile, Sardaigne), IRISA Lannion, et webinaire

Orateur

Thomas Schiex (INRAE, Toulouse)

Département principal

D7 - Gestion des données et de la connaissance

As a person working on automated reasoning, logic and constraints in Artificial Intelligence but also as a computational biologist at INRAE, I quickly realized that logic alone was not ideal to model and solve the computational problems I faced as an INRAE researcher. For this reason, our team quickly worked on the AI side to extend the “propositional automated reasoning” techniques that reason on absolutely certain statements, to more flexible formulations using “weighted” information. This lead to the birth of several dedicated fundamental algorithms and their implementation in our dedicated solver “toulbar2”. Toulbar2 is now one of the most efficient solver in his area, and is capable of mixing logical and numerically weighted information rigorously, solving complex puzzles that combine pure logical knowledge with more vague information, including probabilistic statements.

I will show how, in the last 8 years, we have put toulbar2 to the task of designing new proteins targeting a predefined biochemical/biological function, or more precisely a predefined 3-dimensional structure, standing of the shoulders of giants that designed biophysical force-fields and rotamer libraries, capturing slowly acquired and permanently improved bio-physical and statistical knowledge on proteins. This protein design problem, using a rigid backbone target, a catalog of discrete conformations for amino acid side-chains and a pairwise decomposable force-field such as AMBER, CHARMM or Rosetta score functions, is known to be NP-hard.

For a reason we do not really understand, toulbar2 shines on these problems. Contrarily to the usual Monte Carlo based methods, it is able to find and prove that the solution it has found is optimal for the force-field used on problems of non trivial sizes. It also allows to rigorously satisfy non-trivial designer constraints. It outperforms other guaranteed optimization tools we have tried and was even able to show the limits of an optimized Monte-Carlo method in Rosetta (IPD, Univ. Washington). Recent comparisons of toulbar2 with D-Wave quantum annealing hardware (by Rosetta team members) also show its good relative performances. Thanks to this, we have shown it is also capable of dealing with backbone flexibility, at least when the aim is to design a protein sequence that should fit several “states”. We have put all these techniques to work in practice with structural biologist colleagues, designing new self-assembling proteins, antibody or enzymes.

As I will show, for a few years now, we have upgraded this pure force-field and design target constraints based approach with Machine Learned information extracted from multiple sequence alignments, allowing to refine the force-field for a given suitable design structure, bridging the gap between data, machine learning information, thermodynamic and logic information, which perfectly fits the usual “designer” situation. Going back to AI, we have showed that this approach is also able to learn how to play the Sudoku, without knowing the rules, just from images of solved grids, better than “Deep Neural Net”-friendly appraoches, while providing understandable and customizable learned rules.