Platforms and toolboxes

Printer-friendly versionSend by email

A goal of the team is to facilitate interplays between tools for biological data analysis and integration. Our tools aim at guiding the user to progressively reduce the space of models (families of sequences of genes or proteins, families of keys actors involved in a system response, dynamical models) which are compatible with both knowledge and experimental observations.

Most of our tools are developed in collaboration with the GenOuest resource and data center hosted in the IRISA laboratory, including their computer facilities.

Data integration and query

AskOmics - Integration and interrogation software for linked biological data based on semantic web technologies. 

AskOmics aims at bridging the gap between end user data and the Linked (Open) Data cloud. It allows heterogeneous bioinformatics data (formatted as tabular files or directly in RDF) to be loaded into a Triple Store system using a user-friendly web interface. AskOmics also provides an intuitive graph-based user interface supporting the creation of complex queries that currently require hours of manual searches across tens of spreadsheet files. The elements of interest selected in the graph are then automatically converted into a SPARQL query that is executed on the users's data.

Integrative biology : building static maps of biological networks with logical paradigms

AuReMe - Tracable reconstruction of metabolic networks

The workspace AuReMe allows for the Automatic Reconstruction of Metabolic networks based on the combination of multiple heterogeneous data and knowledge sources. It is available as a Docker image. Five modules are composing AuReMe:

  • The Model-managementPADmet module allows manipulating and tracing all metabolic data via a local database.
  • The Meneco python package allows the gaps of a metabolic network to be filled by using a topological approach that implements a logical programming method to solve a combinatorial problem.
  • The Menetools python package performs topological analyses on metabolic networks. It can compute reachable compounds, producible targets, propositions of co-factors for target production and paths from a set of available compounds to targets.
  • The Shogen python package allows genome and metabolic network to be aligned in order to identify genome units which contain a large density of genes coding for enzymes; it also implements a logical programming approach
  • The manual curation assistancePADmet module allows the reported metabolic networks and their metadata to be curated.
  • The Wiki-exportPADmet module enables the export of the metabolic network and its functional genomic unit as a local wiki platform allowing a user-friendly investigation.

FINGOC-tools - Filtering interaction networks with graph-based optimization criteria

In this series of tools, our goal is to offer methods for the reconstruction and the large-scale analysis of interaction graphs in order to elucidate the main regulators of an observed phenotype. Each tool offers the possibility to filter a graph according to diferent criteria (explaining all mutual expression shared by genes in several experimentations, making consistent a graph with expression data, favor specific patterns). Interactions may be either built by the tools for sequence analysis or deduced from the litterature. The tools are developed in different environments (python packages to virtual machines) because they require dedicated and specific accompanying tools.

  • The lombarde package enables the filtering of transcription-factor/binding-site regulatory networks with mutual information reported by the response to environmental perturbations. The high level of false-positive interactions is filters according to graph-based criteria. Knowledge about regulatory modules such as operons or the output of the shogen package can be taken into account.
  • The KeyRegulatorFinder package allows searching key regulators of lists of molecules (like metabolites, enzymes or genes) by taking advantage of knowledge databases in cell metabolism and signaling. The complete information is transcribed into a large-scale interaction graph which is filtered to report the most significant upstream regulators of the considered list of molecules.
  • The powerGrasp python package provides an implementation of graph compression methods oriented toward visualization, and based on power graph analysis. [package].
  • The iggy package enables the repairing of an interaction graph with respect to expression data. It proposes a range of different operations for altering experimental data and/or a biological network in order to re-establish their mutual consistency, an indispensable prerequisite for automated prediction.

Systems biology: modeling the dynamical response of families of models

Caspo - Studying synchronous boolean networks

Caspo (Cell ASP Optimizer) constitutes a pipeline for automated reasoning on logical signaling networks. The main underlying issue is that inherent experimental noise is considered, so that many different logical networks can be compatible with a set of experimental observations. It is available as a Docker container. Five modules are composing Caspo:

  • The Caspo-learn module performs an automated inference of logical networks from experimental data allows for identifying admissible large-scale logic models saving a lot of efforts and without any a priori bias.
  • The Caspo-classify, predict and visualize modules allows for classifying a family of boolean networks with respect to their input-output predictions.
  • The Caspo-design module designs experimental perturbations which would allow for an optimal discrimination of rival models in a family of boolean networks.
  • The Caspo-control module identifies key-players of a family of networks: it computes robust intervention strategies that force a set of target species or compounds into a desired steady state.
  • The Caspo-timeseries module to take into account time-series observation datasets in the learning procedure.

Cadbiom - Building and analyzing the asynchronous dynamics of enriched logical networks

Based on guarded transition semantic, the Cadbiom software provides a formal framework to help the modeling of biological systems such as cell signaling network. It allows  synchronization events to be investigated  in biological networks. It is available as a Docker image. Three modules are composing Cadbiom:

  • The Cadbiom graphical interface is useful to build and study moderate size models. It provides exploration, simulation and checking. For large-scale models, Cadbiom also allows to focus on specific nodes of interest.
  • The Cadbiom API allows a model to be loaded, performing static analysis and checking temporal properties on a finite horizon in the future or in the past.
  • Exploring large-scale knowledge repositories. The translations of the large-scale PID repository (about 10,000 curated interactions) have been translated into the Cadbiom formalism.

Sequence analysis : modeling sequences with formal grammars

Logol - Complex pattern modelling and matching

The Logol toolbox is a swiss-army-knife for pattern matching on DNA/RNA/Protein sequences, using a high-level grammatical formalism to permit a large expressivity for patterns. A Logol pattern can consist in a complex combination of motifs (such as degenerated strings) and structures (such as imperfect stem-loop ou repeats).  Logol key features are the possibilities to divide a pattern description into several sub-patterns, to model long range dependencies, to enable the use of ambiguous models or to permit the inclusion of negative conditions in a pattern definition.

  • The Graphical designer allows a user to iteratively build a complex pattern based on basic graphical patterns. The associated grammar file is an export of the graphical designer.
  • The LogolMatch parser takes as input a biological (nucleic or amino acid) sequence and a grammar file (i.e. a pattern). It combines a grammar analyzer, a sequence analyzer and a prolog Library. It returns a file containing all the occurrences of the pattern in the sequence with their parsing details.
  • Full genome analysis, and connection to biological databases have been made available recently.

Protomata - Expressive pattern discovery

Protomata is a machine learning suite for the inference of automata characterizing (functional) families of proteins from available sequences. Based on a new kind of alignment, said partial and local, precise characterizations of the families – beyond the scope of classical sequence patterns such as PSSM, Profile HMM, or Prosite Patterns – can be learnt and used to predict new family members with a high specificity.

The three main modules integrated in the protomata-learner workflow are also available as stand-alone programs:

  • Paloma builds partial local multiple alignments;
  • Protobuild infers automata from these alignements;
  • Protomatch and Protoalign score and align new sequences with the automata.