Detecting and visualizing anomalies in heterogeneous network events: Modeling events as graph structures and detecting communities and novelties with machine learning

Type de soutenance
Thèse
Date de début
Date de fin
Lieu
IRISA Rennes
Salle
INRIA- Salle Michel Métivier (C024)
Orateur
Laetitia Leichtnam (CIDRE)
Sujet

la soutenance se déroulera entièrement en visioconférence :  https://imt-atlantique.webex.com/imt-atlantique/j.php?MTID=md9cf52286af114f1500094bed49bec2d

-----------

According to the National Institute of Standards and Technologies (NIST), to ensure the security of an information system it is required to identify threats, protect preventively the system, detect security incidents, respond to attacks and recover from them. Operationally speaking, Security Operational Centers (SOCs) are teams dedicated to the detection, response, and recovery. Their security analysts rely on intrusion detection and analysis tools.

In this thesis, we propose to help security analysts in their tasks by proposing a new approach to detect and display network anomalies. The goal of this thesis is twofold: detecting any security breach in real-time and, in addition, allowing a post-mortem analysis of the techniques used by the attackers.

A first difficulty lies in building a model to represent the various kinds of information the analysts have to handle. In particular, it is useful to represent security data in a way that ensures that the information is both machine-readable, for automatic treatment, and human-readable, for analysis by a human expert. In response to these objectives, we propose a data representation model based on a graph structure. To handle the very heterogeneous data types we have to consider, we rely on knowledge graphs, that allow semantic linking of diverse information.

Once the model in hand, we propose two automatic treatments. The first one focuses on the relations between the pieces of information represented by the link of the knowledge graph model. Using community detection, we select sub-graphs representing events that are strongly related to an alert or an IoC and thus relevant for forensic analysis. This brings information to the analyst to explain the alert or the IoC. The second automatic treatment we propose consists in applying novelty detection to the graph, in order to realize an anomaly-based intrusion detection system. While traditional approaches in anomaly detection need a large volume of normal and anomalous data to build a good learning model, novelty detection techniques need little or no anomalous data. The difficulty here is to feed the novelty detection algorithm with a graph structure. We indeed rely on a machine learning algorithm named autoencoder, an unsupervised learning technique that does not take a graph but a vector as input. We thus propose a transformation of the graph into a vector, encoding both information contained in the nodes and information related to the structure of the graph (links between nodes). Evaluations on CICIDS 2017 and 2018 datasets show that graph structures representation of security data handled by an autoencoder gives results that are better than common anomaly detection methods, even those based on supervised learning. Notice that our results are good both relatively to the detection rate (no or almost no false negatives) and for the false alert rate (very low amount of false positives).

Even being able to minimize the number of false positives, reducing the cost of alerts interpre- tation by analysts is also needed. The goal is here to provide the analyst with a representation of security-relevant data that reduces the time and efforts required to analyze alerts. In response, we propose an immersive visualization of the graph representation in 3D. The visualization highlights the relations between security elements and malicious events or IOCs. It gives a good starting point to the analysts to explore the data and reconstruct a global attack scenario.

To sum up, the general objective of this thesis being to evaluate the interest of graph structures in the field of security data analysis, we propose an end-to-end approach consisting in a unified view of the network data in the form of graphs, a community discovery system, an unsupervised anomaly detection system and a visualization of the data in the form of graphs.

 

Composition du jury
Hervé Debar, Professeur à Télécom SudParis -rapporteur
Davide Balzarotti, Professeur des université à Eurecom Graduate, rapporteur.
François Lesueur, Enseignant-Chercheur à INSA - examinateur
Christine Morin, Senior scientis chez Inria - présidente
Anaël Beaugnon, Machine Learning Scientist for Computer Security chez ANSSI - examinatrice
Olivier Bettan, Head of Cyber Security R&D Lab chez Thales - invité