Reliability and Security of Large Foundation Models

Publié le mer 10/01/2024 - 17:42

Equipe

TARAN

Site web de l'équipe

https://team.inria.fr/taran/

Date de début de thèse (si connue)

09/2024

Lieu

INRIA Rennes

Unité de recherche

IRISA - UMR 6074

Description du sujet de la thèse

Large Foundation Models (LFMs) are cutting-edge technology for natural language processing, object detection and segmentation, and audio and multimodal processing, outperforming any available machine learning technique. LFMs, such as OpenAI GPT-4, Google ViT, and Meta LLaMA, have gained public attention with their unprecedented accuracy. Given the superior performance of LFMs, they are being deployed in safety-critical and mission-critical applications, including space exploration [1] and self-driving cars [2]. Improving LFMs’ security and reliability is crucial to enable dependable real-time safety-critical systems. Large and complex accelerators like Graphics Processing Units (GPUs) are ideal for deploying LFMs in safety-critical applications. However, GPUs integrated into safety-critical systems must meet specific constraints, including real-time execution and high classification/detection accuracy, even in harsh environments [3, 4]. It is imperative to evaluate whether these critical requirements are met when undesirable events, such as radiation-induced faults and electromagnetic hardware attacks, disrupt correct hardware execution and modify the expected results of the LFMs. This Ph.D. aims to identify hardware and software vulnerabilities in LFM-based systems and propose error mitigation techniques.

The Ph.D. student will characterize the impact of radiation-induced faults and electromagnetic hardware attacks on system reliability and security on GPUs for vision, language processing, and multimodal LFMs. The results will be combined with software simulation data to identify effective hardening solutions. The Ph.D. student will work on developing new fault tolerance approaches tailored for LFMs. Standard fault tolerance techniques may introduce unacceptable overhead. We will conduct a comprehensive fault propagation analysis to propose efficient and effective hardening methods.

Bibliographie

[1] Jakubik, Johannes et al., “Foundation Models for Generalist Geospatial Artificial Intelligence.” preprint, 2023

[2] Fang, Yuxin et al., “EVA-02: A Visual Representation for Neon Genesis.” preprint, 2023

[3] Jon Perez-Cerrolaza, et al., “GPU Devices for Safety-Critical Systems: A Survey.” ACM Comput. Surv. 2023

[4] F. F. d. Santos et al., "Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs," IEEE Transactions on Reliability, 2019

Liste des encadrants et encadrantes de thèse

Kritikakou, Angeliki

Type d'encadrement

Directeur.trice de thèse

Unité de recherche

INRIA

Département

D3 - Architecture

Equipe

TARAN

Fernandes dos Santos, Fernando

Type d'encadrement

Co-encadrant.e

Unité de recherche

INRIA

Département

D3 - Architecture

Equipe

TARAN

Contact·s

Nom

Fernandes dos Santos, Fernando

fernando.fernandes-dos-santos@inria.fr

Mots-clés

large foundation models, reliability, GPUs, machine learning, Transformers, radiation-induced faults