HELENOS - Resisting to Massive proliferation of new Android malware threats

Publié le mar 30/01/2024 - 16:38

Equipe

WIDE

Site web de l'équipe

https://team.inria.fr/wide/

Date de début de thèse (si connue)

Octobre

Lieu

Rennes

Unité de recherche

IRISA - UMR 6074

Description du sujet de la thèse

Context. Android is now the most used operating system with 863.3 million applications with a rate
of more than 50 000 submissions a month. Estimations indicate that more than 75 billions applications
were downloaded on the platform in 2016. Consequently, due to its widespread popularity, the Android
platform has become a lucrative target for hackers.
Hence Android constitutes one of the first choice platform to propagate malware threats. Infection
rate on Android devices is constantly increasing spawned out by a dramatic proliferation of malware.
Nowadays there are no satisfactory solutions to stop the proliferation of malware over Android devices.
It constitutes a severe threat to any businesses. It may interrupt and disable applications, retrieved
and spoofed personal information and identity, access sensitive information, control all applications
executing on users’ device, and even overcharge users for functionality that’s widely available.

Previous work. The last decades, to increase the accuracy of malware detection, a huge amount of classification models based on either machine or deep learning methods have been proposed [1, 2, 3].
In previous works, different test-bed to evaluate the true robustness of anti-malware scanners through
the design of benchmarks and key indicators with the use of innovative measurements techniques have
emerged[4, 5]. In particular, to automate the robustness evaluation of anti-malware scanners against
the steady increase of malware variants, toolchains to generate Android adversarial examples (AAEs)
dedicated to Android has demonstrated the strong inability of existing scanners to detect sophisticated
malware that have mutated. In other terms, current classification models have difficulties to detect
AAEs.

Objective. Our objective is first to evaluate the true robustness of anti-malware scanners. Since we have
at our disposal adequate benchmarks and evaluation methodology, the objective of this thesis is to
study the implementation of new defense techniques and to evaluate them.
Transformers, a category of attention-based deep learning techniques, have gained substantial re-
cognition in the last years, and have shown to be very effective in various domains such as language
modeling, machine translation, audio classification, computer vision, object detection, semantic seg-
mentation, and image classification with Vision Transformers models (ViT) [6, 7].
Accordingly, the purpose of this project is to study how to apply Transformers to the specific case
of Android malware detection. Current anti-malware scanners are built around traditional machine
learning (ML) algorithms [8, 9]. They usually require intermediate feature extraction phases to allow
characterization of the malware, via for instance static code analysis. Particularly, deep expertise in
both software engineering and Android code development is required to extract the adequate features
to give as input to the machine learning models beforehand.
Transformers have proven to be particularly effective in performing automatic feature learning
from structured and non-structured data from various domains potentially paving the way for a
substantial improvement in the robustness of scanners. There are different paths to explore : should
we rather apply Transformers model on the source code of an application, or its bytecode assembly
or directly on its binary ?[10, 11, 12]. Another way to explore could be to project the code of an
application to an image and then to analyze this image[13]

Bibliographie

[1] T. Huang et H. Kao. “R2-D2 : ColoR-inspired Convolutional NeuRal Network (CNN)-based
AndroiD Malware Detections”. In : 2018 IEEE International Conference on Big Data (Big
Data). Los Alamitos, CA, USA : IEEE Computer Society, déc. 2018, p. 2633-2642. doi : 10.
1109/BigData.2018.8622324. url : https://doi.ieeecomputersociety.org/10.1109/
BigData.2018.8622324.
[2] Lok Kwong Yan et Heng Yin. “DroidScope : Seamlessly Reconstructing the OS and Dalvik
Semantic Views for Dynamic Android Malware Analysis”. In : 21st USENIX Security Symposium
(USENIX Security 12). Bellevue, WA : USENIX Association, août 2012, p. 569-584. isbn : 978-
931971-95-9. url : https://www.usenix.org/conference/usenixsecurity12/technical-
sessions/presentation/yan.
[3] Feargus Pendlebury et al. “TESSERACT : Eliminating Experimental Bias in Malware Classi-
fication across Space and Time”. In : 28th USENIX Security Symposium (USENIX Security 19).
Santa Clara, CA : USENIX Association, août 2019, p. 729-746. isbn : 978-1-939133-06-9. url :
https://www.usenix.org/conference/usenixsecurity19/presentation/pendleb….
[4] Yizheng Chen, Zhoujie Ding et David Wagner. “Continuous Learning for Android Malware

Detection”. In : 32nd USENIX Security Symposium (USENIX Security 23). Anaheim, CA :
USENIX Association, août 2023, p. 1127-1144. isbn : 978-1-939133-37-3. url : https://www.
usenix.org/conference/usenixsecurity23/presentation/chen-yizheng.
[5] Yun Shen, Pierre-Antoine Vervier et Gianluca Stringhini. “A Large-scale Temporal Mea-
surement of Android Malicious Apps : Persistence, Migration, and Lessons Learned”. In : 31st
USENIX Security Symposium (USENIX Security 22). Boston, MA : USENIX Association, août
2022, p. 1167-1184. isbn : 978-1-939133-31-1. url : https://www.usenix.org/conference/
usenixsecurity22/presentation/shen-yun.
[6] Gyeong-In Yu et al. “Orca : A Distributed Serving System for Transformer-Based Generative
Models”. In : 16th USENIX Symposium on Operating Systems Design and Implementation,
OSDI 2022, Carlsbad, CA, USA, July 11-13, 2022. Sous la dir. de Marcos K. Aguilera et
Hakim Weatherspoon. USENIX Association, 2022, p. 521-538. url : https://www.usenix.
org/conference/osdi22/presentation/yu.
[7] Ashish Vaswani et al. “Attention is All you Need”. In : Advances in Neural Information Proces-
sing Systems 30 : Annual Conference on Neural Information Processing Systems 2017, December
4-9, 2017, Long Beach, CA, USA. Sous la dir. d’Isabelle Guyon et al. 2017, p. 5998-6008. url :
https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1…-
Abstract.html.
[8] Niall McLaughlin et al. “Deep Android Malware Detection”. In : Proceedings of the Seventh
ACM Conference on Data and Application Security and Privacy, CODASPY 2017, Scottsdale,
AZ, USA, March 22-24, 2017. Sous la dir. de Gail-Joon Ahn, Alexander Pretschner et
Gabriel Ghinita. ACM, 2017, p. 301-308. doi : 10.1145/3029806.3029823. url : https:
//doi.org/10.1145/3029806.3029823.
[9] Junyang Qiu et al. “A Survey of Android Malware Detection with Deep Neural Models”. In :
t. 53. 6. New York, NY, USA : Association for Computing Machinery, déc. 2020. doi : 10.
1145/3417978. url : https://doi.org/10.1145/3417978.
[10] Akshara Ravi, Vivek Chaturvedi et Muhammad Shafique. “ViT4Mal : Lightweight Vision
Transformer for Malware Detection on Edge Devices”. In : t. 22. 5s. New York, NY, USA :
Association for Computing Machinery, sept. 2023. doi : 10 . 1145 / 3609112. url : https :
//doi.org/10.1145/3609112.
[11] Kexin Pei et al. “XDA : Accurate, Robust Disassembly with Transfer Learning”. In : 28th
Annual Network and Distributed System Security Symposium, NDSS 2021, virtually, February
21-25, 2021. The Internet Society, 2021. url : https://www.ndss-symposium.org/ndss-
paper/xda-accurate-robust-disassembly-with-transfer-learning/.
[12] Sheng Yu et al. “DeepDi : Learning a Relational Graph Convolutional Network Model on Ins-
tructions for Fast and Accurate Disassembly”. In : 31st USENIX Security Symposium (USENIX
Security 22). Boston, MA : USENIX Association, août 2022, p. 2709-2725. isbn : 978-1-939133-
31-1. url : https://www.usenix.org/conference/usenixsecurity22/presentation/yu-
sheng.
[13] A. Cortesi. A. Cortesi. Binvis. http://Binvis.io/. http://Binvis.io/. 2021