Towards Efficient foundation models for VHR satellite images

Publié le
Equipe
Date de début de thèse (si connue)
Septembre/Octobre 2024
Lieu
UBS (Université Bretagne Sud)/IRISA (UMR 6074), campus in Vannes 56000, France
Unité de recherche
IRISA - UMR 6074
Description du sujet de la thèse

Contexte :
Remote sensing imagery for Earth observation (EO) has emerged as a dynamic research area, enabling precise identification, characterization, and interpretation of objects and materials on the Earth’s surface. The ongoing progress in satellite technology has led to the availability of numerous very-high-resolution (VHR) optical satellites, facilitating daily acquisitions. This enables the creation of highly detailed maps with sub-meter spatial resolution, benefiting various essential EO applications such as urban planning, swift disaster mapping, natural resource management, and wildlife monitoring.

In recent years, deep learning (DL) has found success in various machine learning and computer vision domains, including remote sensing (RS). Despite this, applying DL to real-world scenarios using VHR satellite images for operational purposes faces numerous challenges. The foremost challenge involves the difficulty of annotating domain-specific data, particularly in EO applications that demand expert knowledge. Generating precise and comprehensive labeled datasets for training deep models is a time-consuming and expensive endeavor. For instance, in rapid disaster mapping, acquiring accurate labels is nearly unfeasible due to the infrequent and unique nature of catastrophic events.
Another challenge comes from the significant domain shifts inherent in RS data, arising from diverse sensor characteristics (i.e., spatial resolution and spectral bands) and varied acquisition conditions. Lastly, the exploding size of deep models, with millions (or even billions) of parameters, raises concerns. Not only do they demand substantial computational and storage resources, but they also cause negative environmental issues.

Therefore, designing efficient models while maintaining high accuracy becomes essential in every EO application to reduce energy cost and more importantly, to minimize the environmental impact. These models are expected to be reusable or transferred with low resources. Recent studies have showed that self-supervised pretraining with unlabeled RS images outperforms popular ImageNet-pretrained models in EO downstream tasks, especially when labels are scarce. By leveraging self-supervised learning (SSL) on the abundance of multi-source unlabeled data, foundational models (FMs) have started their era by providing high performance on a wide range of downstream tasks. However, current trends of FMs in EO only focus on creating large vision FMs using substantial multi-source images (RingMo Billion-scale ViT), or large vision-language FMs (RemoteCLIP ). These models demand significant computational resources for training and deployment. Therefore, developing resource-efficient foundation models in the context of EO is imperative to mitigate environmental concerns in the future.

Sujet :
This PhD topic aims to develop efficient foundation models with a focus on EO applications using VHR satellite imagery. The main objectives are the three-folds (please see the details in the attached file or in the below link).
http://www-obelix.irisa.fr/files/2024/02/2024_PhD_IRISA_CNES_Temo.pdf

Bibliographie

[1] Berg, P., Pham, M. T., & Courty, N. (2022). Self-supervised learning for scene classification in remote sensing: Current state of the art and perspectives. Remote Sensing, 14(16), 3995.

[2] Wang, Y., Albrecht, C. M., Ait Ali Braham, N., Mou, L., & Zhu, X. X. (2022). Self-supervised Learning in Remote Sensing: A Review. IEEE Geoscience and Remote Sensing Magazine (GRSM).

[3] Sun, X., Wang, P., Lu, W., Zhu, Z., Lu, X., He, Q., ... & Fu, K. (2022). RingMo: A remote sensing foundation model with masked image modeling. IEEE Transactions on Geoscience and Remote Sensing.

[4] Cha, K., Seo, J., & Lee, T. (2023). A billion-scale foundation model for remote sensing images. arXiv preprint arXiv:2304.05215.

[5] Liu, F., Chen, D., Guan, Z., Zhou, X., Zhu, J., & Zhou, J. (2023). RemoteCLIP: A Vision Language Foundation Model for Remote Sensing. arXiv preprint arXiv:2306.11029.

[6] Mai, G., Huang, W., Sun, J., Song, S., Mishra, D., Liu, N., ... & Lao, N. (2023). On the opportunities and challenges of foundation models for geospatial artificial intelligence. arXiv preprint arXiv:2304.06798.

[7] Xiong, Z., Wang, Y., Zhang, F., & Zhu, X. X. (2024). One for All: Toward Unified Foundation Models for Earth Vision. arXiv preprint arXiv:2401.07527.

Liste des encadrants et encadrantes de thèse

Nom, Prénom
Sébastien Lefèvre
Type d'encadrement
Directeur.trice de thèse
Unité de recherche
IRISA

Nom, Prénom
Minh-Tan Pham
Type d'encadrement
Co-encadrant.e
Unité de recherche
IRISA
Contact·s
Mots-clés
Foundation models, Self-supervised learning, knowledge distillation, VHR images