Extreme compression of image/video databases using GAN-based synthesis

Publié le
Equipe
Date de début de thèse (si connue)
1er Octobre 2021
Lieu
Rennes
Unité de recherche
IRISA - UMR 6074
Description du sujet de la thèse

Everyday 2.5 quintillion bytes of data are produced. Every minute, 500 hours of video are uploaded on Youtube and 147 000 pictures on Facebook. There is clearly an urgent need for coding algorithms with drastic compression ratio. Unfortunately, the current compression formats are limited by the fact that they still aim at minimizing a distortion loss criterion. Said differently, they try to represent the images/videos with the minimum amount of bits while ensuring that the decoded version is as close as possible to the input one. A second interesting path is nowadays explored, with the advent of powerful learning methods. In such approaches, an image is rather described abstractly and then partly/fully synthesized at the decoder. For example, the coder proposed in [1] enables to reach 1/10 of the rate compared to classical method reaching the optimal perception trade-off derived in [2]. This approach has however be developed in the context of single image compression only.

In the thesis, we propose to study how such approaches could be envisaged for the compression of large image and video databases. This would enable to reach even more drastic compression gains, by exploiting the inter-item correlation in the semantic domain (i.e., feature space). Some items may even be removed [4] or completely re synthesized.

This raises several important research problems. First, the method [1] should be tailored to videos and multiple input items. Second, some efficient algorithms should be studied for the compaction of the latent space (i.e., removing inter-items correlation). This latent space compression may however bring errors in the latent representation of the database. The latent space should thus be shaped in order to take into account this error for keeping high quality synthesis [3]. Finally, the impact of the database compression on the perceived information loss should be properly estimated. In other words, the joint information carried by multiple image/videos should be modeled at the semantic level [4].

Bibliographie

[1] Agustsson, E., Tschannen, M., Mentzer, F., Timofte, R., & Gool, L. V. (2019). Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE International Conference on Computer Vision (pp. 221-231).

[2] Blau, Y., & Michaeli, T. (2018). The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 6228-6237).

[3] Alireza Makhzani et al. “Adversarial Autoencoders”, 2015

[4] T. Maugey, L. Toni, Large Database Compression Based on Perceived  Information,  in IEEE Signal Processing letters, vol .7, pp 1735 – 1739, Sep. 2020

Liste des encadrants et encadrantes de thèse

Nom, Prénom
Thomas MAUGEY
Type d'encadrement
Directeur.trice de thèse
Unité de recherche
UMR 6074
Contact·s
Mots-clés
Image and video processing, compression, GAN