Image compression with Stochastic Winner-Take-All Auto-Encoder
 

"Image compression with Stochastic Winner-Take-All Auto-Encoder", ICASSP 2017. [pdf]
contact: T. Dumas, A. Roumy, C. Guillemot

Abstract

This paper addresses the problem of image compression using sparse representations. We propose a variant of auto-encoder called Stochastic Winner-Take-All Auto-Encoder (SWTA AE). "Winner-Take-All" means that image patches compete with one another when computing their sparse representation and "Stochastic" indicates that a stochastic hyperparameter rules this competition during training. Unlike auto-encoders, SWTA AE performs variable rate image compression for images of any size after a single training, which is fundamental for compression. For comparison, we also propose a variant of Orthogonal Matching Pursuit (OMP) called Winner-Take-All Orthogonal Matching Pursuit (WTA OMP). In terms of rate-distortion trade-off, SWTA AE outperforms auto-encoders but it is worse than WTA OMP. Besides, SWTA AE can compete with JPEG in terms of rate-distortion.

Stochastic Winner-Take-All Auto-Encoder (SWTA AE)

Stochastic Winner-Take-All Auto-Encoder (SWTA AE) is a type of auto-encoder. An auto-encoder is a neural network that takes an input and provides a reconstruction of this input. SWTA AE is a fully-convolutional auto-encoder so that it can compress images of various sizes. The central bottleneck of SWTA AE, denoted Z in the figure below, will be further processed to deliver the bitstream. Z contains 64 sparse feature maps, displayed in orange in the figure below, and one non-sparse feature map, displayed in red. The 64 sparse feature maps are computed via a Winner-Take-All (WTA) competition and a WTA parameter gives control over the coding cost of Z. During training, the WTA parameter is stochastically driven so that, after training, SWTA AE is able to compress at any rate.

SWTA AE architecture.

Winner-Take-All Orthogonal Matching Pursuit (WTA OMP)

The WTA competition is adapted to Orthogonal Matching Pursuit (OMP), giving rise to Winner-Take-All Orthogonal Matching Pursuit (WTA OMP).

Training

The training set of luminance images is built from the ILSVRC2012 ImageNet dataset. SWTA AE learns its parameters. A dictionary for OMP and a dictionary for WTA OMP are learned using respectively K-SVD and our proposed dictionary learning algorithm for WTA OMP.

Visualization of the bottleneck feature maps of SWTA AE after training.

Non-sparse feature map in the bottleneck of SWTA AE when SWTA AE is fed with LENA luminance 512x512.


Ten sparse feature maps in the bottleneck of SWTA AE when SWTA AE is fed with LENA luminance 512x512.

LENA is flipped before it is inserted into SWTA AE. That is why the feature maps of Z are flipped. SWTA AE learns to store in the non-sparse feature map of Z a subsampled version of its input image. Apparently, the non-zero coefficients in the sparse feature maps of Z are agglomerated around the spatial locations where the most complex textures are found in the input image.
The above feature maps are displayed 4 times larger than their real size.

Visualization of the learned dictionaries.

Dictionary of 1024 8x8 atoms (K-SVD).
Dictionary of 1024 8x8 atoms (dictionary learning algorithm for WTA OMP).

The two dictionaries look different. The dictionary learned using the dictionary learning algorithm for WTA OMP contains more diverse atoms around high frequencies. This justifies the utility of this dictionary algorithm.

Image compression experiment

Comparison of rate-distortion curves.

We compare the rate-distortion curves of OMP, WTA OMP, SWTA AE, the non-sparse Auto-Encoder (AE), JPEG and JPEG-2000 on four standard luminance images. Note that, before drawing a new point at a different rate in the rate-distortion curve of AE, AE must be re-trained with a different number of feature maps in its bottleneck.

LENA luminance 512x512
BARBARA luminance 480x384
MANDRILL luminance 512x512
PEPPERS luminance 512x512

Visualization of image reconstruction.

LENA luminance 512x512

JPEG2000 (PNSR = 39.00 dB, rate = 0.69 bbp)
WTA OMP (PNSR = 37.99 dB, rate = 0.66 bbp)

SWTA AE (PNSR = 37.04 dB, rate = 0.64 bbp)
JPEG (PNSR = 37.31 dB, rate = 0.65 bbp)

LENA luminance 1st crop
JPEG2000
WTA OMP
SWTA AE
JPEG

LENA luminance 2nd crop
JPEG2000
WTA OMP
SWTA AE
JPEG

LENA luminance 3rd crop
JPEG2000
WTA OMP
SWTA AE
JPEG

MANDRILL luminance 512x512

JPEG2000 (PNSR = 27.91 dB, rate = 0.70 bbp)
WTA OMP (PNSR = 26.55 dB, rate = 0.70 bbp)

SWTA AE (PNSR = 25.35 dB, rate = 0.63 bbp)
JPEG (PNSR = 26.17 dB, rate = 0.62 bbp)

MANDRILL 1st crop
JPEG2000
WTA OMP
SWTA AE
JPEG

MANDRILL 2nd crop
JPEG2000
WTA OMP
SWTA AE
JPEG

MANDRILL 3rd crop
JPEG2000
WTA OMP
SWTA AE
JPEG
The above crops are displayed 3 times larger than their real size.

Complexity analysis

The complexity analysis is in the following note. [pdf]