contact: J. Aghaei Mazaheri, C. Guillemot, C. Labit
Sparse representations consist in representing a signal as a linear combination of only a few columns, known as atoms, from a dictionary. The goal is here to learn a dictionary adapted to satellite images. Besides, this dictionary is structured to be used in the context of image compression and thus to offer a good trade-off between approximation error and coding cost. Several dictionary structures are tested and compared to a "flat" dictionary learned with K-SVD.
These structured dictionaries globally contain more atoms than a "flat" dictionary but thanks to their structure composed of small dictionaries, they offer an efficient coding of the indices of the selected atoms and a limited complexity to search for the atoms. Moreover, they are scalable in sparsity, i.e. adapted to several sparsity values. The dictionary structures are learned with a "top-down approach": each dictionary is learned on a subset of residuals of the previous level [3], with the K-SVD [4] algorithm (with a sparsity set to 1 atom). After the learning, the dictionaries can be used for sparse representations: one atom is selected per level thanks to a matching pursuit algorithm (MP, OMP...).
Fig. 1 - Tree Structure: "Tree K-SVD". |
Fig. 2 - Adaptive Structure. |
Different dictionary structures are compared, with the same dictionary learning algorithm to learn each dictionary in the structures: K-SVD [4]. The Adaptive Structure is compared to the Tree Structure "Tree K-SVD", to a branch structure "Branch K-SVD" composed of only one dictionary per level, to a "flat" dictionary learned with K-SVD and to the (over)complete DCT dictionary. The dictionary structures are learned on 13 various satellite images. The tests are realized on the picture "New York" (Fig. 3) (2400x2400). In the context of compression, we do not intend to compare the methods with the same total number of atoms, but with an equivalent coding cost of each atom index, and also with a comparable decomposition complexity. So for the "flat" dictionaries, K represents the total number of atoms, whereas for the structured dictionaries, K is the number of atoms per dictionary in the structure.
In Fig. 3 and Fig. 4, the test picture is reconstructed for several sparsity values using the learned dictionaries, for K=64 and K=256.
For the coding experiments (Fig. 5-6), the learned Adaptive Structure is used with a rate-distortion criteria applied to select a various number of atoms per block [5]. The DC values of the blocks are coded with a DPCM and a learned Huffman table. The coefficients are quantized with a uniform scalar quantizer with a dead zone and are coded with learned Huffman tables. The indices are coded with a fixed-length code. The CCSDS coder (Fig. 6) is a standard to code spatial images. It is close to JPEG 2000, using a 9/7 wavelet transform also, but with a trade-off between performances and complexity.
A visual comparison (Fig. 7-9) shows that the Adaptive Structure codec, efficient to represent the structured textures, reconstructs a not blurred image, but with some blocking artifacts visible in homogeneous areas.
Fig. 3 - Sparse representations for K=64. |
Fig. 4 - Sparse representations for K=256. |
Fig. 5 - Rate distortion results for K=256 (1). |
Fig. 6 - Rate distortion results for K=256 (2). |
Fig. 7 - Part of "New York" (801x801).
Fig. 8 - CCSDS: 0.7 bpp, 30.53 dB. |
Fig. 9 - Adaptive Structure: 0.7 bpp, 30.76 dB. |