Structured dictionary learning for sparse representations and image coding

contact: J. Aghaei Mazaheri, C. Guillemot, C. Labit

Context and Goal

Sparse representations consist in representing a signal as a linear combination of only a few columns, known as atoms, from a dictionary. The goal is here to learn a dictionary adapted to satellite images. Besides, this dictionary is structured to be used in the context of image compression and thus to offer a good trade-off between approximation error and coding cost. Several dictionary structures are tested and compared to a "flat" dictionary learned with K-SVD.


These structured dictionaries globally contain more atoms than a "flat" dictionary but thanks to their structure composed of small dictionaries, they offer an efficient coding of the indices of the selected atoms and a limited complexity to search for the atoms. Moreover, they are scalable in sparsity, i.e. adapted to several sparsity values. The dictionary structures are learned with a "top-down approach": each dictionary is learned on a subset of residuals of the previous level [3], with the K-SVD [4] algorithm (with a sparsity set to 1 atom). After the learning, the dictionaries can be used for sparse representations: one atom is selected per level thanks to a matching pursuit algorithm (MP, OMP...).

  • The Tree Structure [1] (Fig. 1) is composed of one dictionary of K atoms at the first level. Each atom leads to a dictionary at the second level, composed of K dictionaries, and so on.
  • The Adaptive Structure [2] (Fig. 2) can be seen as a tree-structure whose branches are progressively merged into a general and unique branch (the "merged" branch composed of the Dm_i dictionaries). That way, this structure improves the tree structure composed of many incomplete dictionaries and presenting overfitting after a few levels.
  • Tree Structure

    Fig. 1 - Tree Structure: "Tree K-SVD".
    Adaptive Structure

    Fig. 2 - Adaptive Structure.

    Experimental Results

    Different dictionary structures are compared, with the same dictionary learning algorithm to learn each dictionary in the structures: K-SVD [4]. The Adaptive Structure is compared to the Tree Structure "Tree K-SVD", to a branch structure "Branch K-SVD" composed of only one dictionary per level, to a "flat" dictionary learned with K-SVD and to the (over)complete DCT dictionary. The dictionary structures are learned on 13 various satellite images. The tests are realized on the picture "New York" (Fig. 3) (2400x2400). In the context of compression, we do not intend to compare the methods with the same total number of atoms, but with an equivalent coding cost of each atom index, and also with a comparable decomposition complexity. So for the "flat" dictionaries, K represents the total number of atoms, whereas for the structured dictionaries, K is the number of atoms per dictionary in the structure.

    In Fig. 3 and Fig. 4, the test picture is reconstructed for several sparsity values using the learned dictionaries, for K=64 and K=256.

    For the coding experiments (Fig. 5-6), the learned Adaptive Structure is used with a rate-distortion criteria applied to select a various number of atoms per block [5]. The DC values of the blocks are coded with a DPCM and a learned Huffman table. The coefficients are quantized with a uniform scalar quantizer with a dead zone and are coded with learned Huffman tables. The indices are coded with a fixed-length code. The CCSDS coder (Fig. 6) is a standard to code spatial images. It is close to JPEG 2000, using a 9/7 wavelet transform also, but with a trade-off between performances and complexity.

    A visual comparison (Fig. 7-9) shows that the Adaptive Structure codec, efficient to represent the structured textures, reconstructs a not blurred image, but with some blocking artifacts visible in homogeneous areas.

    PSNR-sparsity K=64

    Fig. 3 - Sparse representations for K=64.
    PSNR-sparsity K=256

    Fig. 4 - Sparse representations for K=256.
    PSNR-rate K=256 (1)

    Fig. 5 - Rate distortion results for K=256 (1).
    PSNR-rate K=256 (1)

    Fig. 6 - Rate distortion results for K=256 (2).

    Fig. 7 - Part of "New York" (801x801).

    CCSDS visual

    Fig. 8 - CCSDS: 0.7 bpp, 30.53 dB.
    Adaptive Structure visual

    Fig. 9 - Adaptive Structure: 0.7 bpp, 30.76 dB.