Clustering-based methods for fast epitome generation

contact: M. Alain, C. Guillemot, D. Thoreau, P. Guillotel

Context and Goal

The concept of epitome was first introduced by Jojic et al. as the condensed representation (meaning its size is only a fraction of the original size) of an image containing the essence of its the textural properties. This original epitomic model is based on a patch-based probabilistic approach, and has different applications in segmentation, denoising, recognition, indexing or texture synthesis.
Several epitomic models have been since proposed, such as the factorized representation of Wang et al. dedicated to texture mapping, or its extension designed for image coding purposes by Chérigui et al. The epitome is in this case the union of epitome charts which are pieces of repeatable textures found in the image. The search for self-similar or repeatable texture patterns, based on the KLT or a block matching (BM) algorithm, is known to be memory and time consuming.
In this work, we propose a clustering-based technique to reduce the self-similarities search complexity.


Epitome generation

We summarize here the epitome generation process as proposed by Chérigui et al, in which the input image \(I\) is factored in an epitome \(E\) and an assignation map \(\phi\). The input image is divided into a regular grid of non-overlapping blocks \(B_i\) (block-grid) and each block will be reconstructed from an epitome patch. The epitome itself is composed of disjoint texture pieces called "epitome chart" (see Fig. 1). The assignation map links the patches from the epitome to the input image blocks. For more detailed explanations, see the papers listed at the end of this page.

Epitome Reconstructed image
34 % of input image PSNR = 35.53 dB
Foreman epitome Foreman reconstructed image

Fig. 1 - Epitome of a Foreman frame (left) and the corresponding reconstruction (right).

Exhaustive self-similarities search

The method first proceeds by finding the self-similarities among the image. The goal of this step is to find for each non-overlapping block in the input image a list of matching patches such that the mean square error (MSE) between a block and its matching patches is below a matching threshold \(\varepsilon_M\). This parameter eventually determines the size of the epitome. The matching patches are found through exhaustive search in the pixel-grid (see Fig. 2), and the assignation map \(\phi\) is composed of translation parameters. $$ \forall B_i \in I, ML(B_i) = \{M_{i,0},M_{i,1},\dots\}, s.t. \forall j, MSE(B_i,M_{i,j}) \leq \varepsilon_M $$

Exhaustive self-similarities search. Fig. 2 - Exhaustive self-similarities search.

Epitome generation

The epitome charts are generated based on an iterative process. candidate patches used to initialize or grow the epitome charts are found by reversing the lists of matching patches obtained at the previous step. The selection of the best candidate is then performed based on a rate-distortion optimization (RDO) criterion, which minimizes the error between the input image \(I\) and the image \(I'\) as well as the size of the epitome. This criterion is summarized in the equation below, where \(R(E)\) represents the size of the epitome estimated as the number of pixels. $$ (E, \phi) = argmin(MSE(I,I') + \lambda R(E))$$

Epitome chart extension with inferred blocks. Fig. 3 - Iterative epitome chart extension.

Clustering-based methods for fast self-similarities search

We propose two methods to reduce the complexity of the self-similarities search compared to the exhaustive search. The main idea is to group together similar non-overlapping blocks. The lists of matchnig patches are then computed for the group of blocks instead of each block separately. The similarity for grouping non-overlapping blocks is characterized by an assignation threshold \(\varepsilon_A\), which is smaller than \(\varepsilon_M\). Thus we define \(\varepsilon_A\) via a coefficient \(\alpha_A\) such that \(\varepsilon_A = \alpha_A * \varepsilon_M, 0 \leq \alpha_A < 1\).

List-based method

This method was designed to obtain a simple grouping of similar non-overlapping blocks and follows the steps below:

Finally for each actual list \(AL(B_i)\) we determine a match list \(ML(B_i) = \{M_{i,0}, M_{i,1}, \dots\}\) obtained through an exhaustive search in the pixel-grid. The list is computed with respect to \(B_i\) but is then used by all blocks in \(AL(B_i)\) for the epitome generation step. To satisfy the constraint that all the blocks have a reconstruction error inferior to \(\varepsilon_M\), the blocks different from \(B_i\) in the list \(AL(B_i)\) only use a subset of \(ML(B_i)\) defined as: $$ \{M \in ML(B_i) | D(B_i,M) \leq \varepsilon_M - \varepsilon_A\}$$

Threshold-based clustering

This method proceeds as follows:

For each cluster \(C_i\) a match list \(ML(B_i)\) is then computed through an exhaustive search in the pixel-grid, with respect to the block \(B_i\) closest to the centroid. Except for the block \(B_i\), all blocks in \(C_i\) will use approximate matches. As for the previous grouping method, we want to ensure that all blocks have a reconstruction error inferior to \(\varepsilon_M\). The blocks different from \(B_i\) thus only use a subset of \(ML(B_i)\), as defined in in the equation above.

Two steps clusterring-based self-similarities search. Fig. 4 - Two steps clusterring-based self-similarities search.

Experimental Results

Experiments were conducted on a set of 4 images: a frame extracted from the Foreman sequence (CIF), Lena (\(512 \times 512\)), City (\(1280 \times 720\)) and Calendar (\(1280 \times 720\)). The size of the blocks is set to \(8 \times 8\).
First tests were carried out with \(\varepsilon_M = \{3.0, 5.0, 10.0, 15.0\}\). The parameter \(\alpha_A\) was here set to \(0.5\). Results are displayed in Table 1. Best results between list-based or cluster-based methods are displayed in bold font. The complexity reduction is assessed by the percentage of the optimized memory occupation over the original one, and the optimized method processing time over the original one. Two processing times are displayed: the self-similarities search time, which is the algorithm step actually optimized, and the complete epitome generation time.
The memory occupation is prohibitive for images City and Calendar when \(\varepsilon_M = 15.00\) with the original method. This shows a very important limitation of the full search method for high resolution images. Because of this, comparative results can not be displayed, but the epitome can be still generated when using optimized methods.

Table 1 - Memory occupation and computation time % with respect to original method, depending on \(\varepsilon_M\), with \(\alpha_A = 0.5\)
Image\(\varepsilon_M\)List-based methodCluster-based method
Memory load (%)Search time (%)Total time (%)Memory load (%)Search time (%)Total time (%)
Foreman 3.0 50.1868.6273.9347.8567.3170.15
5.0 49.5460.7260.1245.2255.8061.88
Lena 3.0 98.1875.5478.9296.8169.0570.13
5.0 63.6161.0065.2448.2856.1266.19
City 3.0 60.5466.7565.2755.8568.6269.25
5.0 60.7164.16564.16056.4762.4862.80
Calendar3.0 84.6668.3666.4480.6970.3668.02
5.0 48.4354.7658.8342.0853.2559.94

The quality of the epitome produced is assessed by the reconstructed image quality. The graphs representing the reconstructed image PSNR as a function of the epitome size are displayed in Fig. 5. For all images, whatever the epitome generation method, the curves are almost identical. The two proposed methods can thus reduce complexity while keeping the same epitome quality.

source image compressed image
kmeans clustering oracle clustering

Fig. 5 - Reconstruction performances depending on \(\varepsilon_M\).

Complexity results for different values of \(\alpha_A\) are displayed in Table 2, with \(\varepsilon_M = 10.00\). These results illustrates the behavior of the proposed methods, which can reduce drastically the complexity when allowing important approximations. However, when evaluating average reconstruction performances (see Fig. 6), we can see that it has a negative impact, as it can increase the epitome size while decreasing the reconstruction PSNR.

Table 2 - Memory occupation and computation time % with respect to original method, depending on \(\alpha_A\), with \(\varepsilon_M = 10.0\)
Image\(\alpha_A\)List-based methodCluster-based method
Memory load (%)Search time (%)Total time (%)Memory load (%)Search time (%)Total time (%)
Foreman 0.2570.6653.1761.5167.8651.4258.41
0.75 9.5821.9424.74 7.3219.9646.74
Lena 0.2579.9960.9372.7370.1356.4165.69
0.75 9.0323.2726.5715.9125.4853.73
City 0.2586.3064.0667.5484.0963.3066.99

Alternative two step schemes with first pass blind de-noising Fig. 6 - Reconstruction performances depending on \(\alpha_A\).


The concept of epitome presented here was used in different applications such as still image coding and scalable image coding.