This paper studies the problem of low rank approximation of light fields for compression. A homography-based approximation method is proposed which jointly searches for homographies to align the different views of the light field together with the low rank approximation matrices. We first consider a global homography per view and show that depending on the variance of the disparity across views, the global homography is not sufficient to well-align the entire images. In a second step, we thus consider multiple homographies, one per region, the region being extracted using depth information. We first show the benefit of the joint optimization of the homographies together with the low-rank approximation. The resulting compact representation is then compressed using HEVC and the results are compared with those obtained by directly applying HEVC on the light field views re-structured as a video sequence.
In this work, we consider both synthetic light fields from the HCI synthetic datasets ("Buddha", "Butterfly" and "Still-Life" : 9 × 9 views of 768 × 768 pixels) and real light fields captured by a first generation Lytro camera ("TotoroWaterfall" and "Beers": 11 × 11 × 379 × 379), a second generation Lytro Illum ("Fruits": 15 × 15 × 625 × 434, and we consider only the 11 × 11 central views) and a Raytrix camera ("Watch": 9 × 9 × 992 × 628).
Homography-based low rank approximation
Our low rank approximation methods exploit data geometry for dimensionality reduction of light fields. An approximation method is proposed in which homographies and the rank approximation model are jointly optimized. A global homography per view is first considered to align each view on the central one. The homographies are searched in order to align linearly correlated sub-aperture images in such a way that the batch of views can be approximated by a low rank model. The rank constraint is expressed in a factored form where one matrix B contains k basis vectors and where the other one C contains weighting coefficients. The optimization hence proceeds by iteratively searching for the homographies and the factored model of the input set of sub-aperture images (views), which will minimize the approximation error.
Depth-based multiple homographies
When the disparity varies from one depth plane to another, the performances can be improved by using multiple homographies (one homography per depth plane per view). Depth map can be normalized between 0 and 1. Depth planes is then obtained by uniformly quantizing the depth map with a series of quantization thresholds. Instead of blending the pixel values, we blend the homography matrices, which produces nicer visual effect in practice.
Still-Life: separation into 2 depth planes
Still-Life: separation into 3 depth planes
Buddha: separation into 2 depth planes
Buddha: separation into 3 depth planes
The columns of the matrix B are separatly encoded using HEVC Intra coding. The following video shows the columns of B when HLRMA is applied for the LF "Buddha" with rank k = 15 and 2 homographies per image.
Since the matrix B will need to be compressed to be transmitted to the receiver side, in order to reduce the impact of the compression (i.e. quantization) errors on the light field reconstruction, the matrix C is recalculated to account for these quantization errors. The coefficients of C are then encoded using a scalar quantization on 16 bits and Huffman coding. The dimension of the matrix C is such that its coding rate is quite negligible. Following figures show the the PSNR gain when we adapt C to the compression artifacts of the matrix B.
The 8 × n coefficients (8 × n × q in the case of multiple homographies, with q the number of depth planes) of the homography matrix h are also encoded using a scalar quantization on 16 bits and Huffman coding. This cost is negligible.
In the case where multiple homographies are applied per view, the depth map is encoded using HEVC intra coding with QP = 32.
Single homography per view
The compression performances are assessed against those obtained by applying HEVC-based inter-coding on the sequences of images formed by extracting the sub-apertures images following a lozenge scan order starting at the central view.
Multiple homographies per view
Comparison original vs. compressed light fields
Left: the original light field. Right: the compressed light field with k = 30, q = 4, and HEVC-QP=2 (number of depth planes). The average PSNR for all views is 34 dB and the average bit rate is 0.64 bpp.
Left: the original light field. Middle: the compressed light field with k = 15, q = 2 and HEVC-QP=2. The average PSNR on all views is 43.3 dB and the average bit-rate is 0.14 bpp. Right: the compressed light field with k = 15, q = 4 and HEVC-QP=2. The average PSNR on all views is in the case of 4 depth planes 44.1 dB and the average bit-rate is 0.13 bpp.