DEPTH MAPS EXTRACTION FROM MULTI-VIEW VIDEOS

Depth maps Extraction of depth information from multi-view content

G. Sourimant,

contact: G. Sourimant, C. Guillemot

Context and Goal

In the future 3DTV broadcasting scheme, we know that many display types will be available, and that acquired videos will not be usable as is to provide relief experience. We thus generally seek to add 3D information to the videos, by the association of depth maps to the content.
We present here results of our depth maps extraction method, based on Werlberger's optical flow algorithm (see refs). These results can be compared to those given by the DERS, provided by the MPEG 3DV normalization group.

Approach

The procedure to compute depth maps from multi-view content using Werlberger's optical flow algorithm is fairly simple. Contrary to the DERS, our method computes the disparities and / or depth maps for all views in one single pass. It's core uses the CUDA-based library (see http://www.gpu4vision.org) developed in parallel to Werlberger's works. The core framework of our method is depicted on Figure 1. One one hand, disparities are computed from left to right views (Figure 1, second row). On the other hand, they are estimated from right to left (Figure 1, third row). The interest of such two-side computation is to be able to locate occlusion zones where the motion field would be incorrect (a pixel in one view would not be visible in the other view). In fact, cross check is performed to detect outlier pixels in each computed disparity map, which are finally combined (Figure 1, fourth row) by taking the minimal value of each disparity pair, to avoid the foreground fattening effect exhibited by window-based algorithms.

Principle of our multi-view depth map extraction method
Fig. 1 - Principle of our multi-view depth map extraction method
(Video Balloons, property of Nagoya University).

Experimental Results

Visual comparison with DERS results

We present in Figure 2 disparity maps extracted with out method for the sequences Newspaper, together with original images and disparity maps computed with the DERS and provided to the MPEG community. Each line in the figure is related to one of the original views. The original images are in the first column, disparity maps computed with our method are in the second one, while DERS-based maps are in the third column. We can notice that globally, estimated disparities seem perceptually more relevant with regards to the scene with our approach. More such results are available in our associated technical report (see references at the page bottom).

Visual comparison with DERS results
Fig. 2 - Visual comparison with DERS results
(Video Newspaper, property of Gwangju Institute of Science and Technology).

View Synthesis Results

On Figures 3, 4 and 5, we present evaluation results of our disparity maps in terms of virtual views synthesis quality. The evaluation protocol used is the one used by the MPEG community. Disparity maps are computed for views N-1 and N+1. These maps are used as input to the VSRS in order to synthesize the view N. This virtual view is then compared to the original view N in terms of PSNR, spatial PSPNR and temporal PSPNR, with the PSPNR tool provided to the MPEG community. We present for each sequence and each of these three measures three different plots (quality measure against video frame number). The red curve is associated to disparity maps generated by the DERS. The blue and black curves are respectively associated to our method without (Mv2mvd F2) or with (Mv2mvd F3) the integration of the symmetry constraint (see Werlberger's paper). The dashed horizontal lines in the figures correspond to the mean values over the whole considered sequence.
We also present below a video of synthesized views generated using the VSRS.

Sequence Book Arrival - PSNR Sequence Book Arrival - Spatial PSPNR Sequence Book Arrival - Temporal PSPNR

Fig. 3 - Sequence Book Arrival (property of HHI, Fraunhofer Institut fr Nachrichtentechnik)

Sequence Newspaper - PSNR Sequence Newspaper - Spatial PSPNR Sequence Newspaper - Temporal PSPNR

Fig. 6 - Sequence Newspaper (property of Gwangju Institute of Science and Technology)

Sequence Lovebird1 - PSNR Sequence Lovebird1 - Spatial PSPNR Sequence Lovebird1 - Temporal PSPNR

Fig. 5 - Sequence Lovebird1 (property of Electronics and Telecommunications Research Institute)

Comparison with Depth Camera Acquisition

On Figure 6, we show visual differences between our optical flow-based disparities, and disparities deduced from a depth camera (z-cam) acquisition of the scene. These results are presented for the central of the five input views of the Cafe sequence. Keeping in mind that these z-cam-based disparities are not raw and have been interpolated to fit the full video resolution, it is worth notice that our method competes very well with - and sometimes outperforms - the depth sensors. For instance, depth contours seem sharper with our method (all sub-images). We are even able to retrieve small depth details with much less noise (bottom right sub-image, for the chair part). However, for uniform zones with depths gradients, disparities are better estimated with the z-cam (see the furthest table for instance), where our method discretizes too heavily the resulting signal, while (again) better retrieving depth contours.

Comparison with depth camera acquisition
Fig. 6 - Comparison with depth camera acquisition
(Video Cafe, property of Gwangju Institute of Science and Technology).

References

Webmaster: Valid CSS! Valid XHTML IRISA
Last time modified: 2015-10-20