# Source coding

## Scientific foundations: Rate-distortion theory

Key words: OPTA limit (Optimum Performance Theoretically Attainable), Rate allocation, Rate-Distortion optimization, lossy coding, joint source-channel coding multiple description coding, channel modelization, oversampled frame expansions, error correcting codes.

Coding and joint source channel coding rely on fundamental concepts of information theory, such as notions of entropy, memoryless or correlated sources, of channel capacity, or on rate-distortion performance bounds. Compression algorithms are defined to be as close as possible to the optimal rate-distortion bound, R(D), for a given signal.

The source coding theorem establishes performance bounds for lossless and lossy coding. In lossless coding, the lower rate bound is given by the entropy of the source. In lossy coding, the bound is given by the rate-distortion function R(D). This function gives the minimum quantity of information needed to represent a given signal under the constraint of a given distortion. The rate-distortion bound is usually called OPTA (Optimum Performance Theoretically Attainable). It is usually difficult to find close-form expressions for the function R(D), except for specific cases such as Gaussian sources. For real signals, this function is defined as the convex-hull of all feasible (rate, distortion) points. The problem of finding the rate-distortion function on this convex hull then becomes a rate-distortion minimization problem which, by using a Lagrangian formulation, can be expressed as

dJ/dQ =0 where J=R+lD with l > 0

The Lagrangian cost function J is derivated with respect to the different optimisation parameters, e.g. with respect to coding parameters such as quantization factors. The parameter is then tuned in order to find the targeted rate-distortion point.

When the problem is to optimise the end-to-end Quality of Service (QoS) of a communication system, the rate-distortion metrics must in addition take into account channel properties and channel coding. Joint source-channel coding optimisation allows to improve the tradeoff between compression efficiency and robustness to channel noise.

## Application domains

The field of video compression has known, during the last decade, a significant evolution leading to the emergence of a large number of international standards (MPEG-4, H.264). Notwithstanding this already large number of solutions, compression remains a widely-sought capability especially for audiovisual communications over wired or wireless IP networks, often characterized by limited bandwidth, and is a natural application framework of many TEMICS developments. The advent of these delivery infrastructures has given momentum to extensive work aiming at optimized end-to-end QoS (Quality of Service). This encompasses low rate compression capability but also capability for adapting the compressed streams to varying network conditions. In particular, fine grain scalable (FGS) coding solutions making use of mesh-representations and/or spatio-temporal frame expansions are developed in order to allow for rate adaptation to varying network bandwidth in streaming scenarios with pre-encoded streams.

### Multimedia communication

The emergence of networks such as 2.5G, 3G networks and ADSL but also of new terminal devices, e.g. handhelds, advanced mobile phones should create a propitious, yet challenging, ground for the development of advanced multimedia services. Networked multimedia is indeed expected to play a key role in the development of 3G and beyond 3G (i.e. all IP-based) networks, by leveraging higher available bandwidth, all-IP based ubiquitous and seamless service provisioning across heterogeneous infrastructures, and new capabilities of rich-featured terminal devices.

However, all-IP based ubiquitous and seamless multimedia service provisioning across heterogeneous infrastructures, presenting a number of challenges beyond existing networking and source coding capabilities, is only a vision so far. In particular, networked multimedia will have to face problems of transmission of large quantities of information with delay constraints on heterogeneous, time-varying communication environments with a non-guaranteed quality of service (QoS). End-to-end QoS provisioning, all the most challenging in a global mobility context is of the utmost importance for consumer acceptance and adoption. It is now a common understanding that QoS provisioning for multimedia applications such as video or audio does require a loosening and a re-thinking of the end-to-end and layer separation principle. These trends are exemplified within 3GPP and the IETF by the UDP-lite and ROHC initiatives. In that context, the joint source-channel coding paradigm sets the foundations for the design of efficient practical solutions to the above application challenges, that we address via different industrial (with Thomson Multimedia, France Telecom), national (RNRT-VIP, RNRT-COSOCATI) and European (IST-OZONE) partnerships.

## New results

### Oriented wavelet transforms for efficient contour representation in scalable image and video coding

Contributed by: Vivien Chappelier, Christine Guillemot, Slavica Marinkovic.

Wavelets are well-known mathematical tools for representing 1-D signals with a finite number of discontinuities with a small number of coefficients. However, for images modelled as homogeneous regions delimitated by contours, curve discontinuities are not fully captured by separable wavelets. In image compression applications, high energy coefficients cluster around the edges and most of the bitrate is spent to code the contours. Thus, new transforms (e.g., curvelets and contourlets) have been designed to better take into account - and capture - geometrical patterns present in images.

Curvelets and contourlets are implemented with filter banks with directional selectivity in the high frequencies, so that the resulting coefficients represent oriented portions of edges instead of points. Their main advantage is that they do not require a geometric model of the image. The counterpart is that discrete implementations of curvelet transforms are currently highly redundant, which limits their interest for compression applications.

The bandlets follow a different approach as they use a geometric model to describe the discontinuities of the image (parametrized curves or regularity flow) and wrap wavelets along these discontinuities. Though theoretically more efficient than curvelets, the main problem is to optimise the bitrate allocation to the geometric description and to the wavelet coefficients.

A hybrid transform based on both contourlets and wavelets has first been designed. A projection technique (based on the POCS - projection on convex sets - technique) has also been coupled with the quantization applied in the redundant transform domain in order to minimize the distortion introduced by quantization. This modified contourlet transform gives better non-linear approximation results compared to wavelets when used for compressing images with directional features. This transform being redundant, we have then designed a new critically sampled transform based on wavelet lifting locally oriented according to multiresolution image geometry information. The orientation is restricted to a binary information per wavelet coefficient (horizontal/vertical for even levels, diagonal/antidiagonal for odd levels) so as to minimize the orientation map coding cost. In a first approach Markov random fields have been used to regularize the map and further reduce its coding cost (Fig.1). However, this model is very dense, hence its coding cost remains too high. A quad-tree structure has then been used to describe the geometry of the image leading to an efficient representation and a simpler rate-distortion optimization (see Fig.2). This transform exposes comparable energy compaction properties as bandlets for a significantly reduced complexity. For example, using stationnary entropy of the subbands as a measure of the bit rate, the oriented wavelet transform achieves a 1.25dB gain in PSNR at 0.3bpp for the reconstruction of the Barbara image compared to wavelets (Fig.3). In order to assess the efficiency of this new transform in a real compression system, we are currently adapting a context-based arithmetic encoder with contexts adapted to the transform properties as well as considering other entropy coding techniques.

### Fine grain scalable video coding based on motion-compensated temporal filtering

Contributed by: Vincent Bottreau, Christine Guillemot, Thomas Guionnet.

In 2002 and 2003, TEMICS has developed a scalable video coder/decoder, called WAVIX, based on motion-compensated spatio-temporal wavelet transforms. The algorithm and the software have been the starting point in the preparation of a joint Thomson/INRIA proposal to ISO in reply to the call for proposal initiating the specification of a scalable video coding standard (in MPEG-21/SVC). The objectives of the call were very challenging since the coder was supposed to generate a unique bitstream embedding resolutions and rates going from QCIF at 7.5 Hz and 48 Kbits/s to standard definition at 30Hz and a few Mbits/s, with a number of intermediate operation points in terms of spatial and temporal resolutions and in terms of bit rates. Spatial wavelet transforms used in the proposal as well as in competing solutions, designed for compression purposes only, were leading to aliasing artifacts in the embedded lower resolution signals. We have worked on the design and validation of a spatial transform based on three lifting steps. This transform has been the object of a joint proposal INRIA/UIC/Thomson to ISO. Another critical aspect in scalable video compression, especially at low rate and low spatial resolutions, is the bit rate needed to encode the motion fields which can potentially be high. TEMICS has worked on a scalable motion representation procedure.

### Distributed source coding

Contributed by: Christine Guillemot, Khaled Lajnef.

Distributed source coding (DSC) is a general framework which applies to highly correlated signals that are coded separately and decoded jointly. This framework applies to sensor networks but also to video compression. In the latter application, the motivation is to reduce the complexity of the encoder, at the expense of an increase of complexity of the decoder. From a theoretical point of view, DSC finds its foundations in the Slepian-Wolf theorem established in 1973. The Slepian-Wolf theorem states that for dependent binary sources Y and Z, the error decoding probability is close to zero for rates such that RY> H(Y|Z), RZ > H(Z|Y), RY+RZ > H(Y,Z) .
This theorem has been extended to continuous-valued Gaussian sources by Wyner and Ziv in 1976. They have shown that for two correlated Gaussian sources Y and Z, if Z is available at the decoder, the rate-distortion performance obtained for the encoding of Y is the same whether the encoder knows the realization of Z or not. The results of Slepian and Wolf on one hand and of Wyner and Ziv on the other hand provide asymptotic bounds to lossless and lossy distributed coding of two correlated sources. They do not induce practical solutions for coder/decoder design.

Most practical DSC solutions are derived from channel coding. The statistical dependence between the two sources is modelled as a virtual correlation channel analogous to binary symmetric channels or additive white Gaussian noise (AWGN) channels. The source Y (called the side information) is thus regarded as a noisy version of X (called the main signal). Practical solutions based on channel codes like block and convolutional codes, turbo codes and LDPC have been designed mostly for the encoding of two sources.

Practical video compression schemes applying the DSC paradigm in a pixel or transform-domain (making use of classical de-correlating transform such as the DCT) have been reported. Key frames (in general every two frames) are coded using intraframe coding. The remaining frames (the Wyner-Ziv coded frames) are then coded separately but decoded conditionally to the side information which can be generated by interpolation of previously decoded frames. In other words, the Wyner-Ziv coded frames are intraframe coded and interframe decoded. One can then understand easily that the encoding is orders of magnitude less complex than the motion-compensated hybrid predictive coding. The first results show rate-distortion performance superior to that of intraframe coding, but there is still a gap with respect to conventional motion-compensated interframe coding.

In an attempt to reduce the performance gap with respect to conventional motion-compensated interframe coding, we have worked on the derivation of performance bounds and on the design of a DSC system with three correlated sources. The structure of dependencies considered and the resulting coding system can be regarded as the DSC counterpart of bidirectional or multiple reference predictive coding which brought a significant performance gain in classical video compression systems. We have extended the rate performances bounds for both binary and Gaussian correlation models. We have designed a coding/decoding system based on punctured turbo codes. The performances with theoretical sources evidence the benefits in terms of compression efficiency. The next step is the validation in a real distributed video coding system.

## Software

### WAVIX video codec

Contributed by: Christine Guillemot, Laurent Guillo, Thomas Guionnet, Cecile Marc.

The software WAVIX (Wavelet based Video Coder with Scalability) is a low rate fine grain scalable video codec based on a motion compensated 2D+t wavelet analysis. In order to code the spatio-temporal subbands, the first release used the EBCOT algorithm provided by the Verification Model JPEG2000. That release has been registered at the Agency for the Protection of Programmes (APP) under the number IDDN.FR.001.160015.000.S.P.2003.000.20100 and then used by Thomson as part of a partnership.

Wavix now takes advantage of three main improvements. First, the JPEG2000 library is replaced by JasPer, which performs better. Then, the movement and texture information have been embedded in a single bitstream. That makes transmission easier. Finally, header protections and soft decoding techniques for quasi-arithmetic codes are now included. These techniques improve the quality of the decoded video when the video is transmitted over a noisy channel. Moreover, the decoding time can be reduced if WAVIX knows what parts of the bitstream are corrupted or not. This last information can be provided by a specific transport layer such as UDP-Lite.

### MOVIQS: Video communication platform with QoS support

Contributed by: Laurent Guillo, Cecile Marc.

With the support of several contracts (RNRT-VISI, IST-SONG), TEMICS has developed a video communication platform. This latter provides a test bed allowing to study and assess, in a realistic way, algorithms implementing joint source channel coding, video modelling or video coding. The software MOVIQS (module pour de la vidéo sur Internet avec qualité de service) is one of the platform component. It is a dynamic link library used by a video streaming server and the related clients. They can take advantage of three of its main mechanisms: video transport in both unicast and multicast mode, congestion control and rate regulation, and loss control. The release 1.0 of the software MOVIQS has been registered at the Agency for the Protection of Programmes (APP) under the number IDDN.FR.001.030031.000.S.P.2003.000.10200. The platform is still being developed. Its next main evolutions will rely on the integration of the fine grain scalable video coder WAVIX and of developments based on IPv6 and the ROHC framework resulting from the collaboration with the ARMOR project-team.

### WULL: A Windows UDP-Lite library

Contributed by: Laurent Guillo, Cecile Marc.

WULL is Windows library that implements the UDP-Lite protocol according to the very new RFC 3828. UDP-Lite is a new transport protocol that is used by application, which would rather receive damaged data than having corrupted data discarded by the network. UDP-Lite is similar to UDP. However, UDP-Lite allows applications to specify the length of checksum coverage. Checksum coverage is the number of bytes, counting from the first byte of the UDP-Lite header, that are covered by the checksum. So, a packet is divided into an error sensitive part (covered by the checksum) and an error insensitive part (not covered by the checksum). Only errors in the sensitive part cause the packet to be discarded by the transport layer. UDP-Lite is especially useful for application that are error tolerant, such video codec and WAVIX in particular. Using UDP-Lite as transport protocol in our video communication platform allows the transport layer to signal to WAVIX that it is going to process corrupted data. Taking into account this information can significantly improve the decoding speed. Wull provides developers with specific UDP-Lite sockets they can use in the same way they use classic UDP socket. It can be used with both IPv4 and IPv6. To get a more mature software, the ARMOR and TEMICS project-team run interoperability tests with an other UDP-Lite implementation carried out by the team headed by Paul Christ of the University of Stuttgart. As a partner in the DANAE project, France Telecom also uses WULL. WULL 1.0 has been registered at the Agency for the Protection of Programmes (APP) under the number IDDN.FR.001.270018.000.S.P.2004.000.10000