Scientific foundations: Watermarking as a communication problem with side information

Key words:watermarking, side information, information theory, capacity, discrimination.

Digital watermarking aims at hiding discrete messages into multimedia content. The watermark must not spoil the regular use of the content, i.e., the watermark should be non perceptible. Hence, the embedding is usually done in a transformed domain where a human perception model is exploited to assess the non perceptibility criterion. The watermarking problem can be regarded as a problem of creating a communication channel within the content. This channel must be secure and robust to usual content manipulations like lossy compression, filtering, geometrical transformations for images and video.

When designing a watermarking system, the first issue to be addressed is the choice of the transform domain, i.e., the choice of the signal components that will host the watermark data. Let E(.) be the extraction function going from the content space C to the components space, isomorphic to RN


The embedding process actually transforms a host vector V into a watermarked vector Vw. The perceptual impact of the watermark embedding in this domain must be quantified and constrained to remain below a certain level. The measure of perceptual distortion is usually defined as a cost function d(Vw-V) in RN constrained to be lower than a given distortion bound dw.

Attack noise will be added to the watermark vector. In order to evaluate the robustness of the watermarking system and design counter-attack strategies, the noise induced by the different types of attack (e.g. compression, filtering, geometrical transformations, ...) must be modelled. The distortion induced by the attack must also remain below a distortion bound d(Va-V)<da. Beyond this distortion bound, the content is considered to be non usable any more. Watermark detection and extraction techniques will then exploit the knowledge of the statistical distribution of the vectors V.

Given the above mathematical model, also sketched in Fig. 1, one has then to design a suitable communication scheme. Direct sequence spread spectrum techniques are often used. The chip rate sets the trade-off between robustness and capacity for a given embedding distortion. This can be seen as a labelling process S(.) mapping a discrete message m in M onto a signal in RN:


The decoding function S-1(.) is then applied to the received signal Va in which the watermark interferes with two sources of noise: the original host signal (V) and the attack (A). The problem is then to find the pair of functions {S(.), S-1(.)} that will allow to optimise the communication channel under the distortion constraints {dt, da}. This amounts to maximizing the probability to decode correctly the hidden message:


A new paradigm stating that the original host signal V shall be considered as a channel state only known at the embedding side rather than a source of noise, as sketched in Fig. 2, appeared recently. The watermark signal thus depends on the channel state: S = S(m, V). This new paradigm known as communication with side information, sets the theoretic foundations for the design of new communication schemes with increased capacity.

Figure 1. Classical watermarking scheme
Figure 2. Watermarking as a problem of communication with side information.

Application domains

The problem of data hiding has gained considerable attention in the recent years as a potential solution for a wide range of applications encompassing copy protection, copyright enforcement, content enhancement by meta-data embedding, authentication, and steganography. TEMICS focuses, via its collaborations and contracts, on the three first applications.

Copy protection

The history of copy protection dates back from the analogue age. Yet, in the digital age, this issue is even more crucial. The biggest effort to build a digital right management system is the attempt of the copy protection technical meeting group for the DVD video format. The goal of copy protection systems is not to forbid copying but rather to enforce some usage rights (e.g. view now, view only for X days, copy once, copy locally).

Usually, conditional access to content as well as users rights management are offered via cryptographic functions. But, a dishonest user might record content in a decrypted form (at least from the analogue signals). The watermark is then just a flag warning the devices that pirated clear content is copyrighted and that it was protected. Basically, the watermark is used to distinguish copy free content from clear pirated contents. Therefore, the mark should be non perceptible and very robust to attacks. In this case, the capacity need not be large. The main issue is the security of the watermark primitive. TEMICS will address this application domain in the ACI FABRIANO.

Copyright enforcement

The availability of multimedia contents in digital forms has brought a number of security issues to the forefront. The "digital revolution" has made digital data very vulnerable to unauthorized use. Watermarking primitives offer technical solutions to these security problems by providing means to trace copies along the distribution chain (from the artist to the consumers), to spot illegal uses of copyrighted contents and to ultimately prove the ownership in case of copyright struggles. For this type of application, the watermark capacity need not be large, but the watermark must be non perceptible and very robust to attacks. The RNRT Diphonet project addresses this application of watermarking. The concept of security being in this context of utmost importance, as there may be usurpers hacking the copyright protection system, it is necessary to define a methodology for analyzing the security level of the watermarking system.

Content enhancement

Watermarking provides a way to embed meta-data into the multimedia content for enhanced services. The content becomes self-contained, the created meta-data transmission channel travelling with the content itself. With respect to traditional solutions where the data is transported beside the content, e.g. into a label (field, head of file, tags), data hiding based systems allow for seamless meta-data transport. When placed in separate channels, the data can be unintentionally removed when submitted to transformations such as D/A+A/D transformation, transcoding within heterogeneous networks. The data-hiding based solution should prevent the metadata from being lost. The embedded data is inside the content and no special steps need to be taken in storage media or transmission networks to keep the metadata and content together. The embedded data must be non perceptible, and possibly robust to some content processing (e.g. compression). This application requires high embedding capacity and possibly fast embedding and real-time decoding solutions. The IST BUSMAN project addresses this application.

New results

Side Informed watermarking and game theory

Contributed by: Gaetan Le Guelvouit, Stephane Pateux.

In 2002, we have developed a watermarking technique making use of both communication tools (wide spread spectrum modulation, modulation, error correcting codes), as well as game theory. A patent has been filed and the corresponding software has been registered at the APP. In 2003, we have derived models of attacks for different types of degradations, e.g., scaling, additive white Gaussian noise and de-synchronizations, and introduced the notion of informed attacks. The models lead to closed-form expressions for the different parameters of the embedding and extracting techniques to be used in a practical watermarking system, as well as to the performance bounds of the system. Error correcting codes, based on punctured convolutional codes, have also been introduced in the approach leading to a side-informed watermarking approach with good performance in terms of capacity as shown in Fig. 8. The approach has been validated with large professional data bases.

Figure 3. Impact of desynchronisation on the capacity of data-hiding for Lenna image. Left curves show theoretical capacity. Right curves show obtainable capacities considering a probability of error Pe>10-5.
lena_capacite lena_proba

Interaction between compression, watermarking and indexation

Contributed by: Sophie Le Delliou, Teddy Furon, Stephane Pateux, Francois Tonnin.

In collaboration with the TexMex project, we are studying the interactions between, and the mutual impact of indexing and watermarking. In the framework of the copyright protection of digital images, a system crawls the Internet and analyses the images found in suspicious websites. The system must recognize copyrighted pictures belonging to its database. In this case, an alarm signal is sent to the copyright holder who checks wether the website has cleared the corresponding copyright fees. This is done with a collaboration between indexing and watermarking as follows. Basically, the indexing process first finds the nearest picture in the large database from a suspicious image. If the distance score is weak, this constitutes a first element of proof. The indexing process also sends side information to the watermark decoder: the secret key and hidden message used at the embedding of the original picture, and also an estimated geometric distortion (angle of rotation, scale factor of a stretch...) between the suspicious and the would-be original images (ie. the nearest image found in the database). The detection of this message with this secret key is a second element of proof. This collaboration decreases the probability of false alarm while increasing the robustness of the watermarking detection test.

Security of watermarking schemes

Contributed by: Francois Cayre, Teddy Furon.

In the framework of the french research network 'ACI securite', we have developped a cryptographic approach of the security of watermarking schemes (also called steganalysis). This analysis is based on the Kerckhoffs principle, Shannon's study of crypto-system, and Fisher's information measurement.

Basically, we estimate the amount of information about the secret key leaking from the observations made by the opponent. Although this is very classical in cryptanalysis, it has never been done in watermarking. For instance, cryptography deals with discrete variable, whereas watermarking usually plays with real signals. This is a typical problem because Shannon's equivocation or uncertainty of random discrete variables has no physical signification: it cannot be interpreted as an information measurement when applied on real signals. A different tool has to be used. We chose Fisher's information measurement.

The approach aims at assessing the number of watermarked contents that allows an accurate estimation of the secret signal. It is very well known in the theory of estimation that the Fisher Information Matrix yields a minimal bound of the mean square error whatever the estimator used by the opponent. This bound is a decreasing function of the number of observations. Such an approach is truly related to watermarking security since an opponent may actually access the watermarking communication channel hidden in host content. The disclosure of the secret allows the opponent to erase, modify, or embed watermarks.

Our work is also inspired by Diffie and Hellman cryptanalysis classification as the observations made by the opponent might be not only watermarked content. Depending on the application framework, we distinguish several attacks:

Notice that in the Known Original Attack, the opponent doesn't need to hack the observed watermarked contents as he also has in hands the original versions. He first deduce some knowledge about the secret key from these pairs of content. Then, later on, he uses this knowledge to forge pirated content whose original version are not available.

Theoretical security levels have been estimated and assessed with experiments on a huge watermarked images database. The results obtained include the number of contents that have to be taken into account to gain an order of magnitude on the estimation of the secret signals, for every kind of attacks (KMA, KOA, WOA). Such a work has never been carried out in watermarking, although it is very common in classical cryptanalysis. Our theoretical results show that the vast majority of watermarking schemes (ie., techniques based on a spread spectrummodulation) are actually not secure: a relatively low number of contents available to an opponent may easily lead to disclosure of the secret signals.

Based on this analysis, we have implemented a security attack of an actual robust watermarking scheme for still images. Under the assumption of 1000 images available to the opponent, we have shown that in the worst case (Watermarked Only Attack), an opponent may gain sufficient knowledge of the secret signals to perform a watermark removal attack at a low distortion: attacked images look almost perfect. However, not only watermark removal is achievable, since the opponent gains full access to the secret channel, he can also try to read/write the watermark (given additionnal high-level knowledge - for example watermark content structure), which was not possible with previous attacks.

The goal of this work is to warn the watermarking community that security is a crucial issue, so far underestimated. People are usually very concerned by the robustness of the watermark. Huge improvements have been achieved in this domain in the last few years. However, a robust watermarking technique may not be a secure primitive. This matter is extremely important when the watermarking technique is deployed on a large scale bank of contents.

Figure 4. Comparison between two strategies for attacking the Le Guelvouit and Pateux robust watermarking technique for still images. In the second strategy, the pirate has observed 300 watermarked pictures with different hidden messages to estimate the secret key. (a) Attack by compression. Best quality for a successful attack: PSNR=21.8 dB. (b) Steganalysis attack. Best quality for a successful hack: PSNR=35.8 dB.
A.jpg B.jpg


CHI-MARK2: Robust image watermarking tool

Contributed by: Sophie Le Delliou, Teddy Furon, Gaetan Le Guelvouit, Stephane Pateux.

This software implements several data-hiding techniques (embedding and extraction) for images and video. The algorithms implemented are based both on state of the art techniques (embedding and extraction based on wide spread spectrum) and on TEMICS theoretical results (a strategy based on the game theory and taking into account optimal attack and potential de-synchronizations, side-informed technique based on dirty paper codes). The software has been registered at the Agency for the Protection of Programmes (APP) under the number IDDN.FR.001.480027.001.S.A.2002.000.41100. Bugs correction and improvements in 2004 were mainly driven by the requirements of Canon, prime of the DIPHONET national project. This concerns the creation of usage profiles, procedures for a zero-knowledge watermarking decoding, and removal of black frames. These modifications were grouped in one update version (v3.1, january 2004) of the CHI-MARK2 software (IDDN.FR.001.480027.001.S.A.2002.000.41100. at the Agency for the Protection of Programmes (APP)).

FastICA in C++

Contributed by: Francois Cayre, Teddy Furon

In the framework of watermark security analysis, we develop simulations tools to hack and disclose secret keys of Spread Spectrum based watermarking techniques. The core process is an independent component analysis of the watermarked signals. As we have not found any satisfying implementation of such a functionality, we develop our own C++ software based on the FastICA algorithm of A. Hyvarinen. This module has been submitted to IT++ sourceforge project, a well known C++ library of mathematical, signal processing, speech processing and communications classes and functions. It has been included in the last release of the library.

Webmaster: Valid CSS! Valid XHTML IRISA
Last time modified: 2006-02-20