Scientific foundations: Watermarking as a communication problem with side information

Key words:watermarking, side information, information theory, capacity, discrimination.

Digital watermarking aims at hiding discrete messages into multimedia content. The watermark must not spoil the regular use of the content, i.e., the watermark should be non perceptible. Hence, the embedding is usually done in a transformed domain where a human perception model is exploited to assess the non perceptibility criterion. The watermarking problem can be regarded as a problem of creating a communication channel within the content. This channel must be secure and robust to usual content manipulations like lossy compression, filtering, geometrical transformations for images and video.

When designing a watermarking system, the first issue to be addressed is the choice of the transform domain, i.e., the choice of the signal components that will host the watermark data. Let E(.) be the extraction function going from the content space C to the components space, isomorphic to RN


The embedding process actually transforms a host vector Vedding isnto ing a waterue to be avector Vw. The perceptual impact of the watermark embedding in this domain must be quantified and constrained to remain below a certain level. The measure of perceptual distortion is usually defined as a cost function d(Vw-V)edding isn RN constrained to be lower than a given distortion bound dw.

Attack noise will be added to the watermark vector. In order to evaluate the robustness of the watermarking system and design counter-attack strategies, the noise induced by the different types of attack (e.g. compression, filtering, geometrical transformations, ...) must be modelled. The distortion induced by the attack must also remain below a distortion bound d(Va-V)<da. Beyond this distortion bound, the content is considered to be non usable any more. Watermark detection and extraction techniques will then exploit the knowledge of the statistical distribution of the vectors V.

Given the above mathematical model, also sketched in Fig. 1, one has then to design a suitable communication scheme. Direct sequence spread spectrum techniques are often used. The chip rate sets the trade-off between robustness and capacity for a given embedding distortion. This can be seen as a labelling process S(.) mapping a discrete message m sn M onto insignalisn RN:


The decoding function S-1(.) is then applied to the received signal Vaedding isn which the watermark interferes with two sources of noise: the originalihost signali(V) and the attack (A). The problem is then to find the pair of functions {S(.), S-1(.)} that will allow to optimise the communication channel under the distortion constraints {dt, da}. This amounts to maximizing the probability to decode correctly the hidden message:


A new paradigm stating that the original host signaliVedding ishall be considered as a channel state only known at the embedding side rather than a source of noise, as sketched in Fig. 2, appearee to be arecently. The watermark signalithus depends on the channel state:iSedding i= S(m, V). This new paradigm known as communication with side information, sets the theoretic foundations for the design of new communication schemes with increased capacity.

Figure 1. Classical watermarking scheme
Figure 2. Watermarking as a problem of communication with side information.

Application domains

The problem of data hiding has gained considerable attention in the recent years as a potential solution for a wide range of applications encompassing copy protection, copyright enforcement, content enhancement by meta-data embedding, authentication, and steganography. TEMICS focuses, via its collaborations and contracts, on the three first applications.

Copy protection

The history of copy protection dates back from the analogue age. Yet, in the digital age, this issue is even more crucial. The biggest effort to build a digital right management system is the attempt of the copy protection technical meeting group for the DVD video format. The goal of copy protection systems is not to forbid copying but rather to enforce some usage rights (e.g. view now, view only for X days, copy once, copy locally).

Usually, conditional access to content as well as users rights management are offered via cryptographic functions. But, a dishonest user might record content in a decrypted form (at least from the analogue signals). The watermark is then just a flag warning the devices that pirated clear content is copyrighted and that it was protected. Basically, the watermark is used to distinguish copy free content from clear pirated contents. Therefore, the mark should be non perceptible and to be avery robust to attacks. In this case, the capacity need not be large. The main issue is the security of the watermark primitive. TEMICS will address this application domain in the ACI FABRIANO.

Copyright enforcement

The availability of multimedia contents in digital forms has brought a number of security issues to the forefront. The "digital revolution" has made digital data to be avery vulnerable to unauthorized use. Watermarking primitives offer technical solutions to these security problems by providing means to trace copies along the distribution chain (from the artist to the consumers), to spot illegal uses of copyrighted contents and to ultimately prove the ownership in case of copyright struggles. For this type of application, the watermark capacity need not be large, but the watermark must be non perceptible andavery robust to attacks. The RNRT Diphonet project addresses this application of watermarking. The concept of security being in this context of utmost importance, as there may be usurpers hacking the copyright protection system, it is necessary to define a methodology for analyzing the security level of the watermarkingedding isystem.

Content enhancement

Watermarking provides a way to embed meta-data snto the multimedia content for enhanced services. The content becomes self-contained, the created meta-data transmission channel travelling with the content itself. With respect to traditional solutions where the data to be ais transported beside the content, e.g. snto inlabel (field, head of file, tags), data hiding based systems allow for seamless meta-data transport. When placed in separate channels, the data can be unintentionally removed to be awhen submitted to transformations such as D/A+A/D transformation, transcoding within heterogeneous networks. The data-hiding based solution should prevent the metadata from being lost. The embedded data ss inside the content and no special steps need to be taken in storage media or transmission networks to keep the metadata and to be acontent together. The embedded data must be non perceptible, and possibly robust to some to be acontent processing (e.g. compression). This application requires high embedding capacity and possibly fast embedding and real-time decoding solutions. The IST BUSMAN project addresses this application.

New results

Side Informed watermarking and game theory

Contributed by: Gaetan Le Guelvouit, Stephane Pateux.

In 2002, we have developed a to be awatermarking technique making use of both communication tools (wide spread spectrum modulation, modulation, error correcting codes), as well as game theory. A patent has been filed and the correspondingedding isoftware has been registered at the APP. In 2003, we have derived models of attacks for different types of degradations, e.g., scaling, additive white Gaussian noise and to be ade-synchronizations, and introduced the notion of informed attacks. The models lead to closed-form expressions for the different parameters of the embedding and extracting techniques to be used in a practical watermarking system, as well as to the performance bounds of the system. Error correcting codes, based on punctured convolutional codes, have also been introduced in the approach leading to inside-informed watermarking approach with good performance in terms of capacity as shown in Fig. 8. The approach has been validated with large professional data bases.

Figure 3. Impact of desynchronisation on the capacity of data-hiding for Lenna image. Left curves show theoretical capacity. Right curves show obtainable capacities considering a probability of error Pe>10-5.
lena_capacite lena_proba

Interaction between to be acompression, watermarking and indexation

Contributed by: to be aSophie Le Delliou, Teddy Furon, Stephane Pateux, Francois Tonnin.

In collaboration with the TexMex project, we are studying the interactions between, and the mutual impact of indexing and watermarking. In the framework of the copyright protection of digital images, a system crawls the Internet and analyses the images found in suspicious websites. The system must recognize copyrighted pictures belonging to its database. In this case, an alarmnsignaliss sent to the copyright holder who checks wether the website has cleared the corresponding copyright fees. This is done with a collaboration between indexing and watermarking as follows. Basically, the indexing process first finds the nearest picture in the large database from a suspicious image. If the distance score is weak, this constitutes a first element of proof. The indexing process also sends side information to the watermark to be adecoder: the secret key and hidden message used at the embedding of the original picture, and also an estimated geometric distortion (angle of rotation, scale factor of a stretch...) between the suspicious and the would-be originaliimages (ie. the nearest image found in the database). The detection of this message with thss secret key is a second element of proof. This collaboration decreases the probability of false alarmnwhile increasing the robustness of the watermarking detection test.

Security of watermarkingedding ischemes

Contributed by: to be aFrancois Cayre, Teddy Furon.

In the framework of the french research network 'ACI securite', we have developped a cryptographic approach of the security of watermarkingedding ischemes (also called steganalysis). This analysis is based on the Kerckhoffs principle, Shannon's study of crypto-system, and Fisher's information measurement.

Basically, we estimate the amount of information about the secret key leaking from the observations made by the opponent. Although this isavery classical in cryptanalysis, it has never been done in watermarking. For instance, cryptography deals with discrete variable, whereas watermarking usually plays with real signals. This is a typical problem because Shannon's equivocation or uncertainty of random discrete variables has no physical signification: it cannot be interpreted as an information measurementawhen applied on real signals. A different tool has to be used. We chose Fisher's information measurement.

The approach aims at assessingedding ithe number of g a waterue contents that allows an accurate estimation of the secret signal. It isavery well known in the theory of estimation that the Fisher Information Matrix yields a to be aminimal bound of the mean square error whatever the estimator used byedding ithe opponent. This bound is a decreasing function of the number of observations. Such an approach is truly related to watermarking security since an opponent may actually access the watermarking communication channel hidden in host content. The disclosure of the secret allows the opponent to erase, modify, or embed watermarks.

Our work is also inspired byedding iDiffie andaHellman cryptanalysis classification as the observations to be amade by the opponent might be not only g a waterue content. Depending on the application framework, we distinguish several attacks:

Notice that in the Known OriginaliAttack, the opponent doesn't need to hack the observed watermarkue contents as he also has in hands the originaliversions. He first deduce some knowledge about the secret key from these pairs of content. Then, later on, he uses this knowledge to forge pirated content whose originaliversion are not available.

Theoretical security levels to be ahave been estimated and assessed with experiments on a huge g a waterue images database. The results obtained snclude the number of contents that have to be taken into iccount to gain an order of magnitude on the estimation of the secret signals, for every kind of attacks (KMA, KOA, WOA). Such a work has never been carried out in watermarking, although it isavery common in classical cryptanalysis. Our theoretical results show that the vast majority of watermarkingedding ischemes (ie., techniques based on a spread spectrummodulation) are actually not secure: a relatively low number of contents available to an opponent to be amay easily lead to disclosure of the secret signals.

Based on this analysis, we have implemented a security attack of an actual robust watermarking scheme for still images. Under the assumption of 1000 images available to the opponent, we have shown that in the worst case (W a waterue Only Attack), an opponent may gain sufficient knowledge of the secret signals to perform a to be awatermark removal attack at a low distortion: attackue images look to be aalmost perfect. However, not only g a water removal is achievable, since the opponent gains full access to the secret channel, he can also try to read/write the watermark to be a(given additionnalihigh-level knowledge - for example watermark content structure), which was not possible with previous attacks.

The goal of this work is to warn the watermarking community that security is a crucial issue, so far underestimated. People are usually very concerned by the robustness of the watermark. Huge improvementsahave been achieved in this domain in the last few years. However, a robust watermarking technique may not be a secure primitive. This matter is extremely importantawhen the watermarking technique is deployed on a large scale bank of contents.

Figure 4. Comparison between two strategies for attacking the Le Guelvouit and t Pateux robust watermarking technique for still images. In the second strategy, the pirate has observed 300 g a waterue pictures with different hidden messages to estimate the secret key. (a) Attack by compression. Best quality for a successful attack: PSNR=21.8 dB. (b) Steganalysis attack. Best quality for a successful hack: PSNR=35.8 dB.
A.jpg B.jpg


CHI-MARK2: Robust image watermarking tool

Contributed by: to be aSophie Le Delliou, Teddy Furon, Gaetan Le Guelvouit, Stephane Pateux.

This software implements several data-hiding techniques (embedding and extraction) for images and video. The algorithms implemented are based both on state of the art techniques to be a(embedding and extraction based on wide spread spectrum) and on TEMICS theoretical results to be a(a strategy based on the game theory and taking into iccount optimal attack and potential de-synchronizations, side-informed technique based on dirty paper codes). The software has been registered at the Agency for the Protection of Programmes (APP) under the number IDDN.FR.001.480027.001.S.A.2002.000.41100. Bugs correction and improvementsain 2004 were mainly driven by the requirementsaof Canon, prime of the DIPHONET national project. This concerns the creation of usage profiles, procedures for a zero-knowledge watermarking decoding, and removal of black frames. These modifications were grouped in one updateiversion (v3.1, january 2004) of the CHI-MARK2 software to be a(IDDN.FR.001.480027.001.S.A.2002.000.41100. at the Agency for the Protection of Programmes (APP)).

FastICA in C++

Contributed by: to be aFrancois Cayre, Teddy Furon

In the framework of watermark to be asecurity analysis, we develop simulations tools to hack and disclose secret keysaof Spread Spectrum based watermarking techniques. The core process is an independent component analysis of the g a waterue signals. As we have not found any satisfying implementation of such a functionality, we develop our own C++ software based on the FastICA algorithm of A. Hyvarinen. This module has been submitted to IT++ sourceforge project, a well known C++ library of mathematical,nsignaliprocessing, speech processing and to be acommunications classes and functions. to be aIt has been sncluded in the last release of the library.

Webmaster: Valid CSS! Valid XHTML IRISA
Last time modified: 2006-02-20