Pirated Images Spotted Faster
The French electronics and
multimedia company Thomson Corporate Research has developed a technology for the automatic detection of
illegal copies of copyrighted films that are webcasted through Peer-to-Peer
file exchange protocol. Until recently, their system of video descriptors
was working fine, but… slowly.
Irisa TexMex research team specializes in the
handling of multimedia documents, with an emphasis on the problems raised by
the management of very large volumes of data. TexMex and researchers from
the University of Reykjavík have created an associated
team - Eff2 - that has come up with a retrieval
system for huge image databases. This
return high quality results but, in addition, it does it extremely fast,
allowing not just effectiveness, but also efficiency in real workflow.
In September 2006, the result of Eff2 research was presented by TEXMEX Laurent Amsaleg, during the first meeting of Diwall, a security-focussed scientific grouping of various research centers in Rennes. Members of the audience were people from Thomson Corporate Research security lab, who immediately figured that this fast retrieval method could well help accelerate their own pirate-spotting technology. “They had the effectiveness. We could offer efficiency as a plus. Precisely what they were looking for”, explains TexMex founder Patrick Gros.
Cooperation was swiftly put on track, as Thomson is to show its prototype to the Motion Picture Association of America (1) in late December. “Obviously, Thomson’s goal is to market a product out of this.” The ensuing technology transfer contract could be three-fold: software transfer, some rewriting to meet Thomson’s requirements and probably a consulting aspect as well. Looming further ahead is the prospective of a more long-term partnership within Quaero, a mostly Franco-German project for developping new products and technologies for managing, searching and explorating large collections of multimedia documents.
The search of images in large size databases is “not an easy task, Patrick Gros. Most people do their searching in an imprecise compare-mode. Therefore, obviously, the sheer mass of information to browse makes it a hard undertaking.” TexMex doesn’t work along this pattern. It has “opted for another approach: working on copy detection.” One of the reasons is the fact that “it can be automatically evaluated”.
The problem is “database size-related. Typically, in a photo agency (2), the size of stock can reach anything from 2 to 30 million pictures. An Internet photo bank like Flickr deals with 250 million images.” And that is just still photography. “In video world, you are talking billions of images.”
Instead of comparing images and have them textually described, like most people would do, “we prefer to work on a digital description in order to apprehend its content in a tamper-resistant mode.” Of course, “if you want to go into copy detection, you better know what you want to be resistant to.” It might be, for instance, a cropped picture, or an image from which the watermark has been removed. “These things can be spotted.”
Now, take in-theater movie bootlegging. Any camcordered pirate video features “a switch from 25 to 30 frames/seconds, a trapezoid image and a significant change in colors. These changes from the original can be described and spotted. We, at TexMex, haven’t work on theses video aspects. But it is a research field at Thomson.”
This method features “a constant search complexity, thus allowing extremely fast return of results on very large databases.” In other words, it is not size-sensitive. “We keep retrieval time constant, whatever the size of the base, Patrick Gros explains. This property has been verified, as far as we have tested. We did test up to 208 million descriptors, i.e. 300,000-some images.”
Footnotes
(1) The association represents major Hollywood studios like Disney, Paramount, Warner, 20th Century Fox…
(2) Together with Canon research center, TexMex has worked, in the past, on Diphonet, a project designed to help photo agencies enforce the protection of their copyright on the Internet by relying on a content-based image retrieval and analysis.