mstarfssamningur

Direction des Relations Européennes et Internationales (DREI)

Programme INRIA "Equipes Associées"

 

I. DEFINITION

EQUIPE ASSOCIEE

Eff2
sélection
2005

Projet INRIA : TexMex
Organisme étranger partenaire : Université de Reykjavík
Unité de recherche INRIA : Rennes
Thème INRIA : systèmes symboliques
Pays : Islande
 
 
Coordinateur français
Coordinateur étranger
Nom, prénom   Laurent Amsaleg   Björn Þor Jónsson
Grade/statut  CR1   Associate Professor
Organisme d'appartenance
(précisez le département et/ou le laboratoire)
 CNRS   Reykjavík University
Adresse postale  IRISA. Campus de Beaulieu. 35042 Rennes cedex. FRANCE   Reykjavík UniversityDepartment of Computer Science.  Kringlan 1. IS-103 Reykjavik. ICELAND
URL   http://www.irisa.fr/texmex/Laurent.Amsaleg/  http://www.ru.is/Default.aspx?PageID=1049&ID=bjorn
Téléphone   +33-299847444   +354-5106240
Télécopie   +33-299847171   +354-5106201
Courriel  Laurent.Amsaleg@irisa.fr   bjorn@ru.is

La proposition en bref

Titre de la thématique de collaboration :
     Eff2:
efficiency and effectiveness in content-based image retrieval systems.
     Eff2 : efficacité et efficience dans les systèmes de recherches d'images par le contenu.

Descriptif : Image databases, and content-based image retrieval systems in particular, have become increasingly important in many applications areas. Moreover, new applications exploiting fine detail of images are now fast emerging thanks to recent and modern image processing techniques.  While extremely effective (they return high quality results), these image processing techniques are very inefficient (they answer very slowly) due to their complexity and because of the inadequacy of traditional lower layers of software. This is particularly prevalent at large scale when dealing with image collections of realistic sizes. The goal of this project is to research and develop new database support that integrates efficiency and effectiveness for modern large-scale computer-vision related applications and problems.
Together, we came up with the PvS-framework that provides an efficient and scalable support for local description based recognition applications. While this work is still very active, we have initiated another thread of research by investigating the browsing of personnal image collections. Today, everyone can witness the tremendous increase in the capability to create, share and store digital images. As a result, personal image collections are growing at an astounding rate and it is clear that in the future individuals will need to access tens of thousands, or even hundreds of thousands, of digital images. It is therefore imperative to start studying ways to access these images in a useful and interesting manner. Addressing this topic is a new development in our cooperation.

 

Présentation de l'Équipe Associée

(environ 2 pages)

1. Présentation du coordinateur étranger

Dr. Björn Þór Jónsson is an Associate Professor, as well as the Director of Graduate Studies, in the School of Computer Science at Reykjavík University, Iceland.  His research work focuses primarily on the performance of content based multimedia retrieval, as well as the performance and tuning of relational database systems.  He has also done work on semantic caching, a client architecture for caching query results, and on the performance of information retrieval systems. He has taught classes on the database design and implementation, the architecture and performance of database systems, and on advanced database systems, such as multimedia systems, stream query processing and database client caching.
Björn completed his Ph.D. degree in Computer Science from the University of Maryland, College Park, in 1999.  After working in industry, he joined Reykjavík University in fall 2000. He has served on the program committees on some of the major database conferences in the world, and was co-creator and co-chair of the first and second editions of the international "Computer Vision meets Databases" workshop.


2. Historique de la collaboration

3. Impact : 

4. Divers : Our collaboration has been internationally recognized as being scientifically rich. We were able to convince ACM to co-locate with SIGMOD a workshop we created in 2004. This workshop (created also with Vincent Oria from New Jersey Institute of Technology, USA) was entitled "First Computer Vision Meets Databases (CVDB) Workshop''. This workshop was held in Paris, France, on June 13, 2004, and was co-located with the 2004 ACM SIGMOD/PODS conferences (the most prestigious international forum on databases) and was attended by forty-two participants from all over the world. We created this workshop because we wanted to understand why only few works in the computer vision community have adopted any of the indexing schemes that have been designed by database researchers. We discovered many valid scientific reasons but also that there was a great gap between the computer vision and the database communities. Our goal was therefore to bridge that gap and to provide database researchers with a snapshot of what computer vision people are dealing with and vice-versa, with the aim of defining some research directions that can benefit both communities. The workshop was successful. Eight papers were selected for presentation and publication. Additionally, we hand-picked two tutorialists to present their views of the research directions and contributions of the computer vision and database communities, respectively. Finally, we assembled a panel to focus on the applications of image databases in the near and distant future. Based on the observed need for a forum for exchanging ideas and results that are at the intersection of the computer vision and database research areas, we have held a second edition of CVDB in co-location with SIGMOD/PODS in Baltimore, Maryland in June 2005. 9 papers were presented, 2 keynotes were given and a panel focused on "Multimedia applications: Beyond similarity searches". It is crucial to note that the CVDB series of workshops is somehow the keystone of our Eff2 project. We decided to create CVDB because we were going deeper into the understanding of the Eff2 problems. We are happy to witness that other scientists share our visions by, among other things, participating to CVDB. Instead of pushing for a third edition co-located with SIGMOD, we are discussing with other major vision conferences and workshops (such as CBMI, MIR, WIAMIS, CIVR...) to try a more global merging to increase the overall audience and to reduce the costs -- all scientific venues would keep their identity, only a date shift would be enforced.



II. BILAN 2006

Eventuelles remarques et/ou changements survenus (indiquez ici, le cas échéant, les éléments des années antérieures qui vous semblent importants ):


 

Uniquement pour les équipes en fin de 3e année : Bilan synthétique des 3 dernières années (environ 1 page)

 

Rapport scientifique pour l'année 2006

This scientific report describing the work achieved in 2006 is divided in two parts. First, we list 5 key milestones for this 2006 year. Then, we present two new scientific issues that have appeared during the course of our studies and that will grow larger in the future.

5 milestones for 2006. Five events took place during the year 2006 that are key because they represent jumps forward in our coorperation. First, we have signed an aggreement with the main Icelandic newspaper and subsequently had access to 300,000 images, allowing us to create one of the largest local description collection ever built. Second, a publication describing our joint work has been accepted to ACM Multimedia 2006, the premier annual multimedia conference. Third, a prototype of our system implementing these ideas is going to be demonstrated at that same conference. Fourth, we are in the process of patenting our invention. Fifth, our work has triggered the publication of several articles targeted to receive a larger and less specialized audience. The following gives a short summary of each.

  1. We have signed, in December 2005 a formal aggreement of cooperation between IRISA, Reykjavík University and Morgunblaðið amstarfssamningur , the main newspaper in Iceland. Reykjavík University, in cooperation with the TexMex at IRISA, is developing and researching software to efficiently find images by their visual content. This software, referred to as PvS, may, among other uses, become part of an image copyright protection system designed to track violations of image copyright. This aggreement defines the terms under which Samstarfssamningur Reykjavík University and the TexMex team can obtain access to the image collection of Morgunblaðið while Morgunblaðið will have use of PvS software. This collection consists of about 300,000 high-resolution images. The images were delivered to us after being thumbnailed to 512x512 pixels, which is sufficient for performing extensive recognition-based performance measurements. We can keep the images for two years. Then, we have to destroy them. Their descriptions, however, can be kept as long as needed for research and development purposes, since this format does not allow for any presentation or reconstruction of the images.
    It is extremely difficult to get access to real image collections, and signing this aggreement gave us a real push since we were able to conduct a series of experiments at a scale never reached. That work resulted in a publication in ACM Multimedia, the premier annual multimedia conference, as well as a prototype demonstration in that same conference. This is described next. 
  2. ACM Multimedia 2006: paper "Scalability of Local Image Descriptors: A Comparative Study". The bulk of the work achieved in 2006 was focused on refining our fast and scalable multidimensional indexing scheme called PvS. In short, the extensive performance experiments measuring response times convinced us that we came up with an efficient and scalable database support. We therefore started to study the scalability of local image descriptors that are used in key applications including face recognition, shape recognition and image copyright protection. With these schemes, each image yields many descriptors (several hundreds for high-quality images), where each descriptor describes a small ``local'' area of the image. Two images are typically considered similar when many of their descriptors are found to be similar.
    All of these approaches, however, have only been studied and compared at a small scale (typically less than few hundred images). Overall, all existing studies fail to predict how local description schemes will perform with collections of tens of thousands of images or more. In [14], we have demonstrated that our PvS-framework achieves efficient query processing for large collections of local descriptors. We therefore decided to compare three major local descriptor schemes (SIFT, PCA-SIFT and RDTQ) to study their recognition power at large scale. This comparison included a fourth scheme that we designed, and called eff2. Using a collection of almost thirty thousand images, we showed that our new descriptor scheme gives the best results in almost all cases. We then gave two stop rules to reduce query processing time and show that in many cases only a few query descriptors must be processed to find matching images.  Finally, we test our descriptors on a collection of over three hundred thousand images (these are the Morgunblaðið images), resulting in over 200 million local descriptors, and show that even at such a large scale the results are still of high quality, with no change in query processing time.
  3. ACM Multimedia 2006: demo "Blazingly Fast Image Copyright Enforcement". Many photo agencies use the web to sell access to their image collections. Despite significant security measures, images may be stolen and distributed, making it necessary to detect copyright violations.  Our demonstration paper describes a content-based system for large-scale automatic copyright enforcement. It briefly describes the image description, indexing and retrieval algorithm that lie at the heart of the system.
    It also describes our proposed demonstration, which is a realistic scenario of copyright violations of a large image collection. The image collection used in the demonstration consists of 287,268 high-quality news images, resulting in 169,159,548 descriptors of 72-dimensions.  During the conference, we will take many "news"' photos such that we are constantly updating our image collection.  We will demonstrate, by "stealing'' and modifying new and old images, that the system practically always finds a match to copyright violations.  Overall, this demonstration will show that our system offers robust and effective descriptions, dynamic storage and blazingly fast retrieval.  More importantly, it will show that these desirable properties hold even at a very large scale.
  4. Based on the experience gained with the PvS-framework, we have designed a more sophisticated and general index which is also based on ranking, projections and partitions. This index is called the NV-tree (Nearest Vector tree) and we are in the process of patenting it. With respect to the PvS-framework, the NV-tree yields better performance and space utilization, is better able to capture the real distribution of data by self-tuning the projection and partitioning strategies, copes with on-the-fly updates of the descriptor collections, can be used stand-alone or by aggregating the results from two or more indices, and lends itself effectively to distributed processing to further reduce response times. All in all, the NV-tree yields efficient query processing and good result quality with extremely large descriptor collections.
  5. In March 2006, we published an article describing our work in the issue # 53 of the Newsletter of INRIA. This article, entitled "Un logiciel pour identifier rapidement les images piratées / A Software for Fast Identification of Pirated Images" was for some time in the front page of INRIA, as a typical example of good international cooperation. Roughly at the same time, we also published an article in a French popular scientific magazine called "Science & Vie Junior", entitled "Des images piratées débusquées en 1 seconde" (unfortunately, subscription is needed to read the article). Also, we had a article in the Les Échos economic newspaper on April 12, entitled "A la recherche d'images piratées". We had another article in the scientific journal of the CNRS, published in the issue number 197 of June 2006, entitled "Un logiciel contre le vol d'images". Finaly, we were asked by an editor named "Techniques pour l'ingénieur" (see here) to write an extended article describing not only our techniques but also the context within which enforcing copyright protection is key. This article, entitled "Contrer le piratage d'images : un logiciel précis et rapide" will be published in February 2007.
Two new scientific issues: describing sequences and digital personnal collections. We have obtained very good result with our work, focused on still images. It is now natural to turn our attention to videos, and to try to understand if we can provide some nice low level support. In addition, we have already investigated the searching dimension of still images, and left unexplored the browsing dimension. We are also starting some work in this direction. These two new axes of research are likely to form the mainstream of our activities in the future since both are going to be explored by Ph. D. students working on these topics. Romain Tavenard, who did his Master with Laurent Amsaleg, will work on sequences. his work will be achieved in France and is likely to open new doors in our joint cooperation, however. Kari Hardarson will work on the personal image collections and will be co-advised by Laurent and Bjorn. He will do his thesis mainly in Iceland but will spend about 4 month each year in France. Having two students working on issues related to our associate team is very nice.
  1. Sequences. As described in this document, we can today quite well exploit rather large databases of still images and we know how to efficiently query them by contents. The next step asks to turn our focus on more complex documents, typically video and audio. There are today several description techniques for audio and video but only very few techniques to efficiently perform query-by-content on video or audio databases. Being able to use such techniques is particularly crucial for professional multimedia archivers.
    People working in such organizations typically want to annotate incoming video or audio streams before archiving. Those annotations are then used by any subsequent search since they are at the roots of document matching. It is key to note that document annotation is an entirely manual process and to understand that this process can not scale with the constantly increasing number of streams to annotate. Therefore, one salient application is the automated segmentation of multimedia streams into separate units, then the automatic annotation of each unit, before archiving the documents. It is thus necessary to perform searches in streams to detect for example jingles, trailers, or the periodic broadcast of elements, etc. Those searches are more complex then searching simply for the repetition of identical patterns since it is necessary to find correlations despites distortions, duration variations, super-imposition of noise, text, additional music, inclusion of multiple side-streams, etc.
    The state of the art make such searches possible, but only at a very small scale, i.e., on a very small amount of data. Today, no search technique is efficient enough to allow any practical usage of real-scale audio or video archive. In addition, it has been observed that it is not possible to simply extend existing multidimensional indexing techniques since they were designed for description schemes in which the concept of sequences is lacking. One of the most prevalent difficulties comes from the temporal aspect of video and/or audio descriptions. Describing video and audio means creating sequences of descriptions in which the notion of order between descriptions is central. That notion of order is ignored by all traditional search techniques that only search for independent elements that are, at most, very loosely coupled. We therefore try to understand how multidimensional indexing techniques can integrate in their principles the notion of sequences of descriptions. This needs to be done to make possible searches by content in very large archives of video and/or audio documents.
    We have started to work on this topic with a student in France named Romain Tavenard. Romain is starting a PhD with Laurent Amsaleg. Romain has implemented few techniques from the state of the art (exhaustive search, dynamic time warping, mixture of Gaussian models and SVM-based modelling) and ran performance evaluations on audio recognition. Using a collection of real audio sample, he checked the ability of each technique to handle recognition despite time shifts, time distortions and some other signal distortions. It turns out that SVM-based models perform quite nicely but are very inefficient in terms of response time. This open room for improvement.
    In parallel to this study, processing huge corpuses of audiovisual content enforces the need to create an adapted infrastructure. This infrastructure has to cope with three main constraints: First, data management and storage aspects are crucial; Two, video or sound analysis tools consume a large part of the computer processing power; Three, that infrastructure must be easily accessible, independently of the operating system used by the client. An engineer at IRISA (Arnaud Dupuis) designed a client/server solution built to facilitate the processing and indexing of video (and audio shortly). In this context, we had one internship (Lian Liu) who worked on completing the ground truth on our three weeks video dataset. It was thus mainly to check manually the consistency of the existing ground truth and to create the ground truth on the last week of the corpus. Creating this ground truth meant to parse the video stream in order to find the boundaries of the television programs and assign them a title. This was done using the infrastructure.
  2. Personal Image Collections. In recent years, the world has seen a tremendous increase in the capability to create, share and store digital images. As a result, personal image collections are growing at an astounding rate and it is clear that in the future individuals will need to access tens of thousands, or even hundreds of thousands, of digital images. It is therefore imperative to start studying ways to access these images in a useful and interesting manner. What is needed is software that will allow users to seamlessly organize, search and browse their images. Kari Hardarson will carry out the research needed to progress on this topic. Kari has held a full-time teaching position at Reykjavik University for several years. He has an M.Sc. degree in computer science from the University of North Carolina, Chapel Hill, and has already started working towards a Ph.D. degree at Reykjavik. We have sent a grant application to Rannis that will be used to reduce Kari's teaching load by about 50%, to allow him the time to conduct his research. So far, Kari has investigated the state-of-the-art in this area in detail, including the installation and testing of several research prototypes. Next, he will start working towards a flexible prototype for use in our research.

Laurent Amsaleg has been involved in three committees evaluating the work done by students achieveing their master at the University of Reykjavik:
  1. Hafþór Guðnason. Median Rank in Face Recognition, M.Sc. thesis, Reykjavík University, June 1st, 2006.
  2. Friðrik Heiðar Ásmundsson. The NV-Network: A Distributed Architecture for High Throughput Image Retrieval, M.Sc. thesis, Reykjavík University, August 21st, 2006.
  3. In 2005, I was involved in the following committee, but forgot to mention it in last year's report: Herwig Lejsek. The PvS-Index, M.Sc. thesis, Reykjavík University, June 20, 2005.
In September 2006, the work we all achieved together received the EUROPRIX Top Talent Award Quality Seal awards.Link here.

 

Rapport financier 2006

1. Dépenses EA (effectuées sur les crédits de l'équipe associée)
 
Budget EA alloué
Montant dépensé
Accueil    
Missions    
Total
(a)            15,000 (b)   17,412.79
Taux d'utilisation des crédits EA alloués (b/a %)
 1.16

 

2. Dépenses externes (soutenues par des financements hors EA)
 
Budget alloué
Montant dépensé
Nom de l'organisme 1 (*): The Icelandic Research Fund for Student Work
Accueil    
Missions    
Total
 56000 euros
 56000
Nom de l'organisme 2 (*) : Reykjavík University
Accueil    
Missions    
Total
 6000 euros
 6000

Total des financements externes

alloués : (c)86650

dépensés :62000

In addition, 9560 euros were provided by Egide, and not yet consumed. I do not know how to insert lines in the above table...

Total des financements EA et externes

alloués : (d)86650

dépensés :79412


Taux de co-financement (c /d %)

0.91

Bilan des échanges effectués en 2006


1. Seniors

Nom
statut (1)
provenance
destination
objet (2)
durée (en semaines)
Coût (EA)
Coût (externe)
 JONSSON Asso. Prof.
Islande
Rennes
 work  1  127.1  
 AMSALEG CR CNRS
France
Paris
 PC 0.7
210.97
 
 JEGOU  CR INRIA
France
Reykjavik
 work
 1  1448.6  
 AMSALEG CR CNRS
France
 Reykjavik work 
 1  2360.83  
 AMSALEG CR CNRS
France
Reykjavik
work + defense
 0.5  1508.32  
 amsaleg CR CNRS
france 
Santa Barbara
ACM Multimedia
 1  3101.47  
 ORIA Ass. Prof
 USA Rennes
seminaire
0.7
128.75
 
 AMSALEG CR CNRS
France
Lyon
defense
0.7
 232.66  

Total des durées en semaines
 5
(1) DR / CR / professeur
(2) colloque, thèse, stage, visite....


2. Juniors

Nom
statut (1)
provenance
destination
objet (2)
durée (en mois)
Coût (EA)
Coût (externe)
 ASMUNDSON Student, MS
Iceland
Rennes
work
0.25
266.80
 
 OLAFSSON Student, MS
Iceland
Rennes
work
0.25
266.80
 
LEJSEK
Student, MS
Iceland
Rennes
work
0.25
444.70
 
DUPUIS
Expert Engineer
France
Fribourg
conference
0.25
1036.24
 
 TAVENARD Ph. D.
Rennes
Lyon
defense
0.1
338.18
 
               
Note: 4594.76 euros were additionnally used to support the work of Romain Tavenard during his Master. 1346.85 euros were additionnally used to support the work of Lian Liu.

Total des durées en mois
 1.1
(1) post-doc / doctorant / stagiaire
(2) colloque, thèse, stage, visite....

 



III. PREVISIONS 2007

Programme de travail


In 2007, the bulk of the work will be focused on investigating the issues related to the browsing of digital personnal collection of images with Kári Harðarson. In many households, organizing a home photo collection has long been a neglected task. This is still true even with the latest digital photo browsers that typically simply dump pictures into folders, an electronic version of the good old shoe-boxes our parents were using for paper-printed pictures. They offer no support for browsing and searching by image contents,  and therefore are inadequate for handling such large collections. Despite numerous features (effective packing on thumbnails on screen, identifying representative images, zoomable user interfaces, ...), all current photo browsers share limitations such as using a time-line view or a folder view at each time, failing to use the two dimensions of the screen. Most have clumsy annotations capabilities and more than anything else completely separate the search and browsing functions. This key flaw is not unique to image browsers: on the Web, browsing is clicking hyperlinks while searching is through Google or others, typically returning a flat list of results from which browsing can start. Overall, presentation is typically linear and the contents of the images are not used to guide the search and presentation.
Each image may be described by a number of attributes, based on image contents and image meta-data (such as camera and time information, stored in so-called EXIF headers). Some of these attributes may be linear or spatial, such as time and location of taking the image, while others may be textual, hierarchical or categorical. These attributes may be considered dimensions in an image hyper-space, which we must be able to traverse dynamically to fully enjoy our digital images. In on-line analytical processing (OLAP), multi-dimensional data is dealt with by considering a few dimensions at a time and pivoting between dimensions when necessary. In advanced computer games such as EVE online, large three-dimensional worlds are explored by simulating space-travel. Both approaches have been very successful in keeping their users occupied and focused on their task for a long time. We propose that a browsing interface for images should merge these features into a multi-dimensional interface that allows flexible space-travel like exploration of the image hyperspace. In order to begin exploring the possibilities of such a browsing interface we have implemented a prototype, based on the PartiView browser, which allows us to browse images in a three-dimensional space. The dimensions may be based on image contents and image meta-data and different dimensions may be combined in an arbitrary manner. Our conclusion is that while the prototype has shortcomings, this is a very promising research direction that merits further exploration. What is novel in this work is that we want to integrate to an image browser OLAP browsing concepts, such as pivoting and filtering that have typically been designed to facilitate the browsing of huge financial data collections. 

In 2006, Romain Tavenard will spend a year in Amsterdam, and will not effectively start his Ph.D. right away. His stay in Amsterdam is a cooperation we have with Eric Pauwels in the context of the MUSCLE European Project. Romain will improve his knowledge in signal processing, databases and multimedia. Romain will then switch to doing his Ph. D., with a support from "Ecole Normale".

Budget prévisionnel 2007

1. Co-financement

- Cette coopération bénéficie-t-elle déjà d'un soutien financier de la part de l'INRIA, de l'organisme étranger partenaire ou d'un organisme tiers (projet européen, NSF, ...) ?
- Dans le cas où votre proposition serait retenue, vous parait-il probable d'obtenir de l'organisme étranger partenaire un soutien financier symétrique ?

ESTIMATION PROSPECTIVE DES CO-FINANCEMENTS
Organisme
Montant
 The Icelandic Research Fund for Student Work   60.000    ??? Hard time for fundings these days in Iceland
   
   
   
   
Total
 

2. Echanges

Description des échanges prévus dans les deux sens : accueil de chercheurs de votre partenaire et missions INRIA vers votre partenaire.
Motivez l'utilité et l'intérêt spécifique des échanges et la complémentarité des équipes.
Précisez s'il s'agit de chercheurs confirmés ou de juniors (stagiaires, doctorants, post-doctorants). Spécifiez si ces échanges ont lieu dans le cadre d'un travail scientifique, d'organisation d'événements conjoints, de séminaires, tutoriels ou écoles, de formation par la recherche : indiquez les étudiants impliqués dans la collaboration, donnez une estimation de leur nombre de chaque côté et précisez si des thèses -éventuellement en co-tutelle- sont prévues (pour chaque échange, précisez la durée et le calendrier prévisionnel).

ESTIMATION DES DÉPENSES
Montant
 
Nombre
Accueil
Missions
Total
Chercheurs confirmés  3 (Bjorn Laurent, Patrick Gros, Herve Jegou)
4x 1 week    6000 
Post-doctorants
       
Doctorants  Romain, Kari
 1 week + 4 months
   9000

Stagiaires

       
Autre (précisez) :
       
Total
       
   
- total des co-financements
 
   
Financement "Équipe Associée" demandé
 15 000

Remarques ou observations :

The bulk of the money for next year will probably support Kari in conducting his PhD with us in Rennes. The duration of his stay(s) is no decided yet. 

 

 

© INRIA - mise à jour le 02/08/2006