Information Retrieval, Multimedia, Fuzzy Logic
Information Retrieval (IR) in textual documents, as performed by web search engines (e.g. Google, Yahoo, Exalead...), is a two-step process. The first one consists in representing (indexing) documents with descriptors (e.g. words, often transformed and weighted). The second-step consists in matching the user query (also indexed) against the document descriptors, and in assigning a score to each document. The answer of the system is (the head of) an ordered list of documents, ordered from the highest to the lowest score.
Roughly speaking, a document is considered relevant if it contains the words from the query, or in other words, if the query "implies" the document. When the descriptors from the documents and query are weighted, this implication operator is no more Boolean. Hence the idea to use fuzzy logic, an extension of classical logic which takes trith values in the unit interval [0,1]. The mathematical foundation of fuzzy logic then gives a clear theoretical framework to IR systems.
A theoretical work on fuzzy logic in the framework of databases has been carried-out for several years in the Pilgrim team (IRISA Lannion). More recently, preliminary works with the Texmex team have shown, from both theoretical and practical points of view, that this work could be extended to IR. The first experimental results have shown that this new approach is promissing. It seems tha fuzzy logic i) yields good experimental results, ii) provides a strong theoretical framework, usually absent from IR systems, and iii) allows for a better interaction with the user, thanks to the large range of fuzzy logic operators one can use in the queries.
First, the student will work in the context of textual IR. S/he will study some proposed extensions of our system, and will be encouraged to propose new ones. For instance, the links between our IR model based on fuzzy logic and existing IR models (Vector Space Models, Language Models, Probabilistic Models...) could be studied, as well as the use of fuzzy operators in the queries (fuzzy AND/OR to represent complex queries, anti-division operators for negative queries...).
Then, this work will be extended to multimedia information retrieval. Methods developed a long time ago for the representation of complex data and their aggregation will allow for the accurate handling of multimedia data (e.g. video) as a whole, an not media by media. The expected IR system should be theoretically grounded, show excellent practical results, and allow for a natural interaction with the user.
No a priori knowledge is required for this subject. However, the PhD candidate needs both theoretical skills (to define, explain, justify the mechanisms to apply) and practical skills (to implement, experiment, and evaluate the models against real data).
The PhD will take place in the TexMex team at IRISA Rennes, with regular meetings with the Pilgrim team in Lannion.