You are here

Real-Time Efficient Recommenders: A System Perspective

Team and supervisors
Department / Team: 
Team Web Site: 
https://team.inria.fr/wide/
PhD Director
François Taiani
Co-director(s), co-supervisor(s)
Davide Frey
Contact(s)
NameEmail addressPhone Number
Davide Frey
davide.frey@inria.fr
02 99847565
François Taiani
francois.taiani@irisa.fr
02 99 84 75 04
PhD subject
Abstract

Recommendation systems have become a fundamental component of most online applications. Initially employed by a few specialized e-commerce websites, they have now found application in a variety of settings. These include news recommendation, social networks, and even software engineering, or smart homes. Most existing recommender systems remain managed by large companies that can afford the computing power required to gather, store, secure, and process the data associated with large user bases. Yet an increasing number of SMEs have entered the world of recommendation by offering innovative services to media, online retailers, and a variety of other markets.

The interest in recommender systems increased dramatically when Netflix announced their competition and introduced the Netflix Prize dataset. The goal of the prize consisted in improving the precision of Netflix’s algorithm by 10%. Since then, researchers have proposed a variety of algorithms to achieve better and better performance. After announcing the winning algorithm in 2009, Netflix declared that they would not deploy the winning solution because the required engineering efforts would outweigh the benefits. In spite of this, most researchers have continued focusing on recommendation accuracy ignoring aspects like dynamics—in the the sets of items, users and the associated interests—operation cost, and privacy.

In one of the few papers that tackle recommendation from the perspective of data dynamics [3], Twitter engineers explain how the real-time demands of their query-expansion engine forced them to completely revise their big-data architecture. But the case of Twitter is not isolated. A growing number of companies offering recommendation services are faced with new data being generated every second. This requires not only the ability to train recommendation models fast, but also to cope with the so-called cold start problem. New users, or new items, that have not interacted with the system before cannot benefit from collaborative filtering and therefore require the combination of a variety of algorithms, including content-based techniques [4], to achieve reasonable performance. The need to process data at a high rate also increases the costs associated with running a recommender system. Dynamic sets of users and items require force continuous updates to the recommendation model, which translates into high storage and computational costs. Finally, the increasing diffusion of recommender systems also presents privacy risks. For example the recommendations received by one user may reveal information about the interests of others [1].

In this PhD program, we plan to tackle the above issues by taking into account the entire recommendation workflow. We will study recommendation systems as a whole: not just from the point of view of algorithms but also in terms of real-time performance, scalability, and privacy. First, we plan to target the issues associated with real-time settings such as news recommendations [2, 5], or tweeter-like microblogging platforms [6]. This involves, on the one hand, training models as fast as possible while retaining good accuracy, and on the other, devising techniques to minimize the cold start phenomenon associated with the presence of new users and items. Second, we plan to explore how to optimize the design of recommendation systems in order to make the best use of the available hardware and software architectures thereby minimizing operational costs. We will consider questions such as where to store data, how to represent it, and how to organize computations. Finally, we will address the problem of privacy in recommender systems, by devising privacy preserving training techniques, and evaluating the vulnerabilities of the proposed solutions.

To carry out this research, we will have the opportunity to collaborate with Mediego (https://www.mediego.com), a startup company founded by the former leader of the ASAP team (WIDE’s precursor) Anne-Marie Kermarrec. Mediego focuses on media recom- mendation and offers two main services to its customers: online recommendations and personalized newsletters. In both cases, this requires them to manage continuosly changing sets of items, and offer recommendations to users with very dynamic intetrests. In the case of online recommendations, they must be able to compute new recommendations in less than 100ms to maximise user experience. In the case of their newsletter, they need to compute 200,000 personalized newsletters in less than one hour, starting from the moment when the articles become available. The collaboration with Mediego will allow us to test the results of our research in a real setting, with a direct impact in terms of innovation and technology transfer.

Bibliography
  1. [1]  Antoine Boutet, Davide Frey, Rachid Guerraoui, Anne-Marie Kermarrec, Antoine Rault, Fran ̧cois Ta ̈ıani, and Jingjing Wang. “Hide & Share: Landmark- based Similarity for Private KNN Computation”. In: 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Rio de Janeiro, Brazil, June 2015, pp. 263–274. doi: 10.1109/DSN.2015.60. url: https://hal.archives-ouvertes.fr/hal-01171492.

  2. [2]  Antoine Boutet, Davide Frey, Rachid Guerraoui, Arnaud J ́egou, and Anne-Marie Kermarrec. “WhatsUp Decentralized Instant News Recom- mender”. In: IPDPS 2013. Boston, United States, May 2013. url: https: //hal.inria.fr/hal-00769291.

  3. [3]  Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, and Jimmy Lin. “Fast Data in the Era of Big Data: Twitter’s Real-time Related Query Suggestion Architecture”. In: Proceedings of the 2013 ACM SIGMOD In- ternational Conference on Management of Data. SIGMOD ’13. New York, New York, USA: ACM, 2013, pp. 1147–1158. isbn: 978-1-4503-2037-5. doi: 10.1145/2463676.2465290. url: http://doi.acm.org/10.1145/ 2463676.2465290.

  1. [4]  Royi Ronen, Noam Koenigstein, Elad Ziklik, and Nir Nice. “Selecting Content- based Features for Collaborative Filtering Recommenders”. In: Proceedings of the 7th ACM Conference on Recommender Systems. RecSys ’13. Hong Kong, China: ACM, 2013, pp. 407–410. isbn: 978-1-4503-2409-0. doi: 10. 1145/2507157.2507203. url: http://doi.acm.org/10.1145/2507157. 2507203.

  2. [5]  A. S. Das, M. Datar, A. Garg, and S. Rajaram. “Google news personaliza- tion: scalable online collaborative filtering”. In: WWW. 2007.

  3. [6] https://joinmastodon.org/.

Work start date: 
Octobre 2019
Keywords: 
recommender systems, scalability, fast data
Place: 
IRISA - Campus universitaire de Beaulieu, Rennes