Direction des Relations Internationales (DRI)

Programme INRIA "Equipes Associées"
/ INRIA "Associate Teams" Programme

 

I. DEFINITION

EQUIPE ASSOCIEE
/ ASSOCIATE TEAM

DataCloud@work

sélection

2010



Equipe-Projet INRIA : KerData

Organisme étranger partenaire / Partner Institution:

Politehnica University of Bucharest (PUB)

Centre de recherche INRIA :

Rennes - Bretagne Atlantique


Thème INRIA :

Réseaux, systèmes et services, calcul distribué
Calcul distribué et applications à très haute performance

Pays / Country :

Romania

 


Coordinateur français / French Coordinator

Coordinateur étranger / Partner Coordinator

Autre partenaire français / Other French Partner

Nom, prénom / First name, Given name

ANTONIU Gabriel

CRISTEA Valentin

MORIN Christine

Grade, statut / Position

Chargé de recherche

Professeur

Directrice de recherche

Organisme d'appartenance/ Home Institution
(précisez < le département et/ou le laboratoire)

INRIA, Centre Rennes - Bretagne Atlantique

Equipe KerData

National Center for International Technology (NCIT)

Politehnica University of Bucharest (PUB)

INRIA, Centre Rennes - Bretagne Atlantique

Equipe-projet PARIS

Adresse postale / Postal address

Campus de Beaulieu, 35042 Rennes cedex

313, Splaiul Independentei, 0600042, Bucuresti, Romania

Campus de Beaulieu, 35042 Rennes cedex

URL / Website

http://www.irisa.fr/kerdata/people/Gabriel.Antoniu/

http://csite.cs.pub.ro/index.php/en/component/comprofiler/?task=userProfile&user=73/

http://www.irisa.fr/paris/web/component/option,com_uhp/task,view/Itemid,110/id,40/

Téléphone / Telephone

+33 2 99 84 72 44

+40 214 029 332

+33 2 99 84 72 90

Télécopie / Fax

+33 2 99 84 71 71

+40 214 029 333

+33 2 99 84 71 71

Courriel / Email

gabriel.antoniu@inria.fr

valentin.cristea@cs.pub.ro

christine.morin@inria.fr

NOTA: Si la proposition d'Equipe Associée comporte plusieurs partenaires, français et/ou étrangers, vous pouvez :
- soit ajouter une colonne,
- soit dupliquer le tableau ci-dessus autant de fois que nécessaire, en remplaçant "Coordinateur français ou étranger" par "Autre participant français ou étranger".
/ In the case of multiple INRIA project-teams and/or multiple foreign partners, applicant may:
- either add another column on the right
- or duplicate the above table as many times as needed, and replace "French coordinator" / "Partner coordinator" by "Other french or partner Participant"


La proposition en bref
/ The proposal in brief

Titre de la thématique de collaboration (en français et en anglais) / Title of the collaboration theme (in French and in English) :

Stockage Autonome pour les Services sur Clouds / Autonomic Storage for Cloud Services

Descriptif (environ 10 lignes) / Description (approximately 10 lines) :

While the cloud computing paradigm is progressively being adopted by companies wishing to deliver large-scale distributed services, such as Amazon, IBM, Google or Yahoo!, other research efforts in the area of large-scale distributed computing are exploring the concept of a grid operating system. Both kinds of systems aim at providing seamless access to a powerful distributed processing infrastructure, while hiding as much as possible all aspects related to the management of the underlying physical resources. In both contexts, data management is a key issue. It significantly impacts the quality of service being delivered by such distributed infrastructures. In this project, we aim at investigating ways to provide advanced, autonomic storage mechanisms for cloud services. More specifically, the goal is explore how to build an efficient, secure and reliable storage service for data intensive distributed applications running in cloud environments by enabling an autonomic behavior. In addition, we will leverage the grid operating system approach as a cloud technology (e.g., by relying on its OS-support for virtual organizations). For validation purposes, experimental prototypes will be implemented based on the BlobSeer data-sharing platform (designed by the KerData Team), on the MonALISA monitoring framework (using the expertise of the PUB Team), and on the XtreemOS grid operation system (designed under the leadership of the PARIS Team). The work will also include interactions with the Nimbus team from Argonne National Lab, led by Kate Keahey: experiments will be carried out using the Nimbus cloud software. The validation phase will include intensive, large-scale experiments on the ALADDIN-Grid'5000 grid testbed.

Présentation détaillée de l'Équipe Associée
Detailed presentation of the Associate Team

1. Scientific goals of the proposal

The emerging cloud computing model [1,2,3] is gaining serious interest from both industry and academia in the area of large-scale distributed computing. It provides a new paradigm for managing computing resources: instead of buying and managing hardware, users rent virtual machines and storage space. Various cloud software stacks have been proposed by leading industry companies, like Google, Amazon or Yahoo!. They aim at providing fully configurable virtual machines or virtual storage (IaaS: Infrastructure-as-a-Service [4,5,6]), higher-level services including programming environments such as Map-Reduce [7] (PaaS: Platform-as-a-Service [8,9]) or community-specific applications (SaaS: Software-as-a-Service [10,11]). On the academic side, one of the most visible projects in this area is Nimbus [5,12], from the Argonne National Lab (USA), which aims at providing a reference implementation for a IaaS. In parallel to these trends, other research efforts focused on the concept of grid operating system: a distributed operating system for large-scale wide-area dynamic infrastructure spanning multiple administrative domains. XtreemOS [13, 14] is such a grid operating system, which provides native support for virtual organizations. Since both the cloud approach and the grid operating system approach deal with resource management on large-scale distributed infrastructures, the relative positioning of these two approaches with respect to each other are currently subject to on-going investigation within the PARIS Project-Team (http://www.irisa.fr/paris/web/) at INRIA Rennes - Bretagne Atlantique: a preliminary discussion is available in [15].

Both in the contexts of the emerging cloud infrastructures and in that of grid operating systems, some of the most critical open issues relate to data management. The KerData research team (http://www.irisa.fr/kerdata/) of INRIA Rennes - Bretagne Atlantique, has recently been created with the goal of exploring ways to address the main challenges raised by data storage and management on cloud infrastructures. The team is designing and implementing BlobSeer [16, 17], a generic data-sharing platform which aims at providing support for storing massive data with fine-grained access control under heavy concurrency on large-scale distributed infrastructures. In addition, it will support versioning and decentralized metadata management. Providing the users with the possibility to store and process data on externalized, virtual resources from the cloud requires simultaneously investigating important aspects related to security, efficiency and quality of service. To this purpose, it clearly becomes necessary to create mechanisms able to provide feedback about the state of the storage system along with the underlying physical infrastructure. This information thus monitored, can further be fed back into the storage system and used by self-managing engines, in order to enable an autonomic behavior, possibly with several goals such as self-configuration, self-optimization, or self-healing. To start moving towards this goal, the KerData Team has started to work with the Distributed Systems and Grids team from NCIT (PUB, Romania) on the design of preliminary introspection mechanisms for BlobSeer. This work is relying on MonALISA [18,19], a general purpose monitoring framework whose main contributors belong to the PUB Team. This preliminary work is detailed in [20].

In this project, we aim at investigating several open issues related to autonomic storage in the context of cloud services. The goal is explore how to build an efficient, secure and reliable storage IaaS for data-intensive distributed applications running in cloud environments by enabling an autonomic behavior, while leveraging the advantages of the grid operating system approach (such OS-support for virtual organizations). For validation purposes, experimental prototypes will be implemented based on the BlobSeer data-sharing platform (designed by the KerData Team), on the XtreemOS grid operation system (designed under the leadership of the PARIS Team) and on the MonALISA monitoring framework (using the expertise of the PUB Team). This work will also include involvement with the Nimbus team from Argonne National Lab, led by Kate Keahey. Experiments will be carried out with the Nimbus cloud software infrastructure. The validation phase will include intensive, large-scale experiments on the Grid'5000 [21,22] grid testbed. We have divided the work in three main areas (each of which corresponds to one of the three years of the project), as described below.

Direction 1: Using BlobSeer for sharing application data in a IaaS

Using BlobSeer for sharing application data in a IaaSScenario: Infrastructure as a Service (IaaS) is the delivery of computer infrastructure (typically a platform virtualization environment) as a service. The client typically runs a distributed application using virtual machines (VMs) rented from a service provider. The client applications are executed by the service provider as a set of virtual machines in a secure environment that enforces several restrictions, according to some pre-established contract. In such a context, access to local storage space on the physical machine where the application is running (owned by the service provider) is typically denied. Clients are instead provided with a specialized storage service they can access directly, through a specific API (e.g., Amazon S3 [23]).

Role of BlobSeer: In this context, the BlobSeer storage system will serve to enable the IaaS provider to offer advanced data sharing facilities to collaborating clients running within distinct VMs on the IaaS. BlobSeer's API will be directly made available as a distributed file system (e.g., within a given virtual organization). BlobSeer exposes a multiversioning interface which can be used in two ways: (1) to enable application data checkpointing (as part of checkpointing the application itself) and (2) to expose a multiversioning interface directly at application level through a specific access API. In the second phase, we will also enable the IaaS provider to allow client applications to share application data through a standard POSIX file system API. File system calls are transparently mapped to specific, secured data accesses to the internal storage service implementing data sharing for multiple VMs that form a given distributed application.

Role of MonALISA: First, the MonALISA monitoring framework includes automated management functions performed by higher-level, agent-based services. We will use these facilities to define a self-adaptive, autonomic behavior of BlobSeer through optimized, dynamic control for large-scale data transfers on dedicated circuits, data-transfer scheduling, distributed data scheduling, automated management and performance prediction of remote storage services (e.g., BlobSeer's Data Providers). Second, MonALISA will serve to introduce client monitoring , in order to ensure that the contract established with the provider is being respected. Related to security in this context, the storage service has to be aware of the different types of clients and of their access rights. Based on configurable policies that can be implemented based on MonALISA, BlobSeer will support different access patterns and enforce adaptive security rules. Moreover, the MonALISA monitoring framework can be used to monitor and to detect malicious behavior. In case of such events, MonALISA will alert the administrators or automatically implement pre-defined policies (e.g., blacklisting users and banning access for specific periods of time). Finally, the same mechanism can also be used to build a consistent reputation system that than further be used by the IaaS provider when scheduling storage resources to users.

Role of XtreemOS: Here, XtreemOS will be used as an internal cloud technology: the IaaS is XtreemOS and the secure environment where the application is running is the virtual machine itself. In its current version, XtreemOS internally relies on XtreemFS [24,25] for distributed data sharing. Our goal is to explore the possibility of using BlobSeer as an advanced, version-enabled, concurrency-optimized storage back-end, used by operating systems running inside VMs.

Direction 2: Using BlobSeer as a cost-effective storage service built on top of multiple IaaS'es

Using BlobSeer as a cost-effective storage service built on top of multiple IaaSesScenario: We consider an Infrastructure as a Service (IaaS) provider which exposes a specialized storage service to the client applications that run on the rented virtual machines, as presented in the scenario above. The storage service is a large-scale distributed application itself, which has to be able to efficiently handle massive data and heavy concurrent accesses. Therefore it needs to run on a large number of physical storage nodes, which can be rented from other, possibly multiple second-level IaaS providers. Each such second-level IaaS provider has its own pricing policies, charging clients for the number of hours they use the resources, for the amount of stored data or for the amount of network traffic generated, while offering different QoS levels. In this context, it is important to design a cost-effective scheduling policy for our first-level IaaS (in terms of money spent), for deciding how to provision storage space from the various second-level IaaS providers.

Role of BlobSeer: We will design for BlobSeer a cost-based scheduler for its storage manager in order to select one or more IaaSes that will provide the external, virtualized storage resources needed by BlobSeer. The goal is to minimize the costs of storing and transferring the data and to preserve agreed QoS levels. Clients benefit transparently from minimized storage costs, whereas BlobSeer seamlessly handles the dynamic migration of its virtualized storage hosts from one IaaS provider to another.

Role of MonALISA: In this case, the MonALISA framework is an essential building block, necessary for building an efficient cost-aware BlobSeer provider manager. Its contribution is twofold: MonALISA will monitor both the data providers for QoS evaluations and the corresponding IaaS sites for pricing information. We rely on MonALISA's ability to collect and store data from a large number of nodes in near real time and quickly retrieve it on demand. MonALISA provides an abstract data API, thus enabling the user to define which is the relevant information that has to be collected, i.e., the data needed for selecting the best IaaS providers, both in terms of cost and of node capabilities (network latency, memory size, storage space). As the process of monitoring numerous nodes or services yields a large volume of raw data which have to be stored and interpreted (millions of published parameters with high update frequency rates), we will rely on MonALISA's advanced mechanisms such as dynamically-loadable filters to select and aggregate relevant information.

Role of XtreemOS: The future direction for XtreemOS is to explore how its technology can help for federating clouds. In this context, XtreemOS aims to offer a unified cloud image, while being set up on top of several cloud infrastructures. An important aspect here is data sharing at virtual organization level, task currently fulfilled by XtreemFS. However, as XtreemFS was designed in the context of grid computing, it does not include cost-effective resource management for the case where the resources are provisioned from clouds. We will investigate how BlobSeer can do this job of implementing cost-effective storage on top of multiple, external IaaS providers.

Direction 3: Using BlobSeer for VM management to build a highly-available IaaS

Using BlobSeer for VM management to build a highly-available IaaS Scenario: IaaS providers rely on virtualization techniques to offer resources to clients. Clients are typically allowed to upload a virtual-machine image to the system, so that they could use an environment compatible with their applications. This image is then executed on each computing element rented to the client. In such a context, BlobSeer can help providing a highly-available service, as it can serve as a storage system for checkpointing images of the virtual machines. The idea is simple: instead of checkpointing virtual-machine instances locally on the computing elements, XtreemOS stores them as binary large objects (BLOBs) within BlobSeer.

Role of BlobSeer: The scenario described above is a perfect application for BlobSeer, which natively provides versioning support for all objects it stores. A new (incremental) version of a BLOB is created each time a write operation is performed on it: this feature can efficiently be used for incremental checkpointing of virtual machines. Moreover, since BlobSeer data (and thus the virtual machines) are globally accessible to the system, various management operations such as migration can be easily implemented on top of it.

Role of MonALISA: For this scenario, we will rely on MonALISA's extensible monitoring modules, which allow to monitor parameters specific to BlobSeer (ex: BLOB IDs, BLOB sizes, data providers characteristics, etc.), to the IaaS or to the VMs. Another valuable property in this context is MonALISA's capability to monitor a large number of heterogeneous nodes with different response times, and at the same time to handle monitored units which are down or not responding, without affecting the other measurements.

Role of XtreemOS: XtreemOS aims to be available as a IaaS cloud, offering the clients the possibility to rent virtualized resources. In order to reach this goal, virtual-machine management and checkpointing are crucial aspects that need to be further investigated. In the current version of XtreemOS, support for managing virtual machines is not integrated, yet. Integrating BlobSeer as a highly-available storage back-end for virtual machines will definitely help XtreemOS make progress in this direction.

2. Partners presentation

This associated team is built by leveraging the strong, specific, and complementary expertise brought by each of the 3 partners:

Romanian partner: the National Center for Information Technology (NCIT) from the Politehnica University of Bucharest (PUB)

Politehnica University of Bucharest (PUB) is the largest technical university in Romania (26,000 students, among which 1,500 with the Computer Science Department). The National Center for Information Technology (NCIT, http://csite.cs.pub.ro/ncit) is part of the PUB, within the Computer Science Department of PUB. The Center is dedicated to advanced and inter-disciplinary research. It includes several research and teaching laboratories in the fields of High-Performance Computing, Distributed Systems and Applications, E-Business and E-Government, Artificial Intelligence, Computer Networks. The Distributed Systems and Grids team (http://csite.cs.pub.ro/ds_team) is part of the NCIT and its research is directed on large-scale distributed systems middleware and applications. The focus is on distributed system monitoring and control, evaluation of distributed system using modeling and simulation, resource management, meta-scheduling, web service-based and workflow-based scheduling. The team is actively involved in multiple international, European and national collaborative projects in the area of distributed computing. Below is a brief description of a few selected projects of PUB relevant for this proposal. They illustrate the high expertise acquired by the PUB Team in the specific area of large-scale distributed monitoring, a significant success factor for the proposed associated team.

First French partner: the KerData Team from INRIA Rennes - Bretagne Atlantique

The KerData Team (http://www.irisa.fr/kerdata/) has been created on the 1st of July 2009 as a joint research team of INRIA Rennes - Bretagne Atlantique and École Normale Supérieure de Cachan - Antenne de Bretagne. It is strongly engaged in the process of becoming an INRIA project-team. KerData has been created by Luc Bougé, Professor at ENS Cachan - Antenne de Bretagne (team leader) and Gabriel Antoniu, Research Scientist at INRIA - former members of the PARIS Project-Team. KerData is focusing on Cloud storage for very large distributed data. It is addressing the challenges raised by today's data-oriented high-performance applications that exhibit the need to handle massive, non-structured data - BLOBs: binary large objects (in the order of terabytes) - stored in a large number (thousands to tens of thousands), accessed under heavy concurrency by a large number of clients (thousands to tens of thousands at a time) with a relatively fine access grain (on the order of megabytes). These challenges are investigated through the design, implementation and experimental validation of a generic data-sharing platform called BlobSeer. Among the applications targeted by the KerData Team, one class is particularly relevant for the work proposed for this project: service-based distributed applications executed in cloud environments.

Background: large-scale distributed data management. The main contribution of Gabriel Antoniu, Luc Bougé and of their students during the last 6 years (mainly while they were members of the PARIS Project-Team) was to propose the concept of grid data-sharing service whose goal has been to provide a transparent data access model at a grid scale. The service provides the grid applications with the abstraction of a globally shared memory, in which data can be easily stored and accessed through global identifiers. This concept has been illustrated through an architecture called JuxMem (http://juxmem.gforge.inria.fr/) that leverages results from several fields: consistency protocols inspired by DSM systems; scalable P2P discovery and data exchange with good scalability and volatility support; algorithms for fault-tolerant distributed systems for dynamic group management in volatile environments. This work was the subject of 3 Ph.D. theses: Mathieu Jan and Sébastien Monnet (defended in 2006) and Loïc Cudennec (defended in 2009). It has been validated at several levels:

This work led to collaborations with several partners.

Second French partner: the PARIS Project-Team from INRIA Rennes - Bretagne Atlantique

The PARIS Project-Team (http://www.irisa.fr/paris/web/) from INRIA Rennes - Bretagne Atlantique research center aims at contributing to the programming of large-scale, parallel and distributed systems. It investigates new approaches to build software mechanisms that hide the complexity of programming computing infrastructures that are both parallel and distributed. Our contribution to the field can thus be summarized as follows: combining parallel and distributed processing whilst preserving performance and transparency. The PARIS Project-Team has carried out research activities on the design and implementation of Grid-aware operating systems. It has designed and implemented Vigne [34,36,37], a system for large-scale dynamic Grids. Since June 2006, Christine Morin has been the scientific coordinator of the XtreemOS European Integrated Project. The objective of the XtreemOS Project [20] is to design, implement and promote a Linux-based Grid operating system providing a native virtual organization support. The research activities of the PARIS Project-Team in XtreemOS are focused on the design and implementation of a fault-tolerance service offering transparent checkpointing to Grid applications [31,32], on the design of virtual organization and security services [19,22,35], on the design and implementation of system services to manage virtualized infrastructures and on the design and implementation of LinuxSSI, leveraging Kerrighed SSI operating system for the cluster flavor of XtreemOS system. The PARIS Project-Team has been involved in the Grid'5000 project since the beginning in 2003 (http://www.grid5000.fr/). Grid'5000 is an infrastructure distributed in 9 sites around France, for research in large-scale parallel and distributed systems.

PUB Team: permanent staff involved

Prof. Valentin Cristea (coordinator for PUB) is the Head of the Computer Science and Engineering Department of Politehnica University of Bucharest. His main fields of expertise are Distributed Systems, Grid Computing and E-Services. He is the Director of the National Center for Information Technology, within which he leads the CoLaborator, Distributed Systems and Grid and e-Business/e-Government laboratories. He has a long experience in the development, management and/or coordination of international and national research projects. He collaborates with by Prof. Harvey Newman (from Caltech), Iosif Legrand (from CERN) and Prof. Nicolae Tapus (from PUB-NCIT) to define RoDiCA - Romanian Distributed Collaborative Architectures, which led to the development of MonALISA, MONARC2 and other projects. He also collaborates with University of Wisconsin, USA (NetPy project) and with Rutgers University, USA (VNSim, Prof. Liviu Iftode). He co-supervised the PUB Team in SEE-GRID-SCI (Contract FP7 nr. 211338), EGEE III (Contract FP7), EU-NCIT (Contract INCO-CT-2005-017101), COOPER (Contract no. 027073, FP6), CoLaborator (World Bank and CNCSIS Contract Nr. 26389/2000), and others. In 2003 he received the IBM award for excellence. He is the Romanian coordinator of the Master program on Parallel and Distributed Computer Systems co-developed with Free University of Amsterdam. He has been a visiting professor in European and US Universities, such as: Free University of Berlin (Germany), Oulu University (Finland), Free University of Amsterdam (The Nederlands), Politecnico di Torino (Italy) and Rutgers University, USA. Detailed CV.

Prof. Nicolae Tapus is the vice-rector of the Politehnica University of Bucharest. His main fields of expertise are Distributed Systems, Local Area Networks, Computer Architecture and Grid Computing. He is also a member of the NCIT board. He has a long experience in the development, management and/or coordination of research national and international projects. He is actively collaborating with IT Companies (CISCO, Microsoft, HP) and is participating in the elaboration of the strategy for the research development in ICT (including Grid development) in Romania, as a member of the government's Experts Council in these problems. He serves as a coordinator of IEEE Computer Society Chapters for IEEE Region 8 Europe, Middle Asia and Africa. He co-supervised the PUB involvement in several international projects: EGEE III (Contract FP7), P2P-Next (Contract FP7), SENSEI (Contract FP7), EU-NCIT (Contract INCO-CT-2005-017101), SEE-GRID-SCI (Contract FP7 nr. 211338), CoLaborator (World Bank and CNCSIS Contract Nr. 26389/2000) and others. He was a visiting professor in European and US Universities, such as: Grenoble (France), Free University of Amsterdam (The Netherlands), Politecnico di Torino (Italy) and Maryland University (USA). He is a member of the IEEE, Chair of Romania Computer Society Chapter and ACM professional organizations, Director of ACM International Collegiate Programming Contest for South-Eastern Europe. He manages the organization of yearly ACM programming contests in Romania, within the PUB. He is a member of the New York Science Academy (1991) and member of the Romanian Technical Science Academy (2004). Detailed CV.

Florin Pop is an assistant professor of the Computer Science and Engineering Department of the Politehnica University of Bucharest. His research interests are oriented to: scheduling in Grid environments (his Ph.D. research), distributed system, parallel computation, communication protocols and numerical methods. He received his Ph.D. in Computer Science in 2008 with “Magna cum laudae” distinction. He is member of RoGrid consortium and participates in several research projects in these domains, in collaboration with other universities and research centers from Romania and from abroad developer (in the national projects like CNCSIS, GridMOSI, MedioGRID and international project like EGEE, SEE-GRID, EU-NCIT). He has received an IBM Ph.D. Assistantship in 2006 (top ranked 1st out from 17 awarded students) and a Ph.D. Excellency Grant from Oracle in 2006-2008. Detailed CV.

Ciprian Dobre, currently a post-doc researcher, received his Ph.D. in Computer Science at the Politehnica University of Bucharest in 2008. His main research interests are Grid Computing, Monitoring and Control of Distributed Systems, Modeling and Simulation, Advanced Networking Architectures, Parallel and Distributed Algorithms. He is member of the RoGrid consortium and is involved in a number of national projects (CNCSIS, GridMOSI, MedioGRID, PEGAF) and international projects (MonALISA, MONARC, VINCI, VNSim, EGEE, SEE-GRID, EU-NCIT). His research activities were awarded with the Innovations in Networking Award for Experimental Applications in 2008 by the Corporation for Education Network Initiatives (CENIC). Detailed CV.

PUB Team: Ph.D. and Master students involved

Alexandru Costan is a Ph.D. student and Teaching Assistant at the Computer Science department of the Politehnica University of Bucharest. His research interests include: Grid Computing, Data Storage and Modeling, P2P systems. He is actively involved in several research projects related to these domains, both national and international, from which it worth mentioning MonALISA, MedioGRID, EGEE, P2P-NEXT, BlobSeer. His Ph.D. thesis is oriented on Data Storage, Representation and Interpretation in Grid Environments. He has received a Ph.D. Excellency Grant from Oracle in 2006-2009 and was awarded an IBM Ph.D. Fellowship in 2009. Detailed CV.

Eliana-Dina Tîrşa is a Ph.D. student  in Computer Science at the University Politehnica of Bucharest. In July-September 2009 she worked as an INRIA intern, on Dynamic Provisioning of Resources from Clouds in XtreemOS System. Her research interests are Fault Tolerance, Monitoring and Virtualization in Distributed Systems, Cloud Computing and Peer-to-Peer Systems. She is a participant in several national (PEGAF, Depsys) and international (MonALISA, EU-NCIT, P2P-Next) research projects. Detailed CV.

Catalin Leordeanu is a Ph.D. student in the Computer Science Department at the Politehnica University of Bucharest. He received his Master in Computer Science in 2009. His research interests include Security of Distributed Systems, Intrusion Detection and Dependability of Large-Scale Distributed Systems. He also works on numerous national and international projects on these subjects. Detailed CV.

Mugurel Ionut Andreica is a Ph.D. student in the Computer Science Department at the Politehnica University of Bucharest. His research interests include theoretical and practical aspects of Communication Optimization in Distributed Systems, Peer-to-Peer Systems, Multi-core Programming, Distributed Data Storage, Sequential and Distributed Algorithms and Data Structures. He has participated in several EU and national research projects focused on the previously mentioned topics. He was awarded a Ph.D. research scholarship from Oracle and a Ph.D. Fellowship from IBM. Detailed CV.

KerData Team: permanent staff involved

Gabriel Antoniu (coordinator for KerData), is a Research Scientist at INRIA Rennes - Bretagne Atlantique (CR1) and is a member of the KerData research team.  His research interests include: grid and cloud distributed storage, large-scale distributed data management and sharing, data consistency models and protocols, grid and peer-to-peer systems. He coordinates the involvement of INRIA Rennes - Bretagne Atlantique in the SCALUS project of the Marie-Curie Initial Training Networks programme (ITN), call FP7-PEOPLE-ITN-2008 (2009-2012) and in the CoreGRID ERCIM Working Group. He has been involved in several other international, European and national collaborative projects in these fields, including the CoreGRID European Network of Excellence. Gabriel Antoniu received his Bachelor of Engineering degree from INSA Lyon in 1997; his Master degree in Computer Science from ENS Lyon in 1998; his Ph.D. degree in Computer Science in 2001 from ENS Lyon; his Habilitation for Research Supervision (HDR) from ENS Cachan in 2009. Detailed CV.

Luc Bougé, Professor, is the Chair of the Informatics and Telecommunication Department (DIT) at ENS Cachan - Antenne de Bretagne. He is also the leader of the KerData Joint Team of INRIA Rennes - Bretagne Atlantique and ENS Cachan - Antenne de Bretagne. His research interests include the design and semantics of parallel programming languages and the management of data in very large distributed systems such as grids, clouds and peer-to-peer (P2P) networks. Detailed CV.

KerData Team: Ph.D. and Master students involved

Bogdan Nicolae is a Ph.D. student at the University of Rennes 1, France, working in the KerData Team at INRIA Rennes - Bretagne Atlantique. His research interests include: parallel and distributed computing, cloud computing, large-scale distributed data-storage solutions, versioning, transactional concurrency control. His main focus is BlobSeer, a storage service for data-intensive distributed applications designed to sustain a high data throughput under heavy access concurrency. He is also actively involved in research activities related to several other projects: LEGO, Gfarm, Hadoop, Grid'5000. His Ph.D. thesis is funded by the French Ministry of Education (2007-2010). Detailed CV.

Alexandra Carpen-Amarie is a Ph.D. student in Computer Science at ENS Cachan, working in the KerData Team at INRIA Rennes - Bretagne Atlantique. Her research interests include large-scale distributed systems, distributed data storage, cloud computing, monitoring in distributed systems. She is working on the design of an introspection layer for the BlobSeer data storage system, a first step towards an autonomic steering for this system. This layer is being implemented as an extension of the MonALISA monitoring framework and evaluated on the Grid'5000 testbed. This work is being carried out in collaboration with Alexandru Costan, Ph.D. student in the PUB Team and member of the MonALISA project (see above). Her Ph.D. thesis is funded through a grant from the INRIA CORDIS program (2008-2011). Detailed CV.

Diana Moise is a Ph.D. student in Computer Science at ENS Cachan, working in the KerData Team at INRIA Rennes - Bretagne Atlantique. Her research interests comprise parallel and distributed computing, distributed file systems, distributed data storage, data-intensive applications, the Map/Reduce paradigm. Her work has so far focused on two projects: developing a file system interface on top of the BlobSeer data storage service; integrating BlobSeer with the Hadoop framework by replacing the default storage layer, the Hadoop File System (HDFS), with BlobSeer in order to improve data throughput and add functionalities. Her Ph.D. thesis is funded through a joint grant of the Brittany Region and of INRIA (2008-2011). Detailed CV.

Viet-Trung Tran is a Ph.D. student in Computer Science at ENS Cachan, working in the KerData Team at INRIA Rennes - Bretagne Atlantique. He is working in the KerData Team at INRIA Rennes - Bretagne Atlantique. His research interests are parallel and distributed computing, distributed file systems, high-performance computing, distributed data storage. His recent Master’s thesis focused on efficient use of BlobSeer, a large-scale data-management platform, as an underlying substrate for grid file systems. His Ph.D. thesis is funded by the French Ministry of Education (2009-2012). Detailed CV.

PARIS Team: permanent staff involved

The following staff from the PARIS Project-Team will be involved into this associated team:

Christine Morin (coordinator for PARIS) is a senior researcher at INRIA Rennes Bretagne Atlantique. Since July 2009, she has been the scientific leader of  the INRIA PARIS Project-Team. She has been leading research activities on single system image OS for high performance computing in clusters, resulting in the Kerrighed Cluster OS, now developed in open source. She is the scientific coordinator of the XtreemOS Project which is a 4-year European integrated project started in June 2006. She is a co-founder of Kerlabs Start-Up, created in 2006 to exploit Kerrighed technology. Her research interests are in operating systems, distributed systems, fault tolerance, cluster, grid and cloud computing. Detailed CV.

Yvon Jégou got his engineering degree from Institut National des Sciences Appliquées (INSA) of Rennes, France and then his Ph.D. degree from the University of Rennes in 1979. He is a full-time INRIA researcher in the PARIS Project-Team. His research activities are focused on architecture, operating systems and compilation techniques for parallel and distributed computing. His current work is focused on the development of a DSM for the implementation of runtime systems on large clusters and for the management of data repositories on the Grid. In the recent past, he participated to the IST POP project on the implementation of an OpenMP runtime systems for clusters using distributed shared memories (DSM). He is currently involved in the XtreemOS IP Project. He is also involved in the Grid'5000 Project and serves as the leader of the Grid'5000 Team at INRIA Rennes - Bretagne Altantique.

PARIS Team: Ph.D. and Master students involved

Background of the collaboration between the teams

This collaboration started 2 years ago, after 2 visits of Gabriel Antoniu at PUB. We have then set up a bilateral project co-funded by the CNRS and the Romanian Academy of Science. Since then, our joint activities have become more and more active, especially thanks to the intensive involvement of French and Romanian Ph.D. and Master students through long internships and short mutual visits. Note that 3 Ph.D. students now members of INRIA's KerData Team have previously graduated as engineers at PUB, in Bucharest: this substantially impacts the efficiency of our contacts. Most of the Ph.D. students mentioned above will be involved in the associate team. A summary of the main stages of this collaboration id given below.

Most of the Ph.D. students mentioned above will be involved in the associate team.

To get more details on the steps of the collaboration: read more...

Relevant joint papers

The collaboration established between PUB with the KerData and PARIS Teams at INRIA Rennes - Bretagne Atlantique began to be productive in terms of co-authored research papers especially thanks to the presence of 5 PUB interns (4 Ph.D. students, 1 Master student, see above) hosted by the two INRIA teams in 2009. Note that most of these results are very recent: they have been (or are in the process of being) published as INRIA Research reports. Some of them are already submitted for publication to international conferences and workshops.

Other joint papers (less relevant to the focus of this project) include:

Links

PUB Team:

KerData Team:

PARIS Team:

3. Impact

The goal of this associated team is to set up a large-scale distributed data management system for cloud platforms, with self-reconfiguring, self-managing, self-adapting, self-healing autonomous capabilities, coping with the security constraints of multiple virtual organization regarding administration, authentication and security restrictions.

At the best of our knowledge, no such system exists yet, though various project have addressed parts of these requirements. In the case of this project, we believe we can address all these requirements at once in an integrated approach, by leveraging the strong, specific and complementary expertise brought by each of the 3 partners:

What is the expected impact on the domain?

The availability of such a data management system is eagerly expected over the world and several research groups are involved in the competition. It is of course expected by end-users with high storage needs, but, more importantly, by the managers of the first cloud computing platforms which have recently been released and opened for public use. For instance, the Amazon Elastic Compute Cloud (EC2) is a service which provides resizable compute capacity in the cloud through a web service interface. It is designed to make web-scale computing easier for developers, but it only provides basic services regarding data management through the Simple Storage Service (S3). Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. Each object is stored in a single bucket: consequently, the objects may only contain (from 1 byte to) 5 gigabytes of data each. Each stored object is retrieved via a unique, developer-assigned key. Only basic authentication mechanisms are provided: objects can be made private or public and rights can be granted to specific users, but no other high-level mechanism is available.

Our goal is to leverage on our mutualized skills and experiences to do better than today's basic cloud storage services, such as S3:

What is the expected impact on the partners?

KerData is a young team, with only two permanent members at this time, launched with a specific focus: providing storage for very large distributed data in grids and clouds. The stress has been put on building a prototype for managind massive data at large scales (BlobSeer), to validate our initial ideas. Of course, it is out of question to rebuild everything from scratch. Reusing available technology as much as possible has been a priority. For instance, the whole system is based on the BOOST C++ runtime library. Related to monitoring, being able to reuse the already proven MonALISA technology will save a lot of time, as it directly provides mechanisms enabling BlobSeer with a self-* behavior. Also, PUB has been a major source of brilliant Ph.D. students for us in prior years: this substantially helped the KerData Team to be set up in a very efficient and productive way.

The PUB Group has been developing MonALISA for a number of years in close collaboration with Caltech and CERN. This collaboration has mainly been targeted to the monitoring of very large distributed systems widespread at continental level. In contrast, the collaboration of PUB with KerData and PARIS opens new fields of application for MonALISA, whose integration into other systems makes them aware of their own behavior in order to take appropriate self-* actions. These new fields bring about very interesting research questions. For instance, the flexibility and modularity of the MonALISA design is of utter importance in order to let it manage a very large spectrum of parameters.

The PARIS Project-Team has been working for many years in the field of operating systems for clusters and now for grids, with the objective to present the user with the illusion of a single computer (SSI, Single System Image), whatever the number of machines or virtual organizations. Highly sophisticated mechanisms have been implemented at the kernel level for this purpose. In contrast, KerData and PUB have always been working at the user level, with as little dependency on the (Linux) kernel as possible. Collaborating with these groups will open new perspective for the PARIS Project-Team. Also, it will bring them the sophisticated MonALISA introspection technology, enabling their operating system to become not only fault-tolerant, but fully self-healing. As the KerData Team, PARIS has benefited from a flow of brilliant interns coming from PUB.

The detailed list of our prior contact history (see above) makes it clear that all three teams have the objective interest, the scientific capacity and the strategic means to collaborate together through this project. This is a major win-win project for all of us.

What is the expected impact on our respective institutions?

On a higher level, there is a growing interest of PUB in further enhancing the scientific collaboration with INRIA.

So far, the collaboration developed in the framework of the project GridDataViz Project supported by the French CNRS and the Romanian Academy of Science. This project focused on Grid monitoring and data management in Grid environments. This bilateral project facilitated a better contact of members of the two institutions and the enlargement of the collaboration. For instance, the application of 3 former PUB students for Ph.D. theses under the supervision of Luc Bougé and Gabriel Antoniu was a remarkable outcome of these contacts.

This fruitful experience is a major incentive to launch this new cooperation project. The emerging cloud computing environments provide a new context for all partners and brings about an opportunity to federate our research efforts. The setup of an INRIA Associate Team will contribute to the extension of the scientific exchanges, but it will also facilitate an increased number of Master and Ph.D. students from PUB to apply to internships and other academic programs at Rennes and be involved in this project.

In the other direction, the involvement of INRIA in projects with Romania in the broad field of grid and cloud computing has been very limited. Among the 80 Associate Team currently funded by INRIA, only 3 have partners from Eastern Europe (2 with Russia, 1 with Ukraine) and none has a Romanian partner! Also, the contacts between INRIA and Romanian teams involved in the deployment of a national grid in Romania (the RoGrid Project) are rather recent: in Rennes we have set up bilateral cooperations with PUB and the Technical University of Cluj-Napoca. This global situation is rather unsatisfactory, as this huge potential for collaboration is clearly underused. There have been numerous contacts in the past with Romanian partners, in the field of applied mathematics and related fields of Informatics (operational research, data bases, etc.) It is clear that, in these fields, Romanian collaborations have been quite beneficial for INRIA. This project of associate team can thus be a very good step in correcting this imbalance, at least in the area of distributed computing. We can expect the synergy created through such a team to lead to the submission of joint proposals for European projects in the future.

4. Miscellaneous

PUB Team: relevant publications

Ciprian Dobre, Florin Pop, Valentin Cristea. "Simulation Framework for the Evaluation of Dependable Distributed Systems". In Scalable Computing: Practice and Experience, Scientific International Journal for Parallel and Distributed Computing (SCPE), Volume 10, Number 1, pp. 13-23. http://www.scpe.org, 2009.

Eliana-Dina Tîrsa, Mugurel Ionut Andreica, Alexandru Costan. "Data Replication Techniques with Applications to the MonAlisa Distributed Monitoring System". In Proceedings of the IEEE International Conference on "Computer as a Tool" (EUROCON), pp. 339-346, Sankt-Petersburg, Russia, 18-23 May, 2009. 

Florin Pop, Ciprian Dobre, Corina Stratan, Alexandru Costan, Valentin Cristea. "Dynamic Meta-Scheduling Architecture based on Monitoring in Distributed Systems". In Proceedings of The Third International Conference on Complex, Intelligent and Software Intensive System, Third International Workshop on P2P, Parallel, Grid and Internet computing - 3PGIC-2009 (CISIS'09), March 16-19, 2009, Fukuoka, Japan, Published by IEEE Computer Society.

Alexandru Costan, Ciprian Dobre, Ramiro Voicu, Valentin Cristea. "A Monitoring Architecture for High-Speed Networks in Large-Scale Distributed Collaborations". In the 7th IEEE International Symposium on Parallel and Distributed Computing, ISPDC 2008, July 1-5 2008 Krakow, Poland.

Florin Pop, Alexandru Costan, Ciprian Dobre, Corina Stratan, Valentin Cristea. "Monitoring of Complex Applications Execution in Distributed Dependable Systems". In the 8th International Symposium on Parallel and Distributed Computing, ISPDC 2009, July 1-3 2009 Lisbon, Portugal.

Florin Pop, Ciprian Dobre, Valentin Cristea. "Evaluation of Multi-Objective Decentralized Scheduling for Applications in Grid Environment". In Proceedings of 2008 IEEE 4th International Conference on Intelligent Computer Communication and Processing, pp. 231-238, August 28-30, 2008, Cluj-Napoca, Romania, Published by IEEE Computer Society.

KerData Team: relevant publications

Bogdan Nicolae, Gabriel Antoniu, and Luc Bougé. "Enabling high data throughput in desktop grids through decentralized data and metadata management: The BlobSeer approach". In Proc. 15th International Euro-Par Conference on Parallel Processing (Euro-Par ’09), volume 5704 of Lect. Notes in Comp. Science, pages 404–416, Delft, The Netherlands, August 2009. Springer-Verlag.

Viet Trung Tran, Gabriel Antoniu, Bogdan Nicolae, Luc Bougé and Osamu Tatebe. "Towards a Grid File System Based on a Large-Scale  BLOB Management Service". In Proc. CoreGRID ERCIM Working Group Workshop on Grids, P2P and Service computing, held in conjunction with Euro-Par 2009, Delft, The Netherlands, August 2009.

Bogdan Nicolae, Gabriel Antoniu, and Luc Bougé. "BlobSeer: How to enable efficient versioning for large object storage under heavy access concurrency". In Proc. 2nd Workshop on Data Management in Peer-to-Peer Systems (DAMAP’2009), Saint Petersburg, Russia, March 2009. Held in conjunction with EDBT’2009.

Bogdan Nicolae, Gabriel Antoniu, Luc Bougé. "Enabling lock-free concurrent fine-grain access to massive distributed data: Application to supernovae detection". In Proc. International Conference CLUSTER 2008, pages 310-315,Tsukuba, Japan, September 2008.

Gabriel Antoniu, Jean-François Deverge, Sébastien Monnet. "How to bring together fault tolerance and data consistency to enable Grid data sharing". In Concurrency and Computation: Practice and Experience 18(13): 1705-1723 (2006).

PARIS Team: relevant publications

John Menhert-Spahn, Thomas Ropars, Michael Schoettner, and Christine Morin. "The architecture of the XtreemOS grid checkpointing service". In Proc. of EuroPar 2009, LNCS, Delft, The Netherlands, August 2009. Springer.

Massimo Coppola, Yvon Jégou, Brian Matthews, Christine Morin, Luis Pablo Prieto, Oscar David Sanchez, Erica Yang, and Haiyan Yu. "Virtual organization support within a grid-wide operating system". In IEEE Internet Computing, 12(2) :20-28, March 2008.

Christine Morin, Jérôme Gallard, Yvon Jégou, and Pierre Riteau. "Clouds : a new playground for the XtreemOS grid operating system". In Parallel Processing Letters, 2009. To appear.

Christine Morin. "XtreemOS : A grid operating system making your computer ready for participating in virtual organizations". In ISORC'07 : Proceedings of the 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing, pages 393-402, Santorini Island, Greece, May 2007. IEEE Computer Society.

Emmanuel Jeanvoine, Louis Rilling, Christine Morin, and Daniel Leprince. "Using overlay networks to build operating system services for large-scale grids". In Proceedings of the 5th International Symposium on Paral lel and Distributed Computing (ISPDC 2006), pages 191-198, Timisoara, Romania, July 2006.


II. PREVISIONS 2010
/ 2010 Forecast

Programme de travail
Work programme

Description du programme scientifiquede travail (1 à 2 pages maximum)
/Description of the scientific work programme (maximum 1 to 2 pages)

Methodology used for task definition. Our work in 2010 will focus on Direction 1: Using BlobSeer for sharing application data in a IaaS. We have identified three main tasks, described below. The successful completion of these tasks relies on the substantial involvement of Ph.D students from all partner teams. Therefore, we defined these tasks by first identifying the Ph.D. students involved, the object of their joint activities and the schedule of their visits. Note that the expected outcome of these tasks will also partly serve (from a technical point of view) the other two directions of the project, planned for the following years.

Task 1: Introducing self-adaptation in BlobSeer based on MonALISA

Ph.D. students involved: Alexandru Costan (PUB), Alexandra Carpen-Amarie (KerData)

Goals. The final goal of this project is to enable autonomic storage for cloud services. As a first milestone, we aim to introduce self-management and self-adaptation facilities in BlobSeer. A preliminary introspection layer has already been jointly defined this year by the PUB and the KerData teams [20]. Based on advanced introspection mechanisms that we will build within this framework using MonALISA, we will target several features: an automatic management of the replication degree used by the storage (data) providers, automatic load balancing through data migration from overloaded to underloaded data providers, removal of providers with poor communication links or poor performance, along with automatic replacement of failed data providers.

Main challenges and difficulties. Integration of BlobSeer with MonALISA has shown to be nontrivial, as demonstrated by our preliminary work carried out in 2009. Intrusiveness, fault tolerance and scaling are key issues: the monitoring system should seemlessly fulfill its function with a very large number of nodes, even in the presence of failures. A serious difficulty comes from the need to simultaneously address multiple properties, some of which are difficult to reconcile: the ability for self-protection, failure recovery, self-reconfiguration in response to changes in the environment, while maintaining near-optimal performance. Existing approaches to these problems typically assume the existence of a performance model that allows optimizations or predictions of the observed behavior. However, creating performance models is inherently difficult and requires knowledge about the application environment. In addition, we are also interested in extending the adaptation strategies to support opportunistic process migrations within the cloud. This, however, requires the development and deployment of new MonALISA modules for dynamic monitoring of a wide range of parameters.

Human resources and organization. Alexandru Costan, Ph.D. student at PUB, is one of the main contributors to the MonALISA distributed monitoring framework. In June 2009, he visited the KerData team in June 2009 and worked together with Alexandra Carpen-Amarie, Ph.D. student in the KerData team at the design and implementation of an introspection layer for BlobSeer, based on MonALISA. This work was a perfect validation for the monitoring mechanisms he developed during his Ph.D thesis. In 2010, Alexandru will visit INRIA again and will work on the design of an upper layer using this introspection layer: the self-adaptation engine. A Master student (that we hope to recruit through INRIA's Internships program) will also contribute to this task.

Task 2: Security and client monitoring

Ph.D. students involved: Catalin Leordeanu (PUB), Alexandra Carpen-Amarie (KerData), Diana Moise (KerData), Sylvain Jeuland (PARIS).

Goals. The following situations can be detected through the analysis of the stored user activity logs: users breaking existing policies, abnormal client activity or incorrect client requests. The restrictions of the provider must be enforced so all attempts to break them must be detected. These restrictions can take various shapes, for example by using only certain resources for each client or restricting the bandwidth in certain time periods. Through strict monitoring of the client activity the cases when the actions of the clients are outside these restrictions can be detected and can restrict the actions of that user or temporarily suspend his access rights.Through the same analysis we can determine what falls into the category of normal client activity, thus detecting unexpected events, such as a sudden increase in the number of requests. This suspicious activity can lead to the detection of users which may have been compromised by an external attacker. In this case the attacker may have taken control of the client and is using it to access unauthorised data or an attempting to affect the system. A compromised client may also try to damage the system by using large numbers of malformed or incomplete requests, as a form of a Denial of Service attack by an external intruder masquerading as one of the clients. Advanced mechanisms must be developed to quickly detect such dangerous activity and isolate the client which may be a security risk. All of these objectives could be reached by developing a specific security system which will continually monitor and analyze the client activity and the state of the system to detect security threats, malicious activity or other kinds of intrusions. Through monitoring, the security system will define (and continuously refine) a suspicion level for each client. When security alerts occur, coresponding to suspicious behavior, the system will automatically take appropriate actions (i.e., ban the client or revoke some of its access rights, according to some predifined policy).

Main challenges and difficulties. A very important challenge related to the above goals is the management of the client's history, which must be stored in a secure, fault tolerant and scalable manner. To this purpose, given the large number of nodes, a centralized client does not seem appropriate. A possible solution is to develop a distributed security system where the policies and user logs are managed by a number of entities which are in constant communication. It could be based on the software architecture internally used by BlobSeer for distributed metadata storage. The development of detection methods for malicious activity in the context described above is very difficult. It will be be necessary to create novel intelligent algorithms capable of defining and detecting both unsecure client activity (according to some predefined pattern), and suspicious behavior (corresponding to previously unknown activity patterns). These novel methods must also meet performance and scalability requirements, in order to be efficiently used in BlobSeer.

Human resources and organization. Catalin Leordeanu, Ph.D. student at PUB, whose thesis focuses on security in distributed systems, will visit INRIA again for 3 months to work on this task and will interact with the Ph.D. students from KerData and PARIS. Afterwards, at PUB, one Romanian Master student will contribute to this task during their master research internship, under the supervision of Catalin.

Task 3: Deploying BlobSeer on an Xtreem-OS enabled IaaS based on Nimbus

Ph.D. students involved: Eliana Tirsa (PUB), Alexandra Carpen-Amarie (KerData), Bogdan Nicolae (KerData), Pierre Riteau (PARIS), Jérôme Gallard (PARIS).

Goals. This task aims at enabling BlobSeer as a storage service for sharing data of applications running in a Nimbus-enabled IaaS. There are two main goals that need to be reached. First, design and implement an IaaS client access interface that supports the deployment and management of a BlobSeer instance. This BlobSeer instance is used to share the application data among the VMs running the application. The client access interface must be integrated with the Nimbus cloud software (developed at Argonne National Lab). It should offer the same level of functionality as with the client access interface offered by Nimbus for virtual machine deployment and management. Second, we need to design and implement an interface for accessing the BlobSeer data-sharing service for the application running inside the VM. This access interface must access the same BlobSeer instance from within any VM regardless of the physical machine where the VM is deployed on.

Main challenges and difficulties. As regards deployment and management of BlobSeer instances, there are several aspects that need to be addressed. One such aspect is to define a security policy. In the simplest scenario, each client application is allowed to manage and access its own BlobSeer instance solely. In a more complex scenario, access rights to a BlobSeer instance may be shared by multiple client applications. Another aspect is directly related to the integration with the Nimbus cloud software. Nimbus provides a client access interface that enables the control and monitoring of the deployed VMs. The interaction works as follows: the client listens for event notifications that are sent by a so-called workspace service and can react by sending commands to the same service. Providing a similar interface to control and monitor the BlobSeer instance requires reasoning about what commands and events are meaningful from the client point of view and then implement a corresponding workspace service. With respect to the access interface to BlobSeer from inside a VM, several aspects are to be discussed as well. The access interface certainly needs to know about the key actors of BlobSeer with which it needs to interact. These actors are defined in a special configuration file. Each VM must keep up with the configuration changes: such changes may occur since a BlobSeer instance can directly be manipulated by the client at any time. Providing uniform and up-to-date BlobSeer configuration files across all VMs is not trivial.

Human resources and organization. Eliana Tirsa, Ph.D. student at PUB, already worked at integrating Nimbus with XtreemOS during her 3-month internship hosted by PARIS in 2009. She will visit the KerData and PARIS Teams for another 3 months in 2010 to work on this task together with the Ph.D. students of our INRIA teams. Additionally, two Master students will be involved: on one side, a Romanian Master student at PUB, co-advised by Eliana, will contribute to this task through his Master research project. On the other side, at INRIA, a student from the local Master Program in Rennes hosted by the KerData team for their research internship will also be involved in this task.


Programme d'échanges avec budget prévisionnel
Exchanges schedule and estimated budget

Actions planned for FY 2010

Common actions

We plan to hold a N+N France-Romania Residential Workshop in Romania. It will last for 3 days, within a week, may be in Spring 2010. The subject will be Towards autonomic storage environment for very large-scale infrastructures. This workshop will gather 9 members of the French partners and (more or less) as many partners from the Romanian partner. Additional scientists may also be invited to join the workshop, for instance from the ANL Kate Meahey's group. The invited participants will be invited to spend the rest of the week in Bucharest to hold specific technical discussions at PUB. We budget an overall additional cost of 1000 Euros for this action, excluding traveling and accommodation expenses to be mentioned below.

French visits to the Romanian partner

For each visit, we budget 600 Euros for traveling expenses, plus 80-100 Euros a day for short visits. Longer visits of French students of Romanian origin receive a specific estimation, as hosting expenses can be kept low if accommodation is possible through personal links.

Romanian visits to french partners

We budget visits on the same financial basis in the opposite direction.

1. ESTIMATION DES DÉPENSES EN MISSIONS INRIA VERS LE PARTENAIRE
Estimated spending for missions of INRIA researchers abroad

Nombre de personnes
Number of persons

Coût estimé
Estimated cost

Chercheurs confirmés
Senior researcher

3

4000

Post-doctorants
Postdoctoral fellows

0

 

Doctorants
Ph.D. students

4

7000

Stagiaires
Interns

1


Autre (précisez) :
Other (detail): Additional funding for Workshop

0

1000

Total

8

12000

 

2. ESTIMATION DES DÉPENSES EN INVITATIONS DES PARTENAIRES
Estimated spending for invitations of Partner researchers in France

Nombre de personnes
Number of persons

Coût estimé
Estimated cost

Chercheurs confirmés
Senior researcher

2

2000

Post-doctorants
Postdoctoral fellows

1

2000

Doctorants
Ph.D. students

3

6000

Stagiaires
Interns

1

2000

Autre (précisez) :
Other (detail):

0

0

Total

7

12000

2. Cofinancing

Is this collaboration already supported by INRIA, by the partner institution or by a third party (European project, National Science Foundation, etc)? Please indicate the related amount of funding.

Additional supports for traveling

Additional support from the French site

Additional support from the Partner side

3. Proposed budget

Indiquez, dans le tableau ci-dessous, le coût global estimé de votre projet et le budget demandé à la DRI dans le cadre de cette Equipe Associée (maximum 20 K€).

Commentaires

Montant

A. Coût global de la proposition (total des tableaux 1 et 2 : invitations, missions, ...)
A. Global cost of the collaboration project

Mutual visits: 24,000

B. Cofinancements utilisés (financements autres que Equipe Associée)
B. Cofinancing (other than Associate Team programme)

French Embassy: 2,000

University Rennes 1: 2,000

C. Additional supports not included in this application

ENS Cachan International Internship Programme: 5,000

Eiffel Doctoral Scholarship Program: 14,000

Financement "Équipe Associée" demandé (A.-B.)
Funding from the Associate Team programme

(maximum 20 000 €)

20,000


References

[1] L. M. Vaquero, L. Rodero-Merino, J. Caceres, M. Lindner. "A break in the clouds: towards a cloud definition". SIGCOMM Comput. Commun. Rev. 39, 1 (Dec. 2008), 50-55.

[2] A. Lenk, M. Klems, J. Nimis, S. Tai, T. Sandholm. "What's inside the Cloud? An architectural map of the Cloud landscape". Software Engineering Challenges of Cloud Computing, 2009. CLOUD '09. ICSE Workshop on 23-23 May 2009 Page(s):23 - 31.

[3] R. Buyya, Chee Shin Yeo, S. Venugopal. "Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities". High Performance Computing and Communications, 2008. HPCC '08. 10th IEEE International Conference on 25-27 Sept. 2008 Page(s):5 - 13.

[4] The Amazon Elastic Compute Cloud: http://aws.amazon.com/ec2/

[5] The Nimbus project: http://workspace.globus.org/

[6] The Eucalyptus project: http://open.eucalyptus.com/

[7] J. Dean and S. Ghemawat. "MapReduce: simplified data processing on large clusters". Communications of the ACM, 51(1):107-113, 2008

[8] Google App Engine http://code.google.com/appengine/

[9] Microsoft Azure http://www.microsoft.com/azure/default.mspx

[10] Google Docs: http://www.google.com/google-d-s/tour1.html

[11] Microsoft Office Live: http://www.officelive.com/

[12] Kate Keahey, Tim Freeman. "Science Clouds: Early Experiences in Cloud Computing for Scientific Applications". Cloud Computing and Its Applications 2008 (CCA-08), Chicago, IL. October 2008.

[13] Christine Morin. "XtreemOS: a Grid Operating System Making your Computer Ready for Participating in Virtual Organizations". IEEE International Symposium on Object/component/service-oriented Real-time distributed Computing (ISORC), Santorini Island, Greece, May 2007.

[14] The XtreemOS project: http://www.xtreemos.eu/

[15] Christine Morin, Jérôme Gallard, Yvon Jégou, and Pierre Riteau. "Clouds : a new playground for the XtreemOS grid operating system". Parallel Processing Letters, 2009. To appear. Published as INRIA Research Report No RR-6824, February 2009. Available on HAL: http://hal.inria.fr/inria-00358594_v1/

[16] The BlobSeer project: http://blobseer.gforge.inria.fr/

[17] Bogdan Nicolae, Gabriel Antoniu, Luc Bougé. "BlobSeer: How to Enable Efficient Versioning for Large Object Storage under Heavy Access Concurrency". Data Management in Peer-to-Peer Systems, St-Petersburg, Russia, 2009.

[18] I. Legrand, H. Newman, R. Voicu, et al. "MonALISA: An agent based, dynamic service system to monitor, control and optimize grid based applications". In Computing for High Energy Physics, Interlaken, Switzerland, 2004

[19] The MonALISA project: http://monalisa.cern.ch/

[20] Alexandra Carpen-Amarie, Cai Jing, Alexandru Costan, Gabriel Antoniu, Luc Bougé. "Bringing Introspection Into the BlobSeer Data-Management System Using the MonALISA Distributed Monitoring Framework". INRIA Research Report No RR-7043, September 2009. Submitted for publication. Available on HAL: http://hal.inria.fr/inria-00419978/.

[21] Yvon Jégou, Stephane Lantéri, Julien Leduc, Melab Noredine, Guillaume Mornet, Raymond Namyst, Pascale Primet, Benjamin Quetier, Olivier Richard, El-Ghazali Talbi, and Touche Iréa. "Grid'5000: a large-scale and highly reconfigurable experimental grid testbed". International Journal of High Performance Computing Applications, 20(4):481'494, November 2006.

[22] The Grid'5000 project: http://www.grid5000.org/

[23] The Amazon Simple Storage Service: http://aws.amazon.com/s3/

[24] F. Hupfeld, T. Cortes, B. Kolbeck, E. Focht, M. Hess, J. Malo, J. Marti, J. Stender, E. Cesario. "XtreemFS - a case for object-based file systems in Grids". In Concurrency and Computation: Practice and Experience. Volume 20 Issue 8 June 2008.

[25] The XtreemFS project: http://www.xtreemfs.org/

 

 

© INRIA - mise à jour le 17/09/2009