Vous êtes ici

Towards new solutions to build large scale distributed applications in the cloud

Equipe et encadrants
Département / Equipe: 
Site Web Equipe: 
Directeur de thèse
David Bromberg
Co-directeur(s), co-encadrant(s)
Sujet de thèse

Modern distributed applications are becoming increasing large and complex. They often bring together independently developed sub-systems (e.g. for storage, batch processing, streaming, application logic, logging, caching) into large distributed architectures. Combining, configuring, and deploying these architectures is a difficult and multifaceted task : individual services have their own requirements, configu- ration spaces, programming models, distribution logic, which must be carefully tuned to insure the overall performance, resilience, and evolvability of the resulting system. This integration effort remains today largely an ad-hoc activity, that is either manual or uses tool-specific scripting capabilities. This low-level approach unfortunately scales poorly in the face of the increasingly complex deployment requirements and topologies of the involved services.

In order to write and maintain the low level glue code or configuration files required to realize such applications, developers must (i) have a deep understanding of the involved distributed services, their specific semantics, and individual programming model; (ii) cater for the unavoidable volatility of the workloads and of the cloud infrastructures in which these services typically operate ; and (iii) allow for a continuous integration process in which a deployed system is modified on the fly. One possible approach to help developers in this task is to embed services within micro-service architectures. But still, it results to increasingly complex systems that can be hard to describe, monitor, and adapt.

PhD topic

To cope with the inherent complexity of building complex distributed systems while foste- ring and increasing evolutivity, efficiency, maintainability, scalability, we propose, in this thesis, to raise the level of abstraction provided to developers. In particular, we will apply the best practices in software engineering to cope with the inherent complexity of building large scale distributed applications. To this end, we will apply a separation of concerns to isolate each operations into a graph of microservices isola- ted into lightweight containers. We will provide the adequate new programming models, abstractions and tools to describe the distributed workflow, and to deploy and orchestrate the underlying microservices.

This thesis is architectured around three key axes :

  • Software engineering. One of our objective will be to design a high-level domain-specific language (DSL), declaring what should be achieved and how it should be deployed in the large. It will enable to describe how to combine, deploy and orchestrate microservices in an abstract manner. More particularly, it will abstract away the developers from the underlying cloud infrastructures, and from the intricacies to write low level code to build a large scale distributed application that scales.
  • Systems and large scale sytems. Isolating microservices into VMs is not the most adequate approach as it requires the use of hypervisors, or virtual machine monitors (VMMs), to virtua- lize hardware resources. VMMs are well known to be heavyweight with both boot and run time overheads that may have a stronger impact on performances. Consequently, we will explore the path to many exciting future research in the system community, such as, for instance, the idea of unikernels that enables smaller footprints, more optimization and faster boot times. One of the key underlying challenges will be to compile directly the aforementionned provided DSL to a dedicated and customized machine image, ready to be deployed direclty on top of a large set of bare metal servers.
  • Statistics and probabilities. According to the workloads of a cloud infrastructure, a large scale distributed applications may not behave adequately, i.e. with the expected performances. There is a strong need to adapt dynamically the way resources are allocated to the running applications. As a result, empirical studies will be conducted on a selection of uses cases to fine tune the design of the DSL to enable developers to express orchestration algorithms based on linear regression with or without probabilistic classifier. 

[1] B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes. Borg, omega, and kubernetes. Communications of the ACM, 59(5) :50–57, 2016.
[2] D. R. Engler, M. F. Kaashoek, and J. O’Toole. Exokernel : An operating system architecture for application-level resource management. In Proceedings of the Fifteenth ACM Symposium on Operating System Principles, SOSP 1995, Copper Mountain Resort, Colorado, USA, December 3-6, 1995, pages 251–266, 1995.
[3] T. Harter, B. Salmon, R. Liu, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Slacker : Fast distribution with lazy docker containers. In 14th USENIX Conference on File and Storage Technologies (FAST 16), pages 181–195, Santa Clara, CA, Feb. 2016. USENIX Association.
[4] D. Loghin, B. M. Tudor, H. Zhang, B. C. Ooi, and Y. M. Teo. A performance study of big data on small nodes. Proc. VLDB Endow., 8(7) :762–773, Feb. 2015.
[5] A. Madhavapeddy, R. Mortier, C. Rotsos, D. J. Scott, B. Singh, T. Gazagnaire, S. Smith, S. Hand, and J. Crowcroft. Unikernels : library operating systems for the cloud. In Architectural Support for Programming Languages and Operating Systems, ASPLOS ’13, Houston, TX, USA - March 16 - 20, 2013, pages 461–472, 2013.
[6] D. Merkel. Docker : lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239) :2, 2014.
[7] J. Thones. Microservices. Software, IEEE, 32(1) :116–116, 2015.
[8] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster
management at google with borg. In EuroSys. ACM, 2015.
[9] Y. Zhao, S. Li, S. Hu, H. Wang, S. Yao, H. Shao, and T. F. Abdelzaher. An experimental evaluation
of datacenter workloads on low-power embedded micro servers. PVLDB, 9(9) :696–707, 2016.

Mots-clés / Keywords
Domain specific language, large scale distributed systems, unikernel, exoker- nel, microkernel, microservices, schedulers, linear regression.
Début des travaux: 
Dès que possible / As soon as possible
IRISA - Campus universitaire de Beaulieu, Rennes