We are receiving John Wilkes from Google. He will give a talk on Monday, April 27th, at 10:30 AM, in room Markov (G105, blue level).

Title: Large-scale cluster management at Google with Borg

Abstract: 

Google’s Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.

It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.

I’ll present a longer version of the paper talk that will be given at EuroSys on April 22. It’ll include a quick summary of the Borg system architecture and features, but focus mostly on a quantitative analysis of some of its policy decisions.