Clustera: A Data-Centric Approach to Scalable Cluster Management
Speaker: David DeWitt , University of Wisconsin & Microsoft ResearchContact:
Date: December 11 2008
Time: 4:00PM to 5:30PM
Host: Hari Balakrishnan & Sam Madden, CSAIL
Colleen Russell, 3-0145, firstname.lastname@example.orgRelevant URL:
Twenty-five years ago, when we built our first cluster management system using a collection of twenty VAX 11/750 computers, the idea of a compute cluster was an exotic concept.Today, clusters of 1,000 nodes are common and some of the biggest have in excess of 10,000 nodes.Such clusters are simply awash in data about machines, users, jobs, and files. Many of the tasks that such systems are asked to perform are very similar to database transactions. For example, the system must accept jobs from users and send them off to be executed. The system should not drop jobs or lose files due to hardware or software failures. The software must also allow users to stop failed computations or change their mind and retract thousands of submitted but not yet completed jobs. Amazingly, no cluster management system that we are aware of uses a database system for managing its data. In this talk I will describe Clustera, a new cluster management system we have been working for the last three years. As one would expect from some database types, Clustera uses a relational DBMS to store all its operational data including information about jobs, users, machines, and files (executable, input, and output). One unique aspect of the Clustera design is its use of an application server (JBoss currently) in front of the relational DBMS. Application servers have a number of appealing capabilities. First, they can handle 10s of 1000s of clients. Second, they provide fault tolerance and scalability by running on multiple server nodes. Third, they multiplex connections to the database system to a level that the database system can comfortably support. Compute nodes in a Clustera cluster appear as web clients to the application server and make SOAP calls to submit requests for jobs to execute and to update status information that is stored in the relational database. Extensibility is a second key goal of the Clustera project. Traditional cluster management systems such as Condor were targeted toward long running, computational intensive jobs.Newer systems such as Map Reduce are targeted toward a specific type of data intensive parallel computation. Parallel SQL datsystems represent a third type of cluster management system.The Clustera framework was designed to handle each of these classes of jobs in a common execution and data framework.
David J. DeWitt joined the Computer Sciences Department at the University of Wisconsin in September 1976 after receiving his Ph.D. degree from the University of Michigan. He served as department chair from July 1999 to July 2004. He held the title John P. Morgridge Professor of Computer Sciences when he retired from the University of Wisconsin and joined Microsoft as a Technical Fellow in 2008. Professor DeWitt is a member of the National Academy of Engineering (1998), a fellow of the American Academy of Arts and Sciences (2007), and an ACM Fellow (1995). He received the 1995 SIGMOD Innovations Award for his contributions to the database systems field. Professor DeWitt has authored over 120 technical publications and served on numerous program committees and NSF Review Panels. He was a member of the NSF CISE Advisor Committee from 2000-2003, the CSTB from 2005-2007, and has served on several NRC and DARPA study panels. He was the program chair of the 1983 SIGMOD conference, program co-chair of the 1988 VLDB conference, and general chair of the 2002 SIGMOD Conference.
See other events that are part of Dertouzos Lecturer Series 2008/2009
See other events happening in December 2008