CSAIL Event Calendar


Taming Big Data with Berkeley Data Analytics Stack (BDAS)

Speaker: Ion Stoica, UC-Berkeley
Date: Wednesday, November 7 2012
Time: 4:00PM to 5:00PM
Refreshments: 3:45AM
Location: 32-G449 (Patil/Kiva)
Host: Samuel Madden, CSAIL
Contact: Sheila Marian, x3-1996, sheila@csail.mit.edu


Abstract: One of the most interesting developments over the past decade is the rapid increase in data; we are now deluged by data from on-line services (PBs per day), scientific instruments (PBs per minute), gene sequencing (250GB per person) and many other sources. Researchers and practitioners collect this massive data with one goal in mind: extract "value" through sophisticated exploratory analysis, and use it as the basis to make decisions as varied as personalized treatment and ad targeting. Unfortunately, today's data analytics tools are slow in answering even simple queries, as they typically require to sift through huge amounts of data stored on disk, and are even less suitable for complex computations, such as machine learning algorithms. These limitations leave the potential of extracting value of big data unfulfilled.

To address this challenge, we are developing BDAS, an open source data analytics stack that provides interactive response times for complex computations on massive data. To achieve this goal, BDAS supports efficient, large-scale in-memory data processing, and allows users and applications to trade between query accuracy, time, and cost. In this talk, I'll present the architecture, challenges, and early results. Some BDAS components have already been released: Mesos, a platform for cluster resource management has been deployed by Twitter on 1,500+ servers, while Spark, an in-memory cluster computing frameworks, is already being used by tens of companies and research institutions.

Bio: Ion Stoica is a Professor in the EECS Department at University of California at Berkeley. He received his PhD from Carnegie Mellon University in 2000. He does research on cloud computing and networked computer systems. Past work includes the Dynamic Packet State (DPS), Chord DHT, Internet Indirection Infrastructure (i3), declarative networks, replay-debugging, and multi-layer tracing in distributed systems. His current research focuses on resource management and scheduling for data centers, cluster computing frameworks, and network architectures. In 2006, he co-founded Conviva, a startup to commercialize technologies for large scale video distribution.

See other events that are part of Big Data Lecture Series 2012/2013

See other events happening in November 2012


About Us Research News Resources Directory