BlueDBM: Distributed Flash Storage for Big Data Analytics
BlueDBM is an architecture of computer clusters consisting of fast distributed flash storage and in-storage accelerators, which often outperforms larger and more expensive clusters in applications such as graph analytics.
BlueDBM is a system architecture to accelerate Big Data analytics. BlueDBM incorporates a distributed flash-based storage with in-store processing capability and a low-latency high-throughput inter-controller networks. By moving high performance near storage and organizing data access to flash-optimized patterns, BlueDBM has demonstrated performance and power efficiency exceeding existing cluster architectures. When processing massive amounts of data, performance if often bound by the capacity of fast local DRAM. In cluster systems with more RAM, the network software stack often becomes a bottleneck. BlueDBM proposes to mitigate these issues by providing an extremely fast access to a scalable network of flash-based storage devices, and to provide a platform for application-specific hardware accelerators on the datapath on each of the storage devices. We have demonstrated BlueDBM using a wide array of applications including terasort, graph analytics and key-values stores. Our current goal is to provide the programmer with a simple programming interface that can easily achieve maximum performance of the BlueDBM architecture, along with exploring more applications that can benefit from BlueDBM.