We're developing a flexible, high-performance storage architecture for database-backed applications, based on a dynamic set of queries specified by the developer which Soup automatically optimizes.

Data soups are a fundamental rethink of the storage architecture for database-backed applications. They achieve high read performance and make applications easier to evolve. The user provides the data soup with a set of pre-declared relational queries (its “recipe”) similar to SQL prepared statements, and the system compiles these expressions into a dynamic data-flow graph and caches the results for efficient reading. The recipe changes over time as the application evolves and new queries are added or old ones retired, and the data soup reconfigures itself accordingly by modifying the data-flow graph.

The data soup paradigm has many benefits compared to classic databases. Qualitatively, data soups automate complex and error-prone schema migrations and allow them to happen while the system is live. Quantitatively, data soups improve performance by shifting work from ad-hoc query processing on reads to pre-computation on writes as needed. Additionally, automated analysis and optimization of the data-flow graph, as well as parallel processing and scale-out distribution can help data soups achieve high performance without manual tuning.

Research Areas