Aurum is a data discovery system that works at large scale, helping people find relevant data.

Organizations face a data discovery problem when their analysts spend more time looking for relevant data than analyzing it. This problem has become commonplace in modern organizations as: i) data is stored across multiple storage systems, from databases to data lakes; ii) data scientists do not operate within the limits of well-defined schemas or a small number of data sources – instead, to answer complex questions they must access data spread across thousands of data sources. To address this problem we are building AURUM, a system to tackle data discovery problems. AURUM introduces a new discovery algebra, called the Source Retrieval Query Language (SRQL), that lets users declaratively search for relevant data sources through a set of primitives that expose the relations of the underlying data. We are investigating new abstractions to represent all data assets within organizations and methods to find it efficiently.

Research Areas

Impact Areas