Aurum is a data discovery system that works at large scale, helping people find relevant data.
Organizations face a data discovery problem when their analysts spend more time looking for relevant data than analyzing it. This problem has become commonplace in modern organizations as: i) data is stored across multiple storage systems, from databases to data lakes; ii) data scientists do not operate within the limits of well-defined schemas or a small number of data sources – instead, to answer complex questions they must access data spread across thousands of data sources. To address this problem we are building AURUM, a system to tackle data discovery problems. AURUM introduces a new discovery algebra, called the Source Retrieval Query Language (SRQL), that lets users declaratively search for relevant data sources through a set of primitives that expose the relations of the underlying data. We are investigating new abstractions to represent all data assets within organizations and methods to find it efficiently.