Organizations face a data discovery problem when their analysts spend more time looking for relevant data than analyzing it. This problem has become commonplace in modern organizations as: i) data is stored across multiple storage systems, from databases to data lakes; ii) data scientists do not operate within the limits of well-defined schemas or a small number of data sources – instead, to answer complex questions they must access data spread across thousands of data sources. To address this problem we are building AURUM, a system to tackle data discovery problems. AURUM introduces a new discovery algebra, called the Source Retrieval Query Language (SRQL), that lets users declaratively search for relevant data sources through a set of primitives that expose the relations of the underlying data. We are investigating new abstractions to represent all data assets within organizations and methods to find it efficiently.
If you would like to contact us about our work, please scroll down to the people section and click on one of the group leads' people pages, where you can reach out to them directly.