Organizations face a data discovery problem when their analysts spend more time finding relevant data than analyzing it. This problem has become common as: i) data is stored across multiple storage systems, from databases to data lakes; ii) data scientists do not operate within the limits of well-defined schemas, instead they want to find data across their organization to answer increasingly complex business questions. We have built Aurum as part of the Data Civilizer project. Aurum is a system to tackle data discovery problems at large. It introduces a new discovery language, SRQL, that permits users to declare their intuition of what is relevant through a set of data primitives that expose the relations of the underlying data. The algebra relies on an enterprise knowledge graph (EKG) to answer queries in human-scale latencies. Aurum is scalable: it builds the EKG in linear time, despite the complexity of extracting complex relationships among thousands of data sources.
If you would like to contact us about our work, please scroll down to the people section and click on one of the group leads' people pages, where you can reach out to them directly.