Project

Data Discovery

As part of Data Civilizer we are designing abstractions and building tools and systems to help people with their data-related tasks, from discovering, to cleaning, to transforming it. The aim is to shape the data in a way that is easy to analyzer---for example to fit a model or fill in a report.

Organizations face a data discovery problem when their analysts spend more time finding relevant data than analyzing it. This problem has become common as: i) data is stored across multiple storage systems, from databases to data lakes; ii) data scientists do not operate within the limits of well-defined schemas, instead they want to find data across their organization to answer increasingly complex business questions. We have built Aurum as part of the Data Civilizer project. Aurum is a system to tackle data discovery problems at large. It introduces a new discovery language, SRQL, that permits users to declare their intuition of what is relevant through a set of data primitives that expose the relations of the underlying data. The algebra relies on an enterprise knowledge graph (EKG) to answer queries in human-scale latencies. Aurum is scalable: it builds the EKG in linear time, despite the complexity of extracting complex relationships among thousands of data sources.

Group

Data Systems Group

Contact us

If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.

Last updated Oct 12 '17

Research Areas

Systems & Networking

Project

Data Discovery

Group

Contact us

Research Areas

Group

Members

Michael Stonebraker

Sam Madden