Data scientists universally report that they spend at least 80% of their time finding data sets of interest, accessing them, cleaning them and assembling them into a unified whole.
Data Civilizer is an end-to-end project to lower the 80%. It consists of sub-projects on data discovery (Aurum) view construction, data cleaning, data transformation and golden record construction. A complete prototype is close to being operational.