Large training datasets have revolutionized AI research, but enabling similar breakthroughs in other fields, such as Robotics, requires a new understanding of how to acquire, clean, and structure emergent forms of large-scale, unstructured sequential data. My talk presents a systematic approach to handling such dirty data in the context of modern AI applications. I start by introducing a statistical formalization on data cleaning in this setting including research on: (1) how common data cleaning operations affect model training, (2) how data cleaning programs can be expected to generalize to unseen data, (3) and how to prioritize limited human intervention in rapidly growing datasets. Then, using surgical robotics as a motivating example, I present a series of robust Bayesian models for automatically extracting hierarchical structure from highly varied and noisy robot trajectory data facilitating imitation learning and reinforcement learning on short, consistent sub-problems. I present how the combination of clean training data and structured learning tasks enables learning highly accurate control policies in tasks ranging from surgical cutting to debridement.
Sanjay Krishnan is a Computer Science PhD candidate in the RISELab and in the Berkeley Laboratory for Automation Science and Engineering at UC Berkeley. His research studies problems on the intersection of database theory, machine learning, and robotics. Sanjay's work has received a number of awards including the 2016 SIGMOD Best Demonstration award, 2015 IEEE GHTC Best Paper award, and Sage Scholar award. https://www.ocf.berkeley.edu/~sanjayk/