PLSE Seminar: Shashank Srikant: "ML models for programming tasks -- Do these models learn good representations of programs? Can cognitive science help learn better representations?"

Speaker

MIT CSAIL
Abstract: Machine learning (ML) models that learn and predict properties of computer programs are increasingly being adopted and deployed. These models encode programs into a distributed representation which are then used in downstream tasks like auto-completing code, summarizing large programs, and detecting bugs and malware in programs.

A question central to such models is what is a good distributed representation for programs? Do state of the art models do a good job capturing key program properties? How do we probe them? Do these models mimic how we humans read and understand code?

In this work, I'll talk about two sets of results -- one from ML and one from cognitive science.

- To test whether current ML models learned code properties well, we evaluated them against 'adversarial programs' containing small semantics-preserving changes. We present a first-order optimization technique to generate such changes. We show that these small changes are enough to fool code models, suggesting the need to improve how we model them.
These results are based on work to appear in ICLR 2021. Details -- https://shashank-srikant.github.io/notes/iclr21/

- In a cognitive neuroscience experiment, we investigated which parts of the brain are involved in reading programs. Are programming languages treated as natural languages by the brain? We conclude that the language centers of our brain are not involved in code comprehension. We ask whether these results can possibly help inform the design of computational models to comprehend and process code.
These results are based on work published in eLife in 2020. Details --  https://shashank-srikant.github.io/notes/elife20/



Speaker Bio: Shashank is currently a Ph.D. candidate in computer science at CSAIL, MIT, advised by Una-May O'Reilly. His research interests are at the intersection of machine learning, program analysis, and cognitive neuroscience. Details on his work are available on his webpage - https://shashank-srikant.github.io/