Working towards smarter automation for fixing code


In the world of software development, there are plenty of automated tools for fixing bugs. However, most such programs can only find cut-and-paste-style errors - they break down if you ask them to find higher-level patterns, and especially the sorts of patterns that might even evade the notice of expert programmers.

A new system developed by researchers at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) aims to help. Dubbed Yogo, the tool makes it much easier to make systematic changes in code - and can do so in multiple programming languages.

For example, consider an E-commerce app that represents the items in a shopping cart as an unordered array with duplicates. Yogo could help you find all code that counts the frequency of a given item in the list as part of some larger refactoring - whether to replace all of them with a more efficient implementation of frequency counting, or to switch to an alternate representation of shopping carts altogether.

Other patterns Yogo can assist with include:

1) finding code that does some work to compute a log message to print, and then doesn't print it because logging is turned off. (This pattern is useful in many languages.)

2) reverse-engineering. Yogo can recognize individual pieces of functions, then bigger and bigger building blocks that use them, until it's identified higher-level algorithms.

3) APIs in multiple languages. Plenty of software, from Facebook's public API to MongoDB, have APIs for multiple languages; often they work the same way in each language.

“Using Yogo, you can write a single declarative query to find a given concept throughout the entire codebase,” says MIT master’s student Pond Premtoon, lead author on a new paper about the project. 

Co-author Jimmy Koppel says that search engines like Google treat searching for code almost the same as searching for text in a webpage. “It’s very fast, but very unsophisticated,” says Koppel, a PhD student at MIT CSAIL.

Many existing code-search approaches by researchers aim to be more generalizable by abstracting away the exact ordering of statements in favor of identifying when one statement uses the result of another (data dependence) or is guarded by a condition (control dependence). However, these so-called “program dependence graphs” (PDGs) are unable to deal with alternate ways of expressing the computations, let alone entire alternate APIs.

Enter Yogo. Yogo works by considering not only the data-flow graph of a function, but also the dataflow graphs of all equivalent functions reachable via a set of rewrite rules. In doing so, it can recognize an operation even if it uses alternate APIs, is in a different but mathematically-equivalent form, or is split apart with temporary variables.

And while most "smart" tools like Yogo are built for one language and can't easily be ported, Yogo is built on Koppel’s PhD work, which focused on better ways of building multi-language tools. Yogo supports Java and Python, and it is even possible to write a single query that works on both languages.

As a next step, the authors plan to investigate whether Yogo can be used to identify design patterns and give design-level feedback on code. They also are exploring the possibility of turning Yogo into a commercial bug-finding project.

Premtoon and Koppel co-wrote the paper with professor Armando Solar-Lezama. They will present it virtually next month at the Programming Language Design and Implementation (PLDI) conference.