Tapir is a compiler intermediate representation that aims to empower compilers to optimize parallel programs on multicore machines.

Compilers are programs that translate human-intelligible descriptions of computer code (such as an algorithm or a solution to a problem) into instructions that computer hardware can execute directly. Modern compilers extensively analyze the computer code they translate to try to derive fast implementations on modern computer hardware. Introduce parallel computing into the mix, though, and this optimization process would, until recently, come to a standstill at parallel control constructs.

In order to unlock this optimization process for parallel programs, one needs to modify a compiler for serial code to understand parallel computing. But embedding an understanding of parallelism into such a compiler was predicted to require extensive changes throughout nearly all parts of the compiler. Such changes would take a huge amount of effort. The LLVM compiler, for example, is half a million lines of code, and the overall LLVM system is 6 million lines of code, which has been built up over many years with hundreds of programmers working on it.

The challenge, then, was to make a change of this kind of magnitude to a modern compiler, such as LLVM, without damaging the code that is already there or introducing bugs during any one of the 80 optimization passes that the compiler performs.

“There are lots of people who said it couldn’t be done,” said Dr. Tao B. Schardl of MIT CSAIL, lead author on the Tapir compiler, which went on to win the Best Paper Award at the Symposium on Principles and Practice of Parallel Programming (PPoPP) in 2017. “That effectively, it would take years. It was a research project. But we figured out a way where the core functionality could be added in 3,000 lines of code.”

Tapir, a compiler intermediate representation, compiles parallel languages (languages for describing parallel programs) and optimizes them much more effectively than any other compiler that deals with parallelism, while requiring just a few minor changes to the original compiler. Because of this development, the Tapir/LLVM compiler has been customized for Cilk, a fork-join language, and has been recently adopted by OpenCilk. The researchers’ work on this project has opened the doorway for many potential code optimizations in parallel computing.


Winner of the PPoPP Best Paper Award 2017: “Tapir: Embedding Fork-Join Parallelism into LLVM’s Intermediate Representation”