Creating your own programming language


Among the thousands of programming languages in existence are hundreds of “domain-specific languages” (DSLs) that have been adapted from traditional languages so that non-programmers can do their work more efficiently. DSLs are used in a wide range of fields, from web development (HTML) and database management (SQL) to genetics and machine learning (TensorFlow). 

One challenge with DSLs is that they’re not always easy to create. In languages like C++ you usually have to hire an expert programmer to design the DSL by looking at programs you’ve written for your work to figure out common elements that should be incorporated, and then writing a brand-new compiler, parser and code generator to get the DSL to work. 

But what if creating a new DSL could be done without coding experts, and without nearly as much as time and effort?

A new tool developed by researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) aims to do just that. 

Dubbed “BuildIt,” the team’s framework for C++ - which they say could be easily adapted to other languages - lets domain experts create new DSLs just by taking their existing programs and making a few tweaks. The CSAIL group has made BuildIt available online so that users can put programs into the system to see how it works in action.  

CSAIL PhD student Ajay Brahmakshatriya says that an important aspect of software engineering is to strike a balance between generality and specialization. Writing generalized libraries allows engineers to implement more applications with less effort and time, but leads to lower performance than if the libraries are tailored for specialized uses. (Libraries are collections of frequently-used routines that a program can use, to avoid having to explicitly link them to every program.)

To try to get the best of both worlds, programmers use techniques like “multi-staging,” where you can write a generalized implementation for all inputs, and then provide some partial inputs that generate customized code that will be more efficient.

BuildIt is a particular kind of multi-staging framework that reduces the complexity of a programming language to a set of common features found in most languages.

“BuildIt doesn’t have a full view of the program, instead examining it through the narrow window of individual simple operations happening in the program, like multiplication and division,” says Brahmakshatriya, who co-wrote a new paper on the system with MIT professor Saman Amarasinghe. “It’s the equivalent of a person walking through a maze: even if they can only see one part of it at a time, they can navigate it by recording their observations and leaving markers on different paths they’ve explored.” 

The team describes BuildIt as light-weight, in the sense that it does not require any specialized compiler or execution environment and works as a simple library. It doesn't introduce any new syntax or any special operators for control flow like conditions and loops - which means that the developer doesn't have to learn a new syntax and can easily migrate existing code with very little or no change. 

“Having simplified the requirements of multi-stage programming, this work could help democratize multi-stage programming and provide a principled approach to building domain-specific languages for all sorts of fields, including computer graphics and machine learning,” says Nada Amin, an assistant professor of computer science at Harvard who was not involved in the project.

Indeed, Amarasinghe - who previously developed Halide, one of the most popular DSLs for image processing - says that he hopes the project will inspire more non-coders to use programming in their work, without actually having to know much about programming.

“With BuildIt, anybody with the knowledge of the application domain, from a physics researcher to a vaccine developer, can write their own DSL instead of creating a large library,” he says. “That opens up a lot of exciting possibilities in terms of pushing forward new innovations in the coming years.”

The team will present the paper virtually on Monday, March 1 at the International Symposium on Code Generation and Optimization (CGO). The project was supported in part by the Applications Driving Architectures (ADA) Research Center and the Defense Advanced Research Projects Agency (DARPA).