Software for hardware accelerators and distributed machines is currently written using low-level APIs and languages such as MPI, OpenCL and CUDA. Although this approach may provide the best possible performance for one particular architecture, it has many limitations: first, these low level languages and APIs have a steep learning curve; they are laborious and error-prone to program with; it is difficult for the compiler to analyze and optimize them automatically; and most importantly, they lack performance portability: the performance of an accelerated application may vary dramatically across platforms and a huge amount of effort is needed to port software from one architecture to another architecture. Developing software at this level is unattractive and costly. With the continuous advent of new architectures and hardware features such as complex non-uniform memory hierarchies (NUMA), GPUs (Graphics Processing Units) and Xeon Phi accelerators, the problem of writing high performance software using low level languages and APIs is becoming more and more serious. The primary goal of the Tiramisu project is to provide a framework for code optimization and generation. Tiramisu takes code generated from high level languages and DSLs, optimizes it and generates LLVM code. Tiramisu can perform a large set of advanced code optimizations and can target multiple architectures including GPUs, multicore machines with vector instructions and distributed machines. All the user needs to do is to express his algorithm and provide schedule and data layout commands that Tiramisu will apply to generate highly optimized code.
If you would like to contact us about our work, please scroll down to the people section and click on one of the group leads' people pages, where you can reach out to them directly.