The Multicore Computing Lab (MCL), located at the department of Computer Science and Automation, Indian Institute of Science, conducts research in compiler and code generation infrastructure for multicore processors and emerging ML/AI accelerators. This includes the development of new and robust compiler transformation techniques, domain-specific languages and compiler infrastructure to make it easier to deal with parallelism while delivering high performance. The focus of our current research is on building compiler infrastructure for the domain of ML/AI computations with an emphasis on automatic code generation and the polyhedral compiler framework. A large amount of our research and development activity is currently based on the MLIR compiler infrastructure.

Research

Research at the Multicore Computing Labs has spanned the design of compiler and runtime techniques for general-purpose multicore processors, compilation for heterorgeneous architectures, polyhedral framework for compiler optimization, high-performance domain-specific languages and compilers, and compilation for distributed-memory system (clusters of multicores). The compute domains of interest include image processing pipelines, dense linear algebra, and deep learning.

Tools

Public repository: https://github.com/mcl-csa
github.
  • HIR

    HIR is an intermediate representation for hardware design. Implemented as a dialect in MLIR, it is built to enable automatic optimization of hardware designs and subsequent lowering to SystemVerilog. As a part of the MLIR infrastructure, it shares many compiler optimization passes with software compilers (such as constant propagation and inlining).
  • Pluto/Pluto+

    Pluto/Pluto+ is a source-to-source parallelization and optimization tool based on the polyhedral compiler framework. It can automatically optimize affine loop nests (sequences of imperfectly nested loops with regular data access patterns) for parallelism and locality using affine transformations. It can target both shared-memory multicore architectures (by generating code with OpenMP parallel pragmas) and distributed-memory architectures (by generating message passing MPI code). Pluto/Pluto+ is extensively used for advanced experimentation with loop optimization and parallelization, optimization of scientific stencil computations, and in university courses teaching loop transformations. More
  • PolyMage

    PolyMage is a domain-specific language and compiler for automatic parallelization and optimization of image processing pipelines. PolyMage takes an image processing pipeline expressed by the user in a high-level language (embedded in Python) and generates a C++ implementation of the pipeline optimized using the polyhedral framework as the intermediate representation. It uses OpenCV for image I/O handling, islpy/ISL for integer set operations, 'cgen' for AST code generation and 'OpenMP' to mark parallel loops. PolyMage uses an asymmetric overlapped tiling technique (overlapped tiling extended for heterogeneous accesses and non-constant dependence vectors) to exploit locality and parallelism simultaneously. It uses a model-driven approach to automatically fuse image processing pipeline stages for tiling, and employs an in-built autotuner to find the best performing code within a small well-defined search space. More
  • SMO

    SMO is a storage optimization tool for regular loop nests. The input to SMO is a specification of the set of conflicting array indices – two indices are said to be in conflict if the corresponding array elements are simultaneously live. A specified conflict could therefore be intra-array or inter-array. The output obtained is the modulo storage mapping using our technique for each array written in the regular loop nest. In the scenario when only one statement is involved, the global conflict set specification defines the set of conflicts associated with the array space written by the statement. More
  • TREEBEARD

    TREEBEARD is an optimizing compiler for decision tree inference. It generates an optimized, user-callable inference function from a high-level model description (XGBoost JSON for example). TREEBEARD combines several novel optimizations at various abstraction levels to mitigate architectural bottlenecks and enable SIMD vectorization of tree walks. TREEBEARD is implemented using the MLIR compiler infrastructure. Code generated by TREEBEARD is significantly faster than state-of-the-art systems.
  • GPU codegen for tensor cores (available "as is" under Apache 2 License)

Grants, Awards & Collaboration