Skip to content

jsarrao/numba-mlir

 
 

Repository files navigation

MLIR-based numba backend

The goal of this project is to provide efficient code generation for CPUs and GPUs using Multi-Level Intermediate Representation (MLIR) infrastructure. It uses Numba infrastructure as a frontend but have completely separate codepaths through MLIR infrastructure for low level code generation.

Package provides set of decorators similar to Numba decorators to compile python code.

Example:

from numba_mlir import njit
import numpy as np

@njit
def foo(a, b):
    return a + b

result = foo(np.array([1,2,3]), np.array([4,5,6]))
print(result)

Building and testing

You will need LLVM built from specific commit, found in llvm-sha.txt.

Linux

Building llvm

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
git checkout $SHA
cd ..
mkdir llvm-build
cd llvm-build
cmake ../llvm-project/llvm -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_RTTI=ON -DLLVM_USE_LINKER=gold -DLLVM_INSTALL_UTILS=ON -DCMAKE_INSTALL_PREFIX=../llvm-install
ninja install

Building and testing Python package

cd numba_mlir
conda create -n test-env python=3.9 numba=0.58 numpy=1.24 "setuptools<65.6" scikit-learn pytest-xdist ninja scipy pybind11 pytest lit tbb=2021.10.0 tbb-devel=2021.10.0 cmake "mkl-devel-dpcpp=2024.0.0" dpcpp_linux-64 level-zero-devel -c conda-forge -c intel -c numba
conda activate test-env
export LLVM_PATH=<...>/llvm-install
export NUMBA_MLIR_USE_SYCL=ON # Optional
python setup.py develop
pytest -n8 --capture=tee-sys -rXF

Windows

TBD

Using GPU offload

Kernel offload example:

from numba_mlir.kernel import kernel, get_global_id, DEFAULT_LOCAL_SIZE
import numpy as np
import dpctl.tensor as dpt

@kernel
def foo(a, b, c):
    i = get_global_id(0)
    j = get_global_id(1)
    c[i, j] = a[i, j] + b[i, j]

a = np.array([[1,2,3],[4,5,6]])
b = np.array([[7,8,9],[-1,-2,-3]])

print(a + b)

device = "gpu"
a = dpt.asarray(a, device=device)
b = dpt.asarray(b, device=device)
c = dpt.empty(a.shape, dtype=a.dtype, device=device)

foo[a.shape, DEFAULT_LOCAL_SIZE] (a, b, c)

result = dpt.asnumpy(c)
print(result)

Numpy offload example:

from numba_mlir import njit
import numpy as np
import dpctl.tensor as dpt

@njit(parallel=True)
def foo(a, b):
    return a + b

a = np.array([[1,2,3],[4,5,6]])
b = np.array([[1,2,3]])

print(a + b)

a = dpt.asarray(a, device="gpu")
b = dpt.asarray(b, device="gpu")

result = foo(a, b)
print(result)

Contributing

We are using github issues to report issues and github pull requests for development.

Code of Conduct

About

POC work on MLIR backend

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 56.1%
  • Python 30.2%
  • MLIR 10.0%
  • CMake 2.4%
  • LLVM 1.3%
  • Batchfile 0.0%