Currently CUDA code is compiled by doing a set of system calls from Python in `setup.py`. It will be perhaps cleaner to do so via Makefile.