Skip to content

Latest commit

 

History

History
67 lines (48 loc) · 2.55 KB

File metadata and controls

67 lines (48 loc) · 2.55 KB

TensorCraft-HPC

English | 简体中文 | Docs

CI Docs License: MIT CUDA C++ CMake Python

TensorCraft-HPC is a modern C++/CUDA AI kernel library for studying and validating GEMM, attention, convolution, normalization, sparse operators, and quantization.

Repository Overview

  • Header-first kernel library under include/tensorcraft/
  • Python bindings in src/python_ops/
  • Tests in tests/
  • Benchmarks in benchmarks/
  • Project docs on GitHub Pages

Quick Start

Recommended on a CUDA development machine:

cmake --preset dev
cmake --build --preset dev --parallel 2
ctest --preset dev --output-on-failure
python -m pip install -e .
python -c "import tensorcraft_ops as tc; print(tc.__version__)"

Build Presets

  • dev: recommended day-to-day CUDA development preset; single architecture, tests on, Python on
  • python-dev: lighter CUDA preset focused on building tensorcraft_ops
  • release: heavier full build, including benchmarks
  • cpu-smoke: CPU-only configure/install smoke validation; tests and Python bindings are disabled

Build Notes

  • This repository targets the local CUDA 12.8 toolkit at /usr/local/cuda/bin/nvcc
  • CMake presets and Python builds pin CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
  • If CUDA is unavailable, CMake disables tests, benchmarks, and Python bindings automatically
  • If build pressure is high, prefer dev/python-dev, keep --parallel low, and set a single CMAKE_CUDA_ARCHITECTURES value for your GPU

Python Bindings

The pybind11 module is exposed as tensorcraft_ops.

python -m pip install -e .
python -c "import tensorcraft_ops as tc; print(tc.__version__)"

Docs

  • Project docs: https://lessup.github.io/modern-ai-kernels/
  • Installation: docs/INSTALL.md
  • Troubleshooting: docs/TROUBLESHOOTING.md
  • Contribution workflow: CONTRIBUTING.md

License

MIT License