RNAPro/docs/kernels.md at main · NVIDIA-Digital-Bio/RNAPro

Setting up kernels

Highlight:
RNAPro kernel setup (custom CUDA kernels, triangle attention/multiplicative, etc.) follows the configuration from Protenix.
Most acceleration options and installation steps are the same or compatible with Protenix.
⚡ The guide below is an explanation from the Protenix documentation.

Custom CUDA layernorm kernels modified from FastFold and Oneflow accelerate about 30%-50% during different training stages. To use this feature, run the following command:
```
export LAYERNORM_TYPE=fast_layernorm
```
If the environment variable LAYERNORM_TYPE is set to fast_layernorm, the model will employ the layernorm we have developed; otherwise, the naive PyTorch layernorm will be adopted. The kernels will be compiled when fast_layernorm is called for the first time.
Triangle_attention Kernel Options The model supports four implementations for triangle attention, configurable in configs_base.py:
```
triangle_attention = "cuequivariance"  # or "triattention"/"deepspeed"/"torch"
```
1. cuEquivariance Kernel(Default) Optimized implementation using NVIDIA's cuEquivariance library.
2. TriAttention kernel
  
  Custom kernel implementation from rnapro/model/tri_attention/
3. DeepSpeed DS4Sci_EvoformerAttention kernel is a memory-efficient attention kernel developed as part of a collaboration between OpenFold and the DeepSpeed4Science initiative.
  
  DS4Sci_EvoformerAttention is implemented based on CUTLASS. If you use this feature, You need to clone the CUTLASS repository and specify the path to it in the environment variable CUTLASS_PATH. The Dockerfile has already include this setting:
```
RUN git clone -b v3.5.1 https://github.com/NVIDIA/cutlass.git  /opt/cutlass
ENV CUTLASS_PATH=/opt/cutlass
```
  If you set up RNAPro by pip, you can set environment variable CUTLASS_PATH as follows:
```
git clone -b v3.5.1 https://github.com/NVIDIA/cutlass.git  /path/to/cutlass
export CUTLASS_PATH=/path/to/cutlass
```
  The kernels will be compiled when DS4Sci_EvoformerAttention is called for the first time.
Triangle_multiplicative Kernel Options

The Triangle Multiplicative operation supports two implementations,Configuration configurable in configs_base.py:
```
triangle_multiplicative = "cuequivariance"  # or "torch"
```
1. cuEquivariance Kernel (Default) Optimized implementation using NVIDIA's cuEquivariance library.
2. Torch Native Standard PyTorch implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting up kernels

FilesExpand file tree

kernels.md

Latest commit

History

kernels.md

File metadata and controls

Setting up kernels