Skip to content

ImportError - fast_transformers/causal_product undefined symbol - unable to train or finetune #6

@kevingreenman

Description

@kevingreenman

After downloading the data, I go to run bash run_finetune_h298.sh and get the following error:

Traceback (most recent call last):
  File "finetune_pubchem_light.py", line 14, in <module>
    from rotate_attention.rotate_builder import RotateEncoderBuilder as rotate_builder
  File "/home/kpg/molformer/finetune/rotate_attention/rotate_builder.py", line 3, in <module>
    from .attention_layer import RotateAttentionLayer
  File "/home/kpg/molformer/finetune/rotate_attention/attention_layer.py", line 8, in <module>
    from fast_transformers.attention import AttentionLayer
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/__init__.py", line 13, in <module>
    from .causal_linear_attention import CausalLinearAttention
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/causal_linear_attention.py", line 15, in <module>
    from ..causal_product import causal_dot_product
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ImportError: /home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/causal_product_cpu.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexIfEEEEPKNS_6detail12TypeMetaDataEv

I get a similar error when running bash run_pubchem_light.sh:

Traceback (most recent call last):
  File "train_pubchem_light.py", line 18, in <module>
    from fast_transformers.builders import TransformerEncoderBuilder
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/builders/__init__.py", line 42, in <module>
    from ..attention import \
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/__init__.py", line 13, in <module>
    from .causal_linear_attention import CausalLinearAttention
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/causal_linear_attention.py", line 15, in <module>
    from ..causal_product import causal_dot_product
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ImportError: /home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/causal_product_cpu.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexIfEEEEPKNS_6detail12TypeMetaDataEv

I set up my environment based on the instructions in environment.md as follows:

conda create --name MolTran_CUDA11 python=3.8.10
conda activate MolTran_CUDA11

conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
conda install rdkit==2021.03.2 pandas=1.2.4 scikit-learn=0.24.2 scipy=1.6.3 -c conda-forge

pip install transformers==4.6.0 pytorch-lightning==1.1.5 pytorch-fast-transformers==0.4.0 datasets==1.6.2 jupyterlab==3.4.0 ipywidgets==7.7.0 bertviz==1.4.0

git clone https://github.com/NVIDIA/apex
cd apex
export CUDA_HOME='/usr'
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

The differences between the above and the original instructions were:

  1. added -c conda-forge to the 2nd conda install command (it couldn't find the packages otherwise)
  2. export CUDA_HOME='/usr' (the actual location on my system, found using which nvcc, which gave the output /usr/bin/nvcc)
  3. changed the pytorch and cudatoolkit versions to match the nvcc version I have installed, which is 11.6 (compiling Apex failed otherwise). I used the oldest pytorch version that supported cudatoolkit=11.6 (based on instructions here) to maximize likelihood of compatibility since this repo was created using pytorch==1.7.1 cudatoolkit=11.0.

Additional information that may be useful:

nvidia-smi: NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7

(MolTran_CUDA11) ~/molformer/finetune$ conda list | grep 'torch\|cuda'
cudatoolkit               11.6.0              hecad31d_10    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
pytorch                   1.12.0          py3.8_cuda11.6_cudnn8.3.2_0    pytorch
pytorch-fast-transformers 0.4.0                    pypi_0    pypi
pytorch-lightning         1.1.5                    pypi_0    pypi
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                0.12.0               py38_cu116    pytorch
torchvision               0.13.0               py38_cu116    pytorch
(MolTran_CUDA11) ~/molformer/finetune$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

Based on similar errors people have gotten with other repos (e.g. here, here), it seems that the problem is related to my version of PyTorch, but I'm not sure how to resolve this while still allowing Apex to compile on my system. Is it possible to run this repo on a system using nvcc 11.6 / CUDA 11.7?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions