ImportError - fast_transformers/causal_product undefined symbol - unable to train or finetune

After downloading the data, I go to run `bash run_finetune_h298.sh` and get the following error:
```
Traceback (most recent call last):
  File "finetune_pubchem_light.py", line 14, in <module>
    from rotate_attention.rotate_builder import RotateEncoderBuilder as rotate_builder
  File "/home/kpg/molformer/finetune/rotate_attention/rotate_builder.py", line 3, in <module>
    from .attention_layer import RotateAttentionLayer
  File "/home/kpg/molformer/finetune/rotate_attention/attention_layer.py", line 8, in <module>
    from fast_transformers.attention import AttentionLayer
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/__init__.py", line 13, in <module>
    from .causal_linear_attention import CausalLinearAttention
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/causal_linear_attention.py", line 15, in <module>
    from ..causal_product import causal_dot_product
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ImportError: /home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/causal_product_cpu.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexIfEEEEPKNS_6detail12TypeMetaDataEv
```

I get a similar error when running `bash run_pubchem_light.sh`:
```
Traceback (most recent call last):
  File "train_pubchem_light.py", line 18, in <module>
    from fast_transformers.builders import TransformerEncoderBuilder
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/builders/__init__.py", line 42, in <module>
    from ..attention import \
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/__init__.py", line 13, in <module>
    from .causal_linear_attention import CausalLinearAttention
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/causal_linear_attention.py", line 15, in <module>
    from ..causal_product import causal_dot_product
  File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/__init__.py", line 9, in <module>
    from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ImportError: /home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/causal_product_cpu.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexIfEEEEPKNS_6detail12TypeMetaDataEv
```

I set up my environment based on the instructions in [environment.md](https://github.com/IBM/molformer/blob/main/environment.md) as follows:

```
conda create --name MolTran_CUDA11 python=3.8.10
conda activate MolTran_CUDA11

conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
conda install rdkit==2021.03.2 pandas=1.2.4 scikit-learn=0.24.2 scipy=1.6.3 -c conda-forge

pip install transformers==4.6.0 pytorch-lightning==1.1.5 pytorch-fast-transformers==0.4.0 datasets==1.6.2 jupyterlab==3.4.0 ipywidgets==7.7.0 bertviz==1.4.0

git clone https://github.com/NVIDIA/apex
cd apex
export CUDA_HOME='/usr'
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
```

The differences between the above and the original instructions were:
1. added `-c conda-forge` to the 2nd `conda install` command (it couldn't find the packages otherwise)
2. `export CUDA_HOME='/usr'` (the actual location on my system, found using `which nvcc`, which gave the output `/usr/bin/nvcc`)
3. changed the `pytorch` and `cudatoolkit` versions to match the `nvcc` version I have installed, which is 11.6 (compiling Apex failed otherwise). I used the oldest `pytorch` version that supported `cudatoolkit=11.6` (based on instructions [here](https://pytorch.org/get-started/previous-versions/)) to maximize likelihood of compatibility since this repo was created using `pytorch==1.7.1 cudatoolkit=11.0`.

Additional information that may be useful:

`nvidia-smi`: `NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7`

```
(MolTran_CUDA11) ~/molformer/finetune$ conda list | grep 'torch\|cuda'
cudatoolkit               11.6.0              hecad31d_10    conda-forge
ffmpeg                    4.3                  hf484d3e_0    pytorch
pytorch                   1.12.0          py3.8_cuda11.6_cudnn8.3.2_0    pytorch
pytorch-fast-transformers 0.4.0                    pypi_0    pypi
pytorch-lightning         1.1.5                    pypi_0    pypi
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                0.12.0               py38_cu116    pytorch
torchvision               0.13.0               py38_cu116    pytorch
```

```
(MolTran_CUDA11) ~/molformer/finetune$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
```

Based on similar errors people have gotten with other repos (e.g. [here](https://stackoverflow.com/questions/67117097/c-cpython-38-x86-64-linux-gnu-so-undefined-symbol-zn6caffe28typemeta21-typem), [here](https://github.com/facebookresearch/maskrcnn-benchmark/issues/891)), it seems that the problem is related to my version of PyTorch, but I'm not sure how to resolve this while still allowing Apex to compile on my system. Is it possible to run this repo on a system using nvcc 11.6 / CUDA 11.7?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ImportError - fast_transformers/causal_product undefined symbol - unable to train or finetune #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ImportError - fast_transformers/causal_product undefined symbol - unable to train or finetune #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions