-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
After downloading the data, I go to run bash run_finetune_h298.sh and get the following error:
Traceback (most recent call last):
File "finetune_pubchem_light.py", line 14, in <module>
from rotate_attention.rotate_builder import RotateEncoderBuilder as rotate_builder
File "/home/kpg/molformer/finetune/rotate_attention/rotate_builder.py", line 3, in <module>
from .attention_layer import RotateAttentionLayer
File "/home/kpg/molformer/finetune/rotate_attention/attention_layer.py", line 8, in <module>
from fast_transformers.attention import AttentionLayer
File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/__init__.py", line 13, in <module>
from .causal_linear_attention import CausalLinearAttention
File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/causal_linear_attention.py", line 15, in <module>
from ..causal_product import causal_dot_product
File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/__init__.py", line 9, in <module>
from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ImportError: /home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/causal_product_cpu.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexIfEEEEPKNS_6detail12TypeMetaDataEv
I get a similar error when running bash run_pubchem_light.sh:
Traceback (most recent call last):
File "train_pubchem_light.py", line 18, in <module>
from fast_transformers.builders import TransformerEncoderBuilder
File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/builders/__init__.py", line 42, in <module>
from ..attention import \
File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/__init__.py", line 13, in <module>
from .causal_linear_attention import CausalLinearAttention
File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/attention/causal_linear_attention.py", line 15, in <module>
from ..causal_product import causal_dot_product
File "/home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/__init__.py", line 9, in <module>
from .causal_product_cpu import causal_dot_product as causal_dot_product_cpu, \
ImportError: /home/kpg/miniconda3/envs/MolTran_CUDA11/lib/python3.8/site-packages/fast_transformers/causal_product/causal_product_cpu.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c107complexIfEEEEPKNS_6detail12TypeMetaDataEv
I set up my environment based on the instructions in environment.md as follows:
conda create --name MolTran_CUDA11 python=3.8.10
conda activate MolTran_CUDA11
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge
conda install rdkit==2021.03.2 pandas=1.2.4 scikit-learn=0.24.2 scipy=1.6.3 -c conda-forge
pip install transformers==4.6.0 pytorch-lightning==1.1.5 pytorch-fast-transformers==0.4.0 datasets==1.6.2 jupyterlab==3.4.0 ipywidgets==7.7.0 bertviz==1.4.0
git clone https://github.com/NVIDIA/apex
cd apex
export CUDA_HOME='/usr'
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
The differences between the above and the original instructions were:
- added
-c conda-forgeto the 2ndconda installcommand (it couldn't find the packages otherwise) export CUDA_HOME='/usr'(the actual location on my system, found usingwhich nvcc, which gave the output/usr/bin/nvcc)- changed the
pytorchandcudatoolkitversions to match thenvccversion I have installed, which is 11.6 (compiling Apex failed otherwise). I used the oldestpytorchversion that supportedcudatoolkit=11.6(based on instructions here) to maximize likelihood of compatibility since this repo was created usingpytorch==1.7.1 cudatoolkit=11.0.
Additional information that may be useful:
nvidia-smi: NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7
(MolTran_CUDA11) ~/molformer/finetune$ conda list | grep 'torch\|cuda'
cudatoolkit 11.6.0 hecad31d_10 conda-forge
ffmpeg 4.3 hf484d3e_0 pytorch
pytorch 1.12.0 py3.8_cuda11.6_cudnn8.3.2_0 pytorch
pytorch-fast-transformers 0.4.0 pypi_0 pypi
pytorch-lightning 1.1.5 pypi_0 pypi
pytorch-mutex 1.0 cuda pytorch
torchaudio 0.12.0 py38_cu116 pytorch
torchvision 0.13.0 py38_cu116 pytorch
(MolTran_CUDA11) ~/molformer/finetune$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
Based on similar errors people have gotten with other repos (e.g. here, here), it seems that the problem is related to my version of PyTorch, but I'm not sure how to resolve this while still allowing Apex to compile on my system. Is it possible to run this repo on a system using nvcc 11.6 / CUDA 11.7?
Metadata
Metadata
Assignees
Labels
No labels