We have already had quite a bit of trouble running example training on the systems that we have access to (our macbooks, Ubuntu workstations). As of now, here is what we have attempted, the result, and the blockers:
| System |
Result |
Blocker |
Details |
Solution |
Related Issue |
| macOS Monterey 12.6 Intel i7 |
Fails |
C++ compile error (at runtime) |
fatal error: 'omp.h' file not found. |
AllenCell/cyto-dl#184 |
|
| macOS Ventura 13.3.1 Intel i7 |
Fails |
C++ compile error (at runtime) |
libomp.dylib not found. |
brew install libomp
export DYLD_LIBRARY_PATH=/usr/local/opt/libomp/lib:/usr/local/lib |
|
| macOS Monterey 12.4 Apple M1 |
Fails |
C++ compile error (at runtime) |
fatal error: 'omp.h' file not found. |
AllenCell/cyto-dl#184 |
|
| Ubuntu 16 |
Fails |
GPU driver runtime error |
RuntimeError: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver. |
Update GPU driver from nvidia.com OR install PyTorch version compiled with current CUDA driver. |
|
| Ubuntu 20 (EC2) |
Succeeds |
. |
. |
. |
. |
| Slurm (CPU) |
. |
. |
. |
. |
. |
| Slurm (GPU) |
. |
. |
. |
. |
. |
| AWS cluster (GPU) |
. |
. |
. |
. |
. |
In all cases, the setup steps were:
- Create a fresh venv based on Python 3.8 or 3.9
- upgrade pip
- pip install wheel
- pip install boto3
- pip install -e .
- pip install requirements/requirements.txt
- python scripts download_test_data.py
And the experiment run was python aics_im2im/train.py experiment=im2im/segmentation.yaml trainer=cpu
We have already had quite a bit of trouble running example training on the systems that we have access to (our macbooks, Ubuntu workstations). As of now, here is what we have attempted, the result, and the blockers:
libomp.dylibnot found.brew install libompexport DYLD_LIBRARY_PATH=/usr/local/opt/libomp/lib:/usr/local/libRuntimeError: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver.OR install PyTorch version compiled with current CUDA driver.
In all cases, the setup steps were:
And the experiment run was
python aics_im2im/train.py experiment=im2im/segmentation.yaml trainer=cpu