Determine / document which systems we intend to support training / executing models on

We have already had quite a bit of trouble running example training on the systems that we have access to (our macbooks, Ubuntu workstations).  As of now, here is what we have attempted, the result, and the blockers:

| System                      | Result   | Blocker                           | Details                                                                                                        | Solution                                          | Related Issue                       |
| ---------------------------| -------- | ---------------------------------| -------------------------------------------------------------------------------------------------------------- | --------------------------------------------------| ------------------------------------|
| macOS Monterey 12.6 Intel i7| Fails    | C++ compile error (at runtime) | fatal error: 'omp.h' file not found.                                                                          | AllenCellModeling/aics-im2im#184                 |
| macOS Ventura 13.3.1 Intel i7| Fails    | C++ compile error (at runtime) | `libomp.dylib` not found.                                                                                    | `brew install libomp`<br>`export DYLD_LIBRARY_PATH=/usr/local/opt/libomp/lib:/usr/local/lib` |
| macOS Monterey 12.4 Apple M1| Fails    | C++ compile error (at runtime) | fatal error: 'omp.h' file not found.                                                                          | AllenCellModeling/aics-im2im#184                 |
| Ubuntu 16                   | Fails    | GPU driver runtime error         | `RuntimeError`: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver.| Update GPU driver from nvidia.com<br>OR install PyTorch version compiled with current CUDA driver. |
| Ubuntu 20 (EC2)             | Succeeds | .                                | .                                                                                                              | .                                                | .                                    |
| Slurm (CPU)                 | .        | .                                | .                                                                                                              | .                                                | .                                    |
| Slurm (GPU)                 | .        | .                                | .                                                                                                              | .                                                | .                                    |
| AWS cluster (GPU)           | .        | .                                | .                                                                                                              | .                                                | .                                    |

In all cases, the setup steps were:
- Create a fresh venv based on Python 3.8 or 3.9
- upgrade pip
- pip install wheel
- pip install boto3
- pip install -e .
- pip install requirements/requirements.txt
- python scripts download_test_data.py

And the experiment run was `python aics_im2im/train.py experiment=im2im/segmentation.yaml trainer=cpu`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine / document which systems we intend to support training / executing models on #154

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

System	Result	Blocker	Details	Solution	Related Issue
macOS Monterey 12.6 Intel i7	Fails	C++ compile error (at runtime)	fatal error: 'omp.h' file not found.	AllenCell/cyto-dl#184
macOS Ventura 13.3.1 Intel i7	Fails	C++ compile error (at runtime)	`libomp.dylib` not found.	`brew install libomp` `export DYLD_LIBRARY_PATH=/usr/local/opt/libomp/lib:/usr/local/lib`
macOS Monterey 12.4 Apple M1	Fails	C++ compile error (at runtime)	fatal error: 'omp.h' file not found.	AllenCell/cyto-dl#184
Ubuntu 16	Fails	GPU driver runtime error	`RuntimeError`: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver.	Update GPU driver from nvidia.com OR install PyTorch version compiled with current CUDA driver.
Ubuntu 20 (EC2)	Succeeds	.	.	.	.
Slurm (CPU)	.	.	.	.	.
Slurm (GPU)	.	.	.	.	.
AWS cluster (GPU)	.	.	.	.	.

Determine / document which systems we intend to support training / executing models on #154

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions