Skip to content

make it build on arch/py39#3

Open
min-xu-ai wants to merge 1 commit intomsbaines:mainfrom
min-xu-ai:main
Open

make it build on arch/py39#3
min-xu-ai wants to merge 1 commit intomsbaines:mainfrom
min-xu-ai:main

Conversation

@min-xu-ai
Copy link

MPI_HOME can still be used in case inc/lib are from the same dir. Otherwise, use separate env variables for control it.

It builds and install fine on my arch linux, but running the test still seg fault:

Jan 27 18:22:49 titan audit[30317]: ANOM_ABEND auid=1000 uid=1000 gid=1001 ses=13 pid=30317 comm="python" exe="/usr/bin/python3.9" sig=11 res=1
Jan 27 18:22:49 titan kernel: python[30462]: segfault at 55f100000483 ip 00007f0cc0a549aa sp 00007f0b05058db0 error 4 in libpython3.9.so.1.0[7f0cc089a000+217000]

It could be that I don't have enough memory. Will need to debug more.

@min-xu-ai
Copy link
Author

@msbaines, does this look ok?

Actually, it seems only test_backward is seg faulting. Those 4 tests are fine on my system.

mpirun -n 4 python -m pytest -p torch_pg.pytest --only-mpi --junitxml=test-results/junit.xml --verbose tests/nn/moe/test_moe_layer.py::test_forward
mpirun -n 4 python -m pytest -p torch_pg.pytest --only-mpi --junitxml=test-results/junit.xml --verbose tests/nn/moe/test_moe_layer.py::test_forward_multi
mpirun -n 4 python -m pytest -p torch_pg.pytest --only-mpi --junitxml=test-results/junit.xml --verbose tests/nn/moe/test_moe_layer.py::test_forward_routing
mpirun -n 4 python -m pytest -p torch_pg.pytest --only-mpi --junitxml=test-results/junit.xml --verbose tests/nn/moe/test_moe_layer.py::test_forward_routing_multi

This is strange because CI is failing test_forward with py3.7.

I am using py3.9 BTW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant