-
Notifications
You must be signed in to change notification settings - Fork 91
Open
Description
I have built the apex module based on the procedure explained but when trying to train the model on cifar10, I get:
/lustre03/project/6054857/mehranag/vdvae/data.py:147: FutureWarning: arrays to stack must be passed as a "sequence" type such as list or tuple. Support for non-sequence iterables such as generators is deprecated as of NumPy 1.16 and will raise an error in the future.
trX = np.vstack(data['data'] for data in tr_data)
Traceback (most recent call last):
File "train.py", line 144, in <module>
main()
File "train.py", line 140, in main
train_loop(H, data_train, data_valid_or_test, preprocess_fn, vae, ema_vae, logprint)
File "train.py", line 59, in train_loop
optimizer, scheduler, cur_eval_loss, iterate, starting_epoch = load_opt(H, vae, logprint)
File "/lustre03/project/6054857/mehranag/vdvae/train_helpers.py", line 180, in load_opt
optimizer = AdamW(vae.parameters(), weight_decay=H.wd, lr=H.lr, betas=(H.adam_beta1, H.adam_beta2))
File "/home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fused_adam.py", line 79, in __init__
raise RuntimeError('apex.optimizers.FusedAdam requires cuda extensions')
RuntimeError: apex.optimizers.FusedAdam requires cuda extensions
I understand that this is an apex-related issue since I get the following error when trying to run examples/simple/distributed in the apex repo:
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ImportError("/lib64/libm.so.6: version `GLIBC_2.29' not found (required by /home/mehranag/anaconda3/envs/env/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/amp_C.cpython-36m-x86_64-linux-gnu.so)",)
final loss = tensor(0.5392, device='cuda:0', grad_fn=<MseLossBackward>)
I have tried many things to fix this issue but no luck. I have two questions:
- Does anybody know why I get
FusedAdam requires cuda extensionseven though I build apex with--global-option="--cpp_ext" --global-option="--cuda_ext"options? - How can I avoid using apex? - I am only trying to test some stuff on cifar10 and don't need the distributed training feature considering that I'm getting some weird errors!
Metadata
Metadata
Assignees
Labels
No labels