Skip to content

Conversation

@mzweilin
Copy link
Contributor

@mzweilin mzweilin commented May 1, 2025

What does this PR do?

This PR changes the return of training/validation/test_step() by returning the full dictionary, so that Callback can visualize the intermediate representations in the hooks of on_training/validation/test_batch_end().

This is useful because Lightning can only log scalar data. We need customized callbacks to log media data, such as image and text.

I wonder if we should detach all tensors except the loss to avoid memory leaks. I don't observe any difference in running the test workload CUDA_VISIBLE_DEVICES=0 python -m mart experiment=CIFAR10_CNN_Adv trainer=gpu trainer.precision=16.

Type of change

Please check all relevant options.

  • Improvement (non-breaking)
  • Bug fix (non-breaking)
  • New feature (non-breaking)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Testing

Please describe the tests that you ran to verify your changes. Consider listing any relevant details of your test configuration.

  • pytest
  • CUDA_VISIBLE_DEVICES=0 python -m mart experiment=CIFAR10_CNN_Adv trainer=gpu trainer.precision=16 reports 70% (21 sec/epoch).
  • CUDA_VISIBLE_DEVICES=0,1 python -m mart experiment=CIFAR10_CNN_Adv trainer=ddp trainer.precision=16 trainer.devices=2 model.optimizer.lr=0.2 trainer.max_steps=2925 datamodule.ims_per_batch=256 datamodule.world_size=2 reports 70% (14 sec/epoch).

Before submitting

  • The title is self-explanatory and the description concisely explains the PR
  • My PR does only one thing, instead of bundling different changes together
  • I list all the breaking changes introduced by this pull request
  • I have commented my code
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have run pre-commit hooks with pre-commit run -a command without errors

Did you have fun?

Make sure you had fun coding 🙃

@mzweilin mzweilin requested a review from dxoigmn May 1, 2025 22:29
@mzweilin mzweilin merged commit 5c093c5 into main May 1, 2025
6 checks passed
@mzweilin mzweilin deleted the mzweilin/litmodular_return_full_dict_in_step branch May 1, 2025 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants