Skip to content

auto shutdown fails when gpu mem usage ~ 0 but utilization is >5% #4

@AmirTuring

Description

@AmirTuring

Reproduction

from trl import ...

outputs:

Traceback (most recent call last):
  File "example.py", line 42, in <module>
    ...

System Info

When GPU memory usage is near 0, GPU util might be 100% or >5%, in this case, the pod will remain open.

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions