-
Notifications
You must be signed in to change notification settings - Fork 69
Fix low cpu #1253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix low cpu #1253
Conversation
for more information, see https://pre-commit.ci
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
for more information, see https://pre-commit.ci
…ix_0108 # Conflicts: # auto_round/compressors/base.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses low CPU memory issues during model quantization by optimizing memory management and refactoring immediate saving functionality. The changes focus on reducing memory pressure during the quantization process through better cleanup and more efficient shard management.
Key changes:
- Renamed
immediate_savingtoimmediate_saveand related boolean flags for consistency - Enhanced memory cleanup in layer release and shard flushing operations
- Reduced default shard size from 5GB to 1GB and removed unnecessary
clear_memory()calls - Added caching for
quantized_layer_names_outside_blocksto avoid repeated computation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| auto_round/export/utils.py | Added torch import and enhanced release_layer_safely to handle additional tensor attributes (scale, zp) with type checking |
| auto_round/compressors/utils.py | Renamed function to immediate_save, reduced default shard size, cached layer names, and removed redundant memory clearing calls |
| auto_round/compressors/base.py | Renamed immediate saving/packing flags for consistency, added caching for module names, and updated all references throughout the file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pass | ||
|
|
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new method is empty and appears to serve no purpose. If it's intended for future implementation, add a docstring explaining its purpose and raise NotImplementedError. Otherwise, remove it to avoid confusion.
| pass | |
| """Quantize layers using RTN with immediate saving of quantized results. | |
| This method is intended to perform RTN-based quantization on individual layers | |
| and immediately persist the quantized artifacts, instead of deferring saving | |
| until the entire model quantization process completes. | |
| """ | |
| raise NotImplementedError("quantize_layer_via_rtn_immediate_save is not implemented yet.") |
| # from pympler import muppy, summary | ||
| # | ||
| # sum1 = summary.summarize(muppy.get_objects()) | ||
| # summary.print_(sum1) |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out debug code should be removed before merging to production. This appears to be temporary debugging code left in the implementation.
| # from pympler import muppy, summary | |
| # | |
| # sum1 = summary.summarize(muppy.get_objects()) | |
| # summary.print_(sum1) |
| all_to_quantized_module_names.remove(m.tmp_name) | ||
| if not self.immediate_saving: | ||
|
|
||
| immediate_save(self, block, last_group=False) |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This call to immediate_save is unconditional and will execute even when is_immediate_saving is False. This should likely be guarded with if self.is_immediate_saving: to match the pattern used elsewhere in the code (lines 1172, 1411, 1697, 1940).
| immediate_save(self, block, last_group=False) | |
| if self.is_immediate_saving: | |
| immediate_save(self, block, last_group=False) |
No description provided.