Skip to content

Conversation

@wenhuach21
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings January 9, 2026 11:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses low CPU memory issues during model quantization by optimizing memory management and refactoring immediate saving functionality. The changes focus on reducing memory pressure during the quantization process through better cleanup and more efficient shard management.

Key changes:

  • Renamed immediate_saving to immediate_save and related boolean flags for consistency
  • Enhanced memory cleanup in layer release and shard flushing operations
  • Reduced default shard size from 5GB to 1GB and removed unnecessary clear_memory() calls
  • Added caching for quantized_layer_names_outside_blocks to avoid repeated computation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
auto_round/export/utils.py Added torch import and enhanced release_layer_safely to handle additional tensor attributes (scale, zp) with type checking
auto_round/compressors/utils.py Renamed function to immediate_save, reduced default shard size, cached layer names, and removed redundant memory clearing calls
auto_round/compressors/base.py Renamed immediate saving/packing flags for consistency, added caching for module names, and updated all references throughout the file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1299 to +1300
pass

Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new method is empty and appears to serve no purpose. If it's intended for future implementation, add a docstring explaining its purpose and raise NotImplementedError. Otherwise, remove it to avoid confusion.

Suggested change
pass
"""Quantize layers using RTN with immediate saving of quantized results.
This method is intended to perform RTN-based quantization on individual layers
and immediately persist the quantized artifacts, instead of deferring saving
until the entire model quantization process completes.
"""
raise NotImplementedError("quantize_layer_via_rtn_immediate_save is not implemented yet.")

Copilot uses AI. Check for mistakes.
Comment on lines +1287 to +1290
# from pympler import muppy, summary
#
# sum1 = summary.summarize(muppy.get_objects())
# summary.print_(sum1)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out debug code should be removed before merging to production. This appears to be temporary debugging code left in the implementation.

Suggested change
# from pympler import muppy, summary
#
# sum1 = summary.summarize(muppy.get_objects())
# summary.print_(sum1)

Copilot uses AI. Check for mistakes.
all_to_quantized_module_names.remove(m.tmp_name)
if not self.immediate_saving:

immediate_save(self, block, last_group=False)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call to immediate_save is unconditional and will execute even when is_immediate_saving is False. This should likely be guarded with if self.is_immediate_saving: to match the pattern used elsewhere in the code (lines 1172, 1411, 1697, 1940).

Suggested change
immediate_save(self, block, last_group=False)
if self.is_immediate_saving:
immediate_save(self, block, last_group=False)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants