Fix low cpu #1253

wenhuach21 · 2026-01-09T11:12:58Z

No description provided.

for more information, see https://pre-commit.ci

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

for more information, see https://pre-commit.ci

…ix_0108 # Conflicts: # auto_round/compressors/base.py

for more information, see https://pre-commit.ci

…ix_0108

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR addresses low CPU memory issues during model quantization by optimizing memory management and refactoring immediate saving functionality. The changes focus on reducing memory pressure during the quantization process through better cleanup and more efficient shard management.

Key changes:

Renamed immediate_saving to immediate_save and related boolean flags for consistency
Enhanced memory cleanup in layer release and shard flushing operations
Reduced default shard size from 5GB to 1GB and removed unnecessary clear_memory() calls
Added caching for quantized_layer_names_outside_blocks to avoid repeated computation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
auto_round/export/utils.py	Added torch import and enhanced `release_layer_safely` to handle additional tensor attributes (scale, zp) with type checking
auto_round/compressors/utils.py	Renamed function to `immediate_save`, reduced default shard size, cached layer names, and removed redundant memory clearing calls
auto_round/compressors/base.py	Renamed immediate saving/packing flags for consistency, added caching for module names, and updated all references throughout the file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-09T11:13:38Z

auto_round/compressors/base.py

+        pass
+


This new method is empty and appears to serve no purpose. If it's intended for future implementation, add a docstring explaining its purpose and raise NotImplementedError. Otherwise, remove it to avoid confusion.

Suggested change

pass

"""Quantize layers using RTN with immediate saving of quantized results.

This method is intended to perform RTN-based quantization on individual layers

and immediately persist the quantized artifacts, instead of deferring saving

until the entire model quantization process completes.

"""

raise NotImplementedError("quantize_layer_via_rtn_immediate_save is not implemented yet.")

Copilot · 2026-01-09T11:13:39Z

auto_round/compressors/base.py

+                    # from pympler import muppy, summary
+                    #
+                    # sum1 = summary.summarize(muppy.get_objects())
+                    # summary.print_(sum1)


Commented-out debug code should be removed before merging to production. This appears to be temporary debugging code left in the implementation.

Suggested change

# from pympler import muppy, summary

#

# sum1 = summary.summarize(muppy.get_objects())

# summary.print_(sum1)

Copilot · 2026-01-09T11:13:39Z

auto_round/compressors/base.py

                        all_to_quantized_module_names.remove(m.tmp_name)
-                if not self.immediate_saving:
+
+                immediate_save(self, block, last_group=False)


This call to immediate_save is unconditional and will execute even when is_immediate_saving is False. This should likely be guarded with if self.is_immediate_saving: to match the pattern used elsewhere in the code (lines 1172, 1411, 1697, 1940).

Suggested change

immediate_save(self, block, last_group=False)

if self.is_immediate_saving:

immediate_save(self, block, last_group=False)

wenhuach21 and others added 26 commits January 8, 2026 17:20

update

c13805f

[pre-commit.ci] auto fixes from pre-commit.com hooks

33b23d8

for more information, see https://pre-commit.ci

update fix

fce2600

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

revert

ae46a3d

Merge branch 'main' into fix_0108

9d6a864

[pre-commit.ci] auto fixes from pre-commit.com hooks

b166138

for more information, see https://pre-commit.ci

fix bug

e49d382

update

a3cc6d0

Merge branch 'fix_0108' of https://github.com/intel/auto-round into f…

f9ec5f8

…ix_0108 # Conflicts: # auto_round/compressors/base.py

fix

f59c6d1

[pre-commit.ci] auto fixes from pre-commit.com hooks

7d2a0d8

for more information, see https://pre-commit.ci

refine log info

7a3d77a

[pre-commit.ci] auto fixes from pre-commit.com hooks

bcf4121

for more information, see https://pre-commit.ci

refine log info

a25272d

[pre-commit.ci] auto fixes from pre-commit.com hooks

190bb4c

for more information, see https://pre-commit.ci

refine

716e19a

refine

b361258

Merge branch 'fix_0108' of https://github.com/intel/auto-round into f…

8c7381b

…ix_0108

fix

d2ad46b

fix

244f4d8

Merge branch 'main' into fix_0108

f506043

[pre-commit.ci] auto fixes from pre-commit.com hooks

e7ed1db

for more information, see https://pre-commit.ci

fix

8b36bcc

Merge branch 'main' into fix_low_cpu

d66ecbb

fix

63ffa6f

tmp_change

4328926

Copilot AI review requested due to automatic review settings January 9, 2026 11:12

Merge branch 'main' into fix_low_cpu

3bb236e

Copilot AI reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix low cpu #1253

Fix low cpu #1253

Uh oh!

wenhuach21 commented Jan 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 9, 2026

Uh oh!

Copilot AI Jan 9, 2026

Uh oh!

Copilot AI Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-        pass
+        """Quantize layers using RTN with immediate saving of quantized results.
+        This method is intended to perform RTN-based quantization on individual layers
+        and immediately persist the quantized artifacts, instead of deferring saving
+        until the entire model quantization process completes.
+        """
+        raise NotImplementedError("quantize_layer_via_rtn_immediate_save is not implemented yet.")

	immediate_save(self, block, last_group=False)
	if self.is_immediate_saving:
	immediate_save(self, block, last_group=False)

Fix low cpu #1253

Are you sure you want to change the base?

Fix low cpu #1253

Uh oh!

Conversation

wenhuach21 commented Jan 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants