Skip to content

mem: add strict mechanism in mdp#698

Open
happy-lx wants to merge 1 commit intoxs-devfrom
mdp-strict-perf
Open

mem: add strict mechanism in mdp#698
happy-lx wants to merge 1 commit intoxs-devfrom
mdp-strict-perf

Conversation

@happy-lx
Copy link
Contributor

@happy-lx happy-lx commented Jan 12, 2026

Change-Id: Ic8ecdc370e8ff3eb7d99065c7b389b062a133f5e

Summary by CodeRabbit

Release Notes

  • New Features
    • Added enable_storeSet_strict_wait configuration parameter to control strict-wait behavior in CPU memory dependency tracking
    • Enhanced store-set mechanism with advanced pending store management for improved dependency resolution

✏️ Tip: You can customize this high-level summary in your review settings.

Change-Id: Ic8ecdc370e8ff3eb7d99065c7b389b062a133f5e
@happy-lx happy-lx requested a review from tastynoob January 12, 2026 06:20
@coderabbitai
Copy link

coderabbitai bot commented Jan 12, 2026

📝 Walkthrough

Walkthrough

A new strict-wait mechanism is introduced to the StoreSet store dependency predictor. A configuration parameter enables the feature, which is propagated through the MemDepUnit to the StoreSet. The StoreSet tracks pending stores and strict-wait eligibility to refine load dependency predictions based on store-set strictness.

Changes

Cohort / File(s) Summary
Configuration
src/cpu/o3/BaseO3CPU.py
Added boolean parameter enable_storeSet_strict_wait (default: True) to control strict-wait behavior in the StoreSet predictor.
MemDepUnit Integration
src/cpu/o3/mem_dep_unit.cc
Propagates the configuration parameter to StoreSet via depPred.setStrictWaitEnabled() in constructor and init() method.
StoreSet Interface & Implementation
src/cpu/o3/store_set.hh, src/cpu/o3/store_set.cc
Adds new data structures (SSITStrict vector, pendingStores set) to track store eligibility and pending sequence numbers; introduces setStrictWaitEnabled() method and updates insertStore(), issued(), squash(), clear(), and checkInst() methods to implement strict-wait load waiting semantics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

  • Yakkhini

Poem

🐰 A rabbit hops through store-set paths so fine,
With pending records and strict-wait lines,
Each load now waits with cautious care,
For stores to finish—strict and fair!
Configuration flows, dependencies flow,
Better predictions help the O3 go! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'mem: add strict mechanism in mdp' directly describes the main changes: adding a strict-wait mechanism to the memory dependency predictor (store-set). It is concise and clearly indicates the primary modification across all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@XiangShanRobot
Copy link

[Generated by GEM5 Performance Robot]
commit: bf722ed
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 19.88 19.90 -0.07 🔴

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/cpu/o3/store_set.cc (1)

230-238: Clarify the strict-wait promotion logic.

When store_SSID == load_SSID (both already in the same store set), setting SSITStrict[load_index] = true promotes the load to strict-wait mode. This means repeated violations on the same store set trigger the more conservative strict-wait behavior, which aligns with the paper's approach of escalating conservatism for problematic PCs.

Consider adding a brief comment explaining this escalation policy for maintainability.

📝 Suggested documentation
         } else {
             SSIT[load_index] = store_SSID;

+            // Escalate to strict-wait when load/store are already in the same
+            // store set (repeated violation on the same set).
             SSITStrict[load_index] = (store_SSID == load_SSID);

             DPRINTF(StoreSet, "StoreSet: Store had smaller store set: %i; "
                     "for load %#x, store %#x\n",
                     store_SSID, load_PC, store_PC);
         }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39cd6af and bf722ed.

📒 Files selected for processing (4)
  • src/cpu/o3/BaseO3CPU.py
  • src/cpu/o3/mem_dep_unit.cc
  • src/cpu/o3/store_set.cc
  • src/cpu/o3/store_set.hh
🧰 Additional context used
🧬 Code graph analysis (2)
src/cpu/o3/mem_dep_unit.cc (1)
src/cpu/o3/mem_dep_unit.hh (1)
  • params (109-109)
src/cpu/o3/store_set.cc (3)
src/cpu/o3/store_set.hh (9)
  • StoreSet (71-71)
  • StoreSet (74-74)
  • StoreSet (77-77)
  • load_PC (95-95)
  • store_PC (84-84)
  • store_PC (99-99)
  • store_SSID (125-125)
  • PC (105-105)
  • squashed_num (113-113)
src/cpu/o3/inst_queue.hh (1)
  • store_seq_num (208-208)
src/cpu/o3/mem_dep_unit.hh (1)
  • squashed_num (152-152)
🪛 Ruff (0.14.10)
src/cpu/o3/BaseO3CPU.py

200-200: Param may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
🔇 Additional comments (16)
src/cpu/o3/mem_dep_unit.cc (2)

66-66: LGTM!

The setStrictWaitEnabled call is appropriately placed after the depPred object is constructed, correctly propagating the configuration parameter to the store-set predictor.


101-103: LGTM!

The setStrictWaitEnabled call in init() mirrors the constructor call, ensuring the strict-wait setting is consistently applied after depPred.init() completes.

src/cpu/o3/BaseO3CPU.py (1)

200-200: LGTM!

The new enable_storeSet_strict_wait parameter follows the established pattern and is logically grouped with enable_storeSet_train. The static analysis hint about Param being undefined is a false positive—it's imported via from m5.params import * on line 40.

src/cpu/o3/store_set.hh (3)

35-35: LGTM!

The <unordered_set> include is correctly added to support the new pendingStores member.


107-108: LGTM!

The inline setter is simple and follows the pattern of other configuration toggles in the codebase.


152-160: LGTM!

The pendingStores set and enableStrictWait flag are well-documented and appropriately declared. The default value of true for enableStrictWait matches the Python configuration default.

src/cpu/o3/store_set.cc (10)

57-62: LGTM!

SSITStrict is correctly resized and initialized alongside validSSIT in the constructor.


74-74: LGTM!

pendingStores.clear() ensures a clean state at construction.


118-127: LGTM!

The init() method correctly initializes SSITStrict and clears pendingStores, mirroring the constructor logic for the separate initialization path.


178-183: LGTM!

Resetting SSITStrict to false when establishing new store-set entries is correct—strict mode should only be enabled after repeated violations on the same store set.


275-275: LGTM!

Inserting into pendingStores before the SSIT validity check ensures all stores are tracked for strict-wait purposes, regardless of whether they have a valid store-set entry.


339-347: LGTM!

The strict-wait logic correctly returns all pending stores when the feature is enabled and the instruction is marked strict. This provides the conservative waiting behavior described in the PR.


380-381: LGTM!

Removing the store from pendingStores upon issue correctly maintains the set of outstanding stores.


450-450: LGTM!

SSITStrict is correctly reset alongside validSSIT in clear().


458-458: LGTM!

Clearing pendingStores in clear() ensures the strict-wait tracking is reset along with other predictor state.


422-428: Current implementation is appropriate for gem5's C++17 requirement.

The manual iterator-based erasure loop is the correct pattern for C++17. The std::erase_if suggestion would require C++20, which is not the project's target standard.

@github-actions
Copy link

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1691 -
This PR 2.1691 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants