cpu-o3: add 2Fetch features by jensen-yan · Pull Request #700 · OpenXiangShan/GEM5

jensen-yan · 2026-01-12T07:14:57Z

Introduced support for 2Fetch features in the branch predictor and Fetch Stage.
Enhanced the FetchTargetQueue to manage next FTQ entries for 2Fetch functionality.
Now if 2 fetchBlock in the same 64 byte fetchBuffer, it could be 2 fetched at the same cycle.

Change-Id: I3b112cc844c485d81cea1f4ed0ff221cb37d2782

Summary by CodeRabbit

New Features
- Added 2Taken/2Fetch support with new configuration knobs and an example config enabling 2Fetch.
- Added performance statistics to track 2Fetch attempts and outcomes.
Refactor
- Centralized per-cycle fetch-stop decision to simplify fetch control.
- In decoupled frontend mode, fetch logic now forces a new fetch entry and invalidates the current buffer when the PC moves outside the buffer.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Introduced support for 2Fetch features in the branch predictor and Fetch Stage. - Enhanced the FetchTargetQueue to manage next FTQ entries for 2Fetch functionality. - Now if 2 fetchBlock in the same 64 byte fetchBuffer, it could be 2 fetched at the same cycle. Change-Id: I3b112cc844c485d81cea1f4ed0ff221cb37d2782

coderabbitai · 2026-01-12T07:15:14Z

📝 Walkthrough

Walkthrough

Introduces a "2Fetch" feature for the decoupled branch predictor (lookahead into the next FTQ entry) and centralizes per-cycle fetch stopping into shouldStopFetchThisCycle(bool predictedBranch). Adds FTQ peek/advance APIs, config knobs, statistics, and conditional fetch-extension logic in the decoupled predictor.

Changes

Cohort / File(s)	Summary
Fetch loop control `src/cpu/o3/fetch.cc`, `src/cpu/o3/fetch.hh`, `src/cpu/o3/fetch.md`	Replaced multi-condition loop stop predicates with `shouldStopFetchThisCycle(bool predictedBranch)` and added decoupled-frontend invalidation path forcing an FTQ entry when architectural PC escapes the fetch buffer.
Branch predictor config `src/cpu/pred/BranchPredictor.py`	Added `enable2Taken`, `enable2Fetch`, and `maxFetchBytesPerCycle` parameters to `DecoupledBPUWithBTB`.
2Fetch core logic `src/cpu/pred/btb/decoupled_bpred.cc`, `src/cpu/pred/btb/decoupled_bpred.hh`	Added `enable2Fetch`, `has1Fetched`, and helpers `canExtendToNextFTQ()` / `extendToNextFTQ()`; integrated 2Fetch checks into `decoupledPredict` flow to optionally extend to next FTQ entry instead of consuming current one.
FTQ lookahead API `src/cpu/pred/btb/fetch_target_queue.cc`, `src/cpu/pred/btb/fetch_target_queue.hh`	Added `hasNext()`, `peekNext()`, and `advance()` to inspect/advance to the next FTQ entry without dequeuing.
2Fetch stats `src/cpu/pred/btb/decoupled_bpred_stats.cc`, `src/cpu/pred/btb/decoupled_bpred.hh`	Added six new statistics: `fetch2Attempts`, `fetch2Successes`, `fetch2SpanTooLarge`, `fetch2NoNextFTQ`, `fetch2FirstNotTaken`, `fetch2FirstNotAtStart`.
Config example `configs/example/kmhv3.py`	Enabled `cpu.branchPred.enable2Fetch = True` in example config.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Fetch
participant DecoupledBPUWithBTB as BPU
participant FetchTargetQueue as FTQQ
participant FTQ
Fetch->>BPU: request prediction / decoupledPredict
BPU->>FTQQ: check hasNext() / peekNext()
alt canExtendToNextFTQ == true
BPU->>FTQQ: extendToNextFTQ()
FTQQ->>FTQ: advance supply state (no dequeue)
BPU->>Fetch: return extended PC (2Fetch success)
else
BPU->>FTQ: consume current FTQ entry, process completion
BPU->>Fetch: return normal PC (no extension)
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

cpu-o3: split decoupled bpu stats helpers #624 — touches decoupled BPU/BTB statistics area (DBPBTBStats), likely related to the stats additions here.

Suggested labels

perf

Suggested reviewers

Yakkhini
CJ362ff

Poem

🐰 I peek ahead with whiskers twitching bright,
A second fetch to catch the flight,
FTQ whispers what comes next in line,
Two strides forward, one step fine,
Hopping onward, predictor delight! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'cpu-o3: add 2Fetch features' directly and clearly summarizes the main objective of the changeset, which is to introduce 2Fetch functionality across multiple components of the CPU's O3 fetch pipeline.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f95836f and 1222442.

📒 Files selected for processing (2)

configs/example/kmhv3.py
src/cpu/o3/fetch.cc

🚧 Files skipped from review as they are similar to previous changes (1)

src/cpu/o3/fetch.cc

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Quick Build, Unit Tests & Smoke Test
GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

🔇 Additional comments (1)

configs/example/kmhv3.py (1)

111-112: LGTM!

The configuration correctly enables the 2Fetch feature within the DecoupledBPUWithBTB branch predictor section. The enable2Fetch parameter is properly defined in BranchPredictor.py and the placement alongside other predictor settings is appropriate.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

src/cpu/pred/btb/fetch_target_queue.cc (1)
287-293: Minor: Redundant map lookup in peekNext().

The method performs the same ftq.find(fetchDemandTargetId + 1) lookup twice - once in the assert(hasNext()) call and once explicitly. Consider caching the result or using a single lookup.
♻️ Suggested optimization
 const FtqEntry&
 FetchTargetQueue::peekNext() const
 {
-    assert(hasNext());
-    auto next_it = ftq.find(fetchDemandTargetId + 1);
+    auto next_it = ftq.find(fetchDemandTargetId + 1);
+    assert(next_it != ftq.end() && "peekNext() requires hasNext()");
     return next_it->second;
 }
src/cpu/pred/btb/decoupled_bpred.cc (1)
167-167: Formatting issue: Misaligned closing brace.

The closing brace appears to have lost its indentation from surrounding code.
♻️ Suggested fix
         for (int i = 0; i < numStages; i++) {
             predsOfEachStage[i].btbEntries.clear();
-}
+        }
src/cpu/pred/btb/decoupled_bpred.hh (1)
1013-1027: Consider encapsulation: has1Fetched is internal state that should be private.

The has1Fetched flag is internal state tracking whether the first FTQ has been processed in a 2Fetch cycle. It's currently in the public section along with enable2Fetch and maxFetchBytesPerCycle, but unlike those configuration parameters, has1Fetched is runtime state that shouldn't be accessed externally.
♻️ Suggested reorganization

Move has1Fetched to the private section near other internal state variables, while keeping enable2Fetch and maxFetchBytesPerCycle as public configuration parameters:
+  private:
+    // 2Fetch internal state
+    bool has1Fetched{false};  ///< Whether first FTQ has been fetched in current 2Fetch cycle
+
   public:
     // NEW: 2Fetch support variables
     /**
      * @brief Enable 2fetch capability
      */
     bool enable2Fetch{true};

-    /**
-     * @brief Whether fetched first FTQ
-     */
-    bool has1Fetched{false};
-
     /**
      * @brief Maximum fetch bytes per cycle for 2fetch
      */
     unsigned maxFetchBytesPerCycle{64};

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39cd6af and f95836f.

📒 Files selected for processing (9)

src/cpu/o3/fetch.cc
src/cpu/o3/fetch.hh
src/cpu/o3/fetch.md
src/cpu/pred/BranchPredictor.py
src/cpu/pred/btb/decoupled_bpred.cc
src/cpu/pred/btb/decoupled_bpred.hh
src/cpu/pred/btb/decoupled_bpred_stats.cc
src/cpu/pred/btb/fetch_target_queue.cc
src/cpu/pred/btb/fetch_target_queue.hh

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-01-12T02:49:08.514Z

Learnt from: jensen-yan
Repo: OpenXiangShan/GEM5 PR: 686
File: src/cpu/pred/btb/btb_mgsc.cc:226-230
Timestamp: 2026-01-12T02:49:08.514Z
Learning: In src/cpu/pred/btb/btb_mgsc.cc, the calculateScaledPercsum function returns the unscaled percsum (weight scaling disabled) to align with RTL implementation. The RTL currently does not have a weight table, so setting weight to 1 (equivalent to disabling the weight table) maintains hardware/software consistency with minimal performance impact.

Applied to files:

src/cpu/pred/btb/decoupled_bpred.cc

🧬 Code graph analysis (2)

src/cpu/o3/fetch.cc (1)

src/cpu/o3/fetch.hh (2)

predictedBranch (522-531)

predictedBranch (522-522)

src/cpu/pred/btb/decoupled_bpred.cc (2)

src/cpu/pred/btb/decoupled_bpred.hh (7)

pc (1041-1041)

target_to_fetch (854-854)

current_pc (1037-1037)

fetchTargetQueue (510-513)

fetchTargetQueue (515-518)

fetchTargetQueue (520-523)

fetchTargetQueue (524-527)

src/cpu/pred/btb/test/decoupled_bpred.cc (2)

processFetchTargetCompletion (343-361)

processFetchTargetCompletion (344-344)

🪛 Ruff (0.14.10)

src/cpu/pred/BranchPredictor.py

1185-1185: Param may be undefined, or defined from star imports

(F405)

1186-1186: Param may be undefined, or defined from star imports

(F405)

1187-1187: Param may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Quick Build, Unit Tests & Smoke Test
GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.3c)

🔇 Additional comments (12)

src/cpu/o3/fetch.cc (1)

2071-2091: LGTM - Clean refactoring of the fetch loop termination condition.

The replacement of the inline condition with shouldStopFetchThisCycle(predictedBranch) properly centralizes the fetch stopping logic. The method in fetch.hh correctly handles:

waitForVsetvl gating

Decoupled frontend usedUpFetchTargets check

predictedBranch for traditional frontends

This sets a good foundation for future 2Fetch extensions.

src/cpu/pred/BranchPredictor.py (1)

1184-1187: LGTM - New 2Fetch configuration parameters are well-defined.

The new parameters follow gem5 conventions:

Features disabled by default (False) ensures backward compatibility

maxFetchBytesPerCycle=64 is a sensible default matching typical fetch buffer sizes

The static analysis warnings about Param are false positives since it's imported via the star import from m5.params at line 29

src/cpu/o3/fetch.hh (1)

517-531: LGTM - Well-documented centralized fetch stop decision.

The shouldStopFetchThisCycle() method cleanly encapsulates the per-cycle fetch termination logic with:

Clear documentation explaining the three cases

Proper priority ordering (waitForVsetvl → decoupled frontend → predictedBranch)

Inline implementation appropriate for this lightweight decision function

This consolidation improves maintainability and prepares the codebase for 2Fetch extensions.

src/cpu/o3/fetch.md (2)

344-353: Documentation accurately reflects the code changes.

The updated documentation correctly shows:

The new shouldStopFetchThisCycle(bool predictedBranch) signature

The three-way logic (waitForVsetvl → decoupled frontend → predictedBranch)

Consistent with the implementation in fetch.hh

This keeps the documentation in sync with the refactored code.

584-586: Code example updated to match implementation.

The fetch loop condition example now correctly uses !shouldStopFetchThisCycle(predictedBranch), maintaining consistency between documentation and actual code.

src/cpu/pred/btb/fetch_target_queue.hh (1)

160-181: Well-designed 2Fetch lookahead API.

The new methods provide a clean interface for 2Fetch traversal:

hasNext() / peekNext() / advance() follow a standard lookahead pattern

Returning const FtqEntry& from peekNext() prevents unintended modifications

peekNext() properly asserts the precondition via assert(hasNext()), ensuring safe use

src/cpu/pred/btb/decoupled_bpred_stats.cc (1)

508-514: LGTM! Statistics for 2Fetch feature are well-structured.

The new statistics cover all decision paths in the 2Fetch logic: attempts, successes, and specific rejection reasons. This provides good observability for tuning and debugging the feature.

src/cpu/pred/btb/fetch_target_queue.cc (1)

301-319: LGTM! advance() correctly updates supply state for 2Fetch.

The method properly updates supplyFetchTargetState to point to the current demand target (which has already been incremented by finishCurrentFetchTarget()). The fallback to invalidate supply state when the entry doesn't exist is appropriate.

src/cpu/pred/btb/decoupled_bpred.cc (2)

1401-1446: LGTM! canExtendToNextFTQ has well-structured validation.

The method correctly validates all preconditions for 2Fetch extension:

First block must end with a taken branch (to jump within the fetch buffer)

Next FTQ entry must be available

Current PC must match next FTQ's start (ensuring continuity)

Total span must fit within maxFetchBytesPerCycle

The early exit checks are ordered efficiently and each rejection path records a specific statistic for observability.

1456-1477: LGTM! extendToNextFTQ correctly sets up for processing the second FTQ entry.

The method properly:

Advances to the next FTQ entry via fetchTargetQueue.advance()

Resets the instruction counter for the new FTQ entry

Updates the PC to the start of the new FTQ entry

Records the successful 2Fetch metric

src/cpu/pred/btb/decoupled_bpred.hh (2)

383-389: LGTM! Statistics declarations are well-documented.

The statistics cover all decision paths in the 2Fetch logic with clear documentation comments explaining each counter's purpose.

1029-1041: LGTM! Method declarations are appropriate.

The method signatures correctly use const references where modification isn't needed and non-const reference for the PC that gets updated.

coderabbitai · 2026-01-12T07:19:34Z

src/cpu/pred/btb/decoupled_bpred.cc

+        // Check if 2fetch is enabled, not fetched first FTQ yet, and if we can extend to the next FTQ
+        // NEW: 2Fetch extension check - before processing completion
+        dbpBtbStats.fetch2Attempts++;
+        if (enable2Fetch && !has1Fetched && canExtendToNextFTQ(pc, target_to_fetch)) {
+            DPRINTF(DecoupleBP, "2Fetch: extending to next FTQ in same cycle\n");
+            has1Fetched = true;
+            processFetchTargetCompletion(target_to_fetch);
+            extendToNextFTQ(pc);
+            // first fetchBlock is always taken, do not run out of FTQ now
+            return std::make_pair(true, false);
+        }
+
        processFetchTargetCompletion(target_to_fetch);
+        has1Fetched = false;    // reset 2fetch flag
    }


⚠️ Potential issue | 🟡 Minor

fetch2Attempts is incremented even when 2Fetch is disabled or already used.

The counter fetch2Attempts is incremented unconditionally at line 445, before checking enable2Fetch and !has1Fetched. This means:

Attempts are counted even when enable2Fetch is false

The second FTQ completion in a 2Fetch cycle (when has1Fetched=true) also counts as an attempt

Consider moving the increment inside the condition, or after the early-exit checks in canExtendToNextFTQ.

🐛 Suggested fix

if (run_out_of_this_entry) { // Check if 2fetch is enabled, not fetched first FTQ yet, and if we can extend to the next FTQ // NEW: 2Fetch extension check - before processing completion - dbpBtbStats.fetch2Attempts++; - if (enable2Fetch && !has1Fetched && canExtendToNextFTQ(pc, target_to_fetch)) { + if (enable2Fetch && !has1Fetched) { + dbpBtbStats.fetch2Attempts++; + if (canExtendToNextFTQ(pc, target_to_fetch)) { DPRINTF(DecoupleBP, "2Fetch: extending to next FTQ in same cycle\n"); has1Fetched = true; processFetchTargetCompletion(target_to_fetch); extendToNextFTQ(pc); // first fetchBlock is always taken, do not run out of FTQ now return std::make_pair(true, false); + } } processFetchTargetCompletion(target_to_fetch);

github-actions · 2026-01-12T07:23:06Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1691`	-
This PR	`2.1691`	➡️ `0.0000` (`0.00%`)

✅ Difftest smoke test passed!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f95836f92e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-12T07:25:38Z

src/cpu/pred/btb/decoupled_bpred.cc

+    Addr span = next_ftq.endPC - current_ftq.startPC;
+    if (span > maxFetchBytesPerCycle) {
+        DPRINTF(DecoupleBP, "2Fetch rejected: span %d exceeds %d bytes\n",
+                span, maxFetchBytesPerCycle);
+        dbpBtbStats.fetch2SpanTooLarge++;


Gate 2fetch extension by actual fetch-buffer span

The 2fetch gate only checks next_ftq.endPC - current_ftq.startPC <= maxFetchBytesPerCycle, but it doesn’t guarantee that the next FTQ entry actually lies within the current fetch buffer. If maxFetchBytesPerCycle is configured larger than fetchBufferSize (or the next entry crosses the buffer), extendToNextFTQ returns (true, false) and leaves usedUpFetchTargets false; checkMemoryNeeds() will then stall because the PC is outside the buffer, yet needNewFTQEntry() won’t issue a new cache request, so fetch can hang. Consider clamping the span check to the fetch-buffer size or invalidating the buffer when extension spans beyond it.

Useful? React with 👍 / 👎.

- Added handling in the fetch stage to invalidate the fetch buffer and force a new FTQ entry when the architectural fetch PC exceeds the buffered FTQ window in decoupled-frontend mode. This prevents potential deadlocks during ICache stalls. Change-Id: I38a6003e136a51c115f33358a62138dc987336da

Change-Id: Iaa4a853508ae80e412a8b3bdd41dea26d9eb8c29

github-actions · 2026-01-12T08:06:38Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1691`	-
This PR	`2.1691`	➡️ `0.0000` (`0.00%`)

✅ Difftest smoke test passed!

github-actions · 2026-01-12T08:06:41Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1691`	-
This PR	`2.1617`	📉 `-0.0074` (`-0.34%`)

✅ Difftest smoke test passed!

XiangShanRobot · 2026-01-12T09:02:55Z

[Generated by GEM5 Performance Robot]
commit: 1222442
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	17.54	17.52	+0.12 🟢

[Generated by GEM5 Performance Robot]
commit: 1222442
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

	PR	Previous Commit	Diff(%)
Score	17.54	17.52	+0.12 🟢

coderabbitai bot reviewed Jan 12, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Jan 12, 2026

View reviewed changes

jensen-yan added 2 commits January 12, 2026 15:59

cpu-o3: test 2 fetch

1222442

Change-Id: Iaa4a853508ae80e412a8b3bdd41dea26d9eb8c29

Conversation

jensen-yan commented Jan 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 12, 2026

🚀 Coremark Smoke Test Results

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 12, 2026

🚀 Coremark Smoke Test Results

Uh oh!

github-actions bot commented Jan 12, 2026

🚀 Coremark Smoke Test Results

Uh oh!

XiangShanRobot commented Jan 12, 2026

Align BTB Performance

Overall Score

Align BTB Performance

Overall Score

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jensen-yan commented Jan 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 12, 2026 •

edited

Loading