Kmh smt by tastynoob · Pull Request #745 · OpenXiangShan/GEM5

tastynoob · 2026-01-29T09:44:53Z

Do not merge, this pr cant build success

Summary by CodeRabbit

Refactor
- Made fetch and branch-prediction thread-aware with per-thread state and pipelines for improved multi-threaded handling.
- Reworked fetch-target queue to manage per-thread queues with round-robin selection for fairer fetch scheduling.
- Updated commit and tracing paths to use thread-local target access, improving per-thread statistics and trace accuracy.

coderabbitai · 2026-01-29T09:45:21Z

📝 Walkthrough

Walkthrough

Refactors BTB/predictor and O3 fetch to be thread-aware: introduces per-thread FetchTargetQueue and per-thread predictor/fetch state, replaces FTQ "head" APIs with "fetching"/target-based APIs, renames predictor update() → commit(), and updates fetch/tracing and cache-request handling to use per-thread fetch buffers.

Changes

Cohort / File(s)	Summary
O3 Fetch (per-thread buffers) `src/cpu/o3/fetch.cc`, `src/cpu/o3/fetch.hh`, `src/cpu/o3/trace/TraceFetch.cc`	Replace global/per-core fetch state with `threads[tid]` FetchBuffer (fetchpc, data, valid, startPC, cacheReq). Use per-thread cacheReq, fetchpc, startPC, valid; replace fetchBuffer[] and pc[] uses with `threads[tid]` members; update fetch-address, multi-cacheline fetch, commit/update callsites to per-thread signatures.
BTB Predictor — decoupled (thread-aware) `src/cpu/pred/btb/decoupled_bpred.hh`, `src/cpu/pred/btb/decoupled_bpred.cc`, `src/cpu/pred/btb/decoupled_bpred_stats.cc`	Migrate predictor internals to per-thread state and per-thread FTQ access: add ThreadID parameters to many methods, introduce threads[MaxThreads] storage for histories/preds, replace getTarget/ftqHeadId usage with `ftq.get(..., tid)`/`ftqFetchingTarget(tid)`, rename `update()`→`commit()`, and adapt squash/resolve/prepare flows per-thread.
Fetch Target Queue (FTQ) implementation `src/cpu/pred/btb/ftq.hh`, `src/cpu/pred/btb/ftq.cc`	Add new `FetchTargetQueue` class with per-thread queues, ftqSize per thread, round-robin getTargetTid(), and implementations for `insert`, `finishTarget`, `commitTarget`, and `squashAfter` operating on per-thread queues.
BTB data structures `src/cpu/pred/btb/common.hh`	Add `ThreadID tid` to `FullBTBPrediction` so predictions carry thread context.
Build integration `src/cpu/pred/SConscript`	Add `btb/ftq.cc` to build sources so FTQ implementation is compiled in.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Fetch as O3 Fetch
participant FTQ as FetchTargetQueue
participant BPU as DecoupledBPUWithBTB
participant BTB as BTB Predictor
participant Cache as CacheSubsystem
Fetch->>FTQ: query fetching target (ftqHasFetching(tid) / ftqFetchingTarget(tid))
FTQ-->>Fetch: return FetchTarget (per-thread)
Fetch->>Cache: prepare/issue cacheReq (threads[tid].cacheReq)
Cache-->>Fetch: deliver fetch data -> threads[tid].data / valid
Fetch->>BPU: hand off FetchTarget / fetchpc (tid)
BPU->>BTB: requestNewPrediction / processNewPrediction (tid)
BTB-->>BPU: prediction (per-thread FullBTBPrediction)
BPU->>FTQ: commitTarget / finishTarget / squashAfter (tid)
BPU->>BTB: commit(pred_id, tid) -- previously update()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

cpu-pred-btb: remove fetch stream use fetch target #744 — similar FTQ/FetchTarget and decoupled BTB refactor (ftqHasFetching / ftqFetchingTarget and update→commit changes).
cpu-o3: simplify fetch， only support decoupled BTB mode #721 — overlapping refactor converting per-core fetch state to per-thread FetchBuffer and adjusting FTQ APIs and fetch call sites.
cpu-o3: remove simple functions in fetch #725 — touches FTQ/FSQ and decoupled BTB fetch interfaces; overlaps on fetch/prediction integration changes.

Suggested labels

perf

Suggested reviewers

jensen-yan
Yakkhini

Poem

🐇 I hopped between targets, thread by thread,
I carried PCs where old heads once led;
Buffers snug in burrows, caches quick and neat,
Fetches per-rabbit, predictions all meet;
A tiny clap — commits now bound and fed.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.62% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title "Kmh smt" is vague and does not clearly convey meaningful information about the changeset's purpose or scope.	Use a more descriptive title that clearly indicates the main change, such as "Add SMT support to O3 CPU fetch stage" or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kmh-smt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/cpu/pred/btb/common.hh (1)
452-484: Initialize the new tid member in the default constructor.

The newly added tid field is not initialized in the default constructor, which could lead to undefined behavior if read before being explicitly set. Other members like bbStart are initialized to 0.
🛡️ Proposed fix
     FullBTBPrediction() :
+        tid(0),
         bbStart(0),
         btbEntries(),
src/cpu/o3/fetch.cc (1)
632-638: Pass the tid parameter to per-thread FTQ accessors.

The methods ftqHasFetching() and ftqFetchingTarget() in decoupled_bpred.hh require a ThreadID tid parameter (lines 404-406), but these calls omit it. The variable tid is available in all these contexts, so update the calls to pass it:
dbpbtb->ftqHasFetching(tid)
dbpbtb->ftqFetchingTarget(tid)
This same pattern appears in multiple locations: line 632-633, 785-786, 1760, and 2054 in fetch.cc, as well as line 971 in fetch.hh.

🤖 Fix all issues with AI agents

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Line 381: Remove the stray extra semicolon after the assignment to s0PC:
change the line using threads[tid].finalPred.getTarget(predictWidth);; to a
single semicolon so it becomes threads[tid].finalPred.getTarget(predictWidth); —
ensure this occurs where s0PC is assigned (references: s0PC, threads, finalPred,
getTarget, predictWidth) and rebuild to confirm no warnings.
- Line 757: The entry's thread id is incorrectly hardcoded to 0; change the
assignment from setting entry.tid = 0 to using the function parameter (entry.tid
= tid) so the entry records the correct thread; this ensures subsequent calls
that use entry.tid (fillAheadPipeline and updateHistoryForPrediction) access the
correct per-thread state rather than always thread 0.

In `@src/cpu/pred/btb/decoupled_bpred.hh`:
- Line 150: scheduleThread() currently always returns 0 which forces
single-threaded behavior; update the scheduleThread implementation (declared in
decoupled_bpred.hh) to compute and return the correct ThreadID based on the
branch predictor's thread scheduling state (e.g., using the class's
last-scheduled thread counter, a thread queue, or the CPU/ThreadContext APIs
already used elsewhere in decoupled_bpred.cc TODO), making it multi-thread aware
and thread-safe rather than hard-coding 0.

In `@src/cpu/pred/btb/ftq.cc`:
- Around line 40-45: Add a guard/assertion in FetchTargetQueue::commitTarget to
ensure you never call queue[tid].cap.pop_front() on an empty deque: check that
queue[tid].cap is not empty (e.g., assert(!queue[tid].cap.empty()) or an if with
an error/log) before calling pop_front() and only increment
queue[tid].baseTargetId when the pop actually happens; reference the function
FetchTargetQueue::commitTarget and the members queue[tid].cap and
queue[tid].baseTargetId when adding this check.

In `@src/cpu/pred/btb/ftq.hh`:
- Line 46: The backId(ThreadID tid) method can underflow when
queue[tid].cap.size() is 0 because it computes baseTargetId + 0 - 1; update
backId in src/cpu/pred/btb/ftq.hh to guard against empty queues by checking
queue[tid].cap.empty(): either assert/throw a clear precondition (e.g.,
assert(!queue[tid].cap.empty()) or throw std::out_of_range) or return a defined
invalid/sentinel FetchTargetId value instead; reference the backId function,
queue[tid].baseTargetId and queue[tid].cap.size()/cap.empty() when making the
change.

🧹 Nitpick comments (3)

src/cpu/pred/btb/ftq.hh (1)

62-68: Consider renaming anyEmpty() to allEmpty() for clarity.

The current implementation returns true only when ALL queues are empty (i.e., no queue has any targets). The name anyEmpty() suggests it would return true if ANY single queue is empty, which is the opposite behavior. Consider renaming to allEmpty() to match the semantics.
src/cpu/pred/btb/decoupled_bpred.hh (1)
70-78: Hardcoded numThreads = 2 should be parameterized.

The number of threads should be configurable via params rather than hardcoded, especially since MaxThreads is defined as a limit. Also consider removing the commented-out code rather than leaving it in the codebase.
♻️ Suggested change
-    // unsigned fetchTargetQueueSize;
-    // std::deque<FetchTarget> fetchTargetQueue;
-    // FetchTargetId fetchTargetBaseId{1}; // ID of fetchTargetQueue.front()
-    // FetchTargetId ftqId{1}; // next FSQ id to allocate (monotonic)
-    // FetchTargetId fetchHeadFtqId{1}; // next FSQ id to be consumed by fetch

     CPU *cpu;

-    const int numThreads = 2;
+    const int numThreads;  // Set from params in constructor
src/cpu/pred/btb/decoupled_bpred.cc (1)

27-30: Hardcoded thread ID needs to be addressed for SMT support.

The tid = 0 is hardcoded with a TODO comment. For proper SMT support, this function should receive ThreadID as a parameter, similar to other refactored functions in this PR.

coderabbitai · 2026-01-29T09:56:21Z

src/cpu/pred/btb/decoupled_bpred.cc

+    FetchTarget entry = createFetchTargetEntry(tid);
+
+    // 2. Update global PC state to target or fall-through
+    s0PC = threads[tid].finalPred.getTarget(predictWidth);;


⚠️ Potential issue | 🟡 Minor

Double semicolon.

Minor typo: there's an extra semicolon at the end of this line.

🧹 Proposed fix

- s0PC = threads[tid].finalPred.getTarget(predictWidth);; + s0PC = threads[tid].finalPred.getTarget(predictWidth);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

s0PC = threads[tid].finalPred.getTarget(predictWidth);;

s0PC = threads[tid].finalPred.getTarget(predictWidth);

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/decoupled_bpred.cc` at line 381, Remove the stray extra semicolon after the assignment to s0PC: change the line using threads[tid].finalPred.getTarget(predictWidth);; to a single semicolon so it becomes threads[tid].finalPred.getTarget(predictWidth); — ensure this occurs where s0PC is assigned (references: s0PC, threads, finalPred, getTarget, predictWidth) and rebuild to confirm no warnings.

src/cpu/pred/btb/decoupled_bpred.cc

src/cpu/pred/btb/decoupled_bpred.hh

coderabbitai · 2026-01-29T09:56:21Z

src/cpu/pred/btb/ftq.cc

+void
+FetchTargetQueue::commitTarget(ThreadID tid)
+{
+    queue[tid].cap.pop_front();
+    queue[tid].baseTargetId++;
+}


⚠️ Potential issue | 🟡 Minor

Add assertion to guard against committing from an empty queue.

Calling pop_front() on an empty deque is undefined behavior. Add a guard to prevent this:

🛡️ Proposed fix

void FetchTargetQueue::commitTarget(ThreadID tid) { + assert(!empty(tid) && "Cannot commit from empty queue"); queue[tid].cap.pop_front(); queue[tid].baseTargetId++; }

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/ftq.cc` around lines 40 - 45, Add a guard/assertion in FetchTargetQueue::commitTarget to ensure you never call queue[tid].cap.pop_front() on an empty deque: check that queue[tid].cap is not empty (e.g., assert(!queue[tid].cap.empty()) or an if with an error/log) before calling pop_front() and only increment queue[tid].baseTargetId when the pop actually happens; reference the function FetchTargetQueue::commitTarget and the members queue[tid].cap and queue[tid].baseTargetId when adding this check.

coderabbitai · 2026-01-29T09:56:21Z

src/cpu/pred/btb/ftq.hh

+
+
+    inline FetchTargetId frontId(ThreadID tid) const { return queue[tid].baseTargetId; }
+    inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; }


⚠️ Potential issue | 🟡 Minor

backId() underflows on empty queue.

When cap.size() is 0, this computes baseTargetId + 0 - 1, which underflows the unsigned FetchTargetId. Consider adding a precondition or handling the empty case:

🛡️ Proposed fix

- inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; } + inline FetchTargetId backId(ThreadID tid) const { + assert(!empty(tid) && "backId() called on empty queue"); + return queue[tid].baseTargetId + queue[tid].cap.size() - 1; + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; }

inline FetchTargetId backId(ThreadID tid) const {

assert(!empty(tid) && "backId() called on empty queue");

return queue[tid].baseTargetId + queue[tid].cap.size() - 1;

}

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/ftq.hh` at line 46, The backId(ThreadID tid) method can underflow when queue[tid].cap.size() is 0 because it computes baseTargetId + 0 - 1; update backId in src/cpu/pred/btb/ftq.hh to guard against empty queues by checking queue[tid].cap.empty(): either assert/throw a clear precondition (e.g., assert(!queue[tid].cap.empty()) or throw std::out_of_range) or return a defined invalid/sentinel FetchTargetId value instead; reference the backId function, queue[tid].baseTargetId and queue[tid].cap.size()/cap.empty() when making the change.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/cpu/pred/btb/common.hh (1)
452-485: ⚠️ Potential issue | 🟠 Major

Initialize FullBTBPrediction::tid to avoid indeterminate values.

Line 454 adds tid but the constructor doesn’t set it, so it can carry garbage until assigned. This is risky if any code reads it before initialization.
🛠 Proposed fix
-    FullBTBPrediction() :
-        bbStart(0),
+    FullBTBPrediction() :
+        tid(InvalidThreadID),
+        bbStart(0),
         btbEntries(),
         condTakens(),
         indirectTargets(),
         returnTarget(0),
         tageInfoForMgscs(),
         predSource(0),
         predTick(0),
         s1Source(-1),
         s3Source(-1) {}

🤖 Fix all issues with AI agents

In `@src/cpu/o3/fetch.cc`:
- Around line 783-787: The code computes ThreadID tid from inst->threadNumber
but then uses a hard-coded thread 0 for FTQ operations; change calls to use tid
instead of 0 so SMT threads access the correct FTQ: replace
dbpbtb->ftqHasFetching(0) with dbpbtb->ftqHasFetching(tid) and
dbpbtb->ftqFetchingTarget(0) with dbpbtb->ftqFetchingTarget(tid) (ensure tid is
the same ThreadID variable computed earlier).

In `@src/cpu/o3/fetch.hh`:
- Around line 969-972: The ftqEmpty() helper currently calls
dbpbtb->ftqHasFetching(0) with a hard-coded thread 0 which breaks SMT; change
ftqEmpty to accept a ThreadID (or tid) parameter and forward that to
dbpbtb->ftqHasFetching(tid) so the check is per-thread, and then update all call
sites in fetch.cc that call ftqEmpty() to pass the current thread id (e.g., the
local/current tid variable used in fetch code) so SMT fetch uses the correct FTQ
state.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Line 50: The FTQ is being constructed with a hardcoded thread count (ftq(2,
p.ftq_size)); change this to use the configured thread count variable
(numThreads) instead so the FTQ is initialized consistently (e.g.,
ftq(numThreads, p.ftq_size)). Locate the FTQ construction in the decoupled_bpred
constructor/initializer list where ftq(...) is used and replace the literal 2
with the numThreads identifier so the FTQ respects the current thread
configuration.
- Around line 635-640: The method DecoupledBPUWithBTB::blockPredictionOnce
currently hardcodes threads[0].blockPredictionPending = true which breaks SMT;
change the API to accept a ThreadID parameter (e.g., ThreadID tid) and set
threads[tid].blockPredictionPending = true, validating tid is in range, or
alternatively iterate all threads and set threads[i].blockPredictionPending =
true to block everyone; update all call sites of blockPredictionOnce() to pass
the ThreadID (or rely on the new all-threads behavior) and add bounds checks
where appropriate.

In `@src/cpu/pred/btb/ftq.hh`:
- Around line 62-68: The method anyEmpty() currently returns true only when
every queue is empty (it checks queue[i].cap.empty() for all threads), so rename
the function to allEmpty() to match its behavior (and update all callers/tests),
or alternatively change its implementation to return true as soon as any
queue[i].cap.empty() is found if you prefer the original anyEmpty name; locate
the symbol anyEmpty(), the loop over numThreads, and the queue[].cap.empty()
check to make the corrective change and adjust usages accordingly.

🧹 Nitpick comments (5)

src/cpu/pred/btb/decoupled_bpred_stats.cc (2)
801-808: Avoid copying FetchTarget here; use a const reference.

ftq.get(...) returns a FetchTarget. Copying it can be expensive and can desync if callers expect the live entry.
♻️ Proposed refactor
-    auto entry = ftq.get(inst->ftqId, inst->threadNumber);
+    const auto &entry = ftq.get(inst->ftqId, inst->threadNumber);
913-924: Reduce double lookup by caching the FTQ entry reference.

You call ftq.get(...) twice; caching it avoids repeated lookups and keeps logging consistent with the increment.
♻️ Proposed refactor
-    ftq.get(inst->ftqId, inst->threadNumber).commitInstNum++;
+    auto &entry = ftq.get(inst->ftqId, inst->threadNumber);
+    entry.commitInstNum++;
     ...
-            ftq.get(inst->ftqId, inst->threadNumber).commitInstNum);
+            entry.commitInstNum);
src/cpu/pred/btb/ftq.hh (1)
45-61: Missing bounds validation on tid parameter.

All accessor methods use tid to index queue[tid] without validation. If tid >= numThreads, the code accesses uninitialized state (or causes out-of-bounds access if tid >= MaxThreads). Consider adding debug assertions.
🛡️ Proposed fix example
     inline FetchTargetId frontId(ThreadID tid) const {
+        assert(tid < numThreads && "Invalid thread ID");
         return queue[tid].baseTargetId;
     }
Apply similar checks to other methods or create a helper macro/inline function.
src/cpu/pred/btb/decoupled_bpred.cc (1)
24-31: Hardcoded tid = 0 limits multi-threading support.

The function doesn't accept ThreadID as a parameter and uses a hardcoded value. This should be addressed when completing SMT support.
♻️ Suggested signature change
 void
-DecoupledBPUWithBTB::consumeFetchTarget(unsigned fetched_inst_num)
+DecoupledBPUWithBTB::consumeFetchTarget(unsigned fetched_inst_num, ThreadID tid)
 {
-    ThreadID tid = 0; // TODO: multi-threading
     ftq.fetching(tid).fetchInstNum = fetched_inst_num;
     ftq.finishTarget(tid);
 }
src/cpu/pred/btb/decoupled_bpred.hh (1)
78-78: Hardcoded numThreads should be configurable.

numThreads is hardcoded to 2 instead of being derived from simulation parameters. This limits flexibility and could cause inconsistencies if changed in one place but not others (e.g., ftq(2, ...) in the constructor).
♻️ Consider parameterizing
-    const int numThreads = 2;
+    const int numThreads;  // Initialize from params in constructor
Then in constructor initialization list:
numThreads(p.numThreads),

src/cpu/o3/fetch.cc

src/cpu/o3/fetch.hh

coderabbitai · 2026-01-30T09:50:48Z

src/cpu/pred/btb/decoupled_bpred.cc

      // uras(p.uras),
      bpDBSwitches(p.bpDBSwitches),
      numStages(p.numStages),
+      ftq(2, p.ftq_size),


🛠️ Refactor suggestion | 🟠 Major

Hardcoded thread count in FTQ initialization.

The FTQ is initialized with ftq(2, p.ftq_size) but should use numThreads for consistency and maintainability.

♻️ Proposed fix

- ftq(2, p.ftq_size), + ftq(numThreads, p.ftq_size),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ftq(2, p.ftq_size),

ftq(numThreads, p.ftq_size),

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/decoupled_bpred.cc` at line 50, The FTQ is being constructed with a hardcoded thread count (ftq(2, p.ftq_size)); change this to use the configured thread count variable (numThreads) instead so the FTQ is initialized consistently (e.g., ftq(numThreads, p.ftq_size)). Locate the FTQ construction in the decoupled_bpred constructor/initializer list where ftq(...) is used and replace the literal 2 with the numThreads identifier so the FTQ respects the current thread configuration.

coderabbitai · 2026-01-30T09:50:48Z

src/cpu/pred/btb/decoupled_bpred.cc

 void
 DecoupledBPUWithBTB::blockPredictionOnce()
 {
-    blockPredictionPending = true;
+    // smtTODO
+    threads[0].blockPredictionPending = true;
 }


🛠️ Refactor suggestion | 🟠 Major

blockPredictionOnce() only affects thread 0.

This hardcodes thread 0, breaking SMT support. The function should accept ThreadID parameter or block all threads.

♻️ Proposed fix

void -DecoupledBPUWithBTB::blockPredictionOnce() +DecoupledBPUWithBTB::blockPredictionOnce(ThreadID tid) { - // smtTODO - threads[0].blockPredictionPending = true; + threads[tid].blockPredictionPending = true; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

void

DecoupledBPUWithBTB::blockPredictionOnce()

{

blockPredictionPending = true;

// smtTODO

threads[0].blockPredictionPending = true;

}

void

DecoupledBPUWithBTB::blockPredictionOnce(ThreadID tid)

{

threads[tid].blockPredictionPending = true;

}

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 635 - 640, The method DecoupledBPUWithBTB::blockPredictionOnce currently hardcodes threads[0].blockPredictionPending = true which breaks SMT; change the API to accept a ThreadID parameter (e.g., ThreadID tid) and set threads[tid].blockPredictionPending = true, validating tid is in range, or alternatively iterate all threads and set threads[i].blockPredictionPending = true to block everyone; update all call sites of blockPredictionOnce() to pass the ThreadID (or rely on the new all-threads behavior) and add bounds checks where appropriate.

coderabbitai · 2026-01-30T09:50:49Z

src/cpu/pred/btb/ftq.hh

+    inline bool anyEmpty() const {
+        for (uint32_t i = 0; i < numThreads; ++i) {
+            if (!queue[i].cap.empty())
+                return false;
+        }
+        return true;
+    }


⚠️ Potential issue | 🟡 Minor

Misleading method name: anyEmpty() actually checks if ALL queues are empty.

The method returns true only when every queue is empty, which is semantically allEmpty(). The current name suggests it returns true if at least one queue is empty.

♻️ Proposed fix

- inline bool anyEmpty() const { + inline bool allEmpty() const { for (uint32_t i = 0; i < numThreads; ++i) { if (!queue[i].cap.empty()) return false; } return true; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

inline bool anyEmpty() const {

for (uint32_t i = 0; i < numThreads; ++i) {

if (!queue[i].cap.empty())

return false;

}

return true;

}

inline bool allEmpty() const {

for (uint32_t i = 0; i < numThreads; ++i) {

if (!queue[i].cap.empty())

return false;

}

return true;

}

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/ftq.hh` around lines 62 - 68, The method anyEmpty() currently returns true only when every queue is empty (it checks queue[i].cap.empty() for all threads), so rename the function to allEmpty() to match its behavior (and update all callers/tests), or alternatively change its implementation to return true as soon as any queue[i].cap.empty() is found if you prefer the original anyEmpty name; locate the symbol anyEmpty(), the loop over numThreads, and the queue[].cap.empty() check to make the corrective change and adjust usages accordingly.

github-actions · 2026-01-30T09:54:02Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1727`	-
This PR	`2.1907`	📈 `+0.0181` (`+0.83%`)

✅ Difftest smoke test passed!

Change-Id: Ie21dbd89df9c1d5d80106d71ae4f9b51415be897

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/cpu/o3/fetch.cc`:
- Around line 459-461: The BPU seeding uses thread 0's PC only
(dbpbtb->resetPC(threads[0].fetchpc->instAddr())), which mis-initializes SMT
threads with different start PCs; update the reset logic to either add/use a
per-thread reset API (e.g., a resetPC method that accepts a thread id or
per-thread predictor instance) or iterate over all threads and call
dbpbtb->resetPC(threads[i].fetchpc->instAddr()) for each thread so every
thread's fetchpc is used to seed the predictor.
- Around line 1537-1545: handleIEWSignals is using iewInfo[0] and passing a
hardcoded 0 to dbpbtb instead of the current thread; update the resolve-update
path to accept and use the ThreadID from checkSignalsAndUpdate(ThreadID tid) by
adding a tid parameter to handleIEWSignals, read resolved CFIs from
fromIEW->iewInfo[tid].resolvedCFIs, and replace the hardcoded 0 with tid in the
dbpbtb calls (prepareResolveUpdateEntries, markCFIResolved, resolveUpdate); also
consider adding ThreadID into ResolvedCFIEntry if you need to preserve thread
context per-entry.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 303-305: The current check uses ftq.backId(tid) and ftq.back(tid)
which underflow/are invalid when the FTQ is empty; change the guard to test
emptiness (e.g. !ftq.empty(tid) or ftq.size(tid) > 0) before calling
ftq.backId/tid or ftq.back(tid). Update the conditional around abtb->isEnabled()
to first ensure the FTQ has entries, then obtain previous_block_startpc from
ftq.back(tid) and call abtb->updateUsingS3Pred(...).

🧹 Nitpick comments (1)

src/cpu/o3/fetch.hh (1)
851-875: Clarify ownership of FetchBuffer::data to avoid leaks.

data is allocated with new[] in Fetch but there’s no corresponding cleanup. Consider explicit deletion in Fetch::~Fetch() (or switch to RAII).
🧹 Suggested fix (explicit cleanup in Fetch::~Fetch)
-Fetch::~Fetch() = default;
+Fetch::~Fetch()
+{
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        delete[] threads[tid].data;
+        threads[tid].data = nullptr;
+    }
+}

coderabbitai · 2026-02-03T07:43:10Z

src/cpu/o3/fetch.cc

    assert(dbpbtb);
-    dbpbtb->resetPC(pc[0]->instAddr());
+    dbpbtb->resetPC(threads[0].fetchpc->instAddr());
 }


⚠️ Potential issue | 🟠 Major

SMT init: predictor PC seeded from thread 0 only.

Line 460 seeds the BPU using thread 0’s PC; other threads with distinct start PCs will be mis-initialized. Consider adding a per-thread resetPC API or looping per thread during reset.

🤖 Prompt for AI Agents

In `@src/cpu/o3/fetch.cc` around lines 459 - 461, The BPU seeding uses thread 0's PC only (dbpbtb->resetPC(threads[0].fetchpc->instAddr())), which mis-initializes SMT threads with different start PCs; update the reset logic to either add/use a per-thread reset API (e.g., a resetPC method that accepts a thread id or per-thread predictor instance) or iterate over all threads and call dbpbtb->resetPC(threads[i].fetchpc->instAddr()) for each thread so every thread's fetchpc is used to seed the predictor.

coderabbitai · 2026-02-03T07:43:10Z

src/cpu/o3/fetch.cc

    if (!resolveQueue.empty()) {
        auto &entry = resolveQueue.front();
        unsigned int stream_id = entry.resolvedFTQId;
-        dbpbtb->prepareResolveUpdateEntries(stream_id);
+        dbpbtb->prepareResolveUpdateEntries(stream_id, 0);
        for (const auto resolvedInstPC : entry.resolvedInstPC) {
-            dbpbtb->markCFIResolved(stream_id, resolvedInstPC);
+            dbpbtb->markCFIResolved(stream_id, resolvedInstPC, 0);
        }
-        bool success = dbpbtb->resolveUpdate(stream_id);
+        bool success = dbpbtb->resolveUpdate(stream_id, 0);
        if (success) {


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "resolvedCFIs" src/cpu/o3 -g'*.hh' -g'*.cc'

Repository: OpenXiangShan/GEM5

Length of output: 471

🏁 Script executed:

rg -n "struct.*Resolved|Resolved.*CFI|ResolvedCFI" src/cpu/o3 -g'*.hh'

Repository: OpenXiangShan/GEM5

Length of output: 198

🏁 Script executed:

# Also check the FetchBufferEntry or similar structure that contains resolvedCFIs rg -n "resolvedInstPC\|resolvedCFIs" src/cpu/o3 -B 5 -A 5 -g'*.hh'

Repository: OpenXiangShan/GEM5

Length of output: 44

🏁 Script executed:

cat -n src/cpu/o3/comm.hh | sed -n '240,270p'

Repository: OpenXiangShan/GEM5

Length of output: 1155

🏁 Script executed:

rg -n "struct.*IEWInfo|IEWInfo.*{" src/cpu/o3 -A 20 -g'*.hh'

Repository: OpenXiangShan/GEM5

Length of output: 177

🏁 Script executed:

cat -n src/cpu/o3/fetch.cc | sed -n '1495,1550p'

Repository: OpenXiangShan/GEM5

Length of output: 2346

🏁 Script executed:

rg -n "fromIEW\s*=" src/cpu/o3/fetch.cc -B 3 -A 3

Repository: OpenXiangShan/GEM5

Length of output: 396

🏁 Script executed:

rg -n "class Fetch" src/cpu/o3/fetch.hh -A 30 | head -50

Repository: OpenXiangShan/GEM5

Length of output: 1498

🏁 Script executed:

rg -n "handleIEWSignals\|handleIEWSignals\(" src/cpu/o3 -g'*.hh' -g'*.cc' -B 2 -A 2

Repository: OpenXiangShan/GEM5

Length of output: 44

🏁 Script executed:

rg -n "void handleIEWSignals|void.*handleIEW" src/cpu/o3/fetch.hh -A 2

Repository: OpenXiangShan/GEM5

Length of output: 142

🏁 Script executed:

# Search for where handleIEWSignals is called rg -n "handleIEWSignals" src/cpu/o3/fetch.cc -B 5 -A 2

Repository: OpenXiangShan/GEM5

Length of output: 488

🏁 Script executed:

# Check if Fetch class has tid or thread-specific members rg -n "tid\|threadId\|numThreads" src/cpu/o3/fetch.hh | head -20

Repository: OpenXiangShan/GEM5

Length of output: 44

🏁 Script executed:

# Look at the full context around line 1496 to see if there's thread context cat -n src/cpu/o3/fetch.cc | sed -n '1480,1510p'

Repository: OpenXiangShan/GEM5

Length of output: 1243

🏁 Script executed:

# Find the function that contains line 1452 where handleIEWSignals is called cat -n src/cpu/o3/fetch.cc | sed -n '1420,1460p'

Repository: OpenXiangShan/GEM5

Length of output: 1360

🏁 Script executed:

# Also check the function signature rg -n "bool.*tid\|void.*tid" src/cpu/o3/fetch.cc | grep -E "14[0-9]{2}|13[0-9]{2}|11[0-9]{2}" | head -5

Repository: OpenXiangShan/GEM5

Length of output: 44

🏁 Script executed:

# Search backwards from line 1450 to find function definition ast-grep --pattern 'void $_($_) { $$$ if (fetchStatus[tid] == WaitingCache && threads[tid].cacheReq.getOverallStatus() == AccessComplete) $$$ }'

Repository: OpenXiangShan/GEM5

Length of output: 44

Wire correct thread ID through resolve-update path.

handleIEWSignals() is called from checkSignalsAndUpdate(ThreadID tid) (line 1436) but receives no tid parameter. It then accesses fromIEW->iewInfo->resolvedCFIs without indexing by thread (line 1503), reading from iewInfo[0] implicitly instead of iewInfo[tid]. The subsequent dbpbtb calls hardcode tid=0 (lines 1540, 1542, 1544) instead of using the actual thread ID. This causes the resolve-update path to process the wrong thread's resolved CFIs and update the wrong thread's predictor state in multi-threaded execution.

Pass tid to handleIEWSignals(), access fromIEW->iewInfo[tid].resolvedCFIs, and use tid instead of 0 in the dbpbtb calls. Additionally, verify whether ResolvedCFIEntry should include ThreadID to preserve thread context at the entry level.

🤖 Prompt for AI Agents

In `@src/cpu/o3/fetch.cc` around lines 1537 - 1545, handleIEWSignals is using iewInfo[0] and passing a hardcoded 0 to dbpbtb instead of the current thread; update the resolve-update path to accept and use the ThreadID from checkSignalsAndUpdate(ThreadID tid) by adding a tid parameter to handleIEWSignals, read resolved CFIs from fromIEW->iewInfo[tid].resolvedCFIs, and replace the hardcoded 0 with tid in the dbpbtb calls (prepareResolveUpdateEntries, markCFIResolved, resolveUpdate); also consider adding ThreadID into ResolvedCFIEntry if you need to preserve thread context per-entry.

coderabbitai · 2026-02-03T07:43:10Z

src/cpu/pred/btb/decoupled_bpred.cc

+        if (abtb->isEnabled() && ftq.backId(tid)) {
+            auto previous_block_startpc = ftq.back(tid).startPC;
            abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], previous_block_startpc);


⚠️ Potential issue | 🟠 Major

Avoid backId()/back() when the FTQ is empty.

ftq.backId(tid) underflows on empty queues, and ftq.back(tid) will be invalid. Guard on emptiness instead.

🛠️ Proposed fix

- if (abtb->isEnabled() && ftq.backId(tid)) { + if (abtb->isEnabled() && !ftq.empty(tid)) { auto previous_block_startpc = ftq.back(tid).startPC; abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], previous_block_startpc); } else if (abtb->isEnabled()) { abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], 0); }

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 303 - 305, The current check uses ftq.backId(tid) and ftq.back(tid) which underflow/are invalid when the FTQ is empty; change the guard to test emptiness (e.g. !ftq.empty(tid) or ftq.size(tid) > 0) before calling ftq.backId/tid or ftq.back(tid). Update the conditional around abtb->isEnabled() to first ensure the FTQ has entries, then obtain previous_block_startpc from ftq.back(tid) and call abtb->updateUsingS3Pred(...).

github-actions · 2026-02-03T07:45:19Z

🚀 Coremark Smoke Test Results

Branch	IPC	Change
Base (`xs-dev`)	`2.1727`	-
This PR	`2.1907`	📈 `+0.0181` (`+0.83%`)

✅ Difftest smoke test passed!

github-actions · 2026-02-03T08:00:47Z

🚀 Performance test triggered: spec06-0.8c

XiangShanRobot · 2026-02-03T10:10:17Z

[Generated by GEM5 Performance Robot]
commit: 7fc14cb
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	20.27	20.37	-0.48 🔴

tastynoob requested a review from jensen-yan January 29, 2026 09:44

tastynoob force-pushed the kmh-smt branch from 8ea00b5 to 77d6b71 Compare January 29, 2026 09:50

tastynoob added the do not merge label Jan 29, 2026

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

cpu-pred: smt support for decoupled bpu

d5537bd

tastynoob force-pushed the kmh-smt branch from 77d6b71 to d5537bd Compare January 30, 2026 09:43

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

cpu-o3: smt support for fetch

7fc14cb

Change-Id: Ie21dbd89df9c1d5d80106d71ae4f9b51415be897

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

tastynoob added the perf label Feb 3, 2026

jensen-yan requested a review from Yakkhini February 3, 2026 10:31

jensen-yan removed the do not merge label Feb 6, 2026

jensen-yan approved these changes Feb 6, 2026

View reviewed changes

	s0PC = threads[tid].finalPred.getTarget(predictWidth);;
	s0PC = threads[tid].finalPred.getTarget(predictWidth);



		inline FetchTargetId frontId(ThreadID tid) const { return queue[tid].baseTargetId; }
		inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; }

Conversation

tastynoob commented Jan 29, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 30, 2026

🚀 Coremark Smoke Test Results

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 3, 2026

🚀 Coremark Smoke Test Results

Uh oh!

github-actions bot commented Feb 3, 2026

Uh oh!

XiangShanRobot commented Feb 3, 2026

Ideal BTB Performance

Overall Score

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tastynoob commented Jan 29, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 29, 2026 •

edited

Loading