Skip to content

Kmh smt#745

Open
tastynoob wants to merge 2 commits intoxs-devfrom
kmh-smt
Open

Kmh smt#745
tastynoob wants to merge 2 commits intoxs-devfrom
kmh-smt

Conversation

@tastynoob
Copy link
Collaborator

@tastynoob tastynoob commented Jan 29, 2026

Do not merge, this pr cant build success

Summary by CodeRabbit

  • Refactor
    • Made fetch and branch-prediction thread-aware with per-thread state and pipelines for improved multi-threaded handling.
    • Reworked fetch-target queue to manage per-thread queues with round-robin selection for fairer fetch scheduling.
    • Updated commit and tracing paths to use thread-local target access, improving per-thread statistics and trace accuracy.

@tastynoob tastynoob requested a review from jensen-yan January 29, 2026 09:44
@coderabbitai
Copy link

coderabbitai bot commented Jan 29, 2026

📝 Walkthrough

Walkthrough

Refactors BTB/predictor and O3 fetch to be thread-aware: introduces per-thread FetchTargetQueue and per-thread predictor/fetch state, replaces FTQ "head" APIs with "fetching"/target-based APIs, renames predictor update()commit(), and updates fetch/tracing and cache-request handling to use per-thread fetch buffers.

Changes

Cohort / File(s) Summary
O3 Fetch (per-thread buffers)
src/cpu/o3/fetch.cc, src/cpu/o3/fetch.hh, src/cpu/o3/trace/TraceFetch.cc
Replace global/per-core fetch state with threads[tid] FetchBuffer (fetchpc, data, valid, startPC, cacheReq). Use per-thread cacheReq, fetchpc, startPC, valid; replace fetchBuffer[] and pc[] uses with threads[tid] members; update fetch-address, multi-cacheline fetch, commit/update callsites to per-thread signatures.
BTB Predictor — decoupled (thread-aware)
src/cpu/pred/btb/decoupled_bpred.hh, src/cpu/pred/btb/decoupled_bpred.cc, src/cpu/pred/btb/decoupled_bpred_stats.cc
Migrate predictor internals to per-thread state and per-thread FTQ access: add ThreadID parameters to many methods, introduce threads[MaxThreads] storage for histories/preds, replace getTarget/ftqHeadId usage with ftq.get(..., tid)/ftqFetchingTarget(tid), rename update()commit(), and adapt squash/resolve/prepare flows per-thread.
Fetch Target Queue (FTQ) implementation
src/cpu/pred/btb/ftq.hh, src/cpu/pred/btb/ftq.cc
Add new FetchTargetQueue class with per-thread queues, ftqSize per thread, round-robin getTargetTid(), and implementations for insert, finishTarget, commitTarget, and squashAfter operating on per-thread queues.
BTB data structures
src/cpu/pred/btb/common.hh
Add ThreadID tid to FullBTBPrediction so predictions carry thread context.
Build integration
src/cpu/pred/SConscript
Add btb/ftq.cc to build sources so FTQ implementation is compiled in.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Fetch as O3 Fetch
participant FTQ as FetchTargetQueue
participant BPU as DecoupledBPUWithBTB
participant BTB as BTB Predictor
participant Cache as CacheSubsystem
Fetch->>FTQ: query fetching target (ftqHasFetching(tid) / ftqFetchingTarget(tid))
FTQ-->>Fetch: return FetchTarget (per-thread)
Fetch->>Cache: prepare/issue cacheReq (threads[tid].cacheReq)
Cache-->>Fetch: deliver fetch data -> threads[tid].data / valid
Fetch->>BPU: hand off FetchTarget / fetchpc (tid)
BPU->>BTB: requestNewPrediction / processNewPrediction (tid)
BTB-->>BPU: prediction (per-thread FullBTBPrediction)
BPU->>FTQ: commitTarget / finishTarget / squashAfter (tid)
BPU->>BTB: commit(pred_id, tid) -- previously update()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

perf

Suggested reviewers

  • jensen-yan
  • Yakkhini

Poem

🐇 I hopped between targets, thread by thread,
I carried PCs where old heads once led;
Buffers snug in burrows, caches quick and neat,
Fetches per-rabbit, predictions all meet;
A tiny clap — commits now bound and fed.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title "Kmh smt" is vague and does not clearly convey meaningful information about the changeset's purpose or scope. Use a more descriptive title that clearly indicates the main change, such as "Add SMT support to O3 CPU fetch stage" or similar.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kmh-smt

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/cpu/pred/btb/common.hh (1)

452-484: Initialize the new tid member in the default constructor.

The newly added tid field is not initialized in the default constructor, which could lead to undefined behavior if read before being explicitly set. Other members like bbStart are initialized to 0.

🛡️ Proposed fix
     FullBTBPrediction() :
+        tid(0),
         bbStart(0),
         btbEntries(),
src/cpu/o3/fetch.cc (1)

632-638: Pass the tid parameter to per-thread FTQ accessors.

The methods ftqHasFetching() and ftqFetchingTarget() in decoupled_bpred.hh require a ThreadID tid parameter (lines 404-406), but these calls omit it. The variable tid is available in all these contexts, so update the calls to pass it:

dbpbtb->ftqHasFetching(tid)
dbpbtb->ftqFetchingTarget(tid)

This same pattern appears in multiple locations: line 632-633, 785-786, 1760, and 2054 in fetch.cc, as well as line 971 in fetch.hh.

🤖 Fix all issues with AI agents
In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Line 381: Remove the stray extra semicolon after the assignment to s0PC:
change the line using threads[tid].finalPred.getTarget(predictWidth);; to a
single semicolon so it becomes threads[tid].finalPred.getTarget(predictWidth); —
ensure this occurs where s0PC is assigned (references: s0PC, threads, finalPred,
getTarget, predictWidth) and rebuild to confirm no warnings.
- Line 757: The entry's thread id is incorrectly hardcoded to 0; change the
assignment from setting entry.tid = 0 to using the function parameter (entry.tid
= tid) so the entry records the correct thread; this ensures subsequent calls
that use entry.tid (fillAheadPipeline and updateHistoryForPrediction) access the
correct per-thread state rather than always thread 0.

In `@src/cpu/pred/btb/decoupled_bpred.hh`:
- Line 150: scheduleThread() currently always returns 0 which forces
single-threaded behavior; update the scheduleThread implementation (declared in
decoupled_bpred.hh) to compute and return the correct ThreadID based on the
branch predictor's thread scheduling state (e.g., using the class's
last-scheduled thread counter, a thread queue, or the CPU/ThreadContext APIs
already used elsewhere in decoupled_bpred.cc TODO), making it multi-thread aware
and thread-safe rather than hard-coding 0.

In `@src/cpu/pred/btb/ftq.cc`:
- Around line 40-45: Add a guard/assertion in FetchTargetQueue::commitTarget to
ensure you never call queue[tid].cap.pop_front() on an empty deque: check that
queue[tid].cap is not empty (e.g., assert(!queue[tid].cap.empty()) or an if with
an error/log) before calling pop_front() and only increment
queue[tid].baseTargetId when the pop actually happens; reference the function
FetchTargetQueue::commitTarget and the members queue[tid].cap and
queue[tid].baseTargetId when adding this check.

In `@src/cpu/pred/btb/ftq.hh`:
- Line 46: The backId(ThreadID tid) method can underflow when
queue[tid].cap.size() is 0 because it computes baseTargetId + 0 - 1; update
backId in src/cpu/pred/btb/ftq.hh to guard against empty queues by checking
queue[tid].cap.empty(): either assert/throw a clear precondition (e.g.,
assert(!queue[tid].cap.empty()) or throw std::out_of_range) or return a defined
invalid/sentinel FetchTargetId value instead; reference the backId function,
queue[tid].baseTargetId and queue[tid].cap.size()/cap.empty() when making the
change.
🧹 Nitpick comments (3)
src/cpu/pred/btb/ftq.hh (1)

62-68: Consider renaming anyEmpty() to allEmpty() for clarity.

The current implementation returns true only when ALL queues are empty (i.e., no queue has any targets). The name anyEmpty() suggests it would return true if ANY single queue is empty, which is the opposite behavior. Consider renaming to allEmpty() to match the semantics.

src/cpu/pred/btb/decoupled_bpred.hh (1)

70-78: Hardcoded numThreads = 2 should be parameterized.

The number of threads should be configurable via params rather than hardcoded, especially since MaxThreads is defined as a limit. Also consider removing the commented-out code rather than leaving it in the codebase.

♻️ Suggested change
-    // unsigned fetchTargetQueueSize;
-    // std::deque<FetchTarget> fetchTargetQueue;
-    // FetchTargetId fetchTargetBaseId{1}; // ID of fetchTargetQueue.front()
-    // FetchTargetId ftqId{1}; // next FSQ id to allocate (monotonic)
-    // FetchTargetId fetchHeadFtqId{1}; // next FSQ id to be consumed by fetch

     CPU *cpu;

-    const int numThreads = 2;
+    const int numThreads;  // Set from params in constructor
src/cpu/pred/btb/decoupled_bpred.cc (1)

27-30: Hardcoded thread ID needs to be addressed for SMT support.

The tid = 0 is hardcoded with a TODO comment. For proper SMT support, this function should receive ThreadID as a parameter, similar to other refactored functions in this PR.

FetchTarget entry = createFetchTargetEntry(tid);

// 2. Update global PC state to target or fall-through
s0PC = threads[tid].finalPred.getTarget(predictWidth);;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Double semicolon.

Minor typo: there's an extra semicolon at the end of this line.

🧹 Proposed fix
-    s0PC = threads[tid].finalPred.getTarget(predictWidth);;
+    s0PC = threads[tid].finalPred.getTarget(predictWidth);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
s0PC = threads[tid].finalPred.getTarget(predictWidth);;
s0PC = threads[tid].finalPred.getTarget(predictWidth);
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/decoupled_bpred.cc` at line 381, Remove the stray extra
semicolon after the assignment to s0PC: change the line using
threads[tid].finalPred.getTarget(predictWidth);; to a single semicolon so it
becomes threads[tid].finalPred.getTarget(predictWidth); — ensure this occurs
where s0PC is assigned (references: s0PC, threads, finalPred, getTarget,
predictWidth) and rebuild to confirm no warnings.

Comment on lines +40 to +45
void
FetchTargetQueue::commitTarget(ThreadID tid)
{
queue[tid].cap.pop_front();
queue[tid].baseTargetId++;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add assertion to guard against committing from an empty queue.

Calling pop_front() on an empty deque is undefined behavior. Add a guard to prevent this:

🛡️ Proposed fix
 void
 FetchTargetQueue::commitTarget(ThreadID tid)
 {
+    assert(!empty(tid) && "Cannot commit from empty queue");
     queue[tid].cap.pop_front();
     queue[tid].baseTargetId++;
 }
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/ftq.cc` around lines 40 - 45, Add a guard/assertion in
FetchTargetQueue::commitTarget to ensure you never call
queue[tid].cap.pop_front() on an empty deque: check that queue[tid].cap is not
empty (e.g., assert(!queue[tid].cap.empty()) or an if with an error/log) before
calling pop_front() and only increment queue[tid].baseTargetId when the pop
actually happens; reference the function FetchTargetQueue::commitTarget and the
members queue[tid].cap and queue[tid].baseTargetId when adding this check.



inline FetchTargetId frontId(ThreadID tid) const { return queue[tid].baseTargetId; }
inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

backId() underflows on empty queue.

When cap.size() is 0, this computes baseTargetId + 0 - 1, which underflows the unsigned FetchTargetId. Consider adding a precondition or handling the empty case:

🛡️ Proposed fix
-    inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; }
+    inline FetchTargetId backId(ThreadID tid) const {
+        assert(!empty(tid) && "backId() called on empty queue");
+        return queue[tid].baseTargetId + queue[tid].cap.size() - 1;
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; }
inline FetchTargetId backId(ThreadID tid) const {
assert(!empty(tid) && "backId() called on empty queue");
return queue[tid].baseTargetId + queue[tid].cap.size() - 1;
}
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/ftq.hh` at line 46, The backId(ThreadID tid) method can
underflow when queue[tid].cap.size() is 0 because it computes baseTargetId + 0 -
1; update backId in src/cpu/pred/btb/ftq.hh to guard against empty queues by
checking queue[tid].cap.empty(): either assert/throw a clear precondition (e.g.,
assert(!queue[tid].cap.empty()) or throw std::out_of_range) or return a defined
invalid/sentinel FetchTargetId value instead; reference the backId function,
queue[tid].baseTargetId and queue[tid].cap.size()/cap.empty() when making the
change.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cpu/pred/btb/common.hh (1)

452-485: ⚠️ Potential issue | 🟠 Major

Initialize FullBTBPrediction::tid to avoid indeterminate values.

Line 454 adds tid but the constructor doesn’t set it, so it can carry garbage until assigned. This is risky if any code reads it before initialization.

🛠 Proposed fix
-    FullBTBPrediction() :
-        bbStart(0),
+    FullBTBPrediction() :
+        tid(InvalidThreadID),
+        bbStart(0),
         btbEntries(),
         condTakens(),
         indirectTargets(),
         returnTarget(0),
         tageInfoForMgscs(),
         predSource(0),
         predTick(0),
         s1Source(-1),
         s3Source(-1) {}
🤖 Fix all issues with AI agents
In `@src/cpu/o3/fetch.cc`:
- Around line 783-787: The code computes ThreadID tid from inst->threadNumber
but then uses a hard-coded thread 0 for FTQ operations; change calls to use tid
instead of 0 so SMT threads access the correct FTQ: replace
dbpbtb->ftqHasFetching(0) with dbpbtb->ftqHasFetching(tid) and
dbpbtb->ftqFetchingTarget(0) with dbpbtb->ftqFetchingTarget(tid) (ensure tid is
the same ThreadID variable computed earlier).

In `@src/cpu/o3/fetch.hh`:
- Around line 969-972: The ftqEmpty() helper currently calls
dbpbtb->ftqHasFetching(0) with a hard-coded thread 0 which breaks SMT; change
ftqEmpty to accept a ThreadID (or tid) parameter and forward that to
dbpbtb->ftqHasFetching(tid) so the check is per-thread, and then update all call
sites in fetch.cc that call ftqEmpty() to pass the current thread id (e.g., the
local/current tid variable used in fetch code) so SMT fetch uses the correct FTQ
state.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Line 50: The FTQ is being constructed with a hardcoded thread count (ftq(2,
p.ftq_size)); change this to use the configured thread count variable
(numThreads) instead so the FTQ is initialized consistently (e.g.,
ftq(numThreads, p.ftq_size)). Locate the FTQ construction in the decoupled_bpred
constructor/initializer list where ftq(...) is used and replace the literal 2
with the numThreads identifier so the FTQ respects the current thread
configuration.
- Around line 635-640: The method DecoupledBPUWithBTB::blockPredictionOnce
currently hardcodes threads[0].blockPredictionPending = true which breaks SMT;
change the API to accept a ThreadID parameter (e.g., ThreadID tid) and set
threads[tid].blockPredictionPending = true, validating tid is in range, or
alternatively iterate all threads and set threads[i].blockPredictionPending =
true to block everyone; update all call sites of blockPredictionOnce() to pass
the ThreadID (or rely on the new all-threads behavior) and add bounds checks
where appropriate.

In `@src/cpu/pred/btb/ftq.hh`:
- Around line 62-68: The method anyEmpty() currently returns true only when
every queue is empty (it checks queue[i].cap.empty() for all threads), so rename
the function to allEmpty() to match its behavior (and update all callers/tests),
or alternatively change its implementation to return true as soon as any
queue[i].cap.empty() is found if you prefer the original anyEmpty name; locate
the symbol anyEmpty(), the loop over numThreads, and the queue[].cap.empty()
check to make the corrective change and adjust usages accordingly.
🧹 Nitpick comments (5)
src/cpu/pred/btb/decoupled_bpred_stats.cc (2)

801-808: Avoid copying FetchTarget here; use a const reference.

ftq.get(...) returns a FetchTarget. Copying it can be expensive and can desync if callers expect the live entry.

♻️ Proposed refactor
-    auto entry = ftq.get(inst->ftqId, inst->threadNumber);
+    const auto &entry = ftq.get(inst->ftqId, inst->threadNumber);

913-924: Reduce double lookup by caching the FTQ entry reference.

You call ftq.get(...) twice; caching it avoids repeated lookups and keeps logging consistent with the increment.

♻️ Proposed refactor
-    ftq.get(inst->ftqId, inst->threadNumber).commitInstNum++;
+    auto &entry = ftq.get(inst->ftqId, inst->threadNumber);
+    entry.commitInstNum++;
     ...
-            ftq.get(inst->ftqId, inst->threadNumber).commitInstNum);
+            entry.commitInstNum);
src/cpu/pred/btb/ftq.hh (1)

45-61: Missing bounds validation on tid parameter.

All accessor methods use tid to index queue[tid] without validation. If tid >= numThreads, the code accesses uninitialized state (or causes out-of-bounds access if tid >= MaxThreads). Consider adding debug assertions.

🛡️ Proposed fix example
     inline FetchTargetId frontId(ThreadID tid) const {
+        assert(tid < numThreads && "Invalid thread ID");
         return queue[tid].baseTargetId;
     }

Apply similar checks to other methods or create a helper macro/inline function.

src/cpu/pred/btb/decoupled_bpred.cc (1)

24-31: Hardcoded tid = 0 limits multi-threading support.

The function doesn't accept ThreadID as a parameter and uses a hardcoded value. This should be addressed when completing SMT support.

♻️ Suggested signature change
 void
-DecoupledBPUWithBTB::consumeFetchTarget(unsigned fetched_inst_num)
+DecoupledBPUWithBTB::consumeFetchTarget(unsigned fetched_inst_num, ThreadID tid)
 {
-    ThreadID tid = 0; // TODO: multi-threading
     ftq.fetching(tid).fetchInstNum = fetched_inst_num;
     ftq.finishTarget(tid);
 }
src/cpu/pred/btb/decoupled_bpred.hh (1)

78-78: Hardcoded numThreads should be configurable.

numThreads is hardcoded to 2 instead of being derived from simulation parameters. This limits flexibility and could cause inconsistencies if changed in one place but not others (e.g., ftq(2, ...) in the constructor).

♻️ Consider parameterizing
-    const int numThreads = 2;
+    const int numThreads;  // Initialize from params in constructor

Then in constructor initialization list:

numThreads(p.numThreads),

// uras(p.uras),
bpDBSwitches(p.bpDBSwitches),
numStages(p.numStages),
ftq(2, p.ftq_size),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Hardcoded thread count in FTQ initialization.

The FTQ is initialized with ftq(2, p.ftq_size) but should use numThreads for consistency and maintainability.

♻️ Proposed fix
-      ftq(2, p.ftq_size),
+      ftq(numThreads, p.ftq_size),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ftq(2, p.ftq_size),
ftq(numThreads, p.ftq_size),
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/decoupled_bpred.cc` at line 50, The FTQ is being constructed
with a hardcoded thread count (ftq(2, p.ftq_size)); change this to use the
configured thread count variable (numThreads) instead so the FTQ is initialized
consistently (e.g., ftq(numThreads, p.ftq_size)). Locate the FTQ construction in
the decoupled_bpred constructor/initializer list where ftq(...) is used and
replace the literal 2 with the numThreads identifier so the FTQ respects the
current thread configuration.

Comment on lines 635 to 640
void
DecoupledBPUWithBTB::blockPredictionOnce()
{
blockPredictionPending = true;
// smtTODO
threads[0].blockPredictionPending = true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

blockPredictionOnce() only affects thread 0.

This hardcodes thread 0, breaking SMT support. The function should accept ThreadID parameter or block all threads.

♻️ Proposed fix
 void
-DecoupledBPUWithBTB::blockPredictionOnce()
+DecoupledBPUWithBTB::blockPredictionOnce(ThreadID tid)
 {
-    // smtTODO
-    threads[0].blockPredictionPending = true;
+    threads[tid].blockPredictionPending = true;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void
DecoupledBPUWithBTB::blockPredictionOnce()
{
blockPredictionPending = true;
// smtTODO
threads[0].blockPredictionPending = true;
}
void
DecoupledBPUWithBTB::blockPredictionOnce(ThreadID tid)
{
threads[tid].blockPredictionPending = true;
}
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 635 - 640, The method
DecoupledBPUWithBTB::blockPredictionOnce currently hardcodes
threads[0].blockPredictionPending = true which breaks SMT; change the API to
accept a ThreadID parameter (e.g., ThreadID tid) and set
threads[tid].blockPredictionPending = true, validating tid is in range, or
alternatively iterate all threads and set threads[i].blockPredictionPending =
true to block everyone; update all call sites of blockPredictionOnce() to pass
the ThreadID (or rely on the new all-threads behavior) and add bounds checks
where appropriate.

Comment on lines +62 to +68
inline bool anyEmpty() const {
for (uint32_t i = 0; i < numThreads; ++i) {
if (!queue[i].cap.empty())
return false;
}
return true;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading method name: anyEmpty() actually checks if ALL queues are empty.

The method returns true only when every queue is empty, which is semantically allEmpty(). The current name suggests it returns true if at least one queue is empty.

♻️ Proposed fix
-    inline bool anyEmpty() const {
+    inline bool allEmpty() const {
         for (uint32_t i = 0; i < numThreads; ++i) {
             if (!queue[i].cap.empty())
                 return false;
         }
         return true;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inline bool anyEmpty() const {
for (uint32_t i = 0; i < numThreads; ++i) {
if (!queue[i].cap.empty())
return false;
}
return true;
}
inline bool allEmpty() const {
for (uint32_t i = 0; i < numThreads; ++i) {
if (!queue[i].cap.empty())
return false;
}
return true;
}
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/ftq.hh` around lines 62 - 68, The method anyEmpty()
currently returns true only when every queue is empty (it checks
queue[i].cap.empty() for all threads), so rename the function to allEmpty() to
match its behavior (and update all callers/tests), or alternatively change its
implementation to return true as soon as any queue[i].cap.empty() is found if
you prefer the original anyEmpty name; locate the symbol anyEmpty(), the loop
over numThreads, and the queue[].cap.empty() check to make the corrective change
and adjust usages accordingly.

@github-actions
Copy link

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1727 -
This PR 2.1907 📈 +0.0181 (+0.83%)

✅ Difftest smoke test passed!

Change-Id: Ie21dbd89df9c1d5d80106d71ae4f9b51415be897
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@src/cpu/o3/fetch.cc`:
- Around line 459-461: The BPU seeding uses thread 0's PC only
(dbpbtb->resetPC(threads[0].fetchpc->instAddr())), which mis-initializes SMT
threads with different start PCs; update the reset logic to either add/use a
per-thread reset API (e.g., a resetPC method that accepts a thread id or
per-thread predictor instance) or iterate over all threads and call
dbpbtb->resetPC(threads[i].fetchpc->instAddr()) for each thread so every
thread's fetchpc is used to seed the predictor.
- Around line 1537-1545: handleIEWSignals is using iewInfo[0] and passing a
hardcoded 0 to dbpbtb instead of the current thread; update the resolve-update
path to accept and use the ThreadID from checkSignalsAndUpdate(ThreadID tid) by
adding a tid parameter to handleIEWSignals, read resolved CFIs from
fromIEW->iewInfo[tid].resolvedCFIs, and replace the hardcoded 0 with tid in the
dbpbtb calls (prepareResolveUpdateEntries, markCFIResolved, resolveUpdate); also
consider adding ThreadID into ResolvedCFIEntry if you need to preserve thread
context per-entry.

In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 303-305: The current check uses ftq.backId(tid) and ftq.back(tid)
which underflow/are invalid when the FTQ is empty; change the guard to test
emptiness (e.g. !ftq.empty(tid) or ftq.size(tid) > 0) before calling
ftq.backId/tid or ftq.back(tid). Update the conditional around abtb->isEnabled()
to first ensure the FTQ has entries, then obtain previous_block_startpc from
ftq.back(tid) and call abtb->updateUsingS3Pred(...).
🧹 Nitpick comments (1)
src/cpu/o3/fetch.hh (1)

851-875: Clarify ownership of FetchBuffer::data to avoid leaks.

data is allocated with new[] in Fetch but there’s no corresponding cleanup. Consider explicit deletion in Fetch::~Fetch() (or switch to RAII).

🧹 Suggested fix (explicit cleanup in Fetch::~Fetch)
-Fetch::~Fetch() = default;
+Fetch::~Fetch()
+{
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        delete[] threads[tid].data;
+        threads[tid].data = nullptr;
+    }
+}

Comment on lines 459 to 461
assert(dbpbtb);
dbpbtb->resetPC(pc[0]->instAddr());
dbpbtb->resetPC(threads[0].fetchpc->instAddr());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SMT init: predictor PC seeded from thread 0 only.

Line 460 seeds the BPU using thread 0’s PC; other threads with distinct start PCs will be mis-initialized. Consider adding a per-thread resetPC API or looping per thread during reset.

🤖 Prompt for AI Agents
In `@src/cpu/o3/fetch.cc` around lines 459 - 461, The BPU seeding uses thread 0's
PC only (dbpbtb->resetPC(threads[0].fetchpc->instAddr())), which mis-initializes
SMT threads with different start PCs; update the reset logic to either add/use a
per-thread reset API (e.g., a resetPC method that accepts a thread id or
per-thread predictor instance) or iterate over all threads and call
dbpbtb->resetPC(threads[i].fetchpc->instAddr()) for each thread so every
thread's fetchpc is used to seed the predictor.

Comment on lines 1537 to 1545
if (!resolveQueue.empty()) {
auto &entry = resolveQueue.front();
unsigned int stream_id = entry.resolvedFTQId;
dbpbtb->prepareResolveUpdateEntries(stream_id);
dbpbtb->prepareResolveUpdateEntries(stream_id, 0);
for (const auto resolvedInstPC : entry.resolvedInstPC) {
dbpbtb->markCFIResolved(stream_id, resolvedInstPC);
dbpbtb->markCFIResolved(stream_id, resolvedInstPC, 0);
}
bool success = dbpbtb->resolveUpdate(stream_id);
bool success = dbpbtb->resolveUpdate(stream_id, 0);
if (success) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "resolvedCFIs" src/cpu/o3 -g'*.hh' -g'*.cc'

Repository: OpenXiangShan/GEM5

Length of output: 471


🏁 Script executed:

rg -n "struct.*Resolved|Resolved.*CFI|ResolvedCFI" src/cpu/o3 -g'*.hh'

Repository: OpenXiangShan/GEM5

Length of output: 198


🏁 Script executed:

# Also check the FetchBufferEntry or similar structure that contains resolvedCFIs
rg -n "resolvedInstPC\|resolvedCFIs" src/cpu/o3 -B 5 -A 5 -g'*.hh'

Repository: OpenXiangShan/GEM5

Length of output: 44


🏁 Script executed:

cat -n src/cpu/o3/comm.hh | sed -n '240,270p'

Repository: OpenXiangShan/GEM5

Length of output: 1155


🏁 Script executed:

rg -n "struct.*IEWInfo|IEWInfo.*{" src/cpu/o3 -A 20 -g'*.hh'

Repository: OpenXiangShan/GEM5

Length of output: 177


🏁 Script executed:

cat -n src/cpu/o3/fetch.cc | sed -n '1495,1550p'

Repository: OpenXiangShan/GEM5

Length of output: 2346


🏁 Script executed:

rg -n "fromIEW\s*=" src/cpu/o3/fetch.cc -B 3 -A 3

Repository: OpenXiangShan/GEM5

Length of output: 396


🏁 Script executed:

rg -n "class Fetch" src/cpu/o3/fetch.hh -A 30 | head -50

Repository: OpenXiangShan/GEM5

Length of output: 1498


🏁 Script executed:

rg -n "handleIEWSignals\|handleIEWSignals\(" src/cpu/o3 -g'*.hh' -g'*.cc' -B 2 -A 2

Repository: OpenXiangShan/GEM5

Length of output: 44


🏁 Script executed:

rg -n "void handleIEWSignals|void.*handleIEW" src/cpu/o3/fetch.hh -A 2

Repository: OpenXiangShan/GEM5

Length of output: 142


🏁 Script executed:

# Search for where handleIEWSignals is called
rg -n "handleIEWSignals" src/cpu/o3/fetch.cc -B 5 -A 2

Repository: OpenXiangShan/GEM5

Length of output: 488


🏁 Script executed:

# Check if Fetch class has tid or thread-specific members
rg -n "tid\|threadId\|numThreads" src/cpu/o3/fetch.hh | head -20

Repository: OpenXiangShan/GEM5

Length of output: 44


🏁 Script executed:

# Look at the full context around line 1496 to see if there's thread context
cat -n src/cpu/o3/fetch.cc | sed -n '1480,1510p'

Repository: OpenXiangShan/GEM5

Length of output: 1243


🏁 Script executed:

# Find the function that contains line 1452 where handleIEWSignals is called
cat -n src/cpu/o3/fetch.cc | sed -n '1420,1460p'

Repository: OpenXiangShan/GEM5

Length of output: 1360


🏁 Script executed:

# Also check the function signature
rg -n "bool.*tid\|void.*tid" src/cpu/o3/fetch.cc | grep -E "14[0-9]{2}|13[0-9]{2}|11[0-9]{2}" | head -5

Repository: OpenXiangShan/GEM5

Length of output: 44


🏁 Script executed:

# Search backwards from line 1450 to find function definition
ast-grep --pattern 'void $_($_) { $$$ if (fetchStatus[tid] == WaitingCache && threads[tid].cacheReq.getOverallStatus() == AccessComplete) $$$ }'

Repository: OpenXiangShan/GEM5

Length of output: 44


Wire correct thread ID through resolve-update path.

handleIEWSignals() is called from checkSignalsAndUpdate(ThreadID tid) (line 1436) but receives no tid parameter. It then accesses fromIEW->iewInfo->resolvedCFIs without indexing by thread (line 1503), reading from iewInfo[0] implicitly instead of iewInfo[tid]. The subsequent dbpbtb calls hardcode tid=0 (lines 1540, 1542, 1544) instead of using the actual thread ID. This causes the resolve-update path to process the wrong thread's resolved CFIs and update the wrong thread's predictor state in multi-threaded execution.

Pass tid to handleIEWSignals(), access fromIEW->iewInfo[tid].resolvedCFIs, and use tid instead of 0 in the dbpbtb calls. Additionally, verify whether ResolvedCFIEntry should include ThreadID to preserve thread context at the entry level.

🤖 Prompt for AI Agents
In `@src/cpu/o3/fetch.cc` around lines 1537 - 1545, handleIEWSignals is using
iewInfo[0] and passing a hardcoded 0 to dbpbtb instead of the current thread;
update the resolve-update path to accept and use the ThreadID from
checkSignalsAndUpdate(ThreadID tid) by adding a tid parameter to
handleIEWSignals, read resolved CFIs from fromIEW->iewInfo[tid].resolvedCFIs,
and replace the hardcoded 0 with tid in the dbpbtb calls
(prepareResolveUpdateEntries, markCFIResolved, resolveUpdate); also consider
adding ThreadID into ResolvedCFIEntry if you need to preserve thread context
per-entry.

Comment on lines +303 to 305
if (abtb->isEnabled() && ftq.backId(tid)) {
auto previous_block_startpc = ftq.back(tid).startPC;
abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], previous_block_startpc);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid backId()/back() when the FTQ is empty.

ftq.backId(tid) underflows on empty queues, and ftq.back(tid) will be invalid. Guard on emptiness instead.

🛠️ Proposed fix
-        if (abtb->isEnabled() && ftq.backId(tid)) {
+        if (abtb->isEnabled() && !ftq.empty(tid)) {
             auto previous_block_startpc = ftq.back(tid).startPC;
             abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], previous_block_startpc);
         } else if (abtb->isEnabled()) {
             abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], 0);
         }
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 303 - 305, The current
check uses ftq.backId(tid) and ftq.back(tid) which underflow/are invalid when
the FTQ is empty; change the guard to test emptiness (e.g. !ftq.empty(tid) or
ftq.size(tid) > 0) before calling ftq.backId/tid or ftq.back(tid). Update the
conditional around abtb->isEnabled() to first ensure the FTQ has entries, then
obtain previous_block_startpc from ftq.back(tid) and call
abtb->updateUsingS3Pred(...).

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1727 -
This PR 2.1907 📈 +0.0181 (+0.83%)

✅ Difftest smoke test passed!

@tastynoob tastynoob added the perf label Feb 3, 2026
@github-actions
Copy link

github-actions bot commented Feb 3, 2026

🚀 Performance test triggered: spec06-0.8c

@XiangShanRobot
Copy link

[Generated by GEM5 Performance Robot]
commit: 7fc14cb
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 20.27 20.37 -0.48 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants