Conversation
📝 WalkthroughWalkthroughRefactors BTB/predictor and O3 fetch to be thread-aware: introduces per-thread FetchTargetQueue and per-thread predictor/fetch state, replaces FTQ "head" APIs with "fetching"/target-based APIs, renames predictor Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/cpu/pred/btb/common.hh (1)
452-484: Initialize the newtidmember in the default constructor.The newly added
tidfield is not initialized in the default constructor, which could lead to undefined behavior if read before being explicitly set. Other members likebbStartare initialized to0.🛡️ Proposed fix
FullBTBPrediction() : + tid(0), bbStart(0), btbEntries(),src/cpu/o3/fetch.cc (1)
632-638: Pass thetidparameter to per-thread FTQ accessors.The methods
ftqHasFetching()andftqFetchingTarget()indecoupled_bpred.hhrequire aThreadID tidparameter (lines 404-406), but these calls omit it. The variabletidis available in all these contexts, so update the calls to pass it:dbpbtb->ftqHasFetching(tid) dbpbtb->ftqFetchingTarget(tid)This same pattern appears in multiple locations: line 632-633, 785-786, 1760, and 2054 in
fetch.cc, as well as line 971 infetch.hh.
🤖 Fix all issues with AI agents
In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Line 381: Remove the stray extra semicolon after the assignment to s0PC:
change the line using threads[tid].finalPred.getTarget(predictWidth);; to a
single semicolon so it becomes threads[tid].finalPred.getTarget(predictWidth); —
ensure this occurs where s0PC is assigned (references: s0PC, threads, finalPred,
getTarget, predictWidth) and rebuild to confirm no warnings.
- Line 757: The entry's thread id is incorrectly hardcoded to 0; change the
assignment from setting entry.tid = 0 to using the function parameter (entry.tid
= tid) so the entry records the correct thread; this ensures subsequent calls
that use entry.tid (fillAheadPipeline and updateHistoryForPrediction) access the
correct per-thread state rather than always thread 0.
In `@src/cpu/pred/btb/decoupled_bpred.hh`:
- Line 150: scheduleThread() currently always returns 0 which forces
single-threaded behavior; update the scheduleThread implementation (declared in
decoupled_bpred.hh) to compute and return the correct ThreadID based on the
branch predictor's thread scheduling state (e.g., using the class's
last-scheduled thread counter, a thread queue, or the CPU/ThreadContext APIs
already used elsewhere in decoupled_bpred.cc TODO), making it multi-thread aware
and thread-safe rather than hard-coding 0.
In `@src/cpu/pred/btb/ftq.cc`:
- Around line 40-45: Add a guard/assertion in FetchTargetQueue::commitTarget to
ensure you never call queue[tid].cap.pop_front() on an empty deque: check that
queue[tid].cap is not empty (e.g., assert(!queue[tid].cap.empty()) or an if with
an error/log) before calling pop_front() and only increment
queue[tid].baseTargetId when the pop actually happens; reference the function
FetchTargetQueue::commitTarget and the members queue[tid].cap and
queue[tid].baseTargetId when adding this check.
In `@src/cpu/pred/btb/ftq.hh`:
- Line 46: The backId(ThreadID tid) method can underflow when
queue[tid].cap.size() is 0 because it computes baseTargetId + 0 - 1; update
backId in src/cpu/pred/btb/ftq.hh to guard against empty queues by checking
queue[tid].cap.empty(): either assert/throw a clear precondition (e.g.,
assert(!queue[tid].cap.empty()) or throw std::out_of_range) or return a defined
invalid/sentinel FetchTargetId value instead; reference the backId function,
queue[tid].baseTargetId and queue[tid].cap.size()/cap.empty() when making the
change.
🧹 Nitpick comments (3)
src/cpu/pred/btb/ftq.hh (1)
62-68: Consider renaminganyEmpty()toallEmpty()for clarity.The current implementation returns
trueonly when ALL queues are empty (i.e., no queue has any targets). The nameanyEmpty()suggests it would returntrueif ANY single queue is empty, which is the opposite behavior. Consider renaming toallEmpty()to match the semantics.src/cpu/pred/btb/decoupled_bpred.hh (1)
70-78: HardcodednumThreads = 2should be parameterized.The number of threads should be configurable via params rather than hardcoded, especially since
MaxThreadsis defined as a limit. Also consider removing the commented-out code rather than leaving it in the codebase.♻️ Suggested change
- // unsigned fetchTargetQueueSize; - // std::deque<FetchTarget> fetchTargetQueue; - // FetchTargetId fetchTargetBaseId{1}; // ID of fetchTargetQueue.front() - // FetchTargetId ftqId{1}; // next FSQ id to allocate (monotonic) - // FetchTargetId fetchHeadFtqId{1}; // next FSQ id to be consumed by fetch CPU *cpu; - const int numThreads = 2; + const int numThreads; // Set from params in constructorsrc/cpu/pred/btb/decoupled_bpred.cc (1)
27-30: Hardcoded thread ID needs to be addressed for SMT support.The
tid = 0is hardcoded with a TODO comment. For proper SMT support, this function should receiveThreadIDas a parameter, similar to other refactored functions in this PR.
| FetchTarget entry = createFetchTargetEntry(tid); | ||
|
|
||
| // 2. Update global PC state to target or fall-through | ||
| s0PC = threads[tid].finalPred.getTarget(predictWidth);; |
There was a problem hiding this comment.
Double semicolon.
Minor typo: there's an extra semicolon at the end of this line.
🧹 Proposed fix
- s0PC = threads[tid].finalPred.getTarget(predictWidth);;
+ s0PC = threads[tid].finalPred.getTarget(predictWidth);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| s0PC = threads[tid].finalPred.getTarget(predictWidth);; | |
| s0PC = threads[tid].finalPred.getTarget(predictWidth); |
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/decoupled_bpred.cc` at line 381, Remove the stray extra
semicolon after the assignment to s0PC: change the line using
threads[tid].finalPred.getTarget(predictWidth);; to a single semicolon so it
becomes threads[tid].finalPred.getTarget(predictWidth); — ensure this occurs
where s0PC is assigned (references: s0PC, threads, finalPred, getTarget,
predictWidth) and rebuild to confirm no warnings.
| void | ||
| FetchTargetQueue::commitTarget(ThreadID tid) | ||
| { | ||
| queue[tid].cap.pop_front(); | ||
| queue[tid].baseTargetId++; | ||
| } |
There was a problem hiding this comment.
Add assertion to guard against committing from an empty queue.
Calling pop_front() on an empty deque is undefined behavior. Add a guard to prevent this:
🛡️ Proposed fix
void
FetchTargetQueue::commitTarget(ThreadID tid)
{
+ assert(!empty(tid) && "Cannot commit from empty queue");
queue[tid].cap.pop_front();
queue[tid].baseTargetId++;
}🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/ftq.cc` around lines 40 - 45, Add a guard/assertion in
FetchTargetQueue::commitTarget to ensure you never call
queue[tid].cap.pop_front() on an empty deque: check that queue[tid].cap is not
empty (e.g., assert(!queue[tid].cap.empty()) or an if with an error/log) before
calling pop_front() and only increment queue[tid].baseTargetId when the pop
actually happens; reference the function FetchTargetQueue::commitTarget and the
members queue[tid].cap and queue[tid].baseTargetId when adding this check.
|
|
||
|
|
||
| inline FetchTargetId frontId(ThreadID tid) const { return queue[tid].baseTargetId; } | ||
| inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; } |
There was a problem hiding this comment.
backId() underflows on empty queue.
When cap.size() is 0, this computes baseTargetId + 0 - 1, which underflows the unsigned FetchTargetId. Consider adding a precondition or handling the empty case:
🛡️ Proposed fix
- inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; }
+ inline FetchTargetId backId(ThreadID tid) const {
+ assert(!empty(tid) && "backId() called on empty queue");
+ return queue[tid].baseTargetId + queue[tid].cap.size() - 1;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inline FetchTargetId backId(ThreadID tid) const { return queue[tid].baseTargetId + queue[tid].cap.size() - 1; } | |
| inline FetchTargetId backId(ThreadID tid) const { | |
| assert(!empty(tid) && "backId() called on empty queue"); | |
| return queue[tid].baseTargetId + queue[tid].cap.size() - 1; | |
| } |
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/ftq.hh` at line 46, The backId(ThreadID tid) method can
underflow when queue[tid].cap.size() is 0 because it computes baseTargetId + 0 -
1; update backId in src/cpu/pred/btb/ftq.hh to guard against empty queues by
checking queue[tid].cap.empty(): either assert/throw a clear precondition (e.g.,
assert(!queue[tid].cap.empty()) or throw std::out_of_range) or return a defined
invalid/sentinel FetchTargetId value instead; reference the backId function,
queue[tid].baseTargetId and queue[tid].cap.size()/cap.empty() when making the
change.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cpu/pred/btb/common.hh (1)
452-485:⚠️ Potential issue | 🟠 MajorInitialize
FullBTBPrediction::tidto avoid indeterminate values.Line 454 adds
tidbut the constructor doesn’t set it, so it can carry garbage until assigned. This is risky if any code reads it before initialization.🛠 Proposed fix
- FullBTBPrediction() : - bbStart(0), + FullBTBPrediction() : + tid(InvalidThreadID), + bbStart(0), btbEntries(), condTakens(), indirectTargets(), returnTarget(0), tageInfoForMgscs(), predSource(0), predTick(0), s1Source(-1), s3Source(-1) {}
🤖 Fix all issues with AI agents
In `@src/cpu/o3/fetch.cc`:
- Around line 783-787: The code computes ThreadID tid from inst->threadNumber
but then uses a hard-coded thread 0 for FTQ operations; change calls to use tid
instead of 0 so SMT threads access the correct FTQ: replace
dbpbtb->ftqHasFetching(0) with dbpbtb->ftqHasFetching(tid) and
dbpbtb->ftqFetchingTarget(0) with dbpbtb->ftqFetchingTarget(tid) (ensure tid is
the same ThreadID variable computed earlier).
In `@src/cpu/o3/fetch.hh`:
- Around line 969-972: The ftqEmpty() helper currently calls
dbpbtb->ftqHasFetching(0) with a hard-coded thread 0 which breaks SMT; change
ftqEmpty to accept a ThreadID (or tid) parameter and forward that to
dbpbtb->ftqHasFetching(tid) so the check is per-thread, and then update all call
sites in fetch.cc that call ftqEmpty() to pass the current thread id (e.g., the
local/current tid variable used in fetch code) so SMT fetch uses the correct FTQ
state.
In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Line 50: The FTQ is being constructed with a hardcoded thread count (ftq(2,
p.ftq_size)); change this to use the configured thread count variable
(numThreads) instead so the FTQ is initialized consistently (e.g.,
ftq(numThreads, p.ftq_size)). Locate the FTQ construction in the decoupled_bpred
constructor/initializer list where ftq(...) is used and replace the literal 2
with the numThreads identifier so the FTQ respects the current thread
configuration.
- Around line 635-640: The method DecoupledBPUWithBTB::blockPredictionOnce
currently hardcodes threads[0].blockPredictionPending = true which breaks SMT;
change the API to accept a ThreadID parameter (e.g., ThreadID tid) and set
threads[tid].blockPredictionPending = true, validating tid is in range, or
alternatively iterate all threads and set threads[i].blockPredictionPending =
true to block everyone; update all call sites of blockPredictionOnce() to pass
the ThreadID (or rely on the new all-threads behavior) and add bounds checks
where appropriate.
In `@src/cpu/pred/btb/ftq.hh`:
- Around line 62-68: The method anyEmpty() currently returns true only when
every queue is empty (it checks queue[i].cap.empty() for all threads), so rename
the function to allEmpty() to match its behavior (and update all callers/tests),
or alternatively change its implementation to return true as soon as any
queue[i].cap.empty() is found if you prefer the original anyEmpty name; locate
the symbol anyEmpty(), the loop over numThreads, and the queue[].cap.empty()
check to make the corrective change and adjust usages accordingly.
🧹 Nitpick comments (5)
src/cpu/pred/btb/decoupled_bpred_stats.cc (2)
801-808: Avoid copyingFetchTargethere; use a const reference.
ftq.get(...)returns aFetchTarget. Copying it can be expensive and can desync if callers expect the live entry.♻️ Proposed refactor
- auto entry = ftq.get(inst->ftqId, inst->threadNumber); + const auto &entry = ftq.get(inst->ftqId, inst->threadNumber);
913-924: Reduce double lookup by caching the FTQ entry reference.You call
ftq.get(...)twice; caching it avoids repeated lookups and keeps logging consistent with the increment.♻️ Proposed refactor
- ftq.get(inst->ftqId, inst->threadNumber).commitInstNum++; + auto &entry = ftq.get(inst->ftqId, inst->threadNumber); + entry.commitInstNum++; ... - ftq.get(inst->ftqId, inst->threadNumber).commitInstNum); + entry.commitInstNum);src/cpu/pred/btb/ftq.hh (1)
45-61: Missing bounds validation ontidparameter.All accessor methods use
tidto indexqueue[tid]without validation. Iftid >= numThreads, the code accesses uninitialized state (or causes out-of-bounds access iftid >= MaxThreads). Consider adding debug assertions.🛡️ Proposed fix example
inline FetchTargetId frontId(ThreadID tid) const { + assert(tid < numThreads && "Invalid thread ID"); return queue[tid].baseTargetId; }Apply similar checks to other methods or create a helper macro/inline function.
src/cpu/pred/btb/decoupled_bpred.cc (1)
24-31: Hardcodedtid = 0limits multi-threading support.The function doesn't accept
ThreadIDas a parameter and uses a hardcoded value. This should be addressed when completing SMT support.♻️ Suggested signature change
void -DecoupledBPUWithBTB::consumeFetchTarget(unsigned fetched_inst_num) +DecoupledBPUWithBTB::consumeFetchTarget(unsigned fetched_inst_num, ThreadID tid) { - ThreadID tid = 0; // TODO: multi-threading ftq.fetching(tid).fetchInstNum = fetched_inst_num; ftq.finishTarget(tid); }src/cpu/pred/btb/decoupled_bpred.hh (1)
78-78: HardcodednumThreadsshould be configurable.
numThreadsis hardcoded to 2 instead of being derived from simulation parameters. This limits flexibility and could cause inconsistencies if changed in one place but not others (e.g.,ftq(2, ...)in the constructor).♻️ Consider parameterizing
- const int numThreads = 2; + const int numThreads; // Initialize from params in constructorThen in constructor initialization list:
numThreads(p.numThreads),
| // uras(p.uras), | ||
| bpDBSwitches(p.bpDBSwitches), | ||
| numStages(p.numStages), | ||
| ftq(2, p.ftq_size), |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Hardcoded thread count in FTQ initialization.
The FTQ is initialized with ftq(2, p.ftq_size) but should use numThreads for consistency and maintainability.
♻️ Proposed fix
- ftq(2, p.ftq_size),
+ ftq(numThreads, p.ftq_size),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ftq(2, p.ftq_size), | |
| ftq(numThreads, p.ftq_size), |
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/decoupled_bpred.cc` at line 50, The FTQ is being constructed
with a hardcoded thread count (ftq(2, p.ftq_size)); change this to use the
configured thread count variable (numThreads) instead so the FTQ is initialized
consistently (e.g., ftq(numThreads, p.ftq_size)). Locate the FTQ construction in
the decoupled_bpred constructor/initializer list where ftq(...) is used and
replace the literal 2 with the numThreads identifier so the FTQ respects the
current thread configuration.
| void | ||
| DecoupledBPUWithBTB::blockPredictionOnce() | ||
| { | ||
| blockPredictionPending = true; | ||
| // smtTODO | ||
| threads[0].blockPredictionPending = true; | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
blockPredictionOnce() only affects thread 0.
This hardcodes thread 0, breaking SMT support. The function should accept ThreadID parameter or block all threads.
♻️ Proposed fix
void
-DecoupledBPUWithBTB::blockPredictionOnce()
+DecoupledBPUWithBTB::blockPredictionOnce(ThreadID tid)
{
- // smtTODO
- threads[0].blockPredictionPending = true;
+ threads[tid].blockPredictionPending = true;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void | |
| DecoupledBPUWithBTB::blockPredictionOnce() | |
| { | |
| blockPredictionPending = true; | |
| // smtTODO | |
| threads[0].blockPredictionPending = true; | |
| } | |
| void | |
| DecoupledBPUWithBTB::blockPredictionOnce(ThreadID tid) | |
| { | |
| threads[tid].blockPredictionPending = true; | |
| } |
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 635 - 640, The method
DecoupledBPUWithBTB::blockPredictionOnce currently hardcodes
threads[0].blockPredictionPending = true which breaks SMT; change the API to
accept a ThreadID parameter (e.g., ThreadID tid) and set
threads[tid].blockPredictionPending = true, validating tid is in range, or
alternatively iterate all threads and set threads[i].blockPredictionPending =
true to block everyone; update all call sites of blockPredictionOnce() to pass
the ThreadID (or rely on the new all-threads behavior) and add bounds checks
where appropriate.
| inline bool anyEmpty() const { | ||
| for (uint32_t i = 0; i < numThreads; ++i) { | ||
| if (!queue[i].cap.empty()) | ||
| return false; | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Misleading method name: anyEmpty() actually checks if ALL queues are empty.
The method returns true only when every queue is empty, which is semantically allEmpty(). The current name suggests it returns true if at least one queue is empty.
♻️ Proposed fix
- inline bool anyEmpty() const {
+ inline bool allEmpty() const {
for (uint32_t i = 0; i < numThreads; ++i) {
if (!queue[i].cap.empty())
return false;
}
return true;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inline bool anyEmpty() const { | |
| for (uint32_t i = 0; i < numThreads; ++i) { | |
| if (!queue[i].cap.empty()) | |
| return false; | |
| } | |
| return true; | |
| } | |
| inline bool allEmpty() const { | |
| for (uint32_t i = 0; i < numThreads; ++i) { | |
| if (!queue[i].cap.empty()) | |
| return false; | |
| } | |
| return true; | |
| } |
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/ftq.hh` around lines 62 - 68, The method anyEmpty()
currently returns true only when every queue is empty (it checks
queue[i].cap.empty() for all threads), so rename the function to allEmpty() to
match its behavior (and update all callers/tests), or alternatively change its
implementation to return true as soon as any queue[i].cap.empty() is found if
you prefer the original anyEmpty name; locate the symbol anyEmpty(), the loop
over numThreads, and the queue[].cap.empty() check to make the corrective change
and adjust usages accordingly.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
Change-Id: Ie21dbd89df9c1d5d80106d71ae4f9b51415be897
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/cpu/o3/fetch.cc`:
- Around line 459-461: The BPU seeding uses thread 0's PC only
(dbpbtb->resetPC(threads[0].fetchpc->instAddr())), which mis-initializes SMT
threads with different start PCs; update the reset logic to either add/use a
per-thread reset API (e.g., a resetPC method that accepts a thread id or
per-thread predictor instance) or iterate over all threads and call
dbpbtb->resetPC(threads[i].fetchpc->instAddr()) for each thread so every
thread's fetchpc is used to seed the predictor.
- Around line 1537-1545: handleIEWSignals is using iewInfo[0] and passing a
hardcoded 0 to dbpbtb instead of the current thread; update the resolve-update
path to accept and use the ThreadID from checkSignalsAndUpdate(ThreadID tid) by
adding a tid parameter to handleIEWSignals, read resolved CFIs from
fromIEW->iewInfo[tid].resolvedCFIs, and replace the hardcoded 0 with tid in the
dbpbtb calls (prepareResolveUpdateEntries, markCFIResolved, resolveUpdate); also
consider adding ThreadID into ResolvedCFIEntry if you need to preserve thread
context per-entry.
In `@src/cpu/pred/btb/decoupled_bpred.cc`:
- Around line 303-305: The current check uses ftq.backId(tid) and ftq.back(tid)
which underflow/are invalid when the FTQ is empty; change the guard to test
emptiness (e.g. !ftq.empty(tid) or ftq.size(tid) > 0) before calling
ftq.backId/tid or ftq.back(tid). Update the conditional around abtb->isEnabled()
to first ensure the FTQ has entries, then obtain previous_block_startpc from
ftq.back(tid) and call abtb->updateUsingS3Pred(...).
🧹 Nitpick comments (1)
src/cpu/o3/fetch.hh (1)
851-875: Clarify ownership ofFetchBuffer::datato avoid leaks.
datais allocated withnew[]in Fetch but there’s no corresponding cleanup. Consider explicit deletion inFetch::~Fetch()(or switch to RAII).🧹 Suggested fix (explicit cleanup in Fetch::~Fetch)
-Fetch::~Fetch() = default; +Fetch::~Fetch() +{ + for (ThreadID tid = 0; tid < numThreads; ++tid) { + delete[] threads[tid].data; + threads[tid].data = nullptr; + } +}
| assert(dbpbtb); | ||
| dbpbtb->resetPC(pc[0]->instAddr()); | ||
| dbpbtb->resetPC(threads[0].fetchpc->instAddr()); | ||
| } |
There was a problem hiding this comment.
SMT init: predictor PC seeded from thread 0 only.
Line 460 seeds the BPU using thread 0’s PC; other threads with distinct start PCs will be mis-initialized. Consider adding a per-thread resetPC API or looping per thread during reset.
🤖 Prompt for AI Agents
In `@src/cpu/o3/fetch.cc` around lines 459 - 461, The BPU seeding uses thread 0's
PC only (dbpbtb->resetPC(threads[0].fetchpc->instAddr())), which mis-initializes
SMT threads with different start PCs; update the reset logic to either add/use a
per-thread reset API (e.g., a resetPC method that accepts a thread id or
per-thread predictor instance) or iterate over all threads and call
dbpbtb->resetPC(threads[i].fetchpc->instAddr()) for each thread so every
thread's fetchpc is used to seed the predictor.
| if (!resolveQueue.empty()) { | ||
| auto &entry = resolveQueue.front(); | ||
| unsigned int stream_id = entry.resolvedFTQId; | ||
| dbpbtb->prepareResolveUpdateEntries(stream_id); | ||
| dbpbtb->prepareResolveUpdateEntries(stream_id, 0); | ||
| for (const auto resolvedInstPC : entry.resolvedInstPC) { | ||
| dbpbtb->markCFIResolved(stream_id, resolvedInstPC); | ||
| dbpbtb->markCFIResolved(stream_id, resolvedInstPC, 0); | ||
| } | ||
| bool success = dbpbtb->resolveUpdate(stream_id); | ||
| bool success = dbpbtb->resolveUpdate(stream_id, 0); | ||
| if (success) { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "resolvedCFIs" src/cpu/o3 -g'*.hh' -g'*.cc'Repository: OpenXiangShan/GEM5
Length of output: 471
🏁 Script executed:
rg -n "struct.*Resolved|Resolved.*CFI|ResolvedCFI" src/cpu/o3 -g'*.hh'Repository: OpenXiangShan/GEM5
Length of output: 198
🏁 Script executed:
# Also check the FetchBufferEntry or similar structure that contains resolvedCFIs
rg -n "resolvedInstPC\|resolvedCFIs" src/cpu/o3 -B 5 -A 5 -g'*.hh'Repository: OpenXiangShan/GEM5
Length of output: 44
🏁 Script executed:
cat -n src/cpu/o3/comm.hh | sed -n '240,270p'Repository: OpenXiangShan/GEM5
Length of output: 1155
🏁 Script executed:
rg -n "struct.*IEWInfo|IEWInfo.*{" src/cpu/o3 -A 20 -g'*.hh'Repository: OpenXiangShan/GEM5
Length of output: 177
🏁 Script executed:
cat -n src/cpu/o3/fetch.cc | sed -n '1495,1550p'Repository: OpenXiangShan/GEM5
Length of output: 2346
🏁 Script executed:
rg -n "fromIEW\s*=" src/cpu/o3/fetch.cc -B 3 -A 3Repository: OpenXiangShan/GEM5
Length of output: 396
🏁 Script executed:
rg -n "class Fetch" src/cpu/o3/fetch.hh -A 30 | head -50Repository: OpenXiangShan/GEM5
Length of output: 1498
🏁 Script executed:
rg -n "handleIEWSignals\|handleIEWSignals\(" src/cpu/o3 -g'*.hh' -g'*.cc' -B 2 -A 2Repository: OpenXiangShan/GEM5
Length of output: 44
🏁 Script executed:
rg -n "void handleIEWSignals|void.*handleIEW" src/cpu/o3/fetch.hh -A 2Repository: OpenXiangShan/GEM5
Length of output: 142
🏁 Script executed:
# Search for where handleIEWSignals is called
rg -n "handleIEWSignals" src/cpu/o3/fetch.cc -B 5 -A 2Repository: OpenXiangShan/GEM5
Length of output: 488
🏁 Script executed:
# Check if Fetch class has tid or thread-specific members
rg -n "tid\|threadId\|numThreads" src/cpu/o3/fetch.hh | head -20Repository: OpenXiangShan/GEM5
Length of output: 44
🏁 Script executed:
# Look at the full context around line 1496 to see if there's thread context
cat -n src/cpu/o3/fetch.cc | sed -n '1480,1510p'Repository: OpenXiangShan/GEM5
Length of output: 1243
🏁 Script executed:
# Find the function that contains line 1452 where handleIEWSignals is called
cat -n src/cpu/o3/fetch.cc | sed -n '1420,1460p'Repository: OpenXiangShan/GEM5
Length of output: 1360
🏁 Script executed:
# Also check the function signature
rg -n "bool.*tid\|void.*tid" src/cpu/o3/fetch.cc | grep -E "14[0-9]{2}|13[0-9]{2}|11[0-9]{2}" | head -5Repository: OpenXiangShan/GEM5
Length of output: 44
🏁 Script executed:
# Search backwards from line 1450 to find function definition
ast-grep --pattern 'void $_($_) { $$$ if (fetchStatus[tid] == WaitingCache && threads[tid].cacheReq.getOverallStatus() == AccessComplete) $$$ }'Repository: OpenXiangShan/GEM5
Length of output: 44
Wire correct thread ID through resolve-update path.
handleIEWSignals() is called from checkSignalsAndUpdate(ThreadID tid) (line 1436) but receives no tid parameter. It then accesses fromIEW->iewInfo->resolvedCFIs without indexing by thread (line 1503), reading from iewInfo[0] implicitly instead of iewInfo[tid]. The subsequent dbpbtb calls hardcode tid=0 (lines 1540, 1542, 1544) instead of using the actual thread ID. This causes the resolve-update path to process the wrong thread's resolved CFIs and update the wrong thread's predictor state in multi-threaded execution.
Pass tid to handleIEWSignals(), access fromIEW->iewInfo[tid].resolvedCFIs, and use tid instead of 0 in the dbpbtb calls. Additionally, verify whether ResolvedCFIEntry should include ThreadID to preserve thread context at the entry level.
🤖 Prompt for AI Agents
In `@src/cpu/o3/fetch.cc` around lines 1537 - 1545, handleIEWSignals is using
iewInfo[0] and passing a hardcoded 0 to dbpbtb instead of the current thread;
update the resolve-update path to accept and use the ThreadID from
checkSignalsAndUpdate(ThreadID tid) by adding a tid parameter to
handleIEWSignals, read resolved CFIs from fromIEW->iewInfo[tid].resolvedCFIs,
and replace the hardcoded 0 with tid in the dbpbtb calls
(prepareResolveUpdateEntries, markCFIResolved, resolveUpdate); also consider
adding ThreadID into ResolvedCFIEntry if you need to preserve thread context
per-entry.
| if (abtb->isEnabled() && ftq.backId(tid)) { | ||
| auto previous_block_startpc = ftq.back(tid).startPC; | ||
| abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], previous_block_startpc); |
There was a problem hiding this comment.
Avoid backId()/back() when the FTQ is empty.
ftq.backId(tid) underflows on empty queues, and ftq.back(tid) will be invalid. Guard on emptiness instead.
🛠️ Proposed fix
- if (abtb->isEnabled() && ftq.backId(tid)) {
+ if (abtb->isEnabled() && !ftq.empty(tid)) {
auto previous_block_startpc = ftq.back(tid).startPC;
abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], previous_block_startpc);
} else if (abtb->isEnabled()) {
abtb->updateUsingS3Pred(predsOfEachStage[numStages - 1], 0);
}🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/decoupled_bpred.cc` around lines 303 - 305, The current
check uses ftq.backId(tid) and ftq.back(tid) which underflow/are invalid when
the FTQ is empty; change the guard to test emptiness (e.g. !ftq.empty(tid) or
ftq.size(tid) > 0) before calling ftq.backId/tid or ftq.back(tid). Update the
conditional around abtb->isEnabled() to first ensure the FTQ has entries, then
obtain previous_block_startpc from ftq.back(tid) and call
abtb->updateUsingS3Pred(...).
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
|
🚀 Performance test triggered: spec06-0.8c |
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Do not merge, this pr cant build success
Summary by CodeRabbit