cpu-o3: using reverse ordered tick & refactor the stalls logic#756
cpu-o3: using reverse ordered tick & refactor the stalls logic#756
Conversation
Change-Id: I9b599a4e0d704215ad1a3bf543dbd075384fe1f4
📝 WalkthroughWalkthroughThis change comprehensively refactors the O3 pipeline's inter-stage buffering and stall signaling architecture. It introduces a centralized StallSignals structure, replaces multiple per-thread queue implementations with unified circular_buffer approach across commit, decode, IEW, and rename stages, simplifies SMT fetch policy to single-thread selection, reorders CPU tick execution (commit→IEW→rename→decode→fetch), and consolidates stall/activation control flow while removing legacy status tracking mechanisms. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
src/cpu/o3/fetch.hh (1)
634-641:⚠️ Potential issue | 🟡 MinorRemove dead
Stallsstruct from fetch header; clarify incomplete drain handling.The
Stallsstruct at lines 634-638 is unused sincestalls[MaxThreads]was replaced byStallSignals* stallSig. Unlike theRenamestage which still actively uses itsStallsstruct, fetch only referencesstallSigwith itsblockFetchmember. The struct definition can be removed.Additionally,
StallSignalsprovides nodrainequivalent, anddrainStall()in fetch.cc is an empty stub. This suggests drain handling was intentionally removed or remains incomplete. Clarify whether drain functionality is intentional or needs implementation. ThestallSigpointer initialization viasetStallSignals()called during CPU construction is safe and does not require nullptr initialization.src/cpu/o3/commit.cc (1)
1789-1857:⚠️ Potential issue | 🟠 MajorProtect fixedbuffer intake when prior batch hasn’t drained.
getInsts()assumes the per-thread fixedbuffer is empty whenever rename delivers a new batch. If that invariant ever breaks, the new batch is mixed with older entries. A runtime guard makes this robust even in non-assert builds.🛠️ Suggested guard
- ThreadID tid = insts_from_rename > 0 ? fromRename->insts[0]->threadNumber : -1; - if (tid != -1) assert(fixedbuffer[tid].empty()); + ThreadID tid = insts_from_rename > 0 ? fromRename->insts[0]->threadNumber + : InvalidThreadID; + if (tid != InvalidThreadID && !fixedbuffer[tid].empty()) { + stallSig->blockIEW[tid] = true; + DPRINTF(Commit, "[tid:%i] Fixedbuffer not drained; deferring rename intake\n", tid); + return; + }src/cpu/o3/rename.cc (2)
652-675:⚠️ Potential issue | 🟠 MajorRename stage never deactivates.
any_unblockingis hardcoded totrue, so the stage stays Active even when idle, preventing CPU idling. Compute it from stall signals/buffer state.🛠️ Suggested fix
- bool any_unblocking = true; + bool any_unblocking = false; + for (ThreadID tid = 0; tid < numThreads; ++tid) { + if (!stallSig->blockDecode[tid] && !fixedbuffer[tid].empty()) { + any_unblocking = true; + break; + } + }
477-606:⚠️ Potential issue | 🟡 MinorFix DPRINTF format/argument mismatch causing undefined behavior.
The format string at lines 501-502 has 2 specifiers (
%llu,%s) but 3 arguments are passed (tid,inst->seqNum,inst->pcState()). Thetidargument is missing from the format string, which causes the printf-style formatter to misalign arguments and trigger undefined behavior.Suggested fix
- DPRINTF(Rename, "[sn:%llu] instruction with PC %s is squashed, skipping.\n", - tid, inst->seqNum, inst->pcState()); + DPRINTF(Rename, "[tid:%i] [sn:%llu] instruction with PC %s is squashed, skipping.\n", + tid, inst->seqNum, inst->pcState());src/cpu/o3/decode.cc (1)
336-371:⚠️ Potential issue | 🟡 MinorupdateActivate is effectively never called.
status_changeis never set intick(), soupdateActivate()won’t reflect block/unblock transitions. Call it unconditionally or computestatus_changefrom stall-signal changes.🛠️ Minimal fix
- if (status_change) { - updateActivate(); - } + updateActivate();Also applies to: 464-469
src/cpu/o3/iew.hh (1)
556-559:⚠️ Potential issue | 🟡 MinorTypo in comment: "fetrenamech" should be "rename".
Proposed fix
- /** Distribution of number of fetrenamech stall reasons each tick. */ + /** Distribution of number of rename stall reasons each tick. */
🤖 Fix all issues with AI agents
In `@src/cpu/o3/comm.hh`:
- Around line 328-335: The StallSignals struct's bool arrays (blockFetch,
blockDecode, blockRename, blockIEW) are left uninitialized causing undefined
behavior; update StallSignals to value-initialize these arrays (e.g., add
default member initializers or a constructor that sets all elements to false) so
every entry for MaxThreads is explicitly false on construction, ensuring any
reader sees deterministic values.
In `@src/cpu/o3/commit.hh`:
- Around line 179-182: Initialize the raw pointer member stallSig to nullptr in
the commit class declaration and add a direct include for
<boost/circular_buffer.hpp> to this header; specifically, in commit.hh set
StallSignals* stallSig = nullptr; (so uses in commit.cc at symbols referencing
stallSig are safe) and add `#include` <boost/circular_buffer.hpp> near other
includes so fixedbuffer (boost::circular_buffer<DynInstPtr>
fixedbuffer[MaxThreads]) no longer relies on a transitive include.
In `@src/cpu/o3/cpu.hh`:
- Around line 529-546: Remove the redundant member declaration stallSig from the
class (it duplicates the existing StallSignals stallSignals and is never used);
delete the line declaring StallSignals stallSig; and ensure no other code relies
on that member (the stages use setStallSignals() and their own stallSig
members), then rebuild to verify there are no references to the removed symbol.
In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Decode::clearStates currently does nothing but must purge
any per-thread buffered state for the given ThreadID to avoid stale entries when
TIDs are reused; implement it to remove/clear entries belonging to tid from
Decode's per-thread containers (e.g., fixedBuffer and stallBuffer) and reset any
other per-thread bookkeeping used by Decode so removeThread() no longer leaves
stale instructions. Locate Decode::clearStates and ensure it iterates the
fixedBuffer and stallBuffer structures, erasing elements whose ThreadID matches
the tid argument (and clearing any related per-thread maps/counters), leaving
other threads' data intact.
In `@src/cpu/o3/decode.hh`:
- Around line 186-187: The member pointer stallSig is declared uninitialized
(StallSignals* stallSig;) and should be set to a null pointer to avoid
indeterminate-value access before setStallSignals is called; initialize stallSig
to nullptr at declaration or assign nullptr in the class constructor (the class
that contains stallSig and the setStallSignals method) so all code paths see a
well-defined value until setStallSignals populates it.
In `@src/cpu/o3/fetch.cc`:
- Around line 1294-1325: The code hardcodes ThreadID tid = 0 which disables SMT;
replace this with the proper thread selection logic (use selectFetchThread() or
the same selection used by sendInstructionsToDecode()) so tid is set to the
chosen thread before checking stallSig->blockFetch[tid] and pulling from
fetchQueue[tid]; if SMT selection is not yet implemented, add a fail-fast
(ASSERT/ fatal) when numThreads > 1 to avoid silent starvation and document that
selectFetchThread()/sendInstructionsToDecode() must be used to pick tid.
In `@src/cpu/o3/iew.cc`:
- Around line 820-837: The loop currently sets stallSig->blockRename for both
threads when multiple active threads are found (using tid,
stallSig->blockRename, fixedbuffer, canInsertLDSTQue), which stalls all SMT
threads; change it to allow exactly one active thread to proceed (e.g., pick a
single thread to keep blockRename=false and set blockRename=true for all other
active threads) and implement a simple selection policy (round-robin using a
persistent nextThread index or a priority tie-breaker) so that when scanning
fixedbuffer[] you assign tid to the chosen thread and mark every other active
thread's stallSig->blockRename true instead of blocking both.
- Line 429: The IEW→Commit timing is broken because toCommit is reading
iewQueue->getWire(0) while Commit expects the data at offset -iewToCommitDelay;
restore the intended zero-latency alignment by changing the producer read to use
the negative delay: set toCommit = iewQueue->getWire(-iewToCommitDelay)
(symbols: toCommit, iewQueue, getWire, iewToCommitDelay, fromIEW, Commit) so
both IEW and Commit use the same buffer offset convention.
- Around line 798-811: The issue is that canInsertLDSTQue(ThreadID) performs
side-effectful calls getAndResetLastLQPopEntries/getAndResetLastSQPopEntries
which are only executed when stallSig->blockIEW[i] is true (due to short-circuit
&&), so counters aren't reset otherwise; fix by invoking the reset logic
unconditionally and then using its result in the block condition: call a new or
existing getter that performs
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries (or call
canInsertLDSTQue(i) and store its bool result) before computing bool block =
stallSig->blockIEW[i] && !... so that the reset side-effects always run, then
use the stored boolean in the original condition (or refactor canInsertLDSTQue
into pure check + separate reset function and call reset unconditionally in
dispatchInsts()).
- Around line 724-748: When detecting a squash in the loop
(fromCommit->commitInfo[i].squash) clear fixedbuffer[i] before calling squash(i)
and returning so any stale instructions are discarded; specifically, move or add
fixedbuffer[i].clear() immediately when fromCommit->commitInfo[i].squash is true
(before squash(i) and the return) because IEW::squash() does not clear the
buffer unlike decode/rename; keep the existing clear for the robSquashing path
unchanged.
- Around line 118-119: The postfix increment and decrement operators for the
wire class are returning references to temporaries; change the signatures of
wire::operator++(int) and wire::operator--(int) to return by value (wire)
instead of wire& and update their definitions accordingly so they return the
temporary (wire(this, i)) by value; ensure both the declaration in the class and
the out-of-line definitions (operator++(int) and operator--(int)) are adjusted
to match the new return type.
- Around line 755-759: The assert fixedbuffer[tid].empty() can fire legitimately
when dispatch stalls; change it to tolerate non-empty buffers by replacing the
unconditional assert with a conditional that either clears retained instructions
or asserts only if blockRename[tid] is false. Specifically, in the block
handling insts_from_rename/fromRename, replace "if (tid != -1)
assert(fixedbuffer[tid].empty());" with logic that: if tid != -1 and
fixedbuffer[tid] is not empty, then either call a clearing/rollback helper to
remove the newly moved instructions (so moveInstsToBuffer() leaves no leftovers)
or assert(!blockRename[tid]) (i.e., only assert emptiness when blockRename[tid]
is not set); reference functions/fields: insts_from_rename, fromRename->insts,
fixedbuffer, IEW::tick(), dispatchInsts(), blockRename, moveInstsToBuffer().
In `@src/cpu/o3/iew.hh`:
- Line 157: Initialize the raw pointer StallSignals* stallSig to nullptr (e.g.,
in IEW's member initializer or declaration) and update uses to check/assert
before dereferencing: add a null-check or assertion at the start of methods that
use stallSig (notably squash and squashDueToBranch) and in any code paths that
might run before setStallSignals() is called; ensure setStallSignals() still
assigns the pointer when available.
- Line 51: Remove the unused boost include: delete the line with `#include`
<boost/circular_buffer.hpp> from iew.hh because this header is not referenced in
this file (the container used is fixedbuffer declared as
std::deque<DynInstPtr>); ensure no other symbols in iew.hh depend on
boost::circular_buffer after removal and run a quick build to confirm no missing
includes.
In `@src/cpu/o3/regfile.hh`:
- Around line 423-427: The debug print in the VecPredRegClass case uses the
wrong regClass instance: replace the call to
vectorRegFile.regClass.valString(val) with
vecPredRegFile.regClass.valString(val) so the DPRINTF reflects the same register
file that vecPredRegFile.set(idx, val) writes to; update the VecPredRegClass
case handling (the DPRINTF line) to use vecPredRegFile.regClass.valString(val).
In `@src/cpu/o3/rename.cc`:
- Around line 389-410: The bug is that releaseSeq is computed using
historyBuffer->empty() which only checks thread 0; in Rename::releasePhysRegs
you must check the current thread’s history buffer (use
historyBuffer[tid].empty()) and, when non-empty, use
historyBuffer[tid].back().instSeqNum to set releaseSeq; update the assignment
inside the loop where fromCommit->commitInfo[tid].doneSeqNum is handled so it
references historyBuffer[tid] instead of historyBuffer.
🧹 Nitpick comments (5)
src/cpu/o3/commit.cc (1)
168-187: Prefer fatal_if for the renameToROBDelay invariant.This constraint is fundamental to the new buffering path; enforcing it in release builds avoids silent misconfigurations.
🔧 Suggested change
- assert(renameToROBDelay == 1); + fatal_if(renameToROBDelay != 1, + "renameToROBDelay must be 1 for O3 commit (got %d).", + renameToROBDelay);src/cpu/o3/rename.cc (1)
76-84: Prefer fatal_if for the decodeToRenameDelay invariant.This is a hard requirement for the refactor; enforcing it in release builds avoids silent misconfigurations.
🔧 Suggested change
- assert(decodeToRenameDelay == 1); + fatal_if(decodeToRenameDelay != 1, + "decodeToRenameDelay must be 1 for O3 rename (got %d).", + decodeToRenameDelay);src/cpu/o3/iew.hh (1)
391-391: Consider renamingfixedbufferto something more descriptive.The name
fixedbufferis vague — it doesn't convey that this is the per-thread dispatch buffer between rename and IEW. A name likerenameBufferordispatchBufferwould improve readability and align with the role described in the AI summary. Also,std::dequeis not "fixed" in any meaningful sense.src/cpu/o3/iew.cc (2)
896-901: Commented-out serialize stall check should be removed or restored.This entire block is commented out with no explanation. If the serialize handling is intentionally removed as part of this refactor, delete the dead code. If it's temporarily disabled for debugging, add a
TODOorFIXMEcomment explaining why.
1650-1653: Tick ordering:moveInstsToBuffer()beforecheckSquash()may process instructions that should be squashed.In the reverse-ordered tick,
moveInstsToBuffer()runs first (line 1651), populatingfixedbufferwith instructions from rename. ThencheckSquash()runs (line 1652), potentially discovering a squash from commit. As noted in a separate comment,checkSquashdoesn't clearfixedbufferon the squash path. Even if it did, the ordering means one wasted cycle of work moving instructions that will immediately be discarded. Consider swapping the order socheckSquash()runs first.Proposed reordering
// dispatch - moveInstsToBuffer(); checkSquash(); + moveInstsToBuffer(); dispatchInsts();
| struct StallSignals | ||
| { | ||
|
|
||
| bool blockFetch[MaxThreads];// decode to fetch | ||
| bool blockDecode[MaxThreads];// rename to decode | ||
| bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled) | ||
| bool blockIEW[MaxThreads];// commit to iew | ||
| }; |
There was a problem hiding this comment.
StallSignals members are left uninitialized by default — risk of undefined behavior.
The bool arrays have no default member initializers and no constructor, so they contain indeterminate values until explicitly written. Any stage that reads a stall signal before the first write will observe undefined behavior.
Add default member initializers or a constructor:
🛡️ Proposed fix — value-initialize all arrays
struct StallSignals
{
-
- bool blockFetch[MaxThreads];// decode to fetch
- bool blockDecode[MaxThreads];// rename to decode
- bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled)
- bool blockIEW[MaxThreads];// commit to iew
+ bool blockFetch[MaxThreads]{}; // decode to fetch
+ bool blockDecode[MaxThreads]{}; // rename to decode
+ bool blockRename[MaxThreads]{}; // iew to rename
+ bool blockIEW[MaxThreads]{}; // commit to iew
};🤖 Prompt for AI Agents
In `@src/cpu/o3/comm.hh` around lines 328 - 335, The StallSignals struct's bool
arrays (blockFetch, blockDecode, blockRename, blockIEW) are left uninitialized
causing undefined behavior; update StallSignals to value-initialize these arrays
(e.g., add default member initializers or a constructor that sets all elements
to false) so every entry for MaxThreads is explicitly false on construction,
ensuring any reader sees deterministic values.
| boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads]; | ||
|
|
||
| StallSignals* stallSig; | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check includes in commit.hh
echo "=== Includes in commit.hh (lines 44-69) ==="
sed -n '44,69p' src/cpu/o3/commit.hh
echo ""
echo "=== Lines around 179-182 in commit.hh ==="
sed -n '175,185p' src/cpu/o3/commit.hh
echo ""
echo "=== Check for stallSig declaration and initialization ==="
rg -n 'stallSig' src/cpu/o3/commit.hh src/cpu/o3/commit.cc
echo ""
echo "=== Commit constructor signature and initialization ==="
rg -n -A 50 'Commit::Commit' src/cpu/o3/commit.cc | head -80Repository: OpenXiangShan/GEM5
Length of output: 5937
🏁 Script executed:
#!/bin/bash
# Check fixedbuffer initialization in constructor body
echo "=== Search for fixedbuffer initialization ==="
rg -n -B 5 -A 5 'fixedbuffer' src/cpu/o3/commit.cc
echo ""
echo "=== Check full constructor initialization list ==="
sed -n '103,200p' src/cpu/o3/commit.cc | grep -E '(fixedbuffer|stallSig|:.*\()'Repository: OpenXiangShan/GEM5
Length of output: 2935
Initialize stallSig to nullptr and add direct include for boost::circular_buffer.
The raw pointer stallSig is declared uninitialized and later used directly in multiple places (lines 1807, 1813, 1814, 1855 in commit.cc) without null-checks. It should be initialized to nullptr.
Also, <boost/circular_buffer.hpp> is not directly included here—only transitively available via decode.hh. Adding a direct include would be more robust.
Note: fixedbuffer is already properly initialized with capacity in the constructor (line 186: fixedbuffer[tid] = boost::circular_buffer<DynInstPtr>(renameWidth)), so no changes needed there.
Proposed fix
boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];
- StallSignals* stallSig;
+ StallSignals* stallSig = nullptr;And add to the includes section:
`#include` <boost/circular_buffer.hpp>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads]; | |
| StallSignals* stallSig; | |
| boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads]; | |
| StallSignals* stallSig = nullptr; |
🤖 Prompt for AI Agents
In `@src/cpu/o3/commit.hh` around lines 179 - 182, Initialize the raw pointer
member stallSig to nullptr in the commit class declaration and add a direct
include for <boost/circular_buffer.hpp> to this header; specifically, in
commit.hh set StallSignals* stallSig = nullptr; (so uses in commit.cc at symbols
referencing stallSig are safe) and add `#include` <boost/circular_buffer.hpp> near
other includes so fixedbuffer (boost::circular_buffer<DynInstPtr>
fixedbuffer[MaxThreads]) no longer relies on a transitive include.
| StallSignals stallSignals; | ||
|
|
||
| /** The main time buffer to do backwards communication. */ | ||
| TimeBuffer<TimeStruct> timeBuffer; | ||
|
|
||
| /** The fetch stage's instruction queue. */ | ||
| TimeBuffer<FetchStruct> fetchQueue; | ||
| TimeBuffer<FetchStruct> fetchTimebuffer; | ||
|
|
||
| /** The decode stage's instruction queue. */ | ||
| TimeBuffer<DecodeStruct> decodeQueue; | ||
| TimeBuffer<DecodeStruct> decodeTimebuffer; | ||
|
|
||
| /** The rename stage's instruction queue. */ | ||
| TimeBuffer<RenameStruct> renameQueue; | ||
| TimeBuffer<RenameStruct> renameTimebuffer; | ||
|
|
||
| /** The IEW stage's instruction queue. */ | ||
| TimeBuffer<IEWStruct> iewQueue; | ||
| TimeBuffer<IEWStruct> iewTimebuffer; | ||
|
|
||
| StallSignals stallSig; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check which StallSignals member is actually used in cpu.cc
echo "=== StallSignals usage in cpu.cc ==="
rg -n 'stallSignals|stallSig|setStallSignals' --type=cpp -g '*cpu.cc'
echo ""
echo "=== StallSignals declarations in cpu.hh ==="
rg -n 'StallSignals' --type=cpp -g '*cpu.hh'Repository: OpenXiangShan/GEM5
Length of output: 566
🏁 Script executed:
#!/bin/bash
# Search entire codebase for stallSig usage
echo "=== All references to stallSig in the entire codebase ==="
rg -n 'stallSig' --type=cpp
echo ""
echo "=== All references to stallSignals in the entire codebase ==="
rg -n 'stallSignals' --type=cpp | head -30Repository: OpenXiangShan/GEM5
Length of output: 4386
Remove unused stallSig member on line 546.
The stallSig member declared at line 546 is unused dead code. All stages correctly receive stallSignals (line 529) via setStallSignals() in cpu.cc (lines 199–203). Each stage stores this pointer in their own local stallSig member; the CPU's duplicate stallSig is never accessed. Remove line 546 to eliminate confusion about which member serves as the shared stall signal object.
🤖 Prompt for AI Agents
In `@src/cpu/o3/cpu.hh` around lines 529 - 546, Remove the redundant member
declaration stallSig from the class (it duplicates the existing StallSignals
stallSignals and is never used); delete the line declaring StallSignals
stallSig; and ensure no other code relies on that member (the stages use
setStallSignals() and their own stallSig members), then rebuild to verify there
are no references to the removed symbol.
| Decode::clearStates(ThreadID tid) | ||
| { | ||
| decodeStatus[tid] = Idle; | ||
| stalls[tid].rename = false; | ||
|
|
||
| } |
There was a problem hiding this comment.
clearStates should purge per-thread buffers.
removeThread() relies on decode.clearStates(), but this is now empty; stale instructions can remain in fixedbuffer/stallBuffer if the TID is reused.
🛠️ Suggested fix
void
Decode::clearStates(ThreadID tid)
{
-
+ fixedbuffer[tid].clear();
+
+ auto delIt = stallBuffer.begin();
+ for (auto it0 = eachstallSize.begin(); it0 != eachstallSize.end();) {
+ int size = *it0;
+ auto start_it = delIt;
+ auto end_it = start_it + size;
+ if ((*start_it)->threadNumber == tid) {
+ delIt = stallBuffer.erase(start_it, end_it);
+ it0 = eachstallSize.erase(it0);
+ } else {
+ delIt = end_it;
+ ++it0;
+ }
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Decode::clearStates(ThreadID tid) | |
| { | |
| decodeStatus[tid] = Idle; | |
| stalls[tid].rename = false; | |
| } | |
| Decode::clearStates(ThreadID tid) | |
| { | |
| fixedbuffer[tid].clear(); | |
| auto delIt = stallBuffer.begin(); | |
| for (auto it0 = eachstallSize.begin(); it0 != eachstallSize.end();) { | |
| int size = *it0; | |
| auto start_it = delIt; | |
| auto end_it = start_it + size; | |
| if ((*start_it)->threadNumber == tid) { | |
| delIt = stallBuffer.erase(start_it, end_it); | |
| it0 = eachstallSize.erase(it0); | |
| } else { | |
| delIt = end_it; | |
| +it0; | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@src/cpu/o3/decode.cc` around lines 112 - 115, Decode::clearStates currently
does nothing but must purge any per-thread buffered state for the given ThreadID
to avoid stale entries when TIDs are reused; implement it to remove/clear
entries belonging to tid from Decode's per-thread containers (e.g., fixedBuffer
and stallBuffer) and reset any other per-thread bookkeeping used by Decode so
removeThread() no longer leaves stale instructions. Locate Decode::clearStates
and ensure it iterates the fixedBuffer and stallBuffer structures, erasing
elements whose ThreadID matches the tid argument (and clearing any related
per-thread maps/counters), leaving other threads' data intact.
| StallSignals* stallSig; | ||
|
|
There was a problem hiding this comment.
stallSig pointer should be initialized to nullptr.
Without initialization, stallSig holds an indeterminate value until setStallSignals is called, risking undefined behavior on any early access.
Proposed fix
- StallSignals* stallSig;
+ StallSignals* stallSig = nullptr;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| StallSignals* stallSig; | |
| StallSignals* stallSig = nullptr; |
🤖 Prompt for AI Agents
In `@src/cpu/o3/decode.hh` around lines 186 - 187, The member pointer stallSig is
declared uninitialized (StallSignals* stallSig;) and should be set to a null
pointer to avoid indeterminate-value access before setStallSignals is called;
initialize stallSig to nullptr at declaration or assign nullptr in the class
constructor (the class that contains stallSig and the setStallSignals method) so
all code paths see a well-defined value until setStallSignals populates it.
| // check threads stall & status | ||
| ThreadID tid = InvalidThreadID; | ||
| for (int i = 0; i < numThreads; i++) { | ||
| bool block = stallSig->blockIEW[i] && !canInsertLDSTQue(i); | ||
| bool active = !block && !fixedbuffer[i].empty(); | ||
|
|
||
| stallSig->blockRename[i] = block; | ||
| if (active) { | ||
| if (tid == InvalidThreadID) tid = i; | ||
| else { | ||
| // if there are multiple active threads, must exhaust all threads first | ||
| // to avoid starvation of other threads and also avoid resource conflict | ||
| stallSig->blockRename[tid] = true; | ||
| stallSig->blockRename[i] = true; | ||
| DPRINTF(IEW, "Multiple active threads detected, blocking all threads\n"); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
SMT multi-thread handling blocks all threads — may cause livelock or severe throughput loss.
When multiple threads are both active (!block && !fixedbuffer[tid].empty()), lines 832-833 set blockRename for both threads, effectively stalling all dispatch. The comment says "must exhaust all threads first to avoid starvation" but this creates a scenario where neither thread can make progress if both consistently have instructions ready. This seems overly conservative for SMT.
Consider implementing a round-robin or priority-based selection instead of blocking both threads.
🤖 Prompt for AI Agents
In `@src/cpu/o3/iew.cc` around lines 820 - 837, The loop currently sets
stallSig->blockRename for both threads when multiple active threads are found
(using tid, stallSig->blockRename, fixedbuffer, canInsertLDSTQue), which stalls
all SMT threads; change it to allow exactly one active thread to proceed (e.g.,
pick a single thread to keep blockRename=false and set blockRename=true for all
other active threads) and implement a simple selection policy (round-robin using
a persistent nextThread index or a priority tie-breaker) so that when scanning
fixedbuffer[] you assign tid to the chosen thread and mark every other active
thread's stallSig->blockRename true instead of blocking both.
| #include <set> | ||
| #include <vector> | ||
|
|
||
| #include <boost/circular_buffer.hpp> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if boost::circular_buffer is used anywhere in iew.hh or iew.cc
rg -n 'circular_buffer' src/cpu/o3/iew.hh src/cpu/o3/iew.ccRepository: OpenXiangShan/GEM5
Length of output: 119
🏁 Script executed:
#!/bin/bash
# Check if circular_buffer is used anywhere else in the o3 directory or related files
rg -n 'circular_buffer' src/cpu/o3/ --type h --type cc
# Also verify fixedbuffer declaration to confirm it's std::deque
rg -n 'fixedbuffer' src/cpu/o3/iew.hh | head -5Repository: OpenXiangShan/GEM5
Length of output: 161
🏁 Script executed:
#!/bin/bash
# Search for circular_buffer more broadly to ensure it's not used anywhere
rg 'circular_buffer' src/cpu/o3/Repository: OpenXiangShan/GEM5
Length of output: 1260
Remove unused #include <boost/circular_buffer.hpp>.
fixedbuffer is declared as std::deque<DynInstPtr> (line 391) in this file. While boost::circular_buffer is used in other o3 header files (rename.hh, decode.hh, lsq_unit.hh, commit.hh), it is not used in iew.hh. This include is a leftover and should be removed.
Proposed fix
-#include <boost/circular_buffer.hpp>
-📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include <boost/circular_buffer.hpp> |
🤖 Prompt for AI Agents
In `@src/cpu/o3/iew.hh` at line 51, Remove the unused boost include: delete the
line with `#include` <boost/circular_buffer.hpp> from iew.hh because this header
is not referenced in this file (the container used is fixedbuffer declared as
std::deque<DynInstPtr>); ensure no other symbols in iew.hh depend on
boost::circular_buffer after removal and run a quick build to confirm no missing
includes.
| ProbePointArg<DynInstPtr> *ppToCommit; | ||
|
|
||
| bool disp_stall = false; | ||
| StallSignals* stallSig; |
There was a problem hiding this comment.
stallSig raw pointer is not initialized in the constructor.
stallSig is a raw StallSignals* with no default initializer. If any code path in IEW executes before setStallSignals() is called (e.g., during squash or squashDueToBranch), dereferencing stallSig will cause a null-pointer dereference. Consider initializing it to nullptr here and adding a guard or assertion before first use.
Proposed fix
- StallSignals* stallSig;
+ StallSignals* stallSig = nullptr;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| StallSignals* stallSig; | |
| StallSignals* stallSig = nullptr; |
🤖 Prompt for AI Agents
In `@src/cpu/o3/iew.hh` at line 157, Initialize the raw pointer StallSignals*
stallSig to nullptr (e.g., in IEW's member initializer or declaration) and
update uses to check/assert before dereferencing: add a null-check or assertion
at the start of methods that use stallSig (notably squash and squashDueToBranch)
and in any code paths that might run before setStallSignals() is called; ensure
setStallSignals() still assigns the pointer when available.
| case VecPredRegClass: | ||
| DPRINTF(IEW, "RegFile: Setting predicate register %i to %s\n", | ||
| DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n", | ||
| idx, vectorRegFile.regClass.valString(val)); | ||
| vecPredRegFile.set(idx, val); | ||
| break; |
There was a problem hiding this comment.
Pre-existing bug: wrong regClass used for predicate register value formatting.
Line 425 uses vectorRegFile.regClass.valString(val) to format a VecPredRegClass value, but it should use vecPredRegFile.regClass.valString(val) to match the actual register file being written on line 426.
🐛 Proposed fix
case VecPredRegClass:
DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n",
- idx, vectorRegFile.regClass.valString(val));
+ idx, vecPredRegFile.regClass.valString(val));
vecPredRegFile.set(idx, val);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case VecPredRegClass: | |
| DPRINTF(IEW, "RegFile: Setting predicate register %i to %s\n", | |
| DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n", | |
| idx, vectorRegFile.regClass.valString(val)); | |
| vecPredRegFile.set(idx, val); | |
| break; | |
| case VecPredRegClass: | |
| DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n", | |
| idx, vecPredRegFile.regClass.valString(val)); | |
| vecPredRegFile.set(idx, val); | |
| break; |
🤖 Prompt for AI Agents
In `@src/cpu/o3/regfile.hh` around lines 423 - 427, The debug print in the
VecPredRegClass case uses the wrong regClass instance: replace the call to
vectorRegFile.regClass.valString(val) with
vecPredRegFile.regClass.valString(val) so the DPRINTF reflects the same register
file that vecPredRegFile.set(idx, val) writes to; update the VecPredRegClass
case handling (the DPRINTF line) to use vecPredRegFile.regClass.valString(val).
| void | ||
| Rename::releasePhysRegs() | ||
| { | ||
| // Release physical registers up to releaseWidth | ||
| auto threads = activeThreads->begin(); | ||
| if (releaseSeq + releaseWidth < finalCommitSeq) { | ||
| releaseSeq += releaseWidth; | ||
| } else { | ||
| releaseSeq = finalCommitSeq; | ||
| } | ||
|
|
||
| while (threads != end) { | ||
| while (threads != activeThreads->end()) { | ||
| ThreadID tid = *threads++; | ||
|
|
||
| removeFromHistory(releaseSeq, tid); | ||
|
|
||
| // If we committed this cycle then doneSeqNum will be > 0 | ||
| if (fromCommit->commitInfo[tid].doneSeqNum != 0 && | ||
| !fromCommit->commitInfo[tid].squash && | ||
| renameStatus[tid] != Squashing) { | ||
| !fromCommit->commitInfo[tid].squash) { | ||
|
|
||
| finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum; | ||
| releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum; | ||
| } | ||
| } |
There was a problem hiding this comment.
releaseSeq uses the wrong thread’s history buffer.
historyBuffer->empty() checks only thread 0, which can corrupt releaseSeq for other threads.
🛠️ Suggested fix
- releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
+ releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void | |
| Rename::releasePhysRegs() | |
| { | |
| // Release physical registers up to releaseWidth | |
| auto threads = activeThreads->begin(); | |
| if (releaseSeq + releaseWidth < finalCommitSeq) { | |
| releaseSeq += releaseWidth; | |
| } else { | |
| releaseSeq = finalCommitSeq; | |
| } | |
| while (threads != end) { | |
| while (threads != activeThreads->end()) { | |
| ThreadID tid = *threads++; | |
| removeFromHistory(releaseSeq, tid); | |
| // If we committed this cycle then doneSeqNum will be > 0 | |
| if (fromCommit->commitInfo[tid].doneSeqNum != 0 && | |
| !fromCommit->commitInfo[tid].squash && | |
| renameStatus[tid] != Squashing) { | |
| !fromCommit->commitInfo[tid].squash) { | |
| finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum; | |
| releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum; | |
| } | |
| } | |
| void | |
| Rename::releasePhysRegs() | |
| { | |
| // Release physical registers up to releaseWidth | |
| auto threads = activeThreads->begin(); | |
| if (releaseSeq + releaseWidth < finalCommitSeq) { | |
| releaseSeq += releaseWidth; | |
| } else { | |
| releaseSeq = finalCommitSeq; | |
| } | |
| while (threads != activeThreads->end()) { | |
| ThreadID tid = *threads++; | |
| removeFromHistory(releaseSeq, tid); | |
| // If we committed this cycle then doneSeqNum will be > 0 | |
| if (fromCommit->commitInfo[tid].doneSeqNum != 0 && | |
| !fromCommit->commitInfo[tid].squash) { | |
| finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum; | |
| releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum; | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In `@src/cpu/o3/rename.cc` around lines 389 - 410, The bug is that releaseSeq is
computed using historyBuffer->empty() which only checks thread 0; in
Rename::releasePhysRegs you must check the current thread’s history buffer (use
historyBuffer[tid].empty()) and, when non-empty, use
historyBuffer[tid].back().instSeqNum to set releaseSeq; update the assignment
inside the loop where fromCommit->commitInfo[tid].doneSeqNum is handled so it
references historyBuffer[tid] instead of historyBuffer.
Change-Id: I9b599a4e0d704215ad1a3bf543dbd075384fe1f4
Summary by CodeRabbit
Release Notes