Skip to content

cpu-o3: using reverse ordered tick & refactor the stalls logic#756

Open
tastynoob wants to merge 1 commit intoxs-devfrom
refactor-stalls
Open

cpu-o3: using reverse ordered tick & refactor the stalls logic#756
tastynoob wants to merge 1 commit intoxs-devfrom
refactor-stalls

Conversation

@tastynoob
Copy link
Collaborator

@tastynoob tastynoob commented Feb 9, 2026

Change-Id: I9b599a4e0d704215ad1a3bf543dbd075384fe1f4

Summary by CodeRabbit

Release Notes

  • Refactor
    • Reorganized CPU pipeline execution ordering to improve instruction flow and scheduling efficiency.
    • Streamlined pipeline stage buffering and stall signal handling across fetch, decode, rename, execute, and commit stages for reduced complexity.
    • Consolidated per-thread queue management into unified circular buffers for more efficient instruction movement through the pipeline.

Change-Id: I9b599a4e0d704215ad1a3bf543dbd075384fe1f4
@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

This change comprehensively refactors the O3 pipeline's inter-stage buffering and stall signaling architecture. It introduces a centralized StallSignals structure, replaces multiple per-thread queue implementations with unified circular_buffer approach across commit, decode, IEW, and rename stages, simplifies SMT fetch policy to single-thread selection, reorders CPU tick execution (commit→IEW→rename→decode→fetch), and consolidates stall/activation control flow while removing legacy status tracking mechanisms.

Changes

Cohort / File(s) Summary
Communication Structures
src/cpu/o3/comm.hh
Introduced new StallSignals struct to centralize stall signals (blockFetch, blockDecode, blockRename, blockIEW per thread); removed per-thread free entry broadcasts and block/unblock pairs from IewComm/CommitComm; streamlined inter-stage signaling.
Commit Stage
src/cpu/o3/commit.hh, src/cpu/o3/commit.cc
Added per-thread fixedbuffer (circular_buffer) for instruction buffering; introduced StallSignals* pointer and setter; replaced fromRename direct processing with buffered flow; removed ROB free entry broadcasts; simplified stall detection via buffer occupancy checks; added trace logging for committed instructions.
CPU Controller
src/cpu/o3/cpu.hh, src/cpu/o3/cpu.cc
Added StallSignals members; renamed stage queues (fetchQueuefetchTimebuffer, decodeQueuedecodeTimebuffer, etc.); added setStallSignals() calls for all stages; changed tick ordering (commit-first instead of fetch-first); updated buffer advancement and drain paths to use timebuffer variants.
Fetch Stage
src/cpu/o3/fetch.hh, src/cpu/o3/fetch.cc
Replaced per-thread stall tracking with StallSignals*; removed SMT fetch policy (getFetchingThread, roundRobin, iqCount, lsqCount, branchCount); simplified thread selection to single-thread (tid=0); updated cache completion and sendInstructionsToDecode to infer activity from stall signals; removed checkStall logic.
Decode Stage
src/cpu/o3/decode.hh, src/cpu/o3/decode.cc
Replaced skidBuffer/insts with per-thread fixedbuffer (circular_buffer); introduced stallBuffer and eachstallSize; renamed sortInstsmoveInstsToBuffer, updateStatusupdateActivate, squashselfSquash; removed ThreadStatus enum and stall tracking; simplified to single active thread per cycle with global stall on multi-thread contention.
IEW Stage
src/cpu/o3/iew.hh, src/cpu/o3/iew.cc
Replaced insts/skidBuffer with unified fixedbuffer[MaxThreads]; renamed updateStatusupdateActivate, sortInstsmoveInstsToBuffer, dispatchcanInsertLDSTQue, checkSignalsAndUpdatecheckSquash; removed dispatchStatus array and skid helpers; changed dispatchInsts/dispatchInstFromDispQue signatures to parameterless; updated toCommit wire from -iewToCommitDelay to 0 (zero-latency).
Rename Stage
src/cpu/o3/rename.hh, src/cpu/o3/rename.cc
Replaced InstQueue/skidBuffer with per-thread fixedbuffer (circular_buffer); removed renameStatus enum and multiple occupancy counters (instsInProgress, loadsInProgress, storesInProgress, freeEntries); renamed sortInstsmoveInstsToBuffer, simplified canRename() signature; asserted decodeToRenameDelay==1; eliminated serialize state tracking and block/unblock paths; streamlined tick flow to single-pass fixedbuffer-based decision.
Supporting Files
src/cpu/o3/dyn_inst.hh, src/cpu/o3/lsq_unit.cc, src/cpu/o3/regfile.hh, src/cpu/o3/rename_map.cc
Removed isTempSerializeAfter() method; removed printDisassemblyAndResult() call in LSQUnit; changed debug logging category from IEW/Rename to Scoreboard for register file and rename map operations; updated includes and log message references.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

architecture, o3-pipeline, refactoring

Suggested reviewers

  • jensen-yan
  • happy-lx

Poem

🐰 Whispers from the Warren

Through pipelines deep we hopped and skipped,
Where buffers round now hold our tricks,
One signal stalls them all at once—
No more per-thread confusion!
The cycles dance in new refrains,
Commit first, then wisdom reigns! 🎯

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: introducing reverse-ordered CPU ticking and refactoring stall signal handling into a centralized StallSignals mechanism.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor-stalls

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
src/cpu/o3/fetch.hh (1)

634-641: ⚠️ Potential issue | 🟡 Minor

Remove dead Stalls struct from fetch header; clarify incomplete drain handling.

The Stalls struct at lines 634-638 is unused since stalls[MaxThreads] was replaced by StallSignals* stallSig. Unlike the Rename stage which still actively uses its Stalls struct, fetch only references stallSig with its blockFetch member. The struct definition can be removed.

Additionally, StallSignals provides no drain equivalent, and drainStall() in fetch.cc is an empty stub. This suggests drain handling was intentionally removed or remains incomplete. Clarify whether drain functionality is intentional or needs implementation. The stallSig pointer initialization via setStallSignals() called during CPU construction is safe and does not require nullptr initialization.

src/cpu/o3/commit.cc (1)

1789-1857: ⚠️ Potential issue | 🟠 Major

Protect fixedbuffer intake when prior batch hasn’t drained.

getInsts() assumes the per-thread fixedbuffer is empty whenever rename delivers a new batch. If that invariant ever breaks, the new batch is mixed with older entries. A runtime guard makes this robust even in non-assert builds.

🛠️ Suggested guard
-    ThreadID tid = insts_from_rename > 0 ? fromRename->insts[0]->threadNumber : -1;
-    if (tid != -1) assert(fixedbuffer[tid].empty());
+    ThreadID tid = insts_from_rename > 0 ? fromRename->insts[0]->threadNumber
+                                         : InvalidThreadID;
+    if (tid != InvalidThreadID && !fixedbuffer[tid].empty()) {
+        stallSig->blockIEW[tid] = true;
+        DPRINTF(Commit, "[tid:%i] Fixedbuffer not drained; deferring rename intake\n", tid);
+        return;
+    }
src/cpu/o3/rename.cc (2)

652-675: ⚠️ Potential issue | 🟠 Major

Rename stage never deactivates.

any_unblocking is hardcoded to true, so the stage stays Active even when idle, preventing CPU idling. Compute it from stall signals/buffer state.

🛠️ Suggested fix
-    bool any_unblocking = true;
+    bool any_unblocking = false;
+    for (ThreadID tid = 0; tid < numThreads; ++tid) {
+        if (!stallSig->blockDecode[tid] && !fixedbuffer[tid].empty()) {
+            any_unblocking = true;
+            break;
+        }
+    }

477-606: ⚠️ Potential issue | 🟡 Minor

Fix DPRINTF format/argument mismatch causing undefined behavior.

The format string at lines 501-502 has 2 specifiers (%llu, %s) but 3 arguments are passed (tid, inst->seqNum, inst->pcState()). The tid argument is missing from the format string, which causes the printf-style formatter to misalign arguments and trigger undefined behavior.

Suggested fix
-            DPRINTF(Rename, "[sn:%llu] instruction  with PC %s is squashed, skipping.\n",
-                    tid, inst->seqNum, inst->pcState());
+            DPRINTF(Rename, "[tid:%i] [sn:%llu] instruction with PC %s is squashed, skipping.\n",
+                    tid, inst->seqNum, inst->pcState());
src/cpu/o3/decode.cc (1)

336-371: ⚠️ Potential issue | 🟡 Minor

updateActivate is effectively never called.

status_change is never set in tick(), so updateActivate() won’t reflect block/unblock transitions. Call it unconditionally or compute status_change from stall-signal changes.

🛠️ Minimal fix
-    if (status_change) {
-        updateActivate();
-    }
+    updateActivate();

Also applies to: 464-469

src/cpu/o3/iew.hh (1)

556-559: ⚠️ Potential issue | 🟡 Minor

Typo in comment: "fetrenamech" should be "rename".

Proposed fix
-        /** Distribution of number of fetrenamech stall reasons each tick. */
+        /** Distribution of number of rename stall reasons each tick. */
🤖 Fix all issues with AI agents
In `@src/cpu/o3/comm.hh`:
- Around line 328-335: The StallSignals struct's bool arrays (blockFetch,
blockDecode, blockRename, blockIEW) are left uninitialized causing undefined
behavior; update StallSignals to value-initialize these arrays (e.g., add
default member initializers or a constructor that sets all elements to false) so
every entry for MaxThreads is explicitly false on construction, ensuring any
reader sees deterministic values.

In `@src/cpu/o3/commit.hh`:
- Around line 179-182: Initialize the raw pointer member stallSig to nullptr in
the commit class declaration and add a direct include for
<boost/circular_buffer.hpp> to this header; specifically, in commit.hh set
StallSignals* stallSig = nullptr; (so uses in commit.cc at symbols referencing
stallSig are safe) and add `#include` <boost/circular_buffer.hpp> near other
includes so fixedbuffer (boost::circular_buffer<DynInstPtr>
fixedbuffer[MaxThreads]) no longer relies on a transitive include.

In `@src/cpu/o3/cpu.hh`:
- Around line 529-546: Remove the redundant member declaration stallSig from the
class (it duplicates the existing StallSignals stallSignals and is never used);
delete the line declaring StallSignals stallSig; and ensure no other code relies
on that member (the stages use setStallSignals() and their own stallSig
members), then rebuild to verify there are no references to the removed symbol.

In `@src/cpu/o3/decode.cc`:
- Around line 112-115: Decode::clearStates currently does nothing but must purge
any per-thread buffered state for the given ThreadID to avoid stale entries when
TIDs are reused; implement it to remove/clear entries belonging to tid from
Decode's per-thread containers (e.g., fixedBuffer and stallBuffer) and reset any
other per-thread bookkeeping used by Decode so removeThread() no longer leaves
stale instructions. Locate Decode::clearStates and ensure it iterates the
fixedBuffer and stallBuffer structures, erasing elements whose ThreadID matches
the tid argument (and clearing any related per-thread maps/counters), leaving
other threads' data intact.

In `@src/cpu/o3/decode.hh`:
- Around line 186-187: The member pointer stallSig is declared uninitialized
(StallSignals* stallSig;) and should be set to a null pointer to avoid
indeterminate-value access before setStallSignals is called; initialize stallSig
to nullptr at declaration or assign nullptr in the class constructor (the class
that contains stallSig and the setStallSignals method) so all code paths see a
well-defined value until setStallSignals populates it.

In `@src/cpu/o3/fetch.cc`:
- Around line 1294-1325: The code hardcodes ThreadID tid = 0 which disables SMT;
replace this with the proper thread selection logic (use selectFetchThread() or
the same selection used by sendInstructionsToDecode()) so tid is set to the
chosen thread before checking stallSig->blockFetch[tid] and pulling from
fetchQueue[tid]; if SMT selection is not yet implemented, add a fail-fast
(ASSERT/ fatal) when numThreads > 1 to avoid silent starvation and document that
selectFetchThread()/sendInstructionsToDecode() must be used to pick tid.

In `@src/cpu/o3/iew.cc`:
- Around line 820-837: The loop currently sets stallSig->blockRename for both
threads when multiple active threads are found (using tid,
stallSig->blockRename, fixedbuffer, canInsertLDSTQue), which stalls all SMT
threads; change it to allow exactly one active thread to proceed (e.g., pick a
single thread to keep blockRename=false and set blockRename=true for all other
active threads) and implement a simple selection policy (round-robin using a
persistent nextThread index or a priority tie-breaker) so that when scanning
fixedbuffer[] you assign tid to the chosen thread and mark every other active
thread's stallSig->blockRename true instead of blocking both.
- Line 429: The IEW→Commit timing is broken because toCommit is reading
iewQueue->getWire(0) while Commit expects the data at offset -iewToCommitDelay;
restore the intended zero-latency alignment by changing the producer read to use
the negative delay: set toCommit = iewQueue->getWire(-iewToCommitDelay)
(symbols: toCommit, iewQueue, getWire, iewToCommitDelay, fromIEW, Commit) so
both IEW and Commit use the same buffer offset convention.
- Around line 798-811: The issue is that canInsertLDSTQue(ThreadID) performs
side-effectful calls getAndResetLastLQPopEntries/getAndResetLastSQPopEntries
which are only executed when stallSig->blockIEW[i] is true (due to short-circuit
&&), so counters aren't reset otherwise; fix by invoking the reset logic
unconditionally and then using its result in the block condition: call a new or
existing getter that performs
getAndResetLastLQPopEntries/getAndResetLastSQPopEntries (or call
canInsertLDSTQue(i) and store its bool result) before computing bool block =
stallSig->blockIEW[i] && !... so that the reset side-effects always run, then
use the stored boolean in the original condition (or refactor canInsertLDSTQue
into pure check + separate reset function and call reset unconditionally in
dispatchInsts()).
- Around line 724-748: When detecting a squash in the loop
(fromCommit->commitInfo[i].squash) clear fixedbuffer[i] before calling squash(i)
and returning so any stale instructions are discarded; specifically, move or add
fixedbuffer[i].clear() immediately when fromCommit->commitInfo[i].squash is true
(before squash(i) and the return) because IEW::squash() does not clear the
buffer unlike decode/rename; keep the existing clear for the robSquashing path
unchanged.
- Around line 118-119: The postfix increment and decrement operators for the
wire class are returning references to temporaries; change the signatures of
wire::operator++(int) and wire::operator--(int) to return by value (wire)
instead of wire& and update their definitions accordingly so they return the
temporary (wire(this, i)) by value; ensure both the declaration in the class and
the out-of-line definitions (operator++(int) and operator--(int)) are adjusted
to match the new return type.
- Around line 755-759: The assert fixedbuffer[tid].empty() can fire legitimately
when dispatch stalls; change it to tolerate non-empty buffers by replacing the
unconditional assert with a conditional that either clears retained instructions
or asserts only if blockRename[tid] is false. Specifically, in the block
handling insts_from_rename/fromRename, replace "if (tid != -1)
assert(fixedbuffer[tid].empty());" with logic that: if tid != -1 and
fixedbuffer[tid] is not empty, then either call a clearing/rollback helper to
remove the newly moved instructions (so moveInstsToBuffer() leaves no leftovers)
or assert(!blockRename[tid]) (i.e., only assert emptiness when blockRename[tid]
is not set); reference functions/fields: insts_from_rename, fromRename->insts,
fixedbuffer, IEW::tick(), dispatchInsts(), blockRename, moveInstsToBuffer().

In `@src/cpu/o3/iew.hh`:
- Line 157: Initialize the raw pointer StallSignals* stallSig to nullptr (e.g.,
in IEW's member initializer or declaration) and update uses to check/assert
before dereferencing: add a null-check or assertion at the start of methods that
use stallSig (notably squash and squashDueToBranch) and in any code paths that
might run before setStallSignals() is called; ensure setStallSignals() still
assigns the pointer when available.
- Line 51: Remove the unused boost include: delete the line with `#include`
<boost/circular_buffer.hpp> from iew.hh because this header is not referenced in
this file (the container used is fixedbuffer declared as
std::deque<DynInstPtr>); ensure no other symbols in iew.hh depend on
boost::circular_buffer after removal and run a quick build to confirm no missing
includes.

In `@src/cpu/o3/regfile.hh`:
- Around line 423-427: The debug print in the VecPredRegClass case uses the
wrong regClass instance: replace the call to
vectorRegFile.regClass.valString(val) with
vecPredRegFile.regClass.valString(val) so the DPRINTF reflects the same register
file that vecPredRegFile.set(idx, val) writes to; update the VecPredRegClass
case handling (the DPRINTF line) to use vecPredRegFile.regClass.valString(val).

In `@src/cpu/o3/rename.cc`:
- Around line 389-410: The bug is that releaseSeq is computed using
historyBuffer->empty() which only checks thread 0; in Rename::releasePhysRegs
you must check the current thread’s history buffer (use
historyBuffer[tid].empty()) and, when non-empty, use
historyBuffer[tid].back().instSeqNum to set releaseSeq; update the assignment
inside the loop where fromCommit->commitInfo[tid].doneSeqNum is handled so it
references historyBuffer[tid] instead of historyBuffer.
🧹 Nitpick comments (5)
src/cpu/o3/commit.cc (1)

168-187: Prefer fatal_if for the renameToROBDelay invariant.

This constraint is fundamental to the new buffering path; enforcing it in release builds avoids silent misconfigurations.

🔧 Suggested change
-    assert(renameToROBDelay == 1);
+    fatal_if(renameToROBDelay != 1,
+             "renameToROBDelay must be 1 for O3 commit (got %d).",
+             renameToROBDelay);
src/cpu/o3/rename.cc (1)

76-84: Prefer fatal_if for the decodeToRenameDelay invariant.

This is a hard requirement for the refactor; enforcing it in release builds avoids silent misconfigurations.

🔧 Suggested change
-    assert(decodeToRenameDelay == 1);
+    fatal_if(decodeToRenameDelay != 1,
+             "decodeToRenameDelay must be 1 for O3 rename (got %d).",
+             decodeToRenameDelay);
src/cpu/o3/iew.hh (1)

391-391: Consider renaming fixedbuffer to something more descriptive.

The name fixedbuffer is vague — it doesn't convey that this is the per-thread dispatch buffer between rename and IEW. A name like renameBuffer or dispatchBuffer would improve readability and align with the role described in the AI summary. Also, std::deque is not "fixed" in any meaningful sense.

src/cpu/o3/iew.cc (2)

896-901: Commented-out serialize stall check should be removed or restored.

This entire block is commented out with no explanation. If the serialize handling is intentionally removed as part of this refactor, delete the dead code. If it's temporarily disabled for debugging, add a TODO or FIXME comment explaining why.


1650-1653: Tick ordering: moveInstsToBuffer() before checkSquash() may process instructions that should be squashed.

In the reverse-ordered tick, moveInstsToBuffer() runs first (line 1651), populating fixedbuffer with instructions from rename. Then checkSquash() runs (line 1652), potentially discovering a squash from commit. As noted in a separate comment, checkSquash doesn't clear fixedbuffer on the squash path. Even if it did, the ordering means one wasted cycle of work moving instructions that will immediately be discarded. Consider swapping the order so checkSquash() runs first.

Proposed reordering
     // dispatch
-    moveInstsToBuffer();
     checkSquash();
+    moveInstsToBuffer();
     dispatchInsts();

Comment on lines +328 to 335
struct StallSignals
{

bool blockFetch[MaxThreads];// decode to fetch
bool blockDecode[MaxThreads];// rename to decode
bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled)
bool blockIEW[MaxThreads];// commit to iew
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

StallSignals members are left uninitialized by default — risk of undefined behavior.

The bool arrays have no default member initializers and no constructor, so they contain indeterminate values until explicitly written. Any stage that reads a stall signal before the first write will observe undefined behavior.

Add default member initializers or a constructor:

🛡️ Proposed fix — value-initialize all arrays
 struct StallSignals
 {
-
-    bool blockFetch[MaxThreads];// decode to fetch
-    bool blockDecode[MaxThreads];// rename to decode
-    bool blockRename[MaxThreads];// iew to rename (if iew is stalling, rename all threads would be stalled)
-    bool blockIEW[MaxThreads];// commit to iew
+    bool blockFetch[MaxThreads]{};  // decode to fetch
+    bool blockDecode[MaxThreads]{}; // rename to decode
+    bool blockRename[MaxThreads]{}; // iew to rename
+    bool blockIEW[MaxThreads]{};    // commit to iew
 };
🤖 Prompt for AI Agents
In `@src/cpu/o3/comm.hh` around lines 328 - 335, The StallSignals struct's bool
arrays (blockFetch, blockDecode, blockRename, blockIEW) are left uninitialized
causing undefined behavior; update StallSignals to value-initialize these arrays
(e.g., add default member initializers or a constructor that sets all elements
to false) so every entry for MaxThreads is explicitly false on construction,
ensuring any reader sees deterministic values.

Comment on lines +179 to +182
boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];

StallSignals* stallSig;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check includes in commit.hh
echo "=== Includes in commit.hh (lines 44-69) ==="
sed -n '44,69p' src/cpu/o3/commit.hh

echo ""
echo "=== Lines around 179-182 in commit.hh ==="
sed -n '175,185p' src/cpu/o3/commit.hh

echo ""
echo "=== Check for stallSig declaration and initialization ==="
rg -n 'stallSig' src/cpu/o3/commit.hh src/cpu/o3/commit.cc

echo ""
echo "=== Commit constructor signature and initialization ==="
rg -n -A 50 'Commit::Commit' src/cpu/o3/commit.cc | head -80

Repository: OpenXiangShan/GEM5

Length of output: 5937


🏁 Script executed:

#!/bin/bash
# Check fixedbuffer initialization in constructor body
echo "=== Search for fixedbuffer initialization ==="
rg -n -B 5 -A 5 'fixedbuffer' src/cpu/o3/commit.cc

echo ""
echo "=== Check full constructor initialization list ==="
sed -n '103,200p' src/cpu/o3/commit.cc | grep -E '(fixedbuffer|stallSig|:.*\()'

Repository: OpenXiangShan/GEM5

Length of output: 2935


Initialize stallSig to nullptr and add direct include for boost::circular_buffer.

The raw pointer stallSig is declared uninitialized and later used directly in multiple places (lines 1807, 1813, 1814, 1855 in commit.cc) without null-checks. It should be initialized to nullptr.

Also, <boost/circular_buffer.hpp> is not directly included here—only transitively available via decode.hh. Adding a direct include would be more robust.

Note: fixedbuffer is already properly initialized with capacity in the constructor (line 186: fixedbuffer[tid] = boost::circular_buffer<DynInstPtr>(renameWidth)), so no changes needed there.

Proposed fix
     boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];
 
-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;

And add to the includes section:

 `#include` <boost/circular_buffer.hpp>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];
StallSignals* stallSig;
boost::circular_buffer<DynInstPtr> fixedbuffer[MaxThreads];
StallSignals* stallSig = nullptr;
🤖 Prompt for AI Agents
In `@src/cpu/o3/commit.hh` around lines 179 - 182, Initialize the raw pointer
member stallSig to nullptr in the commit class declaration and add a direct
include for <boost/circular_buffer.hpp> to this header; specifically, in
commit.hh set StallSignals* stallSig = nullptr; (so uses in commit.cc at symbols
referencing stallSig are safe) and add `#include` <boost/circular_buffer.hpp> near
other includes so fixedbuffer (boost::circular_buffer<DynInstPtr>
fixedbuffer[MaxThreads]) no longer relies on a transitive include.

Comment on lines +529 to +546
StallSignals stallSignals;

/** The main time buffer to do backwards communication. */
TimeBuffer<TimeStruct> timeBuffer;

/** The fetch stage's instruction queue. */
TimeBuffer<FetchStruct> fetchQueue;
TimeBuffer<FetchStruct> fetchTimebuffer;

/** The decode stage's instruction queue. */
TimeBuffer<DecodeStruct> decodeQueue;
TimeBuffer<DecodeStruct> decodeTimebuffer;

/** The rename stage's instruction queue. */
TimeBuffer<RenameStruct> renameQueue;
TimeBuffer<RenameStruct> renameTimebuffer;

/** The IEW stage's instruction queue. */
TimeBuffer<IEWStruct> iewQueue;
TimeBuffer<IEWStruct> iewTimebuffer;

StallSignals stallSig;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check which StallSignals member is actually used in cpu.cc
echo "=== StallSignals usage in cpu.cc ==="
rg -n 'stallSignals|stallSig|setStallSignals' --type=cpp -g '*cpu.cc'

echo ""
echo "=== StallSignals declarations in cpu.hh ==="
rg -n 'StallSignals' --type=cpp -g '*cpu.hh'

Repository: OpenXiangShan/GEM5

Length of output: 566


🏁 Script executed:

#!/bin/bash
# Search entire codebase for stallSig usage
echo "=== All references to stallSig in the entire codebase ==="
rg -n 'stallSig' --type=cpp

echo ""
echo "=== All references to stallSignals in the entire codebase ==="
rg -n 'stallSignals' --type=cpp | head -30

Repository: OpenXiangShan/GEM5

Length of output: 4386


Remove unused stallSig member on line 546.

The stallSig member declared at line 546 is unused dead code. All stages correctly receive stallSignals (line 529) via setStallSignals() in cpu.cc (lines 199–203). Each stage stores this pointer in their own local stallSig member; the CPU's duplicate stallSig is never accessed. Remove line 546 to eliminate confusion about which member serves as the shared stall signal object.

🤖 Prompt for AI Agents
In `@src/cpu/o3/cpu.hh` around lines 529 - 546, Remove the redundant member
declaration stallSig from the class (it duplicates the existing StallSignals
stallSignals and is never used); delete the line declaring StallSignals
stallSig; and ensure no other code relies on that member (the stages use
setStallSignals() and their own stallSig members), then rebuild to verify there
are no references to the removed symbol.

Comment on lines 112 to 115
Decode::clearStates(ThreadID tid)
{
decodeStatus[tid] = Idle;
stalls[tid].rename = false;

}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

clearStates should purge per-thread buffers.

removeThread() relies on decode.clearStates(), but this is now empty; stale instructions can remain in fixedbuffer/stallBuffer if the TID is reused.

🛠️ Suggested fix
 void
 Decode::clearStates(ThreadID tid)
 {
-
+    fixedbuffer[tid].clear();
+
+    auto delIt = stallBuffer.begin();
+    for (auto it0 = eachstallSize.begin(); it0 != eachstallSize.end();) {
+        int size = *it0;
+        auto start_it = delIt;
+        auto end_it = start_it + size;
+        if ((*start_it)->threadNumber == tid) {
+            delIt = stallBuffer.erase(start_it, end_it);
+            it0 = eachstallSize.erase(it0);
+        } else {
+            delIt = end_it;
+            ++it0;
+        }
+    }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Decode::clearStates(ThreadID tid)
{
decodeStatus[tid] = Idle;
stalls[tid].rename = false;
}
Decode::clearStates(ThreadID tid)
{
fixedbuffer[tid].clear();
auto delIt = stallBuffer.begin();
for (auto it0 = eachstallSize.begin(); it0 != eachstallSize.end();) {
int size = *it0;
auto start_it = delIt;
auto end_it = start_it + size;
if ((*start_it)->threadNumber == tid) {
delIt = stallBuffer.erase(start_it, end_it);
it0 = eachstallSize.erase(it0);
} else {
delIt = end_it;
+it0;
}
}
}
🤖 Prompt for AI Agents
In `@src/cpu/o3/decode.cc` around lines 112 - 115, Decode::clearStates currently
does nothing but must purge any per-thread buffered state for the given ThreadID
to avoid stale entries when TIDs are reused; implement it to remove/clear
entries belonging to tid from Decode's per-thread containers (e.g., fixedBuffer
and stallBuffer) and reset any other per-thread bookkeeping used by Decode so
removeThread() no longer leaves stale instructions. Locate Decode::clearStates
and ensure it iterates the fixedBuffer and stallBuffer structures, erasing
elements whose ThreadID matches the tid argument (and clearing any related
per-thread maps/counters), leaving other threads' data intact.

Comment on lines +186 to +187
StallSignals* stallSig;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

stallSig pointer should be initialized to nullptr.

Without initialization, stallSig holds an indeterminate value until setStallSignals is called, risking undefined behavior on any early access.

Proposed fix
-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
StallSignals* stallSig;
StallSignals* stallSig = nullptr;
🤖 Prompt for AI Agents
In `@src/cpu/o3/decode.hh` around lines 186 - 187, The member pointer stallSig is
declared uninitialized (StallSignals* stallSig;) and should be set to a null
pointer to avoid indeterminate-value access before setStallSignals is called;
initialize stallSig to nullptr at declaration or assign nullptr in the class
constructor (the class that contains stallSig and the setStallSignals method) so
all code paths see a well-defined value until setStallSignals populates it.

Comment on lines +820 to +837
// check threads stall & status
ThreadID tid = InvalidThreadID;
for (int i = 0; i < numThreads; i++) {
bool block = stallSig->blockIEW[i] && !canInsertLDSTQue(i);
bool active = !block && !fixedbuffer[i].empty();

stallSig->blockRename[i] = block;
if (active) {
if (tid == InvalidThreadID) tid = i;
else {
// if there are multiple active threads, must exhaust all threads first
// to avoid starvation of other threads and also avoid resource conflict
stallSig->blockRename[tid] = true;
stallSig->blockRename[i] = true;
DPRINTF(IEW, "Multiple active threads detected, blocking all threads\n");
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SMT multi-thread handling blocks all threads — may cause livelock or severe throughput loss.

When multiple threads are both active (!block && !fixedbuffer[tid].empty()), lines 832-833 set blockRename for both threads, effectively stalling all dispatch. The comment says "must exhaust all threads first to avoid starvation" but this creates a scenario where neither thread can make progress if both consistently have instructions ready. This seems overly conservative for SMT.

Consider implementing a round-robin or priority-based selection instead of blocking both threads.

🤖 Prompt for AI Agents
In `@src/cpu/o3/iew.cc` around lines 820 - 837, The loop currently sets
stallSig->blockRename for both threads when multiple active threads are found
(using tid, stallSig->blockRename, fixedbuffer, canInsertLDSTQue), which stalls
all SMT threads; change it to allow exactly one active thread to proceed (e.g.,
pick a single thread to keep blockRename=false and set blockRename=true for all
other active threads) and implement a simple selection policy (round-robin using
a persistent nextThread index or a priority tie-breaker) so that when scanning
fixedbuffer[] you assign tid to the chosen thread and mark every other active
thread's stallSig->blockRename true instead of blocking both.

#include <set>
#include <vector>

#include <boost/circular_buffer.hpp>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if boost::circular_buffer is used anywhere in iew.hh or iew.cc
rg -n 'circular_buffer' src/cpu/o3/iew.hh src/cpu/o3/iew.cc

Repository: OpenXiangShan/GEM5

Length of output: 119


🏁 Script executed:

#!/bin/bash
# Check if circular_buffer is used anywhere else in the o3 directory or related files
rg -n 'circular_buffer' src/cpu/o3/ --type h --type cc
# Also verify fixedbuffer declaration to confirm it's std::deque
rg -n 'fixedbuffer' src/cpu/o3/iew.hh | head -5

Repository: OpenXiangShan/GEM5

Length of output: 161


🏁 Script executed:

#!/bin/bash
# Search for circular_buffer more broadly to ensure it's not used anywhere
rg 'circular_buffer' src/cpu/o3/

Repository: OpenXiangShan/GEM5

Length of output: 1260


Remove unused #include <boost/circular_buffer.hpp>.

fixedbuffer is declared as std::deque<DynInstPtr> (line 391) in this file. While boost::circular_buffer is used in other o3 header files (rename.hh, decode.hh, lsq_unit.hh, commit.hh), it is not used in iew.hh. This include is a leftover and should be removed.

Proposed fix
-#include <boost/circular_buffer.hpp>
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include <boost/circular_buffer.hpp>
🤖 Prompt for AI Agents
In `@src/cpu/o3/iew.hh` at line 51, Remove the unused boost include: delete the
line with `#include` <boost/circular_buffer.hpp> from iew.hh because this header
is not referenced in this file (the container used is fixedbuffer declared as
std::deque<DynInstPtr>); ensure no other symbols in iew.hh depend on
boost::circular_buffer after removal and run a quick build to confirm no missing
includes.

ProbePointArg<DynInstPtr> *ppToCommit;

bool disp_stall = false;
StallSignals* stallSig;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

stallSig raw pointer is not initialized in the constructor.

stallSig is a raw StallSignals* with no default initializer. If any code path in IEW executes before setStallSignals() is called (e.g., during squash or squashDueToBranch), dereferencing stallSig will cause a null-pointer dereference. Consider initializing it to nullptr here and adding a guard or assertion before first use.

Proposed fix
-    StallSignals* stallSig;
+    StallSignals* stallSig = nullptr;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
StallSignals* stallSig;
StallSignals* stallSig = nullptr;
🤖 Prompt for AI Agents
In `@src/cpu/o3/iew.hh` at line 157, Initialize the raw pointer StallSignals*
stallSig to nullptr (e.g., in IEW's member initializer or declaration) and
update uses to check/assert before dereferencing: add a null-check or assertion
at the start of methods that use stallSig (notably squash and squashDueToBranch)
and in any code paths that might run before setStallSignals() is called; ensure
setStallSignals() still assigns the pointer when available.

Comment on lines 423 to 427
case VecPredRegClass:
DPRINTF(IEW, "RegFile: Setting predicate register %i to %s\n",
DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n",
idx, vectorRegFile.regClass.valString(val));
vecPredRegFile.set(idx, val);
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pre-existing bug: wrong regClass used for predicate register value formatting.

Line 425 uses vectorRegFile.regClass.valString(val) to format a VecPredRegClass value, but it should use vecPredRegFile.regClass.valString(val) to match the actual register file being written on line 426.

🐛 Proposed fix
          case VecPredRegClass:
            DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n",
-                    idx, vectorRegFile.regClass.valString(val));
+                    idx, vecPredRegFile.regClass.valString(val));
            vecPredRegFile.set(idx, val);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
case VecPredRegClass:
DPRINTF(IEW, "RegFile: Setting predicate register %i to %s\n",
DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n",
idx, vectorRegFile.regClass.valString(val));
vecPredRegFile.set(idx, val);
break;
case VecPredRegClass:
DPRINTF(Scoreboard, "RegFile: Setting predicate register %i to %s\n",
idx, vecPredRegFile.regClass.valString(val));
vecPredRegFile.set(idx, val);
break;
🤖 Prompt for AI Agents
In `@src/cpu/o3/regfile.hh` around lines 423 - 427, The debug print in the
VecPredRegClass case uses the wrong regClass instance: replace the call to
vectorRegFile.regClass.valString(val) with
vecPredRegFile.regClass.valString(val) so the DPRINTF reflects the same register
file that vecPredRegFile.set(idx, val) writes to; update the VecPredRegClass
case handling (the DPRINTF line) to use vecPredRegFile.regClass.valString(val).

Comment on lines +389 to 410
void
Rename::releasePhysRegs()
{
// Release physical registers up to releaseWidth
auto threads = activeThreads->begin();
if (releaseSeq + releaseWidth < finalCommitSeq) {
releaseSeq += releaseWidth;
} else {
releaseSeq = finalCommitSeq;
}

while (threads != end) {
while (threads != activeThreads->end()) {
ThreadID tid = *threads++;

removeFromHistory(releaseSeq, tid);

// If we committed this cycle then doneSeqNum will be > 0
if (fromCommit->commitInfo[tid].doneSeqNum != 0 &&
!fromCommit->commitInfo[tid].squash &&
renameStatus[tid] != Squashing) {
!fromCommit->commitInfo[tid].squash) {

finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum;
releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

releaseSeq uses the wrong thread’s history buffer.

historyBuffer->empty() checks only thread 0, which can corrupt releaseSeq for other threads.

🛠️ Suggested fix
-            releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
+            releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void
Rename::releasePhysRegs()
{
// Release physical registers up to releaseWidth
auto threads = activeThreads->begin();
if (releaseSeq + releaseWidth < finalCommitSeq) {
releaseSeq += releaseWidth;
} else {
releaseSeq = finalCommitSeq;
}
while (threads != end) {
while (threads != activeThreads->end()) {
ThreadID tid = *threads++;
removeFromHistory(releaseSeq, tid);
// If we committed this cycle then doneSeqNum will be > 0
if (fromCommit->commitInfo[tid].doneSeqNum != 0 &&
!fromCommit->commitInfo[tid].squash &&
renameStatus[tid] != Squashing) {
!fromCommit->commitInfo[tid].squash) {
finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum;
releaseSeq = historyBuffer->empty() ? 0 : historyBuffer[tid].back().instSeqNum;
}
}
void
Rename::releasePhysRegs()
{
// Release physical registers up to releaseWidth
auto threads = activeThreads->begin();
if (releaseSeq + releaseWidth < finalCommitSeq) {
releaseSeq += releaseWidth;
} else {
releaseSeq = finalCommitSeq;
}
while (threads != activeThreads->end()) {
ThreadID tid = *threads++;
removeFromHistory(releaseSeq, tid);
// If we committed this cycle then doneSeqNum will be > 0
if (fromCommit->commitInfo[tid].doneSeqNum != 0 &&
!fromCommit->commitInfo[tid].squash) {
finalCommitSeq = fromCommit->commitInfo[tid].doneSeqNum;
releaseSeq = historyBuffer[tid].empty() ? 0 : historyBuffer[tid].back().instSeqNum;
}
}
}
🤖 Prompt for AI Agents
In `@src/cpu/o3/rename.cc` around lines 389 - 410, The bug is that releaseSeq is
computed using historyBuffer->empty() which only checks thread 0; in
Rename::releasePhysRegs you must check the current thread’s history buffer (use
historyBuffer[tid].empty()) and, when non-empty, use
historyBuffer[tid].back().instSeqNum to set releaseSeq; update the assignment
inside the loop where fromCommit->commitInfo[tid].doneSeqNum is handled so it
references historyBuffer[tid] instead of historyBuffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant