Skip to content

wip#706

Open
tastynoob wants to merge 1 commit intoxs-devfrom
issueque-improve
Open

wip#706
tastynoob wants to merge 1 commit intoxs-devfrom
issueque-improve

Conversation

@tastynoob
Copy link
Collaborator

@tastynoob tastynoob commented Jan 13, 2026

Summary by CodeRabbit

  • Refactor
    • Switched to per-port inflight pipelines for more precise instruction tracking and scheduling.
    • Unified ready-queue insertion path and per-port wiring for consistent readiness handling.
    • Moved inflight advancement to centralized scheduler for clearer per-cycle behavior.
    • Reworked timing buffer internals to contiguous storage with improved construction, copy/move, and lifetime management.

✏️ Tip: You can customize this high-level summary in your review settings.

@tastynoob tastynoob added the perf label Jan 13, 2026
@github-actions
Copy link

🚀 Performance test triggered: spec06-0.8c

@coderabbitai
Copy link

coderabbitai bot commented Jan 13, 2026

📝 Walkthrough

Walkthrough

IssueQueue and Scheduler now use per-port inflight pipelines and method-based ready-queue insertion; TimeBuffer was refactored to use typed contiguous storage with copy/move semantics. Scheduler centralizes advancing per-port inflight buffers; IssueQue exposes setIssuePipe for wiring.

Changes

Cohort / File(s) Summary
Issue Queue Instruction Flow
src/cpu/o3/issue_queue.cc
Removed READYQ_PUSH macro usages; replaced with readyQInsert(...); updated issue/schedule paths to insert into per-port ready queues and route via per-port wires.
Issue Queue Declarations & API
src/cpu/o3/issue_queue.hh
Replaced single-buffer/inflight members with per-port vectors toIssue/toFu; added readyQInsert() (private) and setIssuePipe(...) (public); reorganized visibility, removed skidBuffer/old inflight types.
Scheduler Inflight Centralization
src/cpu/o3/... (constructor wiring in Scheduler)
Added std::vector<TimeBuffer<DynInstPtr>> inflightIssues; Scheduler initializes per-port TimeBuffers and calls IssueQue::setIssuePipe for each IQ outport; Scheduler::tick now advances per-port inflight pipes.
TimeBuffer Implementation
src/cpu/timebuf.hh
Internal storage changed to typed contiguous buffer (T* datas); added default/copy/move ctors and proper destructor; advance/access operations updated to use datas; in-class default initializers added.

Sequence Diagram(s)

sequenceDiagram
  actor CPU
  participant IssueQue
  participant Scheduler
  participant TimeBuffer

  CPU->>IssueQue: submit instruction
  IssueQue->>IssueQue: readyQInsert(inst) (per-port)
  IssueQue->>TimeBuffer: toIssue[port].enqueue(inst)  %% via wire
  Scheduler->>TimeBuffer: tick() -> advance per-port pipes
  TimeBuffer-->>Scheduler: ready entries appear at wire
  Scheduler->>IssueQue: deliver toFu[port] entries
  IssueQue->>CPU: issue to functional unit
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • happy-lx
  • jensen-yan

Poem

🐰 Little hops through pipes so neat,
Per-port wires hum a rhythmic beat,
Typed buffers cradle each tiny feat,
Scheduler taps, and issues meet—
I nibble code and celebrate this treat! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The pull request title "wip" is vague and non-descriptive, using a generic term that doesn't convey meaningful information about the significant architectural changes in the changeset. Replace "wip" with a descriptive title that captures the main change, such as "Refactor issue queue inflight tracking to per-port pipelines" or "Switch IssueQue to per-port inflight management."
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @src/cpu/timebuf.hh:
- Around line 150-170: The class TimeBuffer defines a destructor, copy ctor and
move ctor but lacks copy and move assignment operators; implement TimeBuffer&
operator=(const TimeBuffer<T>&) and TimeBuffer& operator=(TimeBuffer<T>&&) to
satisfy the Rule of 5: in copy-assignment guard against self-assignment, free
existing resources held in datas, deep-copy other's metadata (_id, past, future,
size, base) and allocate/copy elements into datas (handling non-POD via
placement-new or proper element copy), and in move-assignment guard
self-assignment, free existing resources, transfer metadata and datas pointer
from other to this, set other.datas = nullptr and other.size = 0, and return
*this; ensure consistency with the existing destructor and move/copy ctors to
avoid double-free or leaks.
- Line 150: The default constructor TimeBuffer() leaves size==0 and
datas==nullptr which makes methods like advance() and access() undefined; either
remove the default constructor or initialize members to a safe state and add
runtime guards: initialize size to a valid capacity and datas to an allocated
buffer (or explicitly set a flag), and/or update advance() and access() to check
size>0 and datas!=nullptr (or assert/throw std::runtime_error with a clear
message) before operating; reference the TimeBuffer constructor, advance(),
access(), size and datas members when making the change.
🧹 Nitpick comments (4)
src/cpu/o3/issue_queue.hh (1)

210-213: Consider adding bounds checking in setIssuePipe.

The method directly indexes into toIssue and toFu vectors without verifying pi < outports. While callers may ensure valid indices, defensive bounds checking would prevent subtle bugs.

🔧 Suggested bounds check
     inline void setIssuePipe(TimeBuffer<DynInstPtr>& issuepipe, int pi) {
+      assert(pi >= 0 && pi < outports && "Invalid port index");
       toIssue[pi] = issuepipe.getWire(scheduleToExecDelay);
       toFu[pi] = issuepipe.getWire(0);
     }
src/cpu/o3/issue_queue.cc (2)

1386-1393: Minor: Add space after for keyword for consistency.

The iteration logic correctly nullifies canceled instructions across all inflight pipes.

🔧 Formatting suggestion
     for (auto& it : inflightIssues) {
-        for(int i = 0; i < it.getSize(); i++) {
+        for (int i = 0; i < it.getSize(); i++) {
             auto& inst = it[i];
             if (inst && inst->canceled()) {
                 inst = nullptr;
             }
         }
     }

1491-1499: Same formatting nit as in loadCancel.

The squash logic correctly nullifies squashed instructions. Consider consistent formatting.

🔧 Formatting suggestion
     for (auto& it : inflightIssues) {
-        for(int i = 0; i < it.getSize(); i++) {
+        for (int i = 0; i < it.getSize(); i++) {
             auto& inst = it[i];
             if (inst && inst->isSquashed()) {
                 inst = nullptr;
             }
         }
     }
src/cpu/timebuf.hh (1)

161-166: Move constructor should reset source's size to prevent double destruction.

The move constructor nulls the source's datas pointer but doesn't reset size. If the destructor is later called on the moved-from object, it will skip deletion (since datas is null with the proposed fix), but for clarity and to maintain invariants, size should also be reset.

♻️ Suggested improvement
 TimeBuffer(TimeBuffer<T> &&other) noexcept
-    :  _id(other._id), past(other.past), future(other.future), size(other.size), base(other.base), datas(other.datas)
+    : _id(other._id), past(other.past), future(other.future), size(other.size), base(other.base), datas(other.datas)
 {
     // Null out the other datas pointer to avoid double deletion
     other.datas = nullptr;
+    other.size = 0;
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 024b0bf and 85d0254.

📒 Files selected for processing (3)
  • src/cpu/o3/issue_queue.cc
  • src/cpu/o3/issue_queue.hh
  • src/cpu/timebuf.hh
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/issue_queue.cc (1)
src/cpu/timebuf.hh (9)
  • i (82-87)
  • i (82-82)
  • i (113-118)
  • i (126-131)
  • TimeBuffer (138-148)
  • TimeBuffer (150-150)
  • TimeBuffer (152-159)
  • TimeBuffer (161-166)
  • TimeBuffer (168-171)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run SPEC Test / XS-GEM5 - Run performance test (spec06-0.8c)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
🔇 Additional comments (13)
src/cpu/o3/issue_queue.hh (4)

109-117: LGTM on visibility reorganization.

The separation of public members (inports, outports, iqsize, replayQsize, scheduleToExecDelay, iqname) from private implementation details is a reasonable design choice for the per-port wiring mechanism.


139-140: Per-port wire vectors look correct.

The change from single wires to per-port vectors aligns with the new per-port inflight tracking design. Ensure these are properly sized before use in setIssuePipe.


200-206: readyQInsert implementation is correct.

Using std::lower_bound with select_policy maintains sorted order for the ready queue. The method correctly:

  1. Sets the instruction's inReadyQ flag
  2. Looks up the correct queue via readyQclassify
  3. Inserts at the sorted position

307-308: Centralized per-port inflight management looks good.

Moving inflightIssues to a flat std::vector<TimeBuffer<DynInstPtr>> at the scheduler level (one TimeBuffer per port across all IQs) simplifies lifecycle management and aligns with the per-port wiring via setIssuePipe.

src/cpu/o3/issue_queue.cc (7)

191-192: LGTM on vector initialization.

Resizing toIssue and toFu to outports ensures proper sizing before setIssuePipe is called.


384-400: Per-port iteration in issueToFu is correct.

The logic correctly:

  1. Iterates through each output port
  2. Dereferences the toFu wire to access the instruction
  3. Uses readyQInsert for re-queuing failed issues

517-517: Consistent use of readyQInsert.

Replacing the macro with the inline method improves readability and maintainability.


620-625: Per-port scheduling looks correct.

The instruction is assigned to the appropriate port's issue pipe via *toIssue[pi] = inst, which will propagate through the TimeBuffer delay.


833-839: TimeBuffer construction per outport.

Each IQ's outports get a dedicated TimeBuffer<DynInstPtr> with past=0 and future=scheduleToExecDelay. This correctly models the pipeline delay.


882-888: Per-port wiring logic is correct.

The global portid counter correctly maps each IQ's local port index to the global inflightIssues vector index.


1017-1020: Advancing all inflight pipes in tick().

The centralized advance of all inflightIssues ensures pipeline progression happens uniformly after all IQs have ticked.

src/cpu/timebuf.hh (2)

192-194: advance() correctly manages element lifecycle.

The sequence of explicitly calling the destructor, zeroing memory, then placement-new is correct for resetting the oldest slot in the circular buffer.


220-234: LGTM on access pattern changes.

The direct indexing into datas array is correct and aligns with the new storage model.

Comment on lines +150 to +170
TimeBuffer() {}

TimeBuffer(const TimeBuffer<T> &other)
: _id(other._id), past(other.past), future(other.future), size(other.size), base(other.base)
{
datas = new T[size];
for (unsigned i = 0; i < size; i++) {
datas[i] = other.datas[i]; // must use explicit copy to handle non-POD types
}
}

TimeBuffer(TimeBuffer<T> &&other) noexcept
: _id(other._id), past(other.past), future(other.future), size(other.size), base(other.base), datas(other.datas)
{
// Null out the other datas pointer to avoid double deletion
other.datas = nullptr;
}

~TimeBuffer()
{
for (unsigned i = 0; i < size; ++i)
(reinterpret_cast<T *>(index[i]))->~T();
delete [] data;
delete [] datas;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing copy and move assignment operators violates Rule of 5.

The class defines a destructor, copy constructor, and move constructor, but lacks copy assignment and move assignment operators. This can lead to:

  1. Double-free if default copy assignment is used
  2. Resource leaks if default move assignment is used
🐛 Proposed addition of assignment operators
TimeBuffer& operator=(const TimeBuffer<T>& other)
{
    if (this != &other) {
        // Clean up existing resources
        if (datas) {
            for (unsigned i = 0; i < size; i++) {
                datas[i].~T();
            }
            delete[] reinterpret_cast<char*>(datas);
        }
        
        // Copy from other
        _id = other._id;
        past = other.past;
        future = other.future;
        size = other.size;
        base = other.base;
        
        if (size > 0) {
            datas = (T*)new char[sizeof(T) * size];
            for (unsigned i = 0; i < size; i++) {
                new (datas + i) T(other.datas[i]);
            }
        } else {
            datas = nullptr;
        }
    }
    return *this;
}

TimeBuffer& operator=(TimeBuffer<T>&& other) noexcept
{
    if (this != &other) {
        // Clean up existing resources
        if (datas) {
            for (unsigned i = 0; i < size; i++) {
                datas[i].~T();
            }
            delete[] reinterpret_cast<char*>(datas);
        }
        
        // Move from other
        _id = other._id;
        past = other.past;
        future = other.future;
        size = other.size;
        base = other.base;
        datas = other.datas;
        
        other.datas = nullptr;
        other.size = 0;
    }
    return *this;
}
🤖 Prompt for AI Agents
In @src/cpu/timebuf.hh around lines 150 - 170, The class TimeBuffer defines a
destructor, copy ctor and move ctor but lacks copy and move assignment
operators; implement TimeBuffer& operator=(const TimeBuffer<T>&) and TimeBuffer&
operator=(TimeBuffer<T>&&) to satisfy the Rule of 5: in copy-assignment guard
against self-assignment, free existing resources held in datas, deep-copy
other's metadata (_id, past, future, size, base) and allocate/copy elements into
datas (handling non-POD via placement-new or proper element copy), and in
move-assignment guard self-assignment, free existing resources, transfer
metadata and datas pointer from other to this, set other.datas = nullptr and
other.size = 0, and return *this; ensure consistency with the existing
destructor and move/copy ctors to avoid double-free or leaks.

@github-actions
Copy link

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1691 -
This PR 2.1691 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link

[Generated by GEM5 Performance Robot]
commit: 85d0254
workflow: On-Demand SPEC Test (Tier 1.5)

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 19.91 19.90 +0.04 🟢

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/cpu/o3/issue_queue.hh`:
- Around line 210-213: setIssuePipe currently writes to toIssue[pi] and toFu[pi]
without validating pi; add a bounds check at the start of setIssuePipe to ensure
0 <= pi < N (where N is the size/capacity of toIssue/toFu) and early-return or
assert/log an error if out of range, then only call
issuepipe.getWire(scheduleToExecDelay) and issuepipe.getWire(0) when pi is
valid; reference setIssuePipe, toIssue, toFu, issuepipe.getWire, and
scheduleToExecDelay when implementing the check.
♻️ Duplicate comments (3)
src/cpu/timebuf.hh (3)

138-148: Memory allocation/deallocation mismatch (previously flagged).

The constructor allocates with new char[sizeof(T) * size] and casts to T*, but the destructor at line 170 uses delete[] datas. This is undefined behavior because:

  1. Memory allocated as char[] is deleted as T[]
  2. Element destructors are not explicitly called before deallocation

This was already flagged in a previous review with a proposed fix.


150-159: Copy constructor allocation inconsistency and missing assignment operators (previously flagged).

The copy constructor uses new T[size] (line 155) which differs from the main constructor's new char[] approach. This creates inconsistent behavior and the destructor will behave differently for copied vs originally-constructed objects.

Additionally, the class defines destructor, copy constructor, and move constructor but lacks copy and move assignment operators (Rule of 5 violation).

These issues were flagged in previous reviews with proposed fixes.


161-170: Move constructor and destructor.

The move constructor correctly transfers ownership by copying the datas pointer and nulling other.datas. However, the destructor issue (deleting char[]-allocated memory as T[]) remains unresolved as noted in previous reviews.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 85d0254 and a3dcd57.

📒 Files selected for processing (3)
  • src/cpu/o3/issue_queue.cc
  • src/cpu/o3/issue_queue.hh
  • src/cpu/timebuf.hh
🧰 Additional context used
🧬 Code graph analysis (1)
src/cpu/o3/issue_queue.cc (1)
src/cpu/timebuf.hh (9)
  • i (82-87)
  • i (82-82)
  • i (113-118)
  • i (126-131)
  • TimeBuffer (138-148)
  • TimeBuffer (150-150)
  • TimeBuffer (152-159)
  • TimeBuffer (161-166)
  • TimeBuffer (168-171)
🔇 Additional comments (14)
src/cpu/o3/issue_queue.hh (4)

109-117: Visibility changes look intentional for per-port wiring.

The exposure of inports, outports, iqsize, replayQsize, scheduleToExecDelay, and iqname as public members enables external access needed for the centralized per-port inflight management in Scheduler. This aligns with the architectural changes in this PR.


139-140: Per-port wire vectors correctly replace single inflight tracking.

The change from a single IssueStream to per-port std::vector<TimeBuffer<DynInstPtr>::wire> for both toIssue and toFu properly supports the new per-port inflight pipeline architecture.


200-205: Helper method for ready queue insertion looks correct.

The readyQInsert method properly:

  1. Marks the instruction as in the ready queue
  2. Looks up the correct ready queue by opClass
  3. Uses binary search for ordered insertion

307-308: Centralized inflight management in Scheduler.

The std::vector<TimeBuffer<DynInstPtr>> inflightIssues centralizes inflight issue tracking at the Scheduler level, which is then wired to individual IQs via setIssuePipe. This design properly separates ownership (Scheduler owns the buffers) from access (IQs use wires to read/write).

src/cpu/timebuf.hh (1)

192-194: Placement new in advance() is correct pattern.

The advance() method properly destructs the old element, zeroes memory, and constructs a new element in place. This is consistent with the main constructor's approach.

src/cpu/o3/issue_queue.cc (9)

191-192: Per-port wire vectors properly initialized.

The toIssue and toFu vectors are resized to match outports, preparing them for the per-port wiring performed later by Scheduler::setIssuePipe.


384-400: Per-port iteration in issueToFu looks correct.

The loop properly iterates over each outport's toFu wire and handles issue failures by reinserting into the ready queue via readyQInsert.


517-517: Consistent use of readyQInsert helper.

The helper method replaces what was likely a macro, improving code clarity.


620-625: Schedule path correctly uses per-port wiring.

The arbitration retry path uses readyQInsert, and scheduled instructions are written to the per-port toIssue[pi] wire.


833-839: TimeBuffer construction for per-port inflight management.

Each IQ outport gets its own TimeBuffer<DynInstPtr>(0, iq->scheduleToExecDelay) which creates a buffer with past=0 and future=scheduleToExecDelay. This allows accessing wire positions from 0 (current) to scheduleToExecDelay (future), matching the issue-to-execute pipeline depth.


882-888: Per-port wiring connects Scheduler's buffers to IQ wires.

The loop iterates through all IQs and their outports, calling setIssuePipe to wire each inflight buffer to the corresponding IQ port. The portid counter correctly tracks the global port index across all IQs.


1017-1020: Centralized inflight buffer advancement.

Moving the advance() calls to Scheduler::tick() centralizes timing control. All inflight buffers are advanced together after individual IQ ticks complete.


1386-1393: Load cancel clears canceled instructions from inflight buffers.

The iteration properly traverses all inflight buffers and their time slots, nulling any canceled instructions to prevent them from proceeding through the pipeline.


1491-1499: Squash clears squashed instructions from inflight buffers.

Similar to loadCancel, this properly clears squashed instructions from all inflight buffer slots to maintain pipeline consistency after a squash event.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +210 to +213
inline void setIssuePipe(TimeBuffer<DynInstPtr>& issuepipe, int pi) {
toIssue[pi] = issuepipe.getWire(scheduleToExecDelay);
toFu[pi] = issuepipe.getWire(0);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing bounds check in setIssuePipe.

The method accesses toIssue[pi] and toFu[pi] without validating that pi is within bounds. If called with an invalid port index, this will cause undefined behavior.

🐛 Proposed fix to add bounds checking
     inline void setIssuePipe(TimeBuffer<DynInstPtr>& issuepipe, int pi) {
+      assert(pi >= 0 && pi < outports);
       toIssue[pi] = issuepipe.getWire(scheduleToExecDelay);
       toFu[pi] = issuepipe.getWire(0);
     }
🤖 Prompt for AI Agents
In `@src/cpu/o3/issue_queue.hh` around lines 210 - 213, setIssuePipe currently
writes to toIssue[pi] and toFu[pi] without validating pi; add a bounds check at
the start of setIssuePipe to ensure 0 <= pi < N (where N is the size/capacity of
toIssue/toFu) and early-return or assert/log an error if out of range, then only
call issuepipe.getWire(scheduleToExecDelay) and issuepipe.getWire(0) when pi is
valid; reference setIssuePipe, toIssue, toFu, issuepipe.getWire, and
scheduleToExecDelay when implementing the check.

@github-actions
Copy link

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1691 -
This PR 2.1691 ➡️ 0.0000 (0.00%)

✅ Difftest smoke test passed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants