Skip to content

exit-slot TAGE#754

Open
jensen-yan wants to merge 7 commits intoxs-devfrom
bank-tage1-align
Open

exit-slot TAGE#754
jensen-yan wants to merge 7 commits intoxs-devfrom
bank-tage1-align

Conversation

@jensen-yan
Copy link
Collaborator

@jensen-yan jensen-yan commented Feb 4, 2026

use new bank-tage instead of old per-branch tage[WIP]

Summary by CodeRabbit

  • New Features

    • Added offline analysis tools for exit-slot TAGE patterns and upper-bound reports.
  • Refactor

    • Migrated predictor to a block-based exit-slot/provider model with dual payloads, richer mapping, and per-prediction confidence.
  • Bug Fixes

    • Path/history updates now evolve on fall-throughs to improve prediction consistency.
  • Tests

    • Reworked tests for batch/block-style predictions and the revised metadata/prediction shape.
  • Documentation

    • New design and upper-bound docs describing block-based exit-slot TAGE behavior.
  • Chore

    • Disabled bank-conflict handling in the default predictor config.

@coderabbitai
Copy link

coderabbitai bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

Refactors BTB TAGE to an Exit‑Slot v2 provider‑based prediction path with dual exit‑slot encodings and selector counters, adds payload‑mapping and tracers, makes folded/path histories evolve on fall‑throughs (pseudo‑edge), rewrites tests, adds ping‑pong/upper‑bound analysis scripts, and disables TAGE bank‑conflict in kmhv3 config.

Changes

Cohort / File(s) Summary
Configuration
configs/example/kmhv3.py
Add cpu.branchPred.tage.enableBankConflict = False to disable TAGE bank‑conflict handling.
BTB TAGE core
src/cpu/pred/btb/btb_tage.hh, src/cpu/pred/btb/btb_tage.cc
Replace single‑prediction flow with provider‑based lookup (lookupProviders) and PredSource; introduce dual exit‑slot encodings (exitSlotEnc0/1), selector (selCtr), new enc/base/pred mapping (getBaseExitSlotEnc, mapExitSlotToCondPC), update/allocation refactor, tag/index signature changes, many new trace fields and stats.
Trace struct
src/cpu/pred/btb/common.hh
Extend TageMissTrace::set() signature with 12 exit‑slot debug fields (mainTag, altTag, mainPayload, altPayload, mainPayload1, altPayload1, mainSel, altSel, baseEnc, predEnc, realEnc, predSource).
Folded & path history
src/cpu/pred/btb/folded_hist.cc, src/cpu/pred/btb/btb_ittage.cc, src/cpu/pred/btb/btb_mgsc.cc, src/cpu/pred/btb/decoupled_bpred.cc
Unconditionally evolve folded/path histories on fall‑throughs using a pseudo‑edge strategy; adjust callers and recovery/spec updates to use pseudo edges.
MGSC / ITTAGE adjustments
src/cpu/pred/btb/btb_mgsc.cc, src/cpu/pred/btb/btb_ittage.cc
Ensure folded/history updates run for non‑taken cases by substituting pseudo edges and always updating folded histories.
Tests
src/cpu/pred/btb/test/btb_tage.test.cc, src/cpu/pred/btb/test/btb_mgsc.test.cc
Rewrite tests to batch/vector interfaces, adapt to new TagePrediction shape (predEnc, predCondPC, source, useAlt), assert allocation/fallback/payload mapping behaviors, and incorporate pseudo‑edge history semantics.
Analysis tooling
util/xs_scripts/bp_db_tage_pingpong.py, util/xs_scripts/bp_db_upperbound.py
Add scripts: bp_db_tage_pingpong.py to analyze TAGEMISSTRACE ping‑pong behaviors; bp_db_upperbound.py to compute offline upper bounds for block/dir predictions from bp.db traces.
Docs
docs/Gem5_Docs/frontend/block-tage.md, docs/Gem5_Docs/frontend/upperbound_report2.md
Add design doc for Block‑Based Exit‑Slot TAGE and an offline upper‑bound report describing methodology and results.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Prediction Request
    participant Lookup as lookupProviders()
    participant MainTbl as Main TAGE Table
    participant AltTbl as Alt TAGE Table
    participant PayloadMap as mapExitSlotToCondPC()
    participant Result as TagePrediction

    Client->>Lookup: startPC, predMeta
    Lookup->>MainTbl: read main table entry
    Lookup->>AltTbl: read alt table entry
    Lookup-->>Client: return (mainInfo, altInfo)

    Client->>Client: select source (Provider / Alt / Base) using conf & selCtr
    Client->>PayloadMap: predEnc, BTB entries
    PayloadMap->>PayloadMap: map predEnc -> predCondPC
    alt Mapping Success
        PayloadMap-->>Result: predCondPC, payloadMapped=true
    else Mapping fails
        PayloadMap-->>Result: baseEnc, payloadMapped=false
    end

    Result-->>Client: TagePrediction (source, predEnc, baseEnc, predCondPC)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

align-kmhv3

Suggested reviewers

  • Yakkhini
  • CJ362ff
  • tastynoob

Poem

🐰 I hopped through tags and folded trails,
Two slots to choose, two tiny tails,
A pseudo edge made history dance,
Tests and traces got their chance,
Ping‑pong patterns jump and prance.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'exit-slot TAGE' is vague and does not clearly convey the scope of the substantial changes made across multiple files including configuration, core predictor logic, test utilities, and documentation. Consider a more descriptive title that indicates the main architectural change, such as 'Implement Exit-Slot TAGE predictor with provider-based lookup' or 'Migrate BTB TAGE to Exit-Slot v2 model with payload mapping'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bank-tage1-align

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee97182abd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +335 to +336
if (predEnc == 0 || predEnc > 32) {
return 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use maxBranchPositions instead of hard-coded 32

The new mapExitSlotToCondPC hard-codes predEnc > 32 as invalid. This breaks non-default configurations where maxBranchPositions is set above 32 (e.g., larger fetch blocks or smaller instruction sizes), because getBranchIndexInBlock/training can legitimately produce encodings >32 while mapExitSlotToCondPC will always return 0, forcing payload-map failure and base fallback. This causes provider predictions to be discarded even when valid. Consider validating against maxBranchPositions (or a derived limit) instead of the constant 32.

Useful? React with 👍 / 👎.

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1727 -
This PR 1.4528 📉 -0.7199 (-33.14%)

✅ Difftest smoke test passed!

@XiangShanRobot
Copy link

[Generated by GEM5 Performance Robot]
commit: 0defcb8
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Master Diff(%)
Score 15.73 17.55 -10.39 🔴

[Generated by GEM5 Performance Robot]
commit: 0defcb8
workflow: gem5 Align BTB Performance Test(0.3c)

Align BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 15.73 13.96 +12.67 🟢

Change-Id: Iabd67fce1c2217a02f0a09708db13006ffb482d0
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/Gem5_Docs/frontend/upperbound_report2.md`:
- Around line 3-29: Add a short definition sentence before the data table that
explicitly defines "opt" and "ref" (e.g., "opt = optimized/new implementation
under test; ref = reference/baseline run") and clarify how they are used in
metric names such as acc_dir(ref) and UB_dir/UB_exit (these UB_* values are
computed per-key from TAGEMISSTRACE using majority-vote on the indicated feature
key); keep it one or two lines so readers immediately understand that "ref"
refers to the baseline trace used for comparisons and "opt" to the candidate
implementation whose metrics are being reported.

In `@util/xs_scripts/bp_db_upperbound.py`:
- Around line 97-142: After computing n = int(cur.execute("select count(*) from
TAGEMISSTRACE;").fetchone()[0]) add an early guard for n == 0 that returns
immediately (matching the behavior in _dir_upperbounds) to avoid executing the
subsequent aggregate queries that can return NULL and cause float(None) errors;
locate the block around variables cur, n, actual_acc/provider_acc/base_acc and
ub_startpc and implement the early return before any float(...) conversions or
the UB(startPC) CTE runs.
🧹 Nitpick comments (8)
docs/Gem5_Docs/frontend/block-tage.md (4)

50-51: Clarify the 32B alignment rationale for a 64B fetch block.

The document states predictWidth = 64B but then mentions alignedStartPC 取 fetch 起始地址按 32B 对齐. This appears intentional (likely for MBTB half-alignment compatibility), but the rationale isn't explicitly stated. This could confuse readers who expect 64B-aligned blocks for a 64B predictor.

📝 Suggested clarification

Consider adding a brief explanation after line 51:

 * **Slot(指令位置槽)**:以 **2B 粒度**划分 64B block,共 `32` 个 slot,范围 `0..31`。slot 计算方式与当前实现一致:`slot = (branchPC - alignedStartPC) >> instShiftAmt`,其中 `instShiftAmt=1`。其中 `alignedStartPC` 取 fetch 起始地址按 32B 对齐(MBTB half-aligned),因此 slot 覆盖的地址范围是 `[alignedStartPC, alignedStartPC+64B)`。
+  > 注:使用 32B 对齐而非 64B 对齐是为了与现有 MBTB 的 half-bank 边界保持兼容,允许跨 32B 边界的 fetch block。

121-127: Clarify Conf counter representation (signed vs. unsigned).

The document specifies Conf in {0, -1} as the weak threshold (lines 124, 318) but doesn't explicitly state whether the Conf counter is signed or unsigned. For a 3-bit field:

  • Signed (two's complement): range [-4, 3], weak={0, -1} is clear
  • Unsigned: range [0, 7], the notation "-1" is ambiguous
📝 Suggested clarification

Add explicit representation details in the table or in section 3.3.8:

 | **Conf** | 2-3 | **置信计数器(建议 3 bits)**:表示该 payload 在该相关历史下是否稳定可靠。<br>弱态阈值建议沿用现有经验:`Conf in {0, -1}` 视为 weak。<br>更新规则与 per-branch 的 taken/nt 不同:**用 "是否预测正确" 来更新 Conf**(见 3.3)。 |
+  > 注:Conf 采用有符号饱和计数器(3-bit 范围 [-4, 3]),初始值为 0 或 -1(weak),饱和上界为 3,饱和下界为 -4。

257-273: Consider specifying allocation failure handling for strong-but-wrong case.

The three-rule update strategy is sound, but the "strong-but-wrong" case (lines 264-268) attempts to allocate in longer history tables without specifying what happens if all candidate entries are occupied and cannot be replaced (e.g., all have U=1).

📝 Suggested addition

After line 268, add a fallback specification:

    - 行为:在更长历史表中尝试 allocate 写入 `RealEnc`,原 entry payload 不立刻改(防抖)。
+   - 若所有候选表项均无法分配(所有 U=1 或替换策略不允许),则降级为 weak-and-wrong 策略:原地 rewrite payload。

This ensures the specification is complete for all edge cases.


370-375: Discussion point 1 is already resolved in the design.

The first discussion point (lines 371-372) about MBTB miss/unmappable payload handling asks for a recommendation, but this is already specified in section 3.2.4 (lines 168-173): the design explicitly recommends falling back to base MBTB ctr-based prediction rather than forcing fallthrough.

📝 Suggested revision

Update the discussion point to reflect that this is resolved:

-1. **MBTB Miss / 不可映射 payload 的处理**: 若 TAGE 预测的 `ExitSlotEnc` 在当前 `btbEntries` 中找不到对应的 cond entry(MBTB miss/未学到/过滤导致),推荐回退到 Base(按 MBTB 内 cond 的 `ctr` 方向预测),而不是强制 fallthrough;否则可能出现不必要的性能回退。
+1. **MBTB Miss / 不可映射 payload 的处理**: 已在 3.2.4 节明确:回退到 Base(按 MBTB 内 cond 的 `ctr` 方向预测)。需要在实现中验证此策略的性能影响。
docs/Gem5_Docs/frontend/upperbound_report2.md (1)

1-1: Clarify whether this path is temporary or should be generalized.

The title includes a specific debug path /tmp/debug/tage-new6. If this is auto-generated report output, consider adding a note explaining that the path is example-specific. If this is meant to be permanent documentation, consider using a more generic placeholder or removing the path entirely.

util/xs_scripts/bp_db_upperbound.py (3)

245-371: Repeated majority-vote CTE pattern could be extracted into a helper.

The same SQL structure (group-by key+label → max per key → sum/total) appears 9+ times across _tage_upperbounds and _dir_upperbounds with only the key columns and label column varying. A small helper would eliminate the duplication and make adding new key combinations trivial.

♻️ Example helper sketch
def _majority_vote_ub(
    cur: sqlite3.Cursor,
    table: str,
    key_cols: list[str],
    label_col: str,
) -> float:
    keys = ", ".join(key_cols)
    return float(
        cur.execute(
            f"""
            WITH per_label AS (
              SELECT {keys}, {label_col}, count(*) AS c
              FROM {table}
              GROUP BY {keys}, {label_col}
            ),
            per_key AS (
              SELECT {keys}, max(c) AS mx
              FROM per_label
              GROUP BY {keys}
            )
            SELECT 1.0*sum(mx)/(SELECT count(*) FROM {table})
            FROM per_key;
            """
        ).fetchone()[0]
    )

405-411: Connection not closed on exception — use a context manager.

If _tage_upperbounds or _dir_upperbounds raises (e.g., the n == 0 crash above), con.close() is skipped.

♻️ Proposed fix
 def _analyze_one(db: str) -> Dict[str, object]:
-    con = _connect(db)
-    ub = _tage_upperbounds(con)
-    dir_ub = _dir_upperbounds(con)
-    bp = _mispred_rate(con)
-    con.close()
+    con = _connect(db)
+    try:
+        ub = _tage_upperbounds(con)
+        dir_ub = _dir_upperbounds(con)
+        bp = _mispred_rate(con)
+    finally:
+        con.close()
     return {"db": db, "ub": ub, "dir_ub": dir_ub, "bp": bp}

516-543: _bp_fmt and _pct are redefined on every loop iteration; _pct duplicates _fmt_pct.

_pct (line 540) is functionally identical to _fmt_pct (line 386). Both helpers can be moved outside the loop — neither captures loop state.

♻️ Remove _pct, reuse _fmt_pct, hoist _bp_fmt

Move _bp_fmt before the loop (or to module level), and replace all _pct(...) calls with _fmt_pct(...):

+    def _bp_fmt(x: Optional[Tuple[int, int, float]]) -> str:
+        if x is None:
+            return "n/a"
+        return f"{x[2]*100:5.2f}%"
+
     for base, opt, ref in rows:
         opt_bp = opt["bp"] if opt else None  # type: ignore[index]
         ref_bp = ref["bp"] if ref else None  # type: ignore[index]
 
-        def _bp_fmt(x: Optional[Tuple[int, int, float]]) -> str:
-            if x is None:
-                return "n/a"
-            return f"{x[2]*100:5.2f}%"
-
         ...
 
-        def _pct(x: Optional[float]) -> str:
-            if x is None:
-                return "n/a"
-            return f"{x*100:5.1f}%"
-
         print(
             ...
-                _pct(acc_exit),
-                _pct(ub_exit2),
+                _fmt_pct(acc_exit),
+                _fmt_pct(ub_exit2),
                 ...
         )

Comment on lines +97 to +142
cur = con.cursor()
n = int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0])

actual_acc = None
provider_acc = None
base_acc = None
if "predEnc" in cols:
actual_acc = float(
cur.execute(
"select 1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) "
"from TAGEMISSTRACE;"
).fetchone()[0]
)
if "predSource" in cols:
v = cur.execute(
"select case when count(*)=0 then null else "
"1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end "
"from TAGEMISSTRACE where predSource=0;"
).fetchone()[0]
provider_acc = (None if v is None else float(v))
v = cur.execute(
"select case when count(*)=0 then null else "
"1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end "
"from TAGEMISSTRACE where predSource=2;"
).fetchone()[0]
base_acc = (None if v is None else float(v))

# UB(startPC)
ub_startpc = float(
cur.execute(
"""
with per_label as (
select startPC, realEnc, count(*) as c
from TAGEMISSTRACE
group by startPC, realEnc
),
per_startpc as (
select startPC, max(c) as mx
from per_label
group by startPC
)
select 1.0*sum(mx)/(select count(*) from TAGEMISSTRACE)
from per_startpc;
"""
).fetchone()[0]
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing n == 0 guard — float(None) crash when TAGEMISSTRACE is empty.

Unlike _dir_upperbounds (line 223), this function lacks an early return when n == 0. If the table exists with the realEnc column but has no rows, the CTE produces sum(mx) → NULL divided by count(*) → 0, SQLite returns NULL, and float(None) raises TypeError.

🐛 Proposed fix: add an early return guard for n == 0
     n = int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0])
+    if n == 0:
+        return UBRes(
+            n=0,
+            actual_acc=None,
+            provider_acc=None,
+            base_acc=None,
+            ub_startpc=None,
+            ub_startpc_hist=None,
+            ub_startpc_fullhist=None,
+        )
 
     actual_acc = None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cur = con.cursor()
n = int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0])
actual_acc = None
provider_acc = None
base_acc = None
if "predEnc" in cols:
actual_acc = float(
cur.execute(
"select 1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) "
"from TAGEMISSTRACE;"
).fetchone()[0]
)
if "predSource" in cols:
v = cur.execute(
"select case when count(*)=0 then null else "
"1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end "
"from TAGEMISSTRACE where predSource=0;"
).fetchone()[0]
provider_acc = (None if v is None else float(v))
v = cur.execute(
"select case when count(*)=0 then null else "
"1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end "
"from TAGEMISSTRACE where predSource=2;"
).fetchone()[0]
base_acc = (None if v is None else float(v))
# UB(startPC)
ub_startpc = float(
cur.execute(
"""
with per_label as (
select startPC, realEnc, count(*) as c
from TAGEMISSTRACE
group by startPC, realEnc
),
per_startpc as (
select startPC, max(c) as mx
from per_label
group by startPC
)
select 1.0*sum(mx)/(select count(*) from TAGEMISSTRACE)
from per_startpc;
"""
).fetchone()[0]
)
cur = con.cursor()
n = int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0])
if n == 0:
return UBRes(
n=0,
actual_acc=None,
provider_acc=None,
base_acc=None,
ub_startpc=None,
ub_startpc_hist=None,
ub_startpc_fullhist=None,
)
actual_acc = None
provider_acc = None
base_acc = None
if "predEnc" in cols:
actual_acc = float(
cur.execute(
"select 1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) "
"from TAGEMISSTRACE;"
).fetchone()[0]
)
if "predSource" in cols:
v = cur.execute(
"select case when count(*)=0 then null else "
"1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end "
"from TAGEMISSTRACE where predSource=0;"
).fetchone()[0]
provider_acc = (None if v is None else float(v))
v = cur.execute(
"select case when count(*)=0 then null else "
"1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end "
"from TAGEMISSTRACE where predSource=2;"
).fetchone()[0]
base_acc = (None if v is None else float(v))
# UB(startPC)
ub_startpc = float(
cur.execute(
"""
with per_label as (
select startPC, realEnc, count(*) as c
from TAGEMISSTRACE
group by startPC, realEnc
),
per_startpc as (
select startPC, max(c) as mx
from per_label
group by startPC
)
select 1.0*sum(mx)/(select count(*) from TAGEMISSTRACE)
from per_startpc;
"""
).fetchone()[0]
)
🤖 Prompt for AI Agents
In `@util/xs_scripts/bp_db_upperbound.py` around lines 97 - 142, After computing n
= int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0]) add an
early guard for n == 0 that returns immediately (matching the behavior in
_dir_upperbounds) to avoid executing the subsequent aggregate queries that can
return NULL and cause float(None) errors; locate the block around variables cur,
n, actual_acc/provider_acc/base_acc and ub_startpc and implement the early
return before any float(...) conversions or the UB(startPC) CTE runs.

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

🚀 Coremark Smoke Test Results

Branch IPC Change
Base (xs-dev) 2.1727 -
This PR 1.9525 📉 -0.2201 (-10.13%)

✅ Difftest smoke test passed!

@jensen-yan jensen-yan changed the title Bank tage1 align exit-slot TAGE Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants