Conversation
Change-Id: I15f641d120697785bee6f8ea5fa89e64820ace89
This reverts commit ea054b5.
📝 WalkthroughWalkthroughRefactors BTB TAGE to an Exit‑Slot v2 provider‑based prediction path with dual exit‑slot encodings and selector counters, adds payload‑mapping and tracers, makes folded/path histories evolve on fall‑throughs (pseudo‑edge), rewrites tests, adds ping‑pong/upper‑bound analysis scripts, and disables TAGE bank‑conflict in kmhv3 config. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Prediction Request
participant Lookup as lookupProviders()
participant MainTbl as Main TAGE Table
participant AltTbl as Alt TAGE Table
participant PayloadMap as mapExitSlotToCondPC()
participant Result as TagePrediction
Client->>Lookup: startPC, predMeta
Lookup->>MainTbl: read main table entry
Lookup->>AltTbl: read alt table entry
Lookup-->>Client: return (mainInfo, altInfo)
Client->>Client: select source (Provider / Alt / Base) using conf & selCtr
Client->>PayloadMap: predEnc, BTB entries
PayloadMap->>PayloadMap: map predEnc -> predCondPC
alt Mapping Success
PayloadMap-->>Result: predCondPC, payloadMapped=true
else Mapping fails
PayloadMap-->>Result: baseEnc, payloadMapped=false
end
Result-->>Client: TagePrediction (source, predEnc, baseEnc, predCondPC)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee97182abd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (predEnc == 0 || predEnc > 32) { | ||
| return 0; |
There was a problem hiding this comment.
Use maxBranchPositions instead of hard-coded 32
The new mapExitSlotToCondPC hard-codes predEnc > 32 as invalid. This breaks non-default configurations where maxBranchPositions is set above 32 (e.g., larger fetch blocks or smaller instruction sizes), because getBranchIndexInBlock/training can legitimately produce encodings >32 while mapExitSlotToCondPC will always return 0, forcing payload-map failure and base fallback. This causes provider predictions to be discarded even when valid. Consider validating against maxBranchPositions (or a derived limit) instead of the constant 32.
Useful? React with 👍 / 👎.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
also update allocate method
|
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Align BTB PerformanceOverall Score
|
Change-Id: Iabd67fce1c2217a02f0a09708db13006ffb482d0
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/Gem5_Docs/frontend/upperbound_report2.md`:
- Around line 3-29: Add a short definition sentence before the data table that
explicitly defines "opt" and "ref" (e.g., "opt = optimized/new implementation
under test; ref = reference/baseline run") and clarify how they are used in
metric names such as acc_dir(ref) and UB_dir/UB_exit (these UB_* values are
computed per-key from TAGEMISSTRACE using majority-vote on the indicated feature
key); keep it one or two lines so readers immediately understand that "ref"
refers to the baseline trace used for comparisons and "opt" to the candidate
implementation whose metrics are being reported.
In `@util/xs_scripts/bp_db_upperbound.py`:
- Around line 97-142: After computing n = int(cur.execute("select count(*) from
TAGEMISSTRACE;").fetchone()[0]) add an early guard for n == 0 that returns
immediately (matching the behavior in _dir_upperbounds) to avoid executing the
subsequent aggregate queries that can return NULL and cause float(None) errors;
locate the block around variables cur, n, actual_acc/provider_acc/base_acc and
ub_startpc and implement the early return before any float(...) conversions or
the UB(startPC) CTE runs.
🧹 Nitpick comments (8)
docs/Gem5_Docs/frontend/block-tage.md (4)
50-51: Clarify the 32B alignment rationale for a 64B fetch block.The document states
predictWidth = 64Bbut then mentionsalignedStartPC 取 fetch 起始地址按 32B 对齐. This appears intentional (likely for MBTB half-alignment compatibility), but the rationale isn't explicitly stated. This could confuse readers who expect 64B-aligned blocks for a 64B predictor.📝 Suggested clarification
Consider adding a brief explanation after line 51:
* **Slot(指令位置槽)**:以 **2B 粒度**划分 64B block,共 `32` 个 slot,范围 `0..31`。slot 计算方式与当前实现一致:`slot = (branchPC - alignedStartPC) >> instShiftAmt`,其中 `instShiftAmt=1`。其中 `alignedStartPC` 取 fetch 起始地址按 32B 对齐(MBTB half-aligned),因此 slot 覆盖的地址范围是 `[alignedStartPC, alignedStartPC+64B)`。 + > 注:使用 32B 对齐而非 64B 对齐是为了与现有 MBTB 的 half-bank 边界保持兼容,允许跨 32B 边界的 fetch block。
121-127: Clarify Conf counter representation (signed vs. unsigned).The document specifies
Conf in {0, -1}as the weak threshold (lines 124, 318) but doesn't explicitly state whether the Conf counter is signed or unsigned. For a 3-bit field:
- Signed (two's complement): range [-4, 3], weak={0, -1} is clear
- Unsigned: range [0, 7], the notation "-1" is ambiguous
📝 Suggested clarification
Add explicit representation details in the table or in section 3.3.8:
| **Conf** | 2-3 | **置信计数器(建议 3 bits)**:表示该 payload 在该相关历史下是否稳定可靠。<br>弱态阈值建议沿用现有经验:`Conf in {0, -1}` 视为 weak。<br>更新规则与 per-branch 的 taken/nt 不同:**用 "是否预测正确" 来更新 Conf**(见 3.3)。 | + > 注:Conf 采用有符号饱和计数器(3-bit 范围 [-4, 3]),初始值为 0 或 -1(weak),饱和上界为 3,饱和下界为 -4。
257-273: Consider specifying allocation failure handling for strong-but-wrong case.The three-rule update strategy is sound, but the "strong-but-wrong" case (lines 264-268) attempts to allocate in longer history tables without specifying what happens if all candidate entries are occupied and cannot be replaced (e.g., all have U=1).
📝 Suggested addition
After line 268, add a fallback specification:
- 行为:在更长历史表中尝试 allocate 写入 `RealEnc`,原 entry payload 不立刻改(防抖)。 + - 若所有候选表项均无法分配(所有 U=1 或替换策略不允许),则降级为 weak-and-wrong 策略:原地 rewrite payload。This ensures the specification is complete for all edge cases.
370-375: Discussion point 1 is already resolved in the design.The first discussion point (lines 371-372) about MBTB miss/unmappable payload handling asks for a recommendation, but this is already specified in section 3.2.4 (lines 168-173): the design explicitly recommends falling back to base MBTB ctr-based prediction rather than forcing fallthrough.
📝 Suggested revision
Update the discussion point to reflect that this is resolved:
-1. **MBTB Miss / 不可映射 payload 的处理**: 若 TAGE 预测的 `ExitSlotEnc` 在当前 `btbEntries` 中找不到对应的 cond entry(MBTB miss/未学到/过滤导致),推荐回退到 Base(按 MBTB 内 cond 的 `ctr` 方向预测),而不是强制 fallthrough;否则可能出现不必要的性能回退。 +1. **MBTB Miss / 不可映射 payload 的处理**: 已在 3.2.4 节明确:回退到 Base(按 MBTB 内 cond 的 `ctr` 方向预测)。需要在实现中验证此策略的性能影响。docs/Gem5_Docs/frontend/upperbound_report2.md (1)
1-1: Clarify whether this path is temporary or should be generalized.The title includes a specific debug path
/tmp/debug/tage-new6. If this is auto-generated report output, consider adding a note explaining that the path is example-specific. If this is meant to be permanent documentation, consider using a more generic placeholder or removing the path entirely.util/xs_scripts/bp_db_upperbound.py (3)
245-371: Repeated majority-vote CTE pattern could be extracted into a helper.The same SQL structure (group-by key+label → max per key → sum/total) appears 9+ times across
_tage_upperboundsand_dir_upperboundswith only the key columns and label column varying. A small helper would eliminate the duplication and make adding new key combinations trivial.♻️ Example helper sketch
def _majority_vote_ub( cur: sqlite3.Cursor, table: str, key_cols: list[str], label_col: str, ) -> float: keys = ", ".join(key_cols) return float( cur.execute( f""" WITH per_label AS ( SELECT {keys}, {label_col}, count(*) AS c FROM {table} GROUP BY {keys}, {label_col} ), per_key AS ( SELECT {keys}, max(c) AS mx FROM per_label GROUP BY {keys} ) SELECT 1.0*sum(mx)/(SELECT count(*) FROM {table}) FROM per_key; """ ).fetchone()[0] )
405-411: Connection not closed on exception — use a context manager.If
_tage_upperboundsor_dir_upperboundsraises (e.g., then == 0crash above),con.close()is skipped.♻️ Proposed fix
def _analyze_one(db: str) -> Dict[str, object]: - con = _connect(db) - ub = _tage_upperbounds(con) - dir_ub = _dir_upperbounds(con) - bp = _mispred_rate(con) - con.close() + con = _connect(db) + try: + ub = _tage_upperbounds(con) + dir_ub = _dir_upperbounds(con) + bp = _mispred_rate(con) + finally: + con.close() return {"db": db, "ub": ub, "dir_ub": dir_ub, "bp": bp}
516-543:_bp_fmtand_pctare redefined on every loop iteration;_pctduplicates_fmt_pct.
_pct(line 540) is functionally identical to_fmt_pct(line 386). Both helpers can be moved outside the loop — neither captures loop state.♻️ Remove _pct, reuse _fmt_pct, hoist _bp_fmt
Move
_bp_fmtbefore the loop (or to module level), and replace all_pct(...)calls with_fmt_pct(...):+ def _bp_fmt(x: Optional[Tuple[int, int, float]]) -> str: + if x is None: + return "n/a" + return f"{x[2]*100:5.2f}%" + for base, opt, ref in rows: opt_bp = opt["bp"] if opt else None # type: ignore[index] ref_bp = ref["bp"] if ref else None # type: ignore[index] - def _bp_fmt(x: Optional[Tuple[int, int, float]]) -> str: - if x is None: - return "n/a" - return f"{x[2]*100:5.2f}%" - ... - def _pct(x: Optional[float]) -> str: - if x is None: - return "n/a" - return f"{x*100:5.1f}%" - print( ... - _pct(acc_exit), - _pct(ub_exit2), + _fmt_pct(acc_exit), + _fmt_pct(ub_exit2), ... )
| cur = con.cursor() | ||
| n = int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0]) | ||
|
|
||
| actual_acc = None | ||
| provider_acc = None | ||
| base_acc = None | ||
| if "predEnc" in cols: | ||
| actual_acc = float( | ||
| cur.execute( | ||
| "select 1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) " | ||
| "from TAGEMISSTRACE;" | ||
| ).fetchone()[0] | ||
| ) | ||
| if "predSource" in cols: | ||
| v = cur.execute( | ||
| "select case when count(*)=0 then null else " | ||
| "1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end " | ||
| "from TAGEMISSTRACE where predSource=0;" | ||
| ).fetchone()[0] | ||
| provider_acc = (None if v is None else float(v)) | ||
| v = cur.execute( | ||
| "select case when count(*)=0 then null else " | ||
| "1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end " | ||
| "from TAGEMISSTRACE where predSource=2;" | ||
| ).fetchone()[0] | ||
| base_acc = (None if v is None else float(v)) | ||
|
|
||
| # UB(startPC) | ||
| ub_startpc = float( | ||
| cur.execute( | ||
| """ | ||
| with per_label as ( | ||
| select startPC, realEnc, count(*) as c | ||
| from TAGEMISSTRACE | ||
| group by startPC, realEnc | ||
| ), | ||
| per_startpc as ( | ||
| select startPC, max(c) as mx | ||
| from per_label | ||
| group by startPC | ||
| ) | ||
| select 1.0*sum(mx)/(select count(*) from TAGEMISSTRACE) | ||
| from per_startpc; | ||
| """ | ||
| ).fetchone()[0] | ||
| ) |
There was a problem hiding this comment.
Missing n == 0 guard — float(None) crash when TAGEMISSTRACE is empty.
Unlike _dir_upperbounds (line 223), this function lacks an early return when n == 0. If the table exists with the realEnc column but has no rows, the CTE produces sum(mx) → NULL divided by count(*) → 0, SQLite returns NULL, and float(None) raises TypeError.
🐛 Proposed fix: add an early return guard for n == 0
n = int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0])
+ if n == 0:
+ return UBRes(
+ n=0,
+ actual_acc=None,
+ provider_acc=None,
+ base_acc=None,
+ ub_startpc=None,
+ ub_startpc_hist=None,
+ ub_startpc_fullhist=None,
+ )
actual_acc = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cur = con.cursor() | |
| n = int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0]) | |
| actual_acc = None | |
| provider_acc = None | |
| base_acc = None | |
| if "predEnc" in cols: | |
| actual_acc = float( | |
| cur.execute( | |
| "select 1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) " | |
| "from TAGEMISSTRACE;" | |
| ).fetchone()[0] | |
| ) | |
| if "predSource" in cols: | |
| v = cur.execute( | |
| "select case when count(*)=0 then null else " | |
| "1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end " | |
| "from TAGEMISSTRACE where predSource=0;" | |
| ).fetchone()[0] | |
| provider_acc = (None if v is None else float(v)) | |
| v = cur.execute( | |
| "select case when count(*)=0 then null else " | |
| "1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end " | |
| "from TAGEMISSTRACE where predSource=2;" | |
| ).fetchone()[0] | |
| base_acc = (None if v is None else float(v)) | |
| # UB(startPC) | |
| ub_startpc = float( | |
| cur.execute( | |
| """ | |
| with per_label as ( | |
| select startPC, realEnc, count(*) as c | |
| from TAGEMISSTRACE | |
| group by startPC, realEnc | |
| ), | |
| per_startpc as ( | |
| select startPC, max(c) as mx | |
| from per_label | |
| group by startPC | |
| ) | |
| select 1.0*sum(mx)/(select count(*) from TAGEMISSTRACE) | |
| from per_startpc; | |
| """ | |
| ).fetchone()[0] | |
| ) | |
| cur = con.cursor() | |
| n = int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0]) | |
| if n == 0: | |
| return UBRes( | |
| n=0, | |
| actual_acc=None, | |
| provider_acc=None, | |
| base_acc=None, | |
| ub_startpc=None, | |
| ub_startpc_hist=None, | |
| ub_startpc_fullhist=None, | |
| ) | |
| actual_acc = None | |
| provider_acc = None | |
| base_acc = None | |
| if "predEnc" in cols: | |
| actual_acc = float( | |
| cur.execute( | |
| "select 1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) " | |
| "from TAGEMISSTRACE;" | |
| ).fetchone()[0] | |
| ) | |
| if "predSource" in cols: | |
| v = cur.execute( | |
| "select case when count(*)=0 then null else " | |
| "1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end " | |
| "from TAGEMISSTRACE where predSource=0;" | |
| ).fetchone()[0] | |
| provider_acc = (None if v is None else float(v)) | |
| v = cur.execute( | |
| "select case when count(*)=0 then null else " | |
| "1.0*sum(case when predEnc=realEnc then 1 else 0 end)/count(*) end " | |
| "from TAGEMISSTRACE where predSource=2;" | |
| ).fetchone()[0] | |
| base_acc = (None if v is None else float(v)) | |
| # UB(startPC) | |
| ub_startpc = float( | |
| cur.execute( | |
| """ | |
| with per_label as ( | |
| select startPC, realEnc, count(*) as c | |
| from TAGEMISSTRACE | |
| group by startPC, realEnc | |
| ), | |
| per_startpc as ( | |
| select startPC, max(c) as mx | |
| from per_label | |
| group by startPC | |
| ) | |
| select 1.0*sum(mx)/(select count(*) from TAGEMISSTRACE) | |
| from per_startpc; | |
| """ | |
| ).fetchone()[0] | |
| ) |
🤖 Prompt for AI Agents
In `@util/xs_scripts/bp_db_upperbound.py` around lines 97 - 142, After computing n
= int(cur.execute("select count(*) from TAGEMISSTRACE;").fetchone()[0]) add an
early guard for n == 0 that returns immediately (matching the behavior in
_dir_upperbounds) to avoid executing the subsequent aggregate queries that can
return NULL and cause float(None) errors; locate the block around variables cur,
n, actual_acc/provider_acc/base_acc and ub_startpc and implement the early
return before any float(...) conversions or the UB(startPC) CTE runs.
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
use new bank-tage instead of old per-branch tage[WIP]
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
Documentation
Chore