Ahead utage history perf by CJ362ff · Pull Request #715 · OpenXiangShan/GEM5

CJ362ff · 2026-01-15T08:17:57Z

Summary by CodeRabbit

New Features
- Added a MicroTAGE branch predictor option with folded-history prediction, bank-conflict handling, and extended prediction statistics.
- Configuration switches now allow enabling MicroTAGE per simulation profile.
Chores
- Integrated MicroTAGE into the build and predictor type selection.
- Updated branch-predictor wiring and configuration interfaces to support the new predictor.
Configuration
- Example configs updated: one profile enables MicroTAGE; another contains a commented placeholder for disabling it.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Change-Id: I8b6a4bda945564cc961251fc1297a6bacd8636cb

…cases

Change-Id: I44b6175022a3a593ed385407bd95ec0c40c74642

Change-Id: I16ce79a7d8488d9a138d0a26f5500576ae54132e

Change-Id: I8be037a1cdacd7151f4fe8e743a32cca90a85036

Change-Id: I6b9c92c18c574a12c19532ae0b894e64c1187342

Change-Id: I953965b8e6feb1c15a238ac832d65bc16b32f496

Change-Id: I39cd471d452aa343c3dd741a80fdfa7d126e3a9f

Change-Id: I3567ae93652aac218c5b4646003abadddaf7cf32

Change-Id: If7c9d3aa68a23c36dde74d8cf3a286c9c48f3e3c

Change-Id: I83277ae5c801e9d22b594286580459d12cdec69b

Change-Id: I394246af184d3f07e02b85f06e4e5ceed368ec22

Change-Id: I762c11f8d15262fb9f1c9d443f77895fa76bbc79

coderabbitai · 2026-01-15T08:18:16Z

📝 Walkthrough

Walkthrough

Adds a new standalone MicroTAGE branch predictor (header + implementation), integrates it into build/config and DecoupledBPUWithBTB wiring, expands predictor parameters in BranchPredictor, and toggles microtage usage in example configs.

Changes

Cohort / File(s)	Summary
Configuration Updates `configs/example/idealkmhv3.py`, `configs/example/kmhv3.py`	Example configs adjusted: one contains a commented placeholder for disabling microtage; the other enables `cpu.branchPred.microtage.enabled` when `bp_type` is `DecoupledBPUWithBTB`.
Branch Predictor Parameters `src/cpu/pred/BranchPredictor.py`	Replaced prior BTBTAGE-based placeholder with a full `MicroTAGE` Param type; expanded parameter set (histories, tables, tags, banks, latencies, etc.) and changed DecoupledBPUWithBTB microtage declaration to `Param.MicroTAGE(...)`.
Build / SimObject List `src/cpu/pred/SConscript`	Added `MicroTAGE` to BpType enum and added `btb/microtage.cc` to sources.
Decoupled BTB Integration Header `src/cpu/pred/btb/decoupled_bpred.hh`	Updated includes and member type for microtage from `BTBTAGE` to `MicroTAGE`.
MicroTAGE Implementation `src/cpu/pred/btb/microtage.hh`, `src/cpu/pred/btb/microtage.cc`	New MicroTAGE class implementing TAGE-style prediction: generation, folded-history indexing/tagging, updates/allocation, bank-conflict handling, LRU, extensive stats, and unit-test hooks.

Sequence Diagram(s)

sequenceDiagram
    participant Fetch as Fetch Unit
    participant MT as MicroTAGE
    participant Hist as History Mgmt
    participant Tables as TAGE Tables
    participant Meta as Meta Storage

    rect rgba(200,220,240,0.5)
    note over Fetch,Meta: Prediction Path
    Fetch->>MT: putPCHistory(startPC, history)
    MT->>Hist: update folded history
    MT->>Tables: generateSinglePrediction(startPC, foldedHist)
    Tables-->>MT: index/tag matches & candidate
    MT->>Meta: store prediction snapshot (TageMeta)
    MT-->>Fetch: return FullBTBPrediction
    end

    rect rgba(220,240,200,0.5)
    note over Fetch,Tables: Update Path
    Fetch->>MT: update(FetchStream)
    MT->>Hist: compute actual outcome & spec/history updates
    MT->>Tables: check allocation & update counters
    alt allocate new entry
        MT->>Tables: handleNewEntryAllocation(...)
        Tables-->>MT: allocated index/way
    end
    MT->>Hist: doUpdateHist(resolve history)
    MT->>Meta: finalize prediction state
    MT-->>Fetch: complete update
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Tage1 align #615 — adds bank-aware TAGE support (numBanks, enableBankConflict) overlapping MicroTAGE bank logic and params.
cpu-o3: fix bugs https://github.com/OpenXiangShan/GEM5/pull/595 #613 — touches MicroTAGE integration and folded-history checks used by the new MicroTAGE wiring.
Ubtb multihit and utage counter #625 — modifies microTAGE/uBTB code paths and misprediction/duplicate-hit handling that intersect with this change.

Suggested labels

perf, align-kmhv3

Suggested reviewers

jensen-yan
Yakkhini

Poem

🐇 I hopped through headers, code, and lore,

Tables and tags now hum and roar.
Folded histories tucked so neat,
Banks and LRU tap their feet.
Hoppity-hop—predictions fleet.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Ahead utage history perf' is vague and does not clearly convey the actual changes; the PR implements a new MicroTAGE predictor with extensive refactoring but the title provides minimal meaningful context.	Consider revising the title to be more descriptive, such as 'Implement MicroTAGE branch predictor with lookahead history' or similar, to better reflect the substantial changes being made.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/cpu/pred/btb/microtage.cc`:
- Around line 905-920: In MicroTAGE::recoverPHist guard uses of
aheadindexFoldedHist.front() and aheadtagFoldedHist.front() by checking each
queue's emptiness before calling .front(): for each type, if the corresponding
queue (aheadindexFoldedHist or aheadtagFoldedHist) is empty, skip the inner
recovery loop (or otherwise handle the empty case) so you don't call .front() on
an empty container; then, when non-empty, retrieve foldedHistQueuefront =
...front() and call foldedHistQueuefront[i].recover(predMeta->indexFoldedHist[i]
or predMeta->tagFoldedHist[i]) as currently done. Ensure both queues are checked
and that predMeta->indexFoldedHist / predMeta->tagFoldedHist lengths match
numPredictors.

🧹 Nitpick comments (3)

src/cpu/pred/btb/microtage.cc (3)
1022-1054: Consider removing unused LRU functions.

These LRU helper functions (updateLRU, getLRUVictim) appear to be unused - comments in generateSinglePrediction (line 226) and the allocation logic both indicate LRU is not used. If these are truly dead code, consider removing them to reduce maintenance burden.

724-751: Consider adding bounds check for shift amounts.

If tableTagBits[t] or tableIndexBits[t] equals or exceeds 64, the shift operations at lines 728 and 744 would invoke undefined behavior. While typical TAGE configurations use values well under 64, adding a defensive check or static_assert would prevent subtle bugs with unusual configurations.
♻️ Optional defensive check
 Addr
 MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)
 {
+    assert(tableTagBits[t] < 64 && "Tag bits must be < 64 to avoid undefined shift");
     // Create mask for tableTagBits[t] to limit result size
     Addr mask = (1ULL << tableTagBits[t]) - 1;
     ...
 }

 Addr
 MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)
 {
+    assert(tableIndexBits[t] < 64 && "Index bits must be < 64 to avoid undefined shift");
     // Create mask for tableIndexBits[t] to limit result size
     Addr mask = (1ULL << tableIndexBits[t]) - 1;
     ...
 }
858-871: Minor: Consider using unsigned for loop variables.

Loop variables at lines 861, 914, and 932 use int while numPredictors is unsigned. While safe in practice, using unsigned would be more consistent and avoid potential compiler warnings.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 081db5f and b9efab2.

📒 Files selected for processing (7)

configs/example/idealkmhv3.py
configs/example/kmhv3.py
src/cpu/pred/BranchPredictor.py
src/cpu/pred/SConscript
src/cpu/pred/btb/decoupled_bpred.hh
src/cpu/pred/btb/microtage.cc
src/cpu/pred/btb/microtage.hh

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-01-12T02:49:12.886Z

Learnt from: jensen-yan
Repo: OpenXiangShan/GEM5 PR: 686
File: src/cpu/pred/btb/btb_mgsc.cc:226-230
Timestamp: 2026-01-12T02:49:12.886Z
Learning: In src/cpu/pred/btb/btb_mgsc.cc, the calculateScaledPercsum function returns the unscaled percsum (weight scaling disabled) to align with RTL implementation. The RTL currently does not have a weight table, so setting weight to 1 (equivalent to disabling the weight table) maintains hardware/software consistency with minimal performance impact.

Applied to files:

src/cpu/pred/btb/decoupled_bpred.hh

🧬 Code graph analysis (2)

configs/example/idealkmhv3.py (2)

src/cpu/o3/fetch.hh (5)

branchPred (484-484)

branchPred (955-955)

branchPred (957-957)

branchPred (959-959)

branchPred (961-961)

src/cpu/pred/btb/timed_base_pred.hh (1)

enabled (108-108)

src/cpu/pred/btb/microtage.cc (3)

src/cpu/pred/btb/microtage.hh (36)

MicroTAGE (49-49)

MicroTAGE (113-113)

MicroTAGE (115-115)

entry (88-90)

entry (152-152)

entry (153-153)

entry (154-154)

entry (405-408)

TageTableInfo (85-85)

TageTableInfo (86-87)

taken (267-267)

TagePrediction (104-104)

TagePrediction (105-108)

pred (365-365)

history (130-130)

history (134-135)

history (139-142)

history (139-139)

history (144-148)

history (144-145)

history (163-163)

history (199-199)

history (377-377)

TageEntry (66-66)

TageEntry (68-69)

table (422-422)

table (423-423)

counter (70-72)

max (270-270)

pc (173-173)

pc (176-176)

pc (180-180)

pc (184-184)

pc (187-189)

pc (187-187)

pc (196-196)

src/cpu/pred/BranchPredictor.py (2)

MicroTAGE (1066-1092)

TAGE (169-174)

src/mem/ruby/common/Histogram.hh (1)

hist (51-51)

🪛 Cppcheck (2.19.0)

src/cpu/pred/btb/microtage.cc

[error] 119-119: Reference to temporary returned.

(returnTempReference)

[error] 132-132: Reference to temporary returned.

(returnTempReference)

[error] 125-125: Shifting 64-bit value by 64 bits is undefined behaviour

(shiftTooManyBits)

[error] 138-138: Shifting 64-bit value by 64 bits is undefined behaviour

(shiftTooManyBits)

🪛 Ruff (0.14.11)

src/cpu/pred/BranchPredictor.py

1072-1072: Param may be undefined, or defined from star imports

(F405)

1073-1073: Param may be undefined, or defined from star imports

(F405)

1074-1074: Param may be undefined, or defined from star imports

(F405)

1076-1076: Param may be undefined, or defined from star imports

(F405)

1077-1077: VectorParam may be undefined, or defined from star imports

(F405)

1078-1078: VectorParam may be undefined, or defined from star imports

(F405)

1079-1079: VectorParam may be undefined, or defined from star imports

(F405)

1080-1080: Param may be undefined, or defined from star imports

(F405)

1082-1082: VectorParam may be undefined, or defined from star imports

(F405)

1083-1083: Param may be undefined, or defined from star imports

(F405)

1084-1084: Param may be undefined, or defined from star imports

(F405)

1085-1085: Param may be undefined, or defined from star imports

(F405)

1086-1086: Param may be undefined, or defined from star imports

(F405)

1087-1087: Param may be undefined, or defined from star imports

(F405)

1088-1088: Param may be undefined, or defined from star imports

(F405)

1089-1089: Param may be undefined, or defined from star imports

(F405)

1090-1090: Param may be undefined, or defined from star imports

(F405)

1091-1091: Param may be undefined, or defined from star imports

(F405)

1092-1092: Param may be undefined, or defined from star imports

(F405)

1182-1182: Param may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Quick Build, Unit Tests & Smoke Test
GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.8c)

🔇 Additional comments (19)

configs/example/idealkmhv3.py (1)

82-82: LGTM!

Disabling the microtage predictor in the ideal configuration is consistent with the paired configuration pattern (kmhv3.py enables it). The enabled parameter is inherited from TimedBaseBTBPredictor.

configs/example/kmhv3.py (1)

94-109: LGTM!

The microtage predictor is correctly enabled alongside other predictor components (ubtb, abtb, mbtb, tage, ittage, ras) for the DecoupledBPUWithBTB configuration. The placement follows the existing pattern.

src/cpu/pred/btb/decoupled_bpred.hh (2)

26-27: LGTM!

The new microtage.hh include is correctly added for the standalone MicroTAGE class.

107-107: LGTM!

The member type change from BTBTAGE* to MicroTAGE* correctly reflects the new standalone MicroTAGE class introduced in this PR.

src/cpu/pred/SConscript (2)

52-53: LGTM!

MicroTAGE is correctly registered as a SimObject alongside other BTB-based predictors (BTBTAGE, BTBITTAGE, BTBMGSC).

105-105: LGTM!

The new btb/microtage.cc source file is correctly added to the build in the appropriate location with other BTB predictor sources.

src/cpu/pred/BranchPredictor.py (2)

1182-1182: LGTM!

The microtage parameter is correctly wired using Param.MicroTAGE(MicroTAGE(), ...), consistent with the new standalone MicroTAGE class.

1066-1092: LGTM! Well-structured MicroTAGE predictor definition.

The MicroTAGE class is correctly defined as a smaller, faster variant of BTBTAGE with proper parameter consistency validated in the C++ constructor:

numPredictors = 1 with matching single-element vectors (tableSizes, TTagBitSizes, TTagPcShifts, histLengths)

C++ constructor enforces vector length validation via assertions (lines 109, 118, 120, 123 in microtage.cc)

numDelay = 0 for low-latency prediction alongside uBTB

enableBankConflict = False appropriate for single-table predictor (bank simulation logic only active when enabled)

src/cpu/pred/btb/microtage.hh (4)

1-26: LGTM - Include guards and conditional includes are well-structured.

The header uses proper include guards and clearly separates UNIT_TEST from production dependencies.

56-109: LGTM - Well-documented nested data structures for TAGE prediction.

The structs TageEntry, TageTableInfo, and TagePrediction are clearly documented and properly encapsulate the TAGE prediction state. The counter range (-4 to 3) noted in the comment is appropriately enforced by saturation logic in the implementation.

298-300: Verify queue safety in usage sites.

These queues store ahead-of-time folded history snapshots. Ensure all .front() calls in the implementation are guarded by .empty() checks to avoid undefined behavior.

383-426: LGTM - TageMeta and class members are appropriately structured.

The metadata structure properly captures folded history snapshots for update-time index/tag recalculation.
src/cpu/pred/btb/microtage.cc (7)
36-135: LGTM - Constructor initializes predictor tables and folded histories correctly.

Both UNIT_TEST and production constructors properly initialize the TAGE tables, index/tag masks, and folded history structures. The assertions ensure parameter vectors have sufficient entries.

192-256: LGTM - TAGE table lookup logic is correct.

The search from highest to lowest table, position encoding in tags, and single-match enforcement align with standard TAGE predictor design.

848-871: Potential logic issue: member variables may not be updated as intended.

The second loop creates a copy of the folded history vectors:
auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist; // copy, not reference
Updates at line 862 modify this local copy, and while the copy is pushed to the queue, the actual member variables (indexFoldedHist, tagFoldedHist) are never updated in the second loop.

If the intent is to update ahead-of-time history in the queue while keeping current history unchanged until consumed from the queue (in the first loop), this is correct. However, the first loop has an early break if the queue is empty, which means on the first invocation, member variables won't be updated at all.

Please verify this is the intended behavior for the ahead-history optimization.

608-679: LGTM - Update logic correctly handles prediction state updates and allocation.

The update path properly:

Re-reads providers when updateOnRead is enabled (for bank conflict simulation)

Falls back to cached predictions otherwise

Triggers allocation only on mispredictions

487-559: LGTM - Allocation logic follows established TAGE replacement policy.

The allocation strategy (prefer invalid/weak ways, apply age penalty, periodic useful-bit reset) aligns with CBP-style TAGE predictors.

783-806: LGTM - Defensive handling of edge cases in position calculation.

The function appropriately warns about unusual cases (branch preceding block, exceeding max positions) while ensuring a valid position is always returned.

1056-1093: LGTM - Commit-time statistics tracking is correctly implemented.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

src/cpu/pred/btb/microtage.cc

XiangShanRobot · 2026-01-15T09:22:44Z

[Generated by GEM5 Performance Robot]
commit: b9efab2
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	19.78	20.37	-2.89 🔴

[Generated by GEM5 Performance Robot]
commit: b9efab2
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

	PR	Previous Commit	Diff(%)
Score	19.78	19.89	-0.52 🔴

Change-Id: Ia9bcdc028235447e254889d95e5ea98e7f067664

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/cpu/pred/btb/microtage.cc`:
- Around line 858-869: The assert on foldedHistqueue.size() is executed
immediately after foldedHistqueue.push(), which can trigger when the queue was
already at capacity; change the logic in the loop that handles foldedHistqueue
(the block that calls foldedHistqueue.push(), assert(foldedHistqueue.size() <=
2), and foldedHistqueue.pop()) so the overflow is trimmed before asserting:
either call foldedHistqueue.pop() if size > 2 and then perform the assert, or
move the assert to after the pop. Update the sequence around
foldedHistqueue.push()/pop()/assert to ensure the queue is never asserted while
transiently over capacity (refer to foldedHistqueue, push, pop, and the assert).
- Around line 723-751: The mask computation in getTageTag and getTageIndex uses
(1ULL << tableTagBits[t]) and (1ULL << tableIndexBits[t]) which is UB when the
shift equals the word width; replace these with a safe conditional that computes
mask = (bits >= sizeof(Addr)*8 ? ~static_cast<Addr>(0) : ((static_cast<Addr>(1)
<< bits) - 1)), where bits is tableTagBits[t] or tableIndexBits[t] respectively,
so getTageTag and getTageIndex produce correct masks without out-of-range
shifts.

♻️ Duplicate comments (1)

src/cpu/pred/btb/microtage.cc (1)

909-923: Empty-queue recovery returns without restoring history.

At Line 910, recoverPHist returns early when queues are empty, skipping both restoration from predMeta and the doUpdateHist update. This can leave folded histories stale after early mispredicts. Consider falling back to the snapshot and still updating.

🐛 Proposed fix

     std::shared_ptr<TageMeta> predMeta = std::static_pointer_cast<TageMeta>(entry.predMetas[getComponentIdx()]);
     if (aheadindexFoldedHist.empty() || aheadtagFoldedHist.empty()) {
-        DPRINTF(TAGE, "recoverPHist: ahead folded history queues are empty, cannot recover\n");
-        return;
+        DPRINTF(TAGE, "recoverPHist: ahead folded history queues are empty, recovering from snapshot\n");
+        for (int i = 0; i < numPredictors; i++) {
+            indexFoldedHist[i].recover(predMeta->indexFoldedHist[i]);
+            tagFoldedHist[i].recover(predMeta->tagFoldedHist[i]);
+        }
+        doUpdateHist(history, cond_taken, entry.getControlPC(), entry.getTakenTarget());
+        return;
     }

coderabbitai · 2026-01-23T03:03:13Z

src/cpu/pred/btb/microtage.cc

+// Calculate TAGE tag with folded history - optimized version using bitwise operations
+Addr
+MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)
+{
+    // Create mask for tableTagBits[t] to limit result size
+    Addr mask = (1ULL << tableTagBits[t]) - 1;
+
+    // Extract lower bits of PC directly (remove instruction alignment bits)
+    Addr pcBits = (pc >> bankBaseShift) & mask;
+
+    // Extract and prepare folded history bits
+    Addr foldedBits = foldedHist & mask;
+
+    // XOR all components together, including position (like RTL)
+    return pcBits ^ foldedBits ^ position;
+}
+
+Addr
+MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)
+{
+    // Create mask for tableIndexBits[t] to limit result size
+    Addr mask = (1ULL << tableIndexBits[t]) - 1;
+
+    const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift;
+    Addr pcBits = (pc >> pcShift) & mask;
+    Addr foldedBits = foldedHist & mask;
+
+    return pcBits ^ foldedBits;
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the file and check its size wc -l src/cpu/pred/btb/microtage.cc

Repository: OpenXiangShan/GEM5

Length of output: 96

🏁 Script executed:

# Look for Addr typedef definition rg "typedef.*Addr|using.*Addr" --type cc --type h -B2 -A2

Repository: OpenXiangShan/GEM5

Length of output: 88

🏁 Script executed:

# Examine the getTageTag and getTageIndex functions at the flagged lines sed -n '723,751p' src/cpu/pred/btb/microtage.cc

Repository: OpenXiangShan/GEM5

Length of output: 1040

🏁 Script executed:

# Search for tableTagBits and tableIndexBits definition and usage rg "tableTagBits|tableIndexBits" --type cc --type h -B3 -A3 | head -80

Repository: OpenXiangShan/GEM5

Length of output: 88

🏁 Script executed:

# Search for Addr typedef without file type restriction rg "typedef.*Addr|using.*Addr" -B2 -A2 | head -40

Repository: OpenXiangShan/GEM5

Length of output: 2356

🏁 Script executed:

# Find header files in the pred/btb directory find . -path "*pred/btb*" -name "*.hh" -o -path "*pred/btb*" -name "*.h"

Repository: OpenXiangShan/GEM5

Length of output: 839

🏁 Script executed:

# Search for MicroTAGE class definition rg "class MicroTAGE" -B2 -A10

Repository: OpenXiangShan/GEM5

Length of output: 1953

🏁 Script executed:

# Look for tableTagBits and tableIndexBits declarations rg "tableTagBits\[|tableIndexBits\[" -B3 -A3

Repository: OpenXiangShan/GEM5

Length of output: 18657

🏁 Script executed:

# Find Addr type definition rg "typedef.*Addr|using.*Addr" | grep -v "AddressMap\|AddressIndex\|Addr:" | head -20

Repository: OpenXiangShan/GEM5

Length of output: 1866

🏁 Script executed:

# Look for Addr in common headers find . -path "*/sim/*" -name "*.hh" -o -path "*/base/*" -name "*.hh" | xargs rg "typedef.*Addr[^a-zA-Z]" 2>/dev/null | head -10

Repository: OpenXiangShan/GEM5

Length of output: 254

🏁 Script executed:

# Search for ceilLog2 function rg "ceilLog2" -B5 -A5 | head -60

Repository: OpenXiangShan/GEM5

Length of output: 3418

🏁 Script executed:

# Check the BranchPredictor.py for MicroTAGE parameter ranges sed -n '/class MicroTAGE/,/^class /p' src/cpu/pred/BranchPredictor.py | head -100

Repository: OpenXiangShan/GEM5

Length of output: 1867

🏁 Script executed:

# Find ceilLog2 definition rg "ceilLog2\(" -B3 -A10 --max-count=5

Repository: OpenXiangShan/GEM5

Length of output: 36609

🏁 Script executed:

# Check microtage.hh header for member variable types sed -n '1,150p' src/cpu/pred/btb/microtage.hh | grep -A5 -B5 "tableTagBits\|tableIndexBits"

Repository: OpenXiangShan/GEM5

Length of output: 44

🏁 Script executed:

# Look at full constructor to see initialization constraints sed -n '/MicroTAGE::MicroTAGE/,/^}/p' src/cpu/pred/btb/microtage.cc | head -100

Repository: OpenXiangShan/GEM5

Length of output: 3574

🏁 Script executed:

# Check for any bounds checking on tableTagBits or tableIndexBits values rg "tableTagBits|tableIndexBits" src/cpu/pred/btb/microtage.cc -B2 -A2 | grep -E "assert|if.*<|if.*>|warning|fatal"

Repository: OpenXiangShan/GEM5

Length of output: 223

🏁 Script executed:

# Look at the microtage.hh header to see member variable declarations cat src/cpu/pred/btb/microtage.hh | head -200

Repository: OpenXiangShan/GEM5

Length of output: 7396

🏁 Script executed:

# Check actual table size limits from Python config grep -A30 "class MicroTAGE" src/cpu/pred/BranchPredictor.py | grep -E "tableSizes|TTagBitSizes|blockSize"

Repository: OpenXiangShan/GEM5

Length of output: 298

🏁 Script executed:

# Check for any parameter validation or clamping on tableTagBits or tableIndexBits rg "TTagBitSizes|tableTagBits|tableIndexBits" src/cpu/pred/ -B5 -A5 | grep -E "assert|clamp|min|max|fatal|if.*<|if.*>" | head -30

Repository: OpenXiangShan/GEM5

Length of output: 2702

🏁 Script executed:

# Look at the intmath implementation to confirm ceilLog2 behavior sed -n '/^ceilLog2/,/^}/p' src/base/intmath.hh

Repository: OpenXiangShan/GEM5

Length of output: 176

Guard mask generation against 64-bit shifts.

On lines 728 and 744, (1ULL << tableTagBits[t]) and (1ULL << tableIndexBits[t]) trigger undefined behavior when the shift amount equals 64 (the bit width of Addr). While default parameters avoid this, unconstrained table sizes could exceed 2^64, causing the shift to be out-of-range. Implement safe masking.

🐛 Proposed fix

Addr MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position) { // Create mask for tableTagBits[t] to limit result size - Addr mask = (1ULL << tableTagBits[t]) - 1; + const unsigned bits = tableTagBits[t]; + const unsigned addrBits = sizeof(Addr) * 8; + Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1); @@ Addr MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist) { // Create mask for tableIndexBits[t] to limit result size - Addr mask = (1ULL << tableIndexBits[t]) - 1; + const unsigned bits = tableIndexBits[t]; + const unsigned addrBits = sizeof(Addr) * 8; + Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Calculate TAGE tag with folded history - optimized version using bitwise operations

Addr

MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)

{

// Create mask for tableTagBits[t] to limit result size

Addr mask = (1ULL << tableTagBits[t]) - 1;

// Extract lower bits of PC directly (remove instruction alignment bits)

Addr pcBits = (pc >> bankBaseShift) & mask;

// Extract and prepare folded history bits

Addr foldedBits = foldedHist & mask;

// XOR all components together, including position (like RTL)

return pcBits ^ foldedBits ^ position;

}

Addr

MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)

{

// Create mask for tableIndexBits[t] to limit result size

Addr mask = (1ULL << tableIndexBits[t]) - 1;

const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift;

Addr pcBits = (pc >> pcShift) & mask;

Addr foldedBits = foldedHist & mask;

return pcBits ^ foldedBits;

}

// Calculate TAGE tag with folded history - optimized version using bitwise operations

Addr

MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)

{

// Create mask for tableTagBits[t] to limit result size

const unsigned bits = tableTagBits[t];

const unsigned addrBits = sizeof(Addr) * 8;

Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);

// Extract lower bits of PC directly (remove instruction alignment bits)

Addr pcBits = (pc >> bankBaseShift) & mask;

// Extract and prepare folded history bits

Addr foldedBits = foldedHist & mask;

// XOR all components together, including position (like RTL)

return pcBits ^ foldedBits ^ position;

}

Addr

MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)

{

// Create mask for tableIndexBits[t] to limit result size

const unsigned bits = tableIndexBits[t];

const unsigned addrBits = sizeof(Addr) * 8;

Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);

const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift;

Addr pcBits = (pc >> pcShift) & mask;

Addr foldedBits = foldedHist & mask;

return pcBits ^ foldedBits;

}

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/microtage.cc` around lines 723 - 751, The mask computation in getTageTag and getTageIndex uses (1ULL << tableTagBits[t]) and (1ULL << tableIndexBits[t]) which is UB when the shift equals the word width; replace these with a safe conditional that computes mask = (bits >= sizeof(Addr)*8 ? ~static_cast<Addr>(0) : ((static_cast<Addr>(1) << bits) - 1)), where bits is tableTagBits[t] or tableIndexBits[t] respectively, so getTageTag and getTageIndex produce correct masks without out-of-range shifts.

coderabbitai · 2026-01-23T03:03:13Z

src/cpu/pred/btb/microtage.cc

+    for (int type = 0; type < 2; type++) {
+        auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist;
+        auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist;
+        for (int t = 0; t < numPredictors; t++) {
+            foldedHist[t].update(history, 2, taken, pc, target);
+            DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get());
+        }
+        foldedHistqueue.push(foldedHist);
+        assert(foldedHistqueue.size() <= 2);
+        if (foldedHistqueue.size() > 2) {
+            foldedHistqueue.pop();
+        }


⚠️ Potential issue | 🟠 Major

Queue-size assert fires before overflow is trimmed.

Line 866 asserts size <= 2 immediately after push. If the queue is already size 2, this will trip before the pop executes. Move the assert after the pop (or pop first).

🐛 Proposed fix

for (int t = 0; t < numPredictors; t++) { foldedHist[t].update(history, 2, taken, pc, target); DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get()); } foldedHistqueue.push(foldedHist); - assert(foldedHistqueue.size() <= 2); if (foldedHistqueue.size() > 2) { foldedHistqueue.pop(); } + assert(foldedHistqueue.size() <= 2);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (int type = 0; type < 2; type++) {

auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist;

auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist;

for (int t = 0; t < numPredictors; t++) {

foldedHist[t].update(history, 2, taken, pc, target);

DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get());

}

foldedHistqueue.push(foldedHist);

assert(foldedHistqueue.size() <= 2);

if (foldedHistqueue.size() > 2) {

foldedHistqueue.pop();

}

for (int type = 0; type < 2; type++) {

auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist;

auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist;

for (int t = 0; t < numPredictors; t++) {

foldedHist[t].update(history, 2, taken, pc, target);

DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get());

}

foldedHistqueue.push(foldedHist);

if (foldedHistqueue.size() > 2) {

foldedHistqueue.pop();

}

assert(foldedHistqueue.size() <= 2);

🤖 Prompt for AI Agents

In `@src/cpu/pred/btb/microtage.cc` around lines 858 - 869, The assert on foldedHistqueue.size() is executed immediately after foldedHistqueue.push(), which can trigger when the queue was already at capacity; change the logic in the loop that handles foldedHistqueue (the block that calls foldedHistqueue.push(), assert(foldedHistqueue.size() <= 2), and foldedHistqueue.pop()) so the overflow is trimmed before asserting: either call foldedHistqueue.pop() if size > 2 and then perform the assert, or move the assert to after the pop. Update the sequence around foldedHistqueue.push()/pop()/assert to ensure the queue is never asserted while transiently over capacity (refer to foldedHistqueue, push, pop, and the assert).

Change-Id: I56614e8ebc2dd33320d353562087ab456fc452da

XiangShanRobot · 2026-01-23T04:25:54Z

[Generated by GEM5 Performance Robot]
commit: ed95dbf
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

	PR	Master	Diff(%)
Score	19.77	20.29	-2.56 🔴

[Generated by GEM5 Performance Robot]
commit: ed95dbf
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

	PR	Previous Commit	Diff(%)
Score	19.77	19.78	-0.04 🔴

Cao Jiaming added 14 commits January 6, 2026 16:02

cpu-o3: split microtage out of tage

8ada7dd

cpu-o3: reorganize includes in microtage files for clarity

5cfbb4f

Change-Id: I8b6a4bda945564cc961251fc1297a6bacd8636cb

cpu-o3: enhance branch index calculation in MicroTAGE to handle edge …

9469e8f

…cases

cpu-o3: update blockWidth calculation to handle zero blockSize

a089f57

Change-Id: I44b6175022a3a593ed385407bd95ec0c40c74642

cpu-o3: adjust MicroTAGE parameters and clean up code for clarity

3fa08b4

Change-Id: I16ce79a7d8488d9a138d0a26f5500576ae54132e

cpu-o3: remove unused variable 'usingBasetable'

e665826

Change-Id: I8be037a1cdacd7151f4fe8e743a32cca90a85036

cpu-o3: MicroTAGE to remove base table and alternative prediction

be0c88c

Change-Id: I6b9c92c18c574a12c19532ae0b894e64c1187342

cpu-o3: update MicroTAGE parameters for consistency

28084ef

Change-Id: I953965b8e6feb1c15a238ac832d65bc16b32f496

cpu-o3: remove alternative prediction logic

e57b012

Change-Id: I39cd471d452aa343c3dd741a80fdfa7d126e3a9f

cpu-o3: refine MicroTAGE parameters and remove unused variables

b8fc3d6

Change-Id: I3567ae93652aac218c5b4646003abadddaf7cf32

cpu-o3: update MicroTAGE history length parameter to 16

9a98605

Change-Id: If7c9d3aa68a23c36dde74d8cf3a286c9c48f3e3c

cpu-o3: enhance MicroTAGE history management with ahead queues

2c56778

Change-Id: I83277ae5c801e9d22b594286580459d12cdec69b

cpu-o3:open microtage

d3ee9e4

Change-Id: I394246af184d3f07e02b85f06e4e5ceed368ec22

cpu-o3:close utage in ideal

b9efab2

Change-Id: I762c11f8d15262fb9f1c9d443f77895fa76bbc79

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

src/cpu/pred/btb/microtage.cc Show resolved Hide resolved

cpu-o3: update folded history queue size limit

02bc1ff

Change-Id: Ia9bcdc028235447e254889d95e5ea98e7f067664

coderabbitai bot reviewed Jan 23, 2026

View reviewed changes

cpu-o3: disable MicroTAGE in branch prediction configuration

ed95dbf

Change-Id: I56614e8ebc2dd33320d353562087ab456fc452da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ahead utage history perf#715

Ahead utage history perf#715
CJ362ff wants to merge 16 commits intoxs-devfrom
ahead-utage-history-perf

CJ362ff commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

XiangShanRobot commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

coderabbitai bot Jan 23, 2026

Uh oh!

XiangShanRobot commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CJ362ff commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

XiangShanRobot commented Jan 15, 2026

Ideal BTB Performance

Overall Score

Ideal BTB Performance

Overall Score

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

XiangShanRobot commented Jan 23, 2026

Ideal BTB Performance

Overall Score

Ideal BTB Performance

Overall Score

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CJ362ff commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading