Skip to content

Ahead utage history perf#715

Open
CJ362ff wants to merge 16 commits intoxs-devfrom
ahead-utage-history-perf
Open

Ahead utage history perf#715
CJ362ff wants to merge 16 commits intoxs-devfrom
ahead-utage-history-perf

Conversation

@CJ362ff
Copy link
Collaborator

@CJ362ff CJ362ff commented Jan 15, 2026

Summary by CodeRabbit

  • New Features

    • Added a MicroTAGE branch predictor option with folded-history prediction, bank-conflict handling, and extended prediction statistics.
    • Configuration switches now allow enabling MicroTAGE per simulation profile.
  • Chores

    • Integrated MicroTAGE into the build and predictor type selection.
    • Updated branch-predictor wiring and configuration interfaces to support the new predictor.
  • Configuration

    • Example configs updated: one profile enables MicroTAGE; another contains a commented placeholder for disabling it.

✏️ Tip: You can customize this high-level summary in your review settings.

Cao Jiaming added 14 commits January 6, 2026 16:02
Change-Id: I8b6a4bda945564cc961251fc1297a6bacd8636cb
Change-Id: I44b6175022a3a593ed385407bd95ec0c40c74642
Change-Id: I16ce79a7d8488d9a138d0a26f5500576ae54132e
Change-Id: I8be037a1cdacd7151f4fe8e743a32cca90a85036
Change-Id: I6b9c92c18c574a12c19532ae0b894e64c1187342
Change-Id: I953965b8e6feb1c15a238ac832d65bc16b32f496
Change-Id: I39cd471d452aa343c3dd741a80fdfa7d126e3a9f
Change-Id: I3567ae93652aac218c5b4646003abadddaf7cf32
Change-Id: If7c9d3aa68a23c36dde74d8cf3a286c9c48f3e3c
Change-Id: I83277ae5c801e9d22b594286580459d12cdec69b
Change-Id: I394246af184d3f07e02b85f06e4e5ceed368ec22
Change-Id: I762c11f8d15262fb9f1c9d443f77895fa76bbc79
@coderabbitai
Copy link

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

Adds a new standalone MicroTAGE branch predictor (header + implementation), integrates it into build/config and DecoupledBPUWithBTB wiring, expands predictor parameters in BranchPredictor, and toggles microtage usage in example configs.

Changes

Cohort / File(s) Summary
Configuration Updates
configs/example/idealkmhv3.py, configs/example/kmhv3.py
Example configs adjusted: one contains a commented placeholder for disabling microtage; the other enables cpu.branchPred.microtage.enabled when bp_type is DecoupledBPUWithBTB.
Branch Predictor Parameters
src/cpu/pred/BranchPredictor.py
Replaced prior BTBTAGE-based placeholder with a full MicroTAGE Param type; expanded parameter set (histories, tables, tags, banks, latencies, etc.) and changed DecoupledBPUWithBTB microtage declaration to Param.MicroTAGE(...).
Build / SimObject List
src/cpu/pred/SConscript
Added MicroTAGE to BpType enum and added btb/microtage.cc to sources.
Decoupled BTB Integration Header
src/cpu/pred/btb/decoupled_bpred.hh
Updated includes and member type for microtage from BTBTAGE* to MicroTAGE*.
MicroTAGE Implementation
src/cpu/pred/btb/microtage.hh, src/cpu/pred/btb/microtage.cc
New MicroTAGE class implementing TAGE-style prediction: generation, folded-history indexing/tagging, updates/allocation, bank-conflict handling, LRU, extensive stats, and unit-test hooks.

Sequence Diagram(s)

sequenceDiagram
    participant Fetch as Fetch Unit
    participant MT as MicroTAGE
    participant Hist as History Mgmt
    participant Tables as TAGE Tables
    participant Meta as Meta Storage

    rect rgba(200,220,240,0.5)
    note over Fetch,Meta: Prediction Path
    Fetch->>MT: putPCHistory(startPC, history)
    MT->>Hist: update folded history
    MT->>Tables: generateSinglePrediction(startPC, foldedHist)
    Tables-->>MT: index/tag matches & candidate
    MT->>Meta: store prediction snapshot (TageMeta)
    MT-->>Fetch: return FullBTBPrediction
    end

    rect rgba(220,240,200,0.5)
    note over Fetch,Tables: Update Path
    Fetch->>MT: update(FetchStream)
    MT->>Hist: compute actual outcome & spec/history updates
    MT->>Tables: check allocation & update counters
    alt allocate new entry
        MT->>Tables: handleNewEntryAllocation(...)
        Tables-->>MT: allocated index/way
    end
    MT->>Hist: doUpdateHist(resolve history)
    MT->>Meta: finalize prediction state
    MT-->>Fetch: complete update
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

perf, align-kmhv3

Suggested reviewers

  • jensen-yan
  • Yakkhini

Poem

🐇 I hopped through headers, code, and lore,

Tables and tags now hum and roar.
Folded histories tucked so neat,
Banks and LRU tap their feet.
Hoppity-hop—predictions fleet.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Ahead utage history perf' is vague and does not clearly convey the actual changes; the PR implements a new MicroTAGE predictor with extensive refactoring but the title provides minimal meaningful context. Consider revising the title to be more descriptive, such as 'Implement MicroTAGE branch predictor with lookahead history' or similar, to better reflect the substantial changes being made.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/cpu/pred/btb/microtage.cc`:
- Around line 905-920: In MicroTAGE::recoverPHist guard uses of
aheadindexFoldedHist.front() and aheadtagFoldedHist.front() by checking each
queue's emptiness before calling .front(): for each type, if the corresponding
queue (aheadindexFoldedHist or aheadtagFoldedHist) is empty, skip the inner
recovery loop (or otherwise handle the empty case) so you don't call .front() on
an empty container; then, when non-empty, retrieve foldedHistQueuefront =
...front() and call foldedHistQueuefront[i].recover(predMeta->indexFoldedHist[i]
or predMeta->tagFoldedHist[i]) as currently done. Ensure both queues are checked
and that predMeta->indexFoldedHist / predMeta->tagFoldedHist lengths match
numPredictors.
🧹 Nitpick comments (3)
src/cpu/pred/btb/microtage.cc (3)

1022-1054: Consider removing unused LRU functions.

These LRU helper functions (updateLRU, getLRUVictim) appear to be unused - comments in generateSinglePrediction (line 226) and the allocation logic both indicate LRU is not used. If these are truly dead code, consider removing them to reduce maintenance burden.


724-751: Consider adding bounds check for shift amounts.

If tableTagBits[t] or tableIndexBits[t] equals or exceeds 64, the shift operations at lines 728 and 744 would invoke undefined behavior. While typical TAGE configurations use values well under 64, adding a defensive check or static_assert would prevent subtle bugs with unusual configurations.

♻️ Optional defensive check
 Addr
 MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)
 {
+    assert(tableTagBits[t] < 64 && "Tag bits must be < 64 to avoid undefined shift");
     // Create mask for tableTagBits[t] to limit result size
     Addr mask = (1ULL << tableTagBits[t]) - 1;
     ...
 }

 Addr
 MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)
 {
+    assert(tableIndexBits[t] < 64 && "Index bits must be < 64 to avoid undefined shift");
     // Create mask for tableIndexBits[t] to limit result size
     Addr mask = (1ULL << tableIndexBits[t]) - 1;
     ...
 }

858-871: Minor: Consider using unsigned for loop variables.

Loop variables at lines 861, 914, and 932 use int while numPredictors is unsigned. While safe in practice, using unsigned would be more consistent and avoid potential compiler warnings.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 081db5f and b9efab2.

📒 Files selected for processing (7)
  • configs/example/idealkmhv3.py
  • configs/example/kmhv3.py
  • src/cpu/pred/BranchPredictor.py
  • src/cpu/pred/SConscript
  • src/cpu/pred/btb/decoupled_bpred.hh
  • src/cpu/pred/btb/microtage.cc
  • src/cpu/pred/btb/microtage.hh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-12T02:49:12.886Z
Learnt from: jensen-yan
Repo: OpenXiangShan/GEM5 PR: 686
File: src/cpu/pred/btb/btb_mgsc.cc:226-230
Timestamp: 2026-01-12T02:49:12.886Z
Learning: In src/cpu/pred/btb/btb_mgsc.cc, the calculateScaledPercsum function returns the unscaled percsum (weight scaling disabled) to align with RTL implementation. The RTL currently does not have a weight table, so setting weight to 1 (equivalent to disabling the weight table) maintains hardware/software consistency with minimal performance impact.

Applied to files:

  • src/cpu/pred/btb/decoupled_bpred.hh
🧬 Code graph analysis (2)
configs/example/idealkmhv3.py (2)
src/cpu/o3/fetch.hh (5)
  • branchPred (484-484)
  • branchPred (955-955)
  • branchPred (957-957)
  • branchPred (959-959)
  • branchPred (961-961)
src/cpu/pred/btb/timed_base_pred.hh (1)
  • enabled (108-108)
src/cpu/pred/btb/microtage.cc (3)
src/cpu/pred/btb/microtage.hh (36)
  • MicroTAGE (49-49)
  • MicroTAGE (113-113)
  • MicroTAGE (115-115)
  • entry (88-90)
  • entry (152-152)
  • entry (153-153)
  • entry (154-154)
  • entry (405-408)
  • TageTableInfo (85-85)
  • TageTableInfo (86-87)
  • taken (267-267)
  • TagePrediction (104-104)
  • TagePrediction (105-108)
  • pred (365-365)
  • history (130-130)
  • history (134-135)
  • history (139-142)
  • history (139-139)
  • history (144-148)
  • history (144-145)
  • history (163-163)
  • history (199-199)
  • history (377-377)
  • TageEntry (66-66)
  • TageEntry (68-69)
  • table (422-422)
  • table (423-423)
  • counter (70-72)
  • max (270-270)
  • pc (173-173)
  • pc (176-176)
  • pc (180-180)
  • pc (184-184)
  • pc (187-189)
  • pc (187-187)
  • pc (196-196)
src/cpu/pred/BranchPredictor.py (2)
  • MicroTAGE (1066-1092)
  • TAGE (169-174)
src/mem/ruby/common/Histogram.hh (1)
  • hist (51-51)
🪛 Cppcheck (2.19.0)
src/cpu/pred/btb/microtage.cc

[error] 119-119: Reference to temporary returned.

(returnTempReference)


[error] 132-132: Reference to temporary returned.

(returnTempReference)


[error] 125-125: Shifting 64-bit value by 64 bits is undefined behaviour

(shiftTooManyBits)


[error] 138-138: Shifting 64-bit value by 64 bits is undefined behaviour

(shiftTooManyBits)

🪛 Ruff (0.14.11)
src/cpu/pred/BranchPredictor.py

1072-1072: Param may be undefined, or defined from star imports

(F405)


1073-1073: Param may be undefined, or defined from star imports

(F405)


1074-1074: Param may be undefined, or defined from star imports

(F405)


1076-1076: Param may be undefined, or defined from star imports

(F405)


1077-1077: VectorParam may be undefined, or defined from star imports

(F405)


1078-1078: VectorParam may be undefined, or defined from star imports

(F405)


1079-1079: VectorParam may be undefined, or defined from star imports

(F405)


1080-1080: Param may be undefined, or defined from star imports

(F405)


1082-1082: VectorParam may be undefined, or defined from star imports

(F405)


1083-1083: Param may be undefined, or defined from star imports

(F405)


1084-1084: Param may be undefined, or defined from star imports

(F405)


1085-1085: Param may be undefined, or defined from star imports

(F405)


1086-1086: Param may be undefined, or defined from star imports

(F405)


1087-1087: Param may be undefined, or defined from star imports

(F405)


1088-1088: Param may be undefined, or defined from star imports

(F405)


1089-1089: Param may be undefined, or defined from star imports

(F405)


1090-1090: Param may be undefined, or defined from star imports

(F405)


1091-1091: Param may be undefined, or defined from star imports

(F405)


1092-1092: Param may be undefined, or defined from star imports

(F405)


1182-1182: Param may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Quick Build, Unit Tests & Smoke Test
  • GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.8c)
🔇 Additional comments (19)
configs/example/idealkmhv3.py (1)

82-82: LGTM!

Disabling the microtage predictor in the ideal configuration is consistent with the paired configuration pattern (kmhv3.py enables it). The enabled parameter is inherited from TimedBaseBTBPredictor.

configs/example/kmhv3.py (1)

94-109: LGTM!

The microtage predictor is correctly enabled alongside other predictor components (ubtb, abtb, mbtb, tage, ittage, ras) for the DecoupledBPUWithBTB configuration. The placement follows the existing pattern.

src/cpu/pred/btb/decoupled_bpred.hh (2)

26-27: LGTM!

The new microtage.hh include is correctly added for the standalone MicroTAGE class.


107-107: LGTM!

The member type change from BTBTAGE* to MicroTAGE* correctly reflects the new standalone MicroTAGE class introduced in this PR.

src/cpu/pred/SConscript (2)

52-53: LGTM!

MicroTAGE is correctly registered as a SimObject alongside other BTB-based predictors (BTBTAGE, BTBITTAGE, BTBMGSC).


105-105: LGTM!

The new btb/microtage.cc source file is correctly added to the build in the appropriate location with other BTB predictor sources.

src/cpu/pred/BranchPredictor.py (2)

1182-1182: LGTM!

The microtage parameter is correctly wired using Param.MicroTAGE(MicroTAGE(), ...), consistent with the new standalone MicroTAGE class.


1066-1092: LGTM! Well-structured MicroTAGE predictor definition.

The MicroTAGE class is correctly defined as a smaller, faster variant of BTBTAGE with proper parameter consistency validated in the C++ constructor:

  • numPredictors = 1 with matching single-element vectors (tableSizes, TTagBitSizes, TTagPcShifts, histLengths)
  • C++ constructor enforces vector length validation via assertions (lines 109, 118, 120, 123 in microtage.cc)
  • numDelay = 0 for low-latency prediction alongside uBTB
  • enableBankConflict = False appropriate for single-table predictor (bank simulation logic only active when enabled)
src/cpu/pred/btb/microtage.hh (4)

1-26: LGTM - Include guards and conditional includes are well-structured.

The header uses proper include guards and clearly separates UNIT_TEST from production dependencies.


56-109: LGTM - Well-documented nested data structures for TAGE prediction.

The structs TageEntry, TageTableInfo, and TagePrediction are clearly documented and properly encapsulate the TAGE prediction state. The counter range (-4 to 3) noted in the comment is appropriately enforced by saturation logic in the implementation.


298-300: Verify queue safety in usage sites.

These queues store ahead-of-time folded history snapshots. Ensure all .front() calls in the implementation are guarded by .empty() checks to avoid undefined behavior.


383-426: LGTM - TageMeta and class members are appropriately structured.

The metadata structure properly captures folded history snapshots for update-time index/tag recalculation.

src/cpu/pred/btb/microtage.cc (7)

36-135: LGTM - Constructor initializes predictor tables and folded histories correctly.

Both UNIT_TEST and production constructors properly initialize the TAGE tables, index/tag masks, and folded history structures. The assertions ensure parameter vectors have sufficient entries.


192-256: LGTM - TAGE table lookup logic is correct.

The search from highest to lowest table, position encoding in tags, and single-match enforcement align with standard TAGE predictor design.


848-871: Potential logic issue: member variables may not be updated as intended.

The second loop creates a copy of the folded history vectors:

auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist; // copy, not reference

Updates at line 862 modify this local copy, and while the copy is pushed to the queue, the actual member variables (indexFoldedHist, tagFoldedHist) are never updated in the second loop.

If the intent is to update ahead-of-time history in the queue while keeping current history unchanged until consumed from the queue (in the first loop), this is correct. However, the first loop has an early break if the queue is empty, which means on the first invocation, member variables won't be updated at all.

Please verify this is the intended behavior for the ahead-history optimization.


608-679: LGTM - Update logic correctly handles prediction state updates and allocation.

The update path properly:

  • Re-reads providers when updateOnRead is enabled (for bank conflict simulation)
  • Falls back to cached predictions otherwise
  • Triggers allocation only on mispredictions

487-559: LGTM - Allocation logic follows established TAGE replacement policy.

The allocation strategy (prefer invalid/weak ways, apply age penalty, periodic useful-bit reset) aligns with CBP-style TAGE predictors.


783-806: LGTM - Defensive handling of edge cases in position calculation.

The function appropriately warns about unusual cases (branch preceding block, exceeding max positions) while ensuring a valid position is always returned.


1056-1093: LGTM - Commit-time statistics tracking is correctly implemented.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@XiangShanRobot
Copy link

[Generated by GEM5 Performance Robot]
commit: b9efab2
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 19.78 20.37 -2.89 🔴

[Generated by GEM5 Performance Robot]
commit: b9efab2
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 19.78 19.89 -0.52 🔴

Change-Id: Ia9bcdc028235447e254889d95e5ea98e7f067664
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/cpu/pred/btb/microtage.cc`:
- Around line 858-869: The assert on foldedHistqueue.size() is executed
immediately after foldedHistqueue.push(), which can trigger when the queue was
already at capacity; change the logic in the loop that handles foldedHistqueue
(the block that calls foldedHistqueue.push(), assert(foldedHistqueue.size() <=
2), and foldedHistqueue.pop()) so the overflow is trimmed before asserting:
either call foldedHistqueue.pop() if size > 2 and then perform the assert, or
move the assert to after the pop. Update the sequence around
foldedHistqueue.push()/pop()/assert to ensure the queue is never asserted while
transiently over capacity (refer to foldedHistqueue, push, pop, and the assert).
- Around line 723-751: The mask computation in getTageTag and getTageIndex uses
(1ULL << tableTagBits[t]) and (1ULL << tableIndexBits[t]) which is UB when the
shift equals the word width; replace these with a safe conditional that computes
mask = (bits >= sizeof(Addr)*8 ? ~static_cast<Addr>(0) : ((static_cast<Addr>(1)
<< bits) - 1)), where bits is tableTagBits[t] or tableIndexBits[t] respectively,
so getTageTag and getTageIndex produce correct masks without out-of-range
shifts.
♻️ Duplicate comments (1)
src/cpu/pred/btb/microtage.cc (1)

909-923: Empty-queue recovery returns without restoring history.

At Line 910, recoverPHist returns early when queues are empty, skipping both restoration from predMeta and the doUpdateHist update. This can leave folded histories stale after early mispredicts. Consider falling back to the snapshot and still updating.

🐛 Proposed fix
     std::shared_ptr<TageMeta> predMeta = std::static_pointer_cast<TageMeta>(entry.predMetas[getComponentIdx()]);
     if (aheadindexFoldedHist.empty() || aheadtagFoldedHist.empty()) {
-        DPRINTF(TAGE, "recoverPHist: ahead folded history queues are empty, cannot recover\n");
-        return;
+        DPRINTF(TAGE, "recoverPHist: ahead folded history queues are empty, recovering from snapshot\n");
+        for (int i = 0; i < numPredictors; i++) {
+            indexFoldedHist[i].recover(predMeta->indexFoldedHist[i]);
+            tagFoldedHist[i].recover(predMeta->tagFoldedHist[i]);
+        }
+        doUpdateHist(history, cond_taken, entry.getControlPC(), entry.getTakenTarget());
+        return;
     }

Comment on lines +723 to +751
// Calculate TAGE tag with folded history - optimized version using bitwise operations
Addr
MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)
{
// Create mask for tableTagBits[t] to limit result size
Addr mask = (1ULL << tableTagBits[t]) - 1;

// Extract lower bits of PC directly (remove instruction alignment bits)
Addr pcBits = (pc >> bankBaseShift) & mask;

// Extract and prepare folded history bits
Addr foldedBits = foldedHist & mask;

// XOR all components together, including position (like RTL)
return pcBits ^ foldedBits ^ position;
}

Addr
MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)
{
// Create mask for tableIndexBits[t] to limit result size
Addr mask = (1ULL << tableIndexBits[t]) - 1;

const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift;
Addr pcBits = (pc >> pcShift) & mask;
Addr foldedBits = foldedHist & mask;

return pcBits ^ foldedBits;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the file and check its size
wc -l src/cpu/pred/btb/microtage.cc

Repository: OpenXiangShan/GEM5

Length of output: 96


🏁 Script executed:

# Look for Addr typedef definition
rg "typedef.*Addr|using.*Addr" --type cc --type h -B2 -A2

Repository: OpenXiangShan/GEM5

Length of output: 88


🏁 Script executed:

# Examine the getTageTag and getTageIndex functions at the flagged lines
sed -n '723,751p' src/cpu/pred/btb/microtage.cc

Repository: OpenXiangShan/GEM5

Length of output: 1040


🏁 Script executed:

# Search for tableTagBits and tableIndexBits definition and usage
rg "tableTagBits|tableIndexBits" --type cc --type h -B3 -A3 | head -80

Repository: OpenXiangShan/GEM5

Length of output: 88


🏁 Script executed:

# Search for Addr typedef without file type restriction
rg "typedef.*Addr|using.*Addr" -B2 -A2 | head -40

Repository: OpenXiangShan/GEM5

Length of output: 2356


🏁 Script executed:

# Find header files in the pred/btb directory
find . -path "*pred/btb*" -name "*.hh" -o -path "*pred/btb*" -name "*.h"

Repository: OpenXiangShan/GEM5

Length of output: 839


🏁 Script executed:

# Search for MicroTAGE class definition
rg "class MicroTAGE" -B2 -A10

Repository: OpenXiangShan/GEM5

Length of output: 1953


🏁 Script executed:

# Look for tableTagBits and tableIndexBits declarations
rg "tableTagBits\[|tableIndexBits\[" -B3 -A3

Repository: OpenXiangShan/GEM5

Length of output: 18657


🏁 Script executed:

# Find Addr type definition
rg "typedef.*Addr|using.*Addr" | grep -v "AddressMap\|AddressIndex\|Addr:" | head -20

Repository: OpenXiangShan/GEM5

Length of output: 1866


🏁 Script executed:

# Look for Addr in common headers
find . -path "*/sim/*" -name "*.hh" -o -path "*/base/*" -name "*.hh" | xargs rg "typedef.*Addr[^a-zA-Z]" 2>/dev/null | head -10

Repository: OpenXiangShan/GEM5

Length of output: 254


🏁 Script executed:

# Search for ceilLog2 function
rg "ceilLog2" -B5 -A5 | head -60

Repository: OpenXiangShan/GEM5

Length of output: 3418


🏁 Script executed:

# Check the BranchPredictor.py for MicroTAGE parameter ranges
sed -n '/class MicroTAGE/,/^class /p' src/cpu/pred/BranchPredictor.py | head -100

Repository: OpenXiangShan/GEM5

Length of output: 1867


🏁 Script executed:

# Find ceilLog2 definition
rg "ceilLog2\(" -B3 -A10 --max-count=5

Repository: OpenXiangShan/GEM5

Length of output: 36609


🏁 Script executed:

# Check microtage.hh header for member variable types
sed -n '1,150p' src/cpu/pred/btb/microtage.hh | grep -A5 -B5 "tableTagBits\|tableIndexBits"

Repository: OpenXiangShan/GEM5

Length of output: 44


🏁 Script executed:

# Look at full constructor to see initialization constraints
sed -n '/MicroTAGE::MicroTAGE/,/^}/p' src/cpu/pred/btb/microtage.cc | head -100

Repository: OpenXiangShan/GEM5

Length of output: 3574


🏁 Script executed:

# Check for any bounds checking on tableTagBits or tableIndexBits values
rg "tableTagBits|tableIndexBits" src/cpu/pred/btb/microtage.cc -B2 -A2 | grep -E "assert|if.*<|if.*>|warning|fatal"

Repository: OpenXiangShan/GEM5

Length of output: 223


🏁 Script executed:

# Look at the microtage.hh header to see member variable declarations
cat src/cpu/pred/btb/microtage.hh | head -200

Repository: OpenXiangShan/GEM5

Length of output: 7396


🏁 Script executed:

# Check actual table size limits from Python config
grep -A30 "class MicroTAGE" src/cpu/pred/BranchPredictor.py | grep -E "tableSizes|TTagBitSizes|blockSize"

Repository: OpenXiangShan/GEM5

Length of output: 298


🏁 Script executed:

# Check for any parameter validation or clamping on tableTagBits or tableIndexBits
rg "TTagBitSizes|tableTagBits|tableIndexBits" src/cpu/pred/ -B5 -A5 | grep -E "assert|clamp|min|max|fatal|if.*<|if.*>" | head -30

Repository: OpenXiangShan/GEM5

Length of output: 2702


🏁 Script executed:

# Look at the intmath implementation to confirm ceilLog2 behavior
sed -n '/^ceilLog2/,/^}/p' src/base/intmath.hh

Repository: OpenXiangShan/GEM5

Length of output: 176


Guard mask generation against 64-bit shifts.

On lines 728 and 744, (1ULL << tableTagBits[t]) and (1ULL << tableIndexBits[t]) trigger undefined behavior when the shift amount equals 64 (the bit width of Addr). While default parameters avoid this, unconstrained table sizes could exceed 2^64, causing the shift to be out-of-range. Implement safe masking.

🐛 Proposed fix
 Addr
 MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)
 {
     // Create mask for tableTagBits[t] to limit result size
-    Addr mask = (1ULL << tableTagBits[t]) - 1;
+    const unsigned bits = tableTagBits[t];
+    const unsigned addrBits = sizeof(Addr) * 8;
+    Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);
@@
 Addr
 MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)
 {
     // Create mask for tableIndexBits[t] to limit result size
-    Addr mask = (1ULL << tableIndexBits[t]) - 1;
+    const unsigned bits = tableIndexBits[t];
+    const unsigned addrBits = sizeof(Addr) * 8;
+    Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Calculate TAGE tag with folded history - optimized version using bitwise operations
Addr
MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)
{
// Create mask for tableTagBits[t] to limit result size
Addr mask = (1ULL << tableTagBits[t]) - 1;
// Extract lower bits of PC directly (remove instruction alignment bits)
Addr pcBits = (pc >> bankBaseShift) & mask;
// Extract and prepare folded history bits
Addr foldedBits = foldedHist & mask;
// XOR all components together, including position (like RTL)
return pcBits ^ foldedBits ^ position;
}
Addr
MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)
{
// Create mask for tableIndexBits[t] to limit result size
Addr mask = (1ULL << tableIndexBits[t]) - 1;
const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift;
Addr pcBits = (pc >> pcShift) & mask;
Addr foldedBits = foldedHist & mask;
return pcBits ^ foldedBits;
}
// Calculate TAGE tag with folded history - optimized version using bitwise operations
Addr
MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)
{
// Create mask for tableTagBits[t] to limit result size
const unsigned bits = tableTagBits[t];
const unsigned addrBits = sizeof(Addr) * 8;
Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);
// Extract lower bits of PC directly (remove instruction alignment bits)
Addr pcBits = (pc >> bankBaseShift) & mask;
// Extract and prepare folded history bits
Addr foldedBits = foldedHist & mask;
// XOR all components together, including position (like RTL)
return pcBits ^ foldedBits ^ position;
}
Addr
MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)
{
// Create mask for tableIndexBits[t] to limit result size
const unsigned bits = tableIndexBits[t];
const unsigned addrBits = sizeof(Addr) * 8;
Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);
const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift;
Addr pcBits = (pc >> pcShift) & mask;
Addr foldedBits = foldedHist & mask;
return pcBits ^ foldedBits;
}
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/microtage.cc` around lines 723 - 751, The mask computation
in getTageTag and getTageIndex uses (1ULL << tableTagBits[t]) and (1ULL <<
tableIndexBits[t]) which is UB when the shift equals the word width; replace
these with a safe conditional that computes mask = (bits >= sizeof(Addr)*8 ?
~static_cast<Addr>(0) : ((static_cast<Addr>(1) << bits) - 1)), where bits is
tableTagBits[t] or tableIndexBits[t] respectively, so getTageTag and
getTageIndex produce correct masks without out-of-range shifts.

Comment on lines +858 to +869
for (int type = 0; type < 2; type++) {
auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist;
auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist;
for (int t = 0; t < numPredictors; t++) {
foldedHist[t].update(history, 2, taken, pc, target);
DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get());
}
foldedHistqueue.push(foldedHist);
assert(foldedHistqueue.size() <= 2);
if (foldedHistqueue.size() > 2) {
foldedHistqueue.pop();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Queue-size assert fires before overflow is trimmed.

Line 866 asserts size <= 2 immediately after push. If the queue is already size 2, this will trip before the pop executes. Move the assert after the pop (or pop first).

🐛 Proposed fix
         for (int t = 0; t < numPredictors; t++) {
             foldedHist[t].update(history, 2, taken, pc, target);
             DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get());
         }
         foldedHistqueue.push(foldedHist);
-        assert(foldedHistqueue.size() <= 2);
         if (foldedHistqueue.size() > 2) {
             foldedHistqueue.pop();
         }
+        assert(foldedHistqueue.size() <= 2);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (int type = 0; type < 2; type++) {
auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist;
auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist;
for (int t = 0; t < numPredictors; t++) {
foldedHist[t].update(history, 2, taken, pc, target);
DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get());
}
foldedHistqueue.push(foldedHist);
assert(foldedHistqueue.size() <= 2);
if (foldedHistqueue.size() > 2) {
foldedHistqueue.pop();
}
for (int type = 0; type < 2; type++) {
auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist;
auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist;
for (int t = 0; t < numPredictors; t++) {
foldedHist[t].update(history, 2, taken, pc, target);
DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get());
}
foldedHistqueue.push(foldedHist);
if (foldedHistqueue.size() > 2) {
foldedHistqueue.pop();
}
assert(foldedHistqueue.size() <= 2);
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/microtage.cc` around lines 858 - 869, The assert on
foldedHistqueue.size() is executed immediately after foldedHistqueue.push(),
which can trigger when the queue was already at capacity; change the logic in
the loop that handles foldedHistqueue (the block that calls
foldedHistqueue.push(), assert(foldedHistqueue.size() <= 2), and
foldedHistqueue.pop()) so the overflow is trimmed before asserting: either call
foldedHistqueue.pop() if size > 2 and then perform the assert, or move the
assert to after the pop. Update the sequence around
foldedHistqueue.push()/pop()/assert to ensure the queue is never asserted while
transiently over capacity (refer to foldedHistqueue, push, pop, and the assert).

Change-Id: I56614e8ebc2dd33320d353562087ab456fc452da
@XiangShanRobot
Copy link

[Generated by GEM5 Performance Robot]
commit: ed95dbf
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Master Diff(%)
Score 19.77 20.29 -2.56 🔴

[Generated by GEM5 Performance Robot]
commit: ed95dbf
workflow: gem5 Ideal BTB Performance Test

Ideal BTB Performance

Overall Score

PR Previous Commit Diff(%)
Score 19.77 19.78 -0.04 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants