Conversation
Change-Id: I8b6a4bda945564cc961251fc1297a6bacd8636cb
Change-Id: I44b6175022a3a593ed385407bd95ec0c40c74642
Change-Id: I16ce79a7d8488d9a138d0a26f5500576ae54132e
Change-Id: I8be037a1cdacd7151f4fe8e743a32cca90a85036
Change-Id: I6b9c92c18c574a12c19532ae0b894e64c1187342
Change-Id: I953965b8e6feb1c15a238ac832d65bc16b32f496
Change-Id: I39cd471d452aa343c3dd741a80fdfa7d126e3a9f
Change-Id: I3567ae93652aac218c5b4646003abadddaf7cf32
Change-Id: If7c9d3aa68a23c36dde74d8cf3a286c9c48f3e3c
Change-Id: I83277ae5c801e9d22b594286580459d12cdec69b
Change-Id: I394246af184d3f07e02b85f06e4e5ceed368ec22
Change-Id: I762c11f8d15262fb9f1c9d443f77895fa76bbc79
📝 WalkthroughWalkthroughAdds a new standalone MicroTAGE branch predictor (header + implementation), integrates it into build/config and DecoupledBPUWithBTB wiring, expands predictor parameters in BranchPredictor, and toggles microtage usage in example configs. Changes
Sequence Diagram(s)sequenceDiagram
participant Fetch as Fetch Unit
participant MT as MicroTAGE
participant Hist as History Mgmt
participant Tables as TAGE Tables
participant Meta as Meta Storage
rect rgba(200,220,240,0.5)
note over Fetch,Meta: Prediction Path
Fetch->>MT: putPCHistory(startPC, history)
MT->>Hist: update folded history
MT->>Tables: generateSinglePrediction(startPC, foldedHist)
Tables-->>MT: index/tag matches & candidate
MT->>Meta: store prediction snapshot (TageMeta)
MT-->>Fetch: return FullBTBPrediction
end
rect rgba(220,240,200,0.5)
note over Fetch,Tables: Update Path
Fetch->>MT: update(FetchStream)
MT->>Hist: compute actual outcome & spec/history updates
MT->>Tables: check allocation & update counters
alt allocate new entry
MT->>Tables: handleNewEntryAllocation(...)
Tables-->>MT: allocated index/way
end
MT->>Hist: doUpdateHist(resolve history)
MT->>Meta: finalize prediction state
MT-->>Fetch: complete update
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/cpu/pred/btb/microtage.cc`:
- Around line 905-920: In MicroTAGE::recoverPHist guard uses of
aheadindexFoldedHist.front() and aheadtagFoldedHist.front() by checking each
queue's emptiness before calling .front(): for each type, if the corresponding
queue (aheadindexFoldedHist or aheadtagFoldedHist) is empty, skip the inner
recovery loop (or otherwise handle the empty case) so you don't call .front() on
an empty container; then, when non-empty, retrieve foldedHistQueuefront =
...front() and call foldedHistQueuefront[i].recover(predMeta->indexFoldedHist[i]
or predMeta->tagFoldedHist[i]) as currently done. Ensure both queues are checked
and that predMeta->indexFoldedHist / predMeta->tagFoldedHist lengths match
numPredictors.
🧹 Nitpick comments (3)
src/cpu/pred/btb/microtage.cc (3)
1022-1054: Consider removing unused LRU functions.These LRU helper functions (
updateLRU,getLRUVictim) appear to be unused - comments ingenerateSinglePrediction(line 226) and the allocation logic both indicate LRU is not used. If these are truly dead code, consider removing them to reduce maintenance burden.
724-751: Consider adding bounds check for shift amounts.If
tableTagBits[t]ortableIndexBits[t]equals or exceeds 64, the shift operations at lines 728 and 744 would invoke undefined behavior. While typical TAGE configurations use values well under 64, adding a defensive check or static_assert would prevent subtle bugs with unusual configurations.♻️ Optional defensive check
Addr MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position) { + assert(tableTagBits[t] < 64 && "Tag bits must be < 64 to avoid undefined shift"); // Create mask for tableTagBits[t] to limit result size Addr mask = (1ULL << tableTagBits[t]) - 1; ... } Addr MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist) { + assert(tableIndexBits[t] < 64 && "Index bits must be < 64 to avoid undefined shift"); // Create mask for tableIndexBits[t] to limit result size Addr mask = (1ULL << tableIndexBits[t]) - 1; ... }
858-871: Minor: Consider usingunsignedfor loop variables.Loop variables at lines 861, 914, and 932 use
intwhilenumPredictorsisunsigned. While safe in practice, usingunsignedwould be more consistent and avoid potential compiler warnings.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
configs/example/idealkmhv3.pyconfigs/example/kmhv3.pysrc/cpu/pred/BranchPredictor.pysrc/cpu/pred/SConscriptsrc/cpu/pred/btb/decoupled_bpred.hhsrc/cpu/pred/btb/microtage.ccsrc/cpu/pred/btb/microtage.hh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-12T02:49:12.886Z
Learnt from: jensen-yan
Repo: OpenXiangShan/GEM5 PR: 686
File: src/cpu/pred/btb/btb_mgsc.cc:226-230
Timestamp: 2026-01-12T02:49:12.886Z
Learning: In src/cpu/pred/btb/btb_mgsc.cc, the calculateScaledPercsum function returns the unscaled percsum (weight scaling disabled) to align with RTL implementation. The RTL currently does not have a weight table, so setting weight to 1 (equivalent to disabling the weight table) maintains hardware/software consistency with minimal performance impact.
Applied to files:
src/cpu/pred/btb/decoupled_bpred.hh
🧬 Code graph analysis (2)
configs/example/idealkmhv3.py (2)
src/cpu/o3/fetch.hh (5)
branchPred(484-484)branchPred(955-955)branchPred(957-957)branchPred(959-959)branchPred(961-961)src/cpu/pred/btb/timed_base_pred.hh (1)
enabled(108-108)
src/cpu/pred/btb/microtage.cc (3)
src/cpu/pred/btb/microtage.hh (36)
MicroTAGE(49-49)MicroTAGE(113-113)MicroTAGE(115-115)entry(88-90)entry(152-152)entry(153-153)entry(154-154)entry(405-408)TageTableInfo(85-85)TageTableInfo(86-87)taken(267-267)TagePrediction(104-104)TagePrediction(105-108)pred(365-365)history(130-130)history(134-135)history(139-142)history(139-139)history(144-148)history(144-145)history(163-163)history(199-199)history(377-377)TageEntry(66-66)TageEntry(68-69)table(422-422)table(423-423)counter(70-72)max(270-270)pc(173-173)pc(176-176)pc(180-180)pc(184-184)pc(187-189)pc(187-187)pc(196-196)src/cpu/pred/BranchPredictor.py (2)
MicroTAGE(1066-1092)TAGE(169-174)src/mem/ruby/common/Histogram.hh (1)
hist(51-51)
🪛 Cppcheck (2.19.0)
src/cpu/pred/btb/microtage.cc
[error] 119-119: Reference to temporary returned.
(returnTempReference)
[error] 132-132: Reference to temporary returned.
(returnTempReference)
[error] 125-125: Shifting 64-bit value by 64 bits is undefined behaviour
(shiftTooManyBits)
[error] 138-138: Shifting 64-bit value by 64 bits is undefined behaviour
(shiftTooManyBits)
🪛 Ruff (0.14.11)
src/cpu/pred/BranchPredictor.py
1072-1072: Param may be undefined, or defined from star imports
(F405)
1073-1073: Param may be undefined, or defined from star imports
(F405)
1074-1074: Param may be undefined, or defined from star imports
(F405)
1076-1076: Param may be undefined, or defined from star imports
(F405)
1077-1077: VectorParam may be undefined, or defined from star imports
(F405)
1078-1078: VectorParam may be undefined, or defined from star imports
(F405)
1079-1079: VectorParam may be undefined, or defined from star imports
(F405)
1080-1080: Param may be undefined, or defined from star imports
(F405)
1082-1082: VectorParam may be undefined, or defined from star imports
(F405)
1083-1083: Param may be undefined, or defined from star imports
(F405)
1084-1084: Param may be undefined, or defined from star imports
(F405)
1085-1085: Param may be undefined, or defined from star imports
(F405)
1086-1086: Param may be undefined, or defined from star imports
(F405)
1087-1087: Param may be undefined, or defined from star imports
(F405)
1088-1088: Param may be undefined, or defined from star imports
(F405)
1089-1089: Param may be undefined, or defined from star imports
(F405)
1090-1090: Param may be undefined, or defined from star imports
(F405)
1091-1091: Param may be undefined, or defined from star imports
(F405)
1092-1092: Param may be undefined, or defined from star imports
(F405)
1182-1182: Param may be undefined, or defined from star imports
(F405)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Quick Build, Unit Tests & Smoke Test
- GitHub Check: perf_test / XS-GEM5 - Run performance test (spec06-0.8c)
🔇 Additional comments (19)
configs/example/idealkmhv3.py (1)
82-82: LGTM!Disabling the microtage predictor in the ideal configuration is consistent with the paired configuration pattern (kmhv3.py enables it). The
enabledparameter is inherited fromTimedBaseBTBPredictor.configs/example/kmhv3.py (1)
94-109: LGTM!The microtage predictor is correctly enabled alongside other predictor components (ubtb, abtb, mbtb, tage, ittage, ras) for the
DecoupledBPUWithBTBconfiguration. The placement follows the existing pattern.src/cpu/pred/btb/decoupled_bpred.hh (2)
26-27: LGTM!The new
microtage.hhinclude is correctly added for the standalone MicroTAGE class.
107-107: LGTM!The member type change from
BTBTAGE*toMicroTAGE*correctly reflects the new standalone MicroTAGE class introduced in this PR.src/cpu/pred/SConscript (2)
52-53: LGTM!
MicroTAGEis correctly registered as a SimObject alongside other BTB-based predictors (BTBTAGE,BTBITTAGE,BTBMGSC).
105-105: LGTM!The new
btb/microtage.ccsource file is correctly added to the build in the appropriate location with other BTB predictor sources.src/cpu/pred/BranchPredictor.py (2)
1182-1182: LGTM!The
microtageparameter is correctly wired usingParam.MicroTAGE(MicroTAGE(), ...), consistent with the new standalone MicroTAGE class.
1066-1092: LGTM! Well-structured MicroTAGE predictor definition.The MicroTAGE class is correctly defined as a smaller, faster variant of BTBTAGE with proper parameter consistency validated in the C++ constructor:
numPredictors = 1with matching single-element vectors (tableSizes,TTagBitSizes,TTagPcShifts,histLengths)- C++ constructor enforces vector length validation via assertions (lines 109, 118, 120, 123 in microtage.cc)
numDelay = 0for low-latency prediction alongside uBTBenableBankConflict = Falseappropriate for single-table predictor (bank simulation logic only active when enabled)src/cpu/pred/btb/microtage.hh (4)
1-26: LGTM - Include guards and conditional includes are well-structured.The header uses proper include guards and clearly separates UNIT_TEST from production dependencies.
56-109: LGTM - Well-documented nested data structures for TAGE prediction.The structs
TageEntry,TageTableInfo, andTagePredictionare clearly documented and properly encapsulate the TAGE prediction state. The counter range (-4 to 3) noted in the comment is appropriately enforced by saturation logic in the implementation.
298-300: Verify queue safety in usage sites.These queues store ahead-of-time folded history snapshots. Ensure all
.front()calls in the implementation are guarded by.empty()checks to avoid undefined behavior.
383-426: LGTM - TageMeta and class members are appropriately structured.The metadata structure properly captures folded history snapshots for update-time index/tag recalculation.
src/cpu/pred/btb/microtage.cc (7)
36-135: LGTM - Constructor initializes predictor tables and folded histories correctly.Both UNIT_TEST and production constructors properly initialize the TAGE tables, index/tag masks, and folded history structures. The assertions ensure parameter vectors have sufficient entries.
192-256: LGTM - TAGE table lookup logic is correct.The search from highest to lowest table, position encoding in tags, and single-match enforcement align with standard TAGE predictor design.
848-871: Potential logic issue: member variables may not be updated as intended.The second loop creates a copy of the folded history vectors:
auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist; // copy, not referenceUpdates at line 862 modify this local copy, and while the copy is pushed to the queue, the actual member variables (
indexFoldedHist,tagFoldedHist) are never updated in the second loop.If the intent is to update ahead-of-time history in the queue while keeping current history unchanged until consumed from the queue (in the first loop), this is correct. However, the first loop has an early
breakif the queue is empty, which means on the first invocation, member variables won't be updated at all.Please verify this is the intended behavior for the ahead-history optimization.
608-679: LGTM - Update logic correctly handles prediction state updates and allocation.The update path properly:
- Re-reads providers when
updateOnReadis enabled (for bank conflict simulation)- Falls back to cached predictions otherwise
- Triggers allocation only on mispredictions
487-559: LGTM - Allocation logic follows established TAGE replacement policy.The allocation strategy (prefer invalid/weak ways, apply age penalty, periodic useful-bit reset) aligns with CBP-style TAGE predictors.
783-806: LGTM - Defensive handling of edge cases in position calculation.The function appropriately warns about unusual cases (branch preceding block, exceeding max positions) while ensuring a valid position is always returned.
1056-1093: LGTM - Commit-time statistics tracking is correctly implemented.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Change-Id: Ia9bcdc028235447e254889d95e5ea98e7f067664
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/cpu/pred/btb/microtage.cc`:
- Around line 858-869: The assert on foldedHistqueue.size() is executed
immediately after foldedHistqueue.push(), which can trigger when the queue was
already at capacity; change the logic in the loop that handles foldedHistqueue
(the block that calls foldedHistqueue.push(), assert(foldedHistqueue.size() <=
2), and foldedHistqueue.pop()) so the overflow is trimmed before asserting:
either call foldedHistqueue.pop() if size > 2 and then perform the assert, or
move the assert to after the pop. Update the sequence around
foldedHistqueue.push()/pop()/assert to ensure the queue is never asserted while
transiently over capacity (refer to foldedHistqueue, push, pop, and the assert).
- Around line 723-751: The mask computation in getTageTag and getTageIndex uses
(1ULL << tableTagBits[t]) and (1ULL << tableIndexBits[t]) which is UB when the
shift equals the word width; replace these with a safe conditional that computes
mask = (bits >= sizeof(Addr)*8 ? ~static_cast<Addr>(0) : ((static_cast<Addr>(1)
<< bits) - 1)), where bits is tableTagBits[t] or tableIndexBits[t] respectively,
so getTageTag and getTageIndex produce correct masks without out-of-range
shifts.
♻️ Duplicate comments (1)
src/cpu/pred/btb/microtage.cc (1)
909-923: Empty-queue recovery returns without restoring history.At Line 910,
recoverPHistreturns early when queues are empty, skipping both restoration frompredMetaand thedoUpdateHistupdate. This can leave folded histories stale after early mispredicts. Consider falling back to the snapshot and still updating.🐛 Proposed fix
std::shared_ptr<TageMeta> predMeta = std::static_pointer_cast<TageMeta>(entry.predMetas[getComponentIdx()]); if (aheadindexFoldedHist.empty() || aheadtagFoldedHist.empty()) { - DPRINTF(TAGE, "recoverPHist: ahead folded history queues are empty, cannot recover\n"); - return; + DPRINTF(TAGE, "recoverPHist: ahead folded history queues are empty, recovering from snapshot\n"); + for (int i = 0; i < numPredictors; i++) { + indexFoldedHist[i].recover(predMeta->indexFoldedHist[i]); + tagFoldedHist[i].recover(predMeta->tagFoldedHist[i]); + } + doUpdateHist(history, cond_taken, entry.getControlPC(), entry.getTakenTarget()); + return; }
| // Calculate TAGE tag with folded history - optimized version using bitwise operations | ||
| Addr | ||
| MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position) | ||
| { | ||
| // Create mask for tableTagBits[t] to limit result size | ||
| Addr mask = (1ULL << tableTagBits[t]) - 1; | ||
|
|
||
| // Extract lower bits of PC directly (remove instruction alignment bits) | ||
| Addr pcBits = (pc >> bankBaseShift) & mask; | ||
|
|
||
| // Extract and prepare folded history bits | ||
| Addr foldedBits = foldedHist & mask; | ||
|
|
||
| // XOR all components together, including position (like RTL) | ||
| return pcBits ^ foldedBits ^ position; | ||
| } | ||
|
|
||
| Addr | ||
| MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist) | ||
| { | ||
| // Create mask for tableIndexBits[t] to limit result size | ||
| Addr mask = (1ULL << tableIndexBits[t]) - 1; | ||
|
|
||
| const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift; | ||
| Addr pcBits = (pc >> pcShift) & mask; | ||
| Addr foldedBits = foldedHist & mask; | ||
|
|
||
| return pcBits ^ foldedBits; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the file and check its size
wc -l src/cpu/pred/btb/microtage.ccRepository: OpenXiangShan/GEM5
Length of output: 96
🏁 Script executed:
# Look for Addr typedef definition
rg "typedef.*Addr|using.*Addr" --type cc --type h -B2 -A2Repository: OpenXiangShan/GEM5
Length of output: 88
🏁 Script executed:
# Examine the getTageTag and getTageIndex functions at the flagged lines
sed -n '723,751p' src/cpu/pred/btb/microtage.ccRepository: OpenXiangShan/GEM5
Length of output: 1040
🏁 Script executed:
# Search for tableTagBits and tableIndexBits definition and usage
rg "tableTagBits|tableIndexBits" --type cc --type h -B3 -A3 | head -80Repository: OpenXiangShan/GEM5
Length of output: 88
🏁 Script executed:
# Search for Addr typedef without file type restriction
rg "typedef.*Addr|using.*Addr" -B2 -A2 | head -40Repository: OpenXiangShan/GEM5
Length of output: 2356
🏁 Script executed:
# Find header files in the pred/btb directory
find . -path "*pred/btb*" -name "*.hh" -o -path "*pred/btb*" -name "*.h"Repository: OpenXiangShan/GEM5
Length of output: 839
🏁 Script executed:
# Search for MicroTAGE class definition
rg "class MicroTAGE" -B2 -A10Repository: OpenXiangShan/GEM5
Length of output: 1953
🏁 Script executed:
# Look for tableTagBits and tableIndexBits declarations
rg "tableTagBits\[|tableIndexBits\[" -B3 -A3Repository: OpenXiangShan/GEM5
Length of output: 18657
🏁 Script executed:
# Find Addr type definition
rg "typedef.*Addr|using.*Addr" | grep -v "AddressMap\|AddressIndex\|Addr:" | head -20Repository: OpenXiangShan/GEM5
Length of output: 1866
🏁 Script executed:
# Look for Addr in common headers
find . -path "*/sim/*" -name "*.hh" -o -path "*/base/*" -name "*.hh" | xargs rg "typedef.*Addr[^a-zA-Z]" 2>/dev/null | head -10Repository: OpenXiangShan/GEM5
Length of output: 254
🏁 Script executed:
# Search for ceilLog2 function
rg "ceilLog2" -B5 -A5 | head -60Repository: OpenXiangShan/GEM5
Length of output: 3418
🏁 Script executed:
# Check the BranchPredictor.py for MicroTAGE parameter ranges
sed -n '/class MicroTAGE/,/^class /p' src/cpu/pred/BranchPredictor.py | head -100Repository: OpenXiangShan/GEM5
Length of output: 1867
🏁 Script executed:
# Find ceilLog2 definition
rg "ceilLog2\(" -B3 -A10 --max-count=5Repository: OpenXiangShan/GEM5
Length of output: 36609
🏁 Script executed:
# Check microtage.hh header for member variable types
sed -n '1,150p' src/cpu/pred/btb/microtage.hh | grep -A5 -B5 "tableTagBits\|tableIndexBits"Repository: OpenXiangShan/GEM5
Length of output: 44
🏁 Script executed:
# Look at full constructor to see initialization constraints
sed -n '/MicroTAGE::MicroTAGE/,/^}/p' src/cpu/pred/btb/microtage.cc | head -100Repository: OpenXiangShan/GEM5
Length of output: 3574
🏁 Script executed:
# Check for any bounds checking on tableTagBits or tableIndexBits values
rg "tableTagBits|tableIndexBits" src/cpu/pred/btb/microtage.cc -B2 -A2 | grep -E "assert|if.*<|if.*>|warning|fatal"Repository: OpenXiangShan/GEM5
Length of output: 223
🏁 Script executed:
# Look at the microtage.hh header to see member variable declarations
cat src/cpu/pred/btb/microtage.hh | head -200Repository: OpenXiangShan/GEM5
Length of output: 7396
🏁 Script executed:
# Check actual table size limits from Python config
grep -A30 "class MicroTAGE" src/cpu/pred/BranchPredictor.py | grep -E "tableSizes|TTagBitSizes|blockSize"Repository: OpenXiangShan/GEM5
Length of output: 298
🏁 Script executed:
# Check for any parameter validation or clamping on tableTagBits or tableIndexBits
rg "TTagBitSizes|tableTagBits|tableIndexBits" src/cpu/pred/ -B5 -A5 | grep -E "assert|clamp|min|max|fatal|if.*<|if.*>" | head -30Repository: OpenXiangShan/GEM5
Length of output: 2702
🏁 Script executed:
# Look at the intmath implementation to confirm ceilLog2 behavior
sed -n '/^ceilLog2/,/^}/p' src/base/intmath.hhRepository: OpenXiangShan/GEM5
Length of output: 176
Guard mask generation against 64-bit shifts.
On lines 728 and 744, (1ULL << tableTagBits[t]) and (1ULL << tableIndexBits[t]) trigger undefined behavior when the shift amount equals 64 (the bit width of Addr). While default parameters avoid this, unconstrained table sizes could exceed 2^64, causing the shift to be out-of-range. Implement safe masking.
🐛 Proposed fix
Addr
MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position)
{
// Create mask for tableTagBits[t] to limit result size
- Addr mask = (1ULL << tableTagBits[t]) - 1;
+ const unsigned bits = tableTagBits[t];
+ const unsigned addrBits = sizeof(Addr) * 8;
+ Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);
@@
Addr
MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist)
{
// Create mask for tableIndexBits[t] to limit result size
- Addr mask = (1ULL << tableIndexBits[t]) - 1;
+ const unsigned bits = tableIndexBits[t];
+ const unsigned addrBits = sizeof(Addr) * 8;
+ Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Calculate TAGE tag with folded history - optimized version using bitwise operations | |
| Addr | |
| MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position) | |
| { | |
| // Create mask for tableTagBits[t] to limit result size | |
| Addr mask = (1ULL << tableTagBits[t]) - 1; | |
| // Extract lower bits of PC directly (remove instruction alignment bits) | |
| Addr pcBits = (pc >> bankBaseShift) & mask; | |
| // Extract and prepare folded history bits | |
| Addr foldedBits = foldedHist & mask; | |
| // XOR all components together, including position (like RTL) | |
| return pcBits ^ foldedBits ^ position; | |
| } | |
| Addr | |
| MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist) | |
| { | |
| // Create mask for tableIndexBits[t] to limit result size | |
| Addr mask = (1ULL << tableIndexBits[t]) - 1; | |
| const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift; | |
| Addr pcBits = (pc >> pcShift) & mask; | |
| Addr foldedBits = foldedHist & mask; | |
| return pcBits ^ foldedBits; | |
| } | |
| // Calculate TAGE tag with folded history - optimized version using bitwise operations | |
| Addr | |
| MicroTAGE::getTageTag(Addr pc, int t, uint64_t foldedHist, Addr position) | |
| { | |
| // Create mask for tableTagBits[t] to limit result size | |
| const unsigned bits = tableTagBits[t]; | |
| const unsigned addrBits = sizeof(Addr) * 8; | |
| Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1); | |
| // Extract lower bits of PC directly (remove instruction alignment bits) | |
| Addr pcBits = (pc >> bankBaseShift) & mask; | |
| // Extract and prepare folded history bits | |
| Addr foldedBits = foldedHist & mask; | |
| // XOR all components together, including position (like RTL) | |
| return pcBits ^ foldedBits ^ position; | |
| } | |
| Addr | |
| MicroTAGE::getTageIndex(Addr pc, int t, uint64_t foldedHist) | |
| { | |
| // Create mask for tableIndexBits[t] to limit result size | |
| const unsigned bits = tableIndexBits[t]; | |
| const unsigned addrBits = sizeof(Addr) * 8; | |
| Addr mask = (bits >= addrBits) ? ~Addr(0) : ((Addr(1) << bits) - 1); | |
| const unsigned pcShift = enableBankConflict ? indexShift : bankBaseShift; | |
| Addr pcBits = (pc >> pcShift) & mask; | |
| Addr foldedBits = foldedHist & mask; | |
| return pcBits ^ foldedBits; | |
| } |
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/microtage.cc` around lines 723 - 751, The mask computation
in getTageTag and getTageIndex uses (1ULL << tableTagBits[t]) and (1ULL <<
tableIndexBits[t]) which is UB when the shift equals the word width; replace
these with a safe conditional that computes mask = (bits >= sizeof(Addr)*8 ?
~static_cast<Addr>(0) : ((static_cast<Addr>(1) << bits) - 1)), where bits is
tableTagBits[t] or tableIndexBits[t] respectively, so getTageTag and
getTageIndex produce correct masks without out-of-range shifts.
| for (int type = 0; type < 2; type++) { | ||
| auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist; | ||
| auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist; | ||
| for (int t = 0; t < numPredictors; t++) { | ||
| foldedHist[t].update(history, 2, taken, pc, target); | ||
| DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get()); | ||
| } | ||
| foldedHistqueue.push(foldedHist); | ||
| assert(foldedHistqueue.size() <= 2); | ||
| if (foldedHistqueue.size() > 2) { | ||
| foldedHistqueue.pop(); | ||
| } |
There was a problem hiding this comment.
Queue-size assert fires before overflow is trimmed.
Line 866 asserts size <= 2 immediately after push. If the queue is already size 2, this will trip before the pop executes. Move the assert after the pop (or pop first).
🐛 Proposed fix
for (int t = 0; t < numPredictors; t++) {
foldedHist[t].update(history, 2, taken, pc, target);
DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get());
}
foldedHistqueue.push(foldedHist);
- assert(foldedHistqueue.size() <= 2);
if (foldedHistqueue.size() > 2) {
foldedHistqueue.pop();
}
+ assert(foldedHistqueue.size() <= 2);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (int type = 0; type < 2; type++) { | |
| auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist; | |
| auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist; | |
| for (int t = 0; t < numPredictors; t++) { | |
| foldedHist[t].update(history, 2, taken, pc, target); | |
| DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get()); | |
| } | |
| foldedHistqueue.push(foldedHist); | |
| assert(foldedHistqueue.size() <= 2); | |
| if (foldedHistqueue.size() > 2) { | |
| foldedHistqueue.pop(); | |
| } | |
| for (int type = 0; type < 2; type++) { | |
| auto foldedHist = type == 0 ? indexFoldedHist : tagFoldedHist; | |
| auto &foldedHistqueue = type == 0 ? aheadindexFoldedHist : aheadtagFoldedHist; | |
| for (int t = 0; t < numPredictors; t++) { | |
| foldedHist[t].update(history, 2, taken, pc, target); | |
| DPRINTF(TAGEHistory, "t: %d, type: %d, foldedHist _folded 0x%lx\n", t, type, foldedHist[t].get()); | |
| } | |
| foldedHistqueue.push(foldedHist); | |
| if (foldedHistqueue.size() > 2) { | |
| foldedHistqueue.pop(); | |
| } | |
| assert(foldedHistqueue.size() <= 2); |
🤖 Prompt for AI Agents
In `@src/cpu/pred/btb/microtage.cc` around lines 858 - 869, The assert on
foldedHistqueue.size() is executed immediately after foldedHistqueue.push(),
which can trigger when the queue was already at capacity; change the logic in
the loop that handles foldedHistqueue (the block that calls
foldedHistqueue.push(), assert(foldedHistqueue.size() <= 2), and
foldedHistqueue.pop()) so the overflow is trimmed before asserting: either call
foldedHistqueue.pop() if size > 2 and then perform the assert, or move the
assert to after the pop. Update the sequence around
foldedHistqueue.push()/pop()/assert to ensure the queue is never asserted while
transiently over capacity (refer to foldedHistqueue, push, pop, and the assert).
Change-Id: I56614e8ebc2dd33320d353562087ab456fc452da
|
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
[Generated by GEM5 Performance Robot] Ideal BTB PerformanceOverall Score
|
Summary by CodeRabbit
New Features
Chores
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.