Conversation
conditional missprediction see a significant increase after this commit, it might be that TAGE's history are affected by two-taken and thus needs fixing Change-Id: Ic4d0ea2cb7b05ed466e91ae7a606f6e8c376fe7d
this commit is based on the BPU tick implemented in the previous commit: "functional implementation of 2taken". Unlike the previous commit where 2 ubtbs are combined to produce 2 taken predictions, in this commit we modified our ubtb entry format as well as prediction/ training logic to support two taken. Change-Id: Ie9802a231f0e3607f49edb2415a040934a0beef3
This commit introduces new statistics to monitor the performance of 2taken: 1. tracks how many 1-taken and 2-taken are produced in BPU pipeline 2. control squashes caused by the second predictions in 2taken. 3. tracks how many second predictions are commited
this commit clears the ubtb during squashing, we aim to fix the pattern where there's a wrong 2nd branch in a ubtb entry that couldn't be replaced for a long time Change-Id: I9aa98da029f8ebf5890c78628b37421e66fa9f45
This commit ensure that the return address stack (ras) is updated when skipping the predictor update for the second prediction of a 2taken-pair. This change ensures the internal state of RAS is consistent and synchronized. (RAS tracks two stacks: one speculative, another commited) Change-Id: I9d719af64e6291
remember to delete this commit
This commit tracks the reason we can or cannot train a two taken entry in ubtb Change-Id: I5565927502b6af4a687d3d0b9025c74283ffb0b4
Change-Id: I8b38a9daa93de81ce056129e2f1e468d421f0a49
Implement optimization to allow MBTB to update 2nd prediction fetch blocks. Previously, always-taken conditional branches in the 2nd prediction of a 2-taken pair would suffer performance degradation when becoming bi-directional, as the first "not taken" squash wouldn't update MBTB due to missing mBTB meta, requiring multiple squashes before proper training. Solution: for 2nd pred of a 2taken pair, have uBTB generate and store a copy of BTBMeta during getTwoTakenPrediction(). DecoupledBPU now retrieves this stored meta when creating fetch stream entries for the second predictions, enabling immediate MBTB updates when the fetch block eventually get commited. Changes: - Add BTBMeta storage and retrieval to uBTB for second predictions - Integrate meta generation into getTwoTakenPrediction() flow - Modify DecoupledBPU to use uBTB's stored meta for MBTB's predMeta - Maintain backward compatibility for all other prediction scenarios This reduces squashes and improves performance for workloads with conditional branches in 2-taken prediction patterns. Change-Id: Ifb91988bce44a5e5d618b597c82babdbd6b54f96
* ABTB (the ahead-pipelined BTB) assumes that `putPCHistory()` is called once per consecutive fetch block (A, B, C, …). * With 2-taken the uBTB returns two blocks (A and B) in a single predictor tick. We therefore only call `putPCHistory()` once (for A). * For ABTB this looks like the sequence A → C, breaking the ahead-pipeline queue (aheadReadBtbEntries) and tanking its hit-rate in this commit, we Keep ABTB's *consecutive* invariant by silently pushing B into its ahead-read queue, but without asking it to produce a real prediction in the current cycle. Change-Id: I8a643866287f22575f38624594d2901ba20afc78 Result: ABTB’s ahead-pipeline queue remains consecutive even when uBTB emits two fetch blocks, recovering its hit-rate without affecting non-ABTB paths.
before this commit, the hisotry info of a S3 pred is directly trained into ubtb and its correctness will not be checked when trained with a new S3 pred, this results in a massive increase in intra flush caused by history info mismatch. Now, evertime a new S3pred gets trained into the ubtb, we check if the existing entry has the right amount of numNTCond, which is ubtb's way of storing the histInfo. Change-Id: I88fb40b8efd7e7a5702b4dc915ce5a210121396e
Change-Id: I401fc3bda70847d0ea9c4e31caa4f76019ce63ed
Change-Id: Ieae48d4fdee30e62cdca45d1d0eec6fe9a47e56f
Change-Id: Iafc909f31c2c3ec4dce487ccb443860e1c04d58d
Change-Id: I4d3c7561bb74c43dcbfb312bf9e8cc59fbbb0272
Change-Id: Ic6e6a1d47dc1ed60266007b53d53f9692356e984
This reverts commit 2b26c52.
Change-Id: I2a4b9b4530c740883b6c11a7ffd11e9fa5d14044
Change-Id: Ifa2218af9a88b1200606c909383d9ccd2a55f809
📝 WalkthroughWalkthroughThis PR introduces a comprehensive 2-taken branch prediction feature for GEM5's BTB-based decoupled BPU, enabling dual-prediction generation with dedicated second-prediction metadata, training workflows, and state machine updates. Workflow configurations are updated to target the Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Fetch as Fetch Engine
participant DecoupledBPU as DecoupledBPU<br/>(2-Taken Enabled)
participant UBTB as UBTB
participant BTB as BTB/Metadata<br/>(TAGE, RAS, etc.)
participant Training as Training<br/>Unit
Fetch->>DecoupledBPU: requestNewPrediction(PC)
activate DecoupledBPU
DecoupledBPU->>UBTB: putPCHistory2Taken(startAddr, history)
activate UBTB
UBTB->>UBTB: Predict first fetch block (S0)
UBTB->>UBTB: Predict second fetch block (S3)
UBTB-->>DecoupledBPU: (ubtbHitIndex, hasSecondPred)
deactivate UBTB
DecoupledBPU->>BTB: getSecondPredictionMeta()
activate BTB
BTB-->>DecoupledBPU: metaSnapshot (TAGE/RAS/ITTAGE)
deactivate BTB
DecoupledBPU->>DecoupledBPU: generateFinalPred() + Second Pred
DecoupledBPU->>DecoupledBPU: Create FSQ entries (first, second)
DecoupledBPU-->>Fetch: Enqueue predictions
deactivate DecoupledBPU
Fetch->>Fetch: Execute & retire
Fetch->>Training: commit(firstOutcome, secondOutcome)
activate Training
Training->>Training: Validate 2-taken conditions
Training->>UBTB: train2Taken(dff_pred, s3_pred, index)
activate UBTB
UBTB->>UBTB: Update entry or replace
UBTB->>UBTB: Update second prediction meta
UBTB-->>Training: Training complete
deactivate UBTB
Training->>BTB: train(meta, outcome)
activate BTB
BTB->>BTB: Update tables (TAGE, RAS)
deactivate BTB
deactivate Training
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1564af5987
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Generate new BTB entry for MBTB | ||
| btb->getAndSetNewBTBEntry(stream); | ||
|
|
||
| // Update only MBTB component | ||
| btb->update(stream); |
There was a problem hiding this comment.
Skip MBTB update on fall-through second predictions
When the second FB is a fall‑through (uBTB pt_2nd=false), finalPred.btbEntries is empty and stream.exeTaken is false. This block still calls DefaultBTB::getAndSetNewBTBEntry/update, which leaves updateNewBTBEntry as the default invalid entry and collectEntriesToUpdate then pushes it, potentially evicting a valid BTB entry and inserting an invalid one (pc=0). This corrupts MBTB state on every fall‑through second prediction. Consider guarding this path with stream.exeTaken or !stream.predBTBEntries.empty() similar to the normal update path.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @configs/example/xiangshan.py:
- Line 385: The config uses two different attribute names for the same
feature—cpu.ftb.enableTwoTaken and cpu.branchPred.enable2Taken—causing
inconsistency; pick one canonical name (e.g., enableTwoTaken) and rename the
other to match across both classes (update cpu.branchPred.enable2Taken ->
cpu.branchPred.enableTwoTaken or vice versa), then update any code references
(including the assignment using args.disable_2taken and any docs/tests) to the
chosen name so all consumers use the same attribute.
In @util/xs_scripts/kmh_v3_btb_2taken.sh:
- Around line 6-10: The loop currently checks GCBV_REF_SO, GCB_RESTORER and
gem5_home but the script later uses $gem5 and never uses GCBV_REF_SO; update the
variable validation loop to remove GCBV_REF_SO and include gem5 so that
checkForVariable is invoked for gem5 (e.g., for var in gem5 GCB_RESTORER
gem5_home; do checkForVariable $var; done) ensuring $gem5 is validated before
the $gem5 invocation in the call to xiangshan.py.
🧹 Nitpick comments (8)
src/cpu/pred/btb/btb.hh (1)
70-71: Verify that friend access is necessary.The friend declaration allows UBTB to access DefaultBTB's private members, which breaks encapsulation. While the comment indicates this is needed for second prediction support, consider whether:
- The required data could be exposed through a public accessor method
- A protected member or inheritance relationship would be more appropriate
- The tight coupling is justified for performance reasons
If friend access is truly necessary, the current implementation is acceptable.
src/cpu/pred/btb/btb.cc (1)
954-981:preloadBlockcorrectly mirrors ahead‑pipeline queue semantics, but shares logic withlookupSingleBlockThe helper cleanly enqueues
(block_pc, idx, set)intoaheadReadBtbEntrieswith the same size discipline (>= aheadPipelinedStages+1then pop) and guard conditions as the ahead‑pipelined path, without doing tag comparison or stats, which is what ABTB needs for 2‑taken padding.To reduce future drift, consider factoring the “read set + queue push + trim” portion into a small internal helper used both by
lookupSingleBlockandpreloadBlock. Also worth double‑checking that having bothlookupSingleBlockandpreloadBlockas producers cannot overfill or misalign the ahead‑pipeline window under 2‑taken traffic.docs/Gem5_Docs/frontend/2-Taken_代码实现指南.md (1)
135-151: Minor markdownlint issues: bold lines used as headingsThe sections like “情况1:pt_2nd=true(有分支)”, “情况2:pt_2nd=false(顺序执行)”, and the later “扩展2-taken从连续taken分支到包含顺序执行情况” are flagged by markdownlint (MD036) for using emphasis instead of actual headings.
If you care about a clean lint run, consider switching these to proper headings (e.g.
### 情况1:...) or disabling MD036 for this file.Also applies to: 572-590
src/cpu/pred/btb/decoupled_bpred.cc (2)
700-752: uBTB 2‑taken request flow: semantics are clear, naming of hit/miss counters may merit a quick checkThe uBTB-specific branch in
requestNewPrediction()correctly:
- Initializes all per-cycle second‑prediction state.
- Calls
ubtb->putPCHistory2Taken()only whenenable2Takenis true.- Stores the returned
hitIndexintoubtbHitIndexfor later DFF/training use.- Treats
secondAvailableas the “2‑taken produced” condition, settinghasSecondPredictionand bumpingtwoTakenHitvstwoTakenMiss.One small thing to double‑check:
twoTakenHit/twoTakenMissare currently keyed on whether a second prediction is produced by uBTB, not whether the combined 2‑taken pair is ultimately correct at commit. If your intent is “uBTB second‑lane availability rate,” this is fine; if you meant “2‑taken correctness,” you may want to clarify naming or split stats.
2248-2268: Second‑FB validation logic matches the design, but stats interpretation should be confirmed
validateSecondFBPrediction()enforces the key invariant:
- If
finalPred.predSource != 0, the uBTB stage‑0 prediction has been overridden and any uBTB‑generated second prediction is invalidated;hasSecondPredictionis cleared andtwoTakenDiscardedByOverrideincremented.- Otherwise,
twoTakenRemainsAfterOverrideis incremented.This is exactly the intended “second valid only if first from uBTB1 survives override” check. Just ensure downstream consumers interpret
twoTakenRemainsAfterOverrideas “2‑taken pairs where stage 0 still won” rather than “correct 2‑taken pairs,” since correctness is decided later at commit.src/cpu/pred/btb/decoupled_bpred.hh (1)
82-117: Unused private declarationupdate2TakenEntrycan be droppedThe header declares:
void update2TakenEntry(Addr prevAddr, const FullBTBPrediction& dff_pred, const FullBTBPrediction& s3_pred);but there is no corresponding definition or call site in the
.cc. This is harmless (no linker issue since it’s never odr‑used) but adds noise.Consider removing the declaration to keep the class interface minimal, or re‑introducing the helper if you plan to factor more 2‑taken logic into it later.
Also applies to: 175-181, 204-217, 339-417, 878-908
src/cpu/pred/btb/btb_ubtb.cc (2)
449-655:trainCommonencapsulates 1‑/2‑taken update behavior well; duplicate‑tag early return deserves a quick design checkThe consolidated training routine handles:
Hit cases:
trainHitFallThru: S0 hit, S3 fallthrough → decrementuctr, possibly invalidate the entry (and clearvalid_2nd) whenuctrreaches 0.trainHitMismatch: S0 hit, S3 taken but mismatch in (pc, target, numNTConds);uctris decremented and, when reaching 0,replaceEntryis called and an optional secondPred is attached.trainHitMatch: S0 and S3 agree →uctrincremented and optional secondPred added.Miss cases:
trainMissTaken: S0 miss, S3 taken → either fill an invalid slot or choose the LRU entry, thenreplaceEntryand maybe attach secondPred.trainMissFallThru: S0 miss, S3 fallthrough → no action.The only subtle point is the duplicate check in the miss path:
for (size_t i = 0; i < ubtb.size(); ++i) { if (ubtb[i].tag == getTag(pred.bbStart)) { ubtbStats.trainDuplicateEntry++; return; } }This ignores
ubtb[i].valid, so an entry that was previously invalidated (valid==false but tag left intact) will prevent re‑insertion for that tag. If the intent is “never resurrect entries once we decided they’re not worth caching,” this is fine; if you expect uBTB to relearn entries for branches that become hot again, you probably want to gate this onubtb[i].valid.Possible adjustment if relearning invalidated entries is desired
- for (size_t i = 0; i < ubtb.size(); ++i) { - if (ubtb[i].tag == getTag(pred.bbStart)) { + for (size_t i = 0; i < ubtb.size(); ++i) { + if (ubtb[i].valid && ubtb[i].tag == getTag(pred.bbStart)) { ubtbStats.trainDuplicateEntry++; return; } }
686-700:recoverHist’s global clearing ofvalid_2ndon second‑FB squash is aggressive but consistentThe new
recoverHistoverride:if (entry.isSecondFBPred) { for (auto &entry : ubtb) { entry.valid_2nd = false; } }implements a simple policy: any squash of a second‑FB stream clears all stored second‑branch info in the uBTB. This fits the stated goal of removing “persistently wrong” second predictions without touching the 1‑taken entries.
If you ever want more granular behavior (e.g., only clear second info for the entry that generated this FB), you could plumb the uBTB hit index through the meta and target only that entry, but the current approach is functionally safe and easy to reason about.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
.github/workflows/gem5-ideal-btb-perf-2taken.yml.github/workflows/gem5-ideal-btb-perf-weekly.yml.github/workflows/gem5-ideal-btb-perf.yml.github/workflows/gem5-ideal-rvv-simple-perf.yml.github/workflows/gem5-vector.yml.github/workflows/gem5.ymlconfigs/common/Options.pyconfigs/example/xiangshan.pydocs/Gem5_Docs/frontend/2-Taken_代码实现指南.mdsrc/cpu/pred/BranchPredictor.pysrc/cpu/pred/btb/btb.ccsrc/cpu/pred/btb/btb.hhsrc/cpu/pred/btb/btb_ittage.ccsrc/cpu/pred/btb/btb_ittage.hhsrc/cpu/pred/btb/btb_mgsc.ccsrc/cpu/pred/btb/btb_mgsc.hhsrc/cpu/pred/btb/btb_tage.ccsrc/cpu/pred/btb/btb_tage.hhsrc/cpu/pred/btb/btb_ubtb.ccsrc/cpu/pred/btb/btb_ubtb.hhsrc/cpu/pred/btb/decoupled_bpred.ccsrc/cpu/pred/btb/decoupled_bpred.hhsrc/cpu/pred/btb/ras.ccsrc/cpu/pred/btb/ras.hhsrc/cpu/pred/btb/stream_struct.hhsrc/cpu/pred/btb/timed_base_pred.hhutil/xs_scripts/kmh_v3_btb.shutil/xs_scripts/kmh_v3_btb_2taken.sh
💤 Files with no reviewable changes (4)
- .github/workflows/gem5-ideal-btb-perf-weekly.yml
- .github/workflows/gem5.yml
- .github/workflows/gem5-vector.yml
- .github/workflows/gem5-ideal-rvv-simple-perf.yml
🧰 Additional context used
🧬 Code graph analysis (6)
src/cpu/pred/btb/btb_tage.cc (3)
src/cpu/pred/btb/btb_ittage.cc (2)
getSecondPredictionMeta(191-200)getSecondPredictionMeta(192-192)src/cpu/pred/btb/btb_mgsc.cc (2)
getSecondPredictionMeta(484-495)getSecondPredictionMeta(485-485)src/cpu/pred/btb/ras.cc (2)
getSecondPredictionMeta(81-92)getSecondPredictionMeta(82-82)
src/cpu/pred/btb/ras.cc (4)
src/cpu/pred/btb/btb_ittage.cc (2)
getSecondPredictionMeta(191-200)getSecondPredictionMeta(192-192)src/cpu/pred/btb/btb_mgsc.cc (2)
getSecondPredictionMeta(484-495)getSecondPredictionMeta(485-485)src/cpu/pred/btb/btb_tage.cc (2)
getSecondPredictionMeta(306-315)getSecondPredictionMeta(307-307)src/cpu/pred/ftb/ras.cc (2)
getTop(317-341)getTop(318-318)
src/cpu/pred/btb/btb_ittage.cc (3)
src/cpu/pred/btb/btb_mgsc.cc (2)
getSecondPredictionMeta(484-495)getSecondPredictionMeta(485-485)src/cpu/pred/btb/btb_tage.cc (2)
getSecondPredictionMeta(306-315)getSecondPredictionMeta(307-307)src/cpu/pred/btb/ras.cc (2)
getSecondPredictionMeta(81-92)getSecondPredictionMeta(82-82)
src/cpu/pred/btb/decoupled_bpred.hh (4)
src/cpu/pred/btb/btb_ubtb.hh (3)
dff_pred(163-165)stream(170-170)stream(176-176)src/cpu/pred/btb/fetch_target_queue.cc (2)
enqueue(216-223)enqueue(217-217)src/cpu/pred/btb/btb.hh (6)
stream(151-151)stream(159-159)stream(161-161)stream(276-276)stream(282-282)stream(288-289)src/cpu/pred/btb/test/decoupled_bpred.hh (4)
stream(134-134)stream(410-416)stream(432-432)stream(433-433)
src/cpu/pred/btb/btb_ubtb.hh (2)
src/cpu/pred/btb/btb.hh (4)
e(168-172)e(168-168)e(174-176)e(174-174)src/cpu/pred/btb/stream_struct.hh (1)
entry(464-486)
src/cpu/pred/btb/btb_ubtb.cc (2)
src/cpu/pred/btb/btb_ubtb.hh (15)
branchInfo(306-306)prediction(351-351)entry(292-292)entry(298-299)UBTB(82-82)startAddr(129-130)startAddr(140-143)startAddr(286-286)history(193-194)history(197-197)history(197-197)TickedUBTBEntry(106-107)TickedUBTBEntry(108-109)secondPred(357-357)branch_info_2nd(345-345)src/cpu/pred/btb/stream_struct.hh (11)
entry(464-486)predictWidth(402-414)predictWidth(402-402)predictWidth(492-495)predictWidth(492-492)predictWidth(497-519)predictWidth(497-497)predictWidth(521-527)predictWidth(521-521)BTBEntry(169-169)BTBEntry(170-170)
🪛 Cppcheck (2.19.0)
src/cpu/pred/btb/btb_ubtb.cc
[error] 638-638: Iterators of different containers 'mruList' and 'ubtb' are used together.
(mismatchingContainers)
🪛 markdownlint-cli2 (0.18.1)
docs/Gem5_Docs/frontend/2-Taken_代码实现指南.md
135-135: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
151-151: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
572-572: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🪛 Ruff (0.14.10)
src/cpu/pred/BranchPredictor.py
1158-1158: Param may be undefined, or defined from star imports
(F405)
🪛 Shellcheck (0.11.0)
util/xs_scripts/kmh_v3_btb.sh
[warning] 10-10: gem5 is referenced but not assigned.
(SC2154)
[warning] 10-10: gem5_home is referenced but not assigned.
(SC2154)
util/xs_scripts/kmh_v3_btb_2taken.sh
[warning] 10-10: gem5 is referenced but not assigned.
(SC2154)
[warning] 10-10: gem5_home is referenced but not assigned.
(SC2154)
🔇 Additional comments (30)
.github/workflows/gem5-ideal-btb-perf.yml (1)
5-11: Verify the workflow configuration aligns with testing intent.This workflow now triggers on the
2-taken-v8branch but invokeskmh_v3_btb.sh, which now includes the--disable-2takenflag (as seen in the script changes). This means the workflow runs performance tests with the 2-taken feature disabled on a branch specifically for 2-taken development.Confirm whether this is intentional (e.g., for baseline comparison) or if this workflow should instead use
kmh_v3_btb_2taken.shto test with the feature enabled.
util/xs_scripts/kmh_v3_btb.sh (1)
10-10: LGTM!The addition of the
--disable-2takenflag is correct and provides a mechanism to disable the 2-taken feature when using this script. The static analysis warnings aboutgem5andgem5_homeare false positives—these variables are sourced fromcommon.shat line 4.
src/cpu/pred/btb/btb.hh (1)
199-199: LGTM!The
preloadBlockmethod declaration is appropriate for supporting ahead-pipelined BTB variants as part of the 2-taken feature. The method signature is clean and its purpose is clear from the context.
src/cpu/pred/BranchPredictor.py (1)
1158-1158: LGTM!The
enable2Takenparameter addition is well-structured:
- Default value of
Falseensures backward compatibility- Clear description
- Follows existing parameter patterns in the DecoupledBPUWithBTB class
The static analysis warning about
Paramis a false positive—it's imported viafrom m5.params import *at line 29.
src/cpu/pred/btb/btb_mgsc.hh (1)
152-152: LGTM!The
getSecondPredictionMeta()method declaration correctly mirrors the existinggetPredictionMeta()pattern at line 151. The signature is consistent with similar additions across other BTB components (UBTB, BTBTAGE, BTBITTAGE, BTBRAS) as noted in the AI summary, ensuring a uniform interface for second-prediction metadata retrieval.
src/cpu/pred/btb/btb_ittage.cc (1)
191-200: LGTM! Consistent implementation of second prediction metadata.The
getSecondPredictionMeta()method correctly:
- Creates an independent snapshot of the ITTAGE predictor state
- Captures all necessary folded history vectors for second prediction
- Follows the same pattern used across other BTB components (BTBTAGE, BTBRAS, BTBMGSC)
This allows the second prediction path to maintain its own metadata without interfering with the primary prediction flow.
src/cpu/pred/btb/stream_struct.hh (1)
303-303: LGTM! Appropriate addition of second prediction flag.The new
isSecondFBPredmember:
- Uses in-class initialization with a sensible default (false)
- Follows modern C++ best practices
- Is appropriately placed among other prediction-related state
This flag enables the system to distinguish between primary and second fetch-by-prediction streams in the 2-taken workflow.
src/cpu/pred/btb/ras.hh (1)
71-71: LGTM! Consistent interface addition for RAS second prediction.The
getSecondPredictionMeta()declaration:
- Properly overrides the base class virtual method
- Maintains consistent interface across all BTB predictor components (BTBITTAGE, BTBTAGE, BTBMGSC, BTBRAS)
- Is logically positioned alongside the primary
getPredictionMeta()methodThis enables the RAS to provide independent metadata snapshots for second predictions in the 2-taken workflow.
.github/workflows/gem5-ideal-btb-perf-2taken.yml (1)
1-12: LGTM! Workflow configuration aligns with 2-taken feature.The workflow has been appropriately updated to:
- Test the 2-taken branch prediction feature
- Target the correct branch (2-taken-v8)
- Use the dedicated 2-taken test script
The script path is correctly set and the referenced script exists at
util/xs_scripts/kmh_v3_btb_2taken.sh. The relative path../kmh_v3_btb_2taken.shresolves correctly when executed from the test subdirectory, consistent with existing performance test workflows.
configs/common/Options.py (1)
276-277: LGTM!The new
--disable-2takenoption follows the established pattern of similar options like--disable-scand is well-documented. The default value ofFalsecorrectly enables the 2-taken feature by default for DecoupledBPUWithBTB.
src/cpu/pred/btb/btb_tage.cc (1)
306-315: LGTM!The implementation correctly creates an independent metadata snapshot for the second prediction, mirroring the pattern used in
getPredictionMeta()and consistent with similar implementations inBTBITTAGE,BTBMGSC, andBTBRAS(as shown in the relevant code snippets).
src/cpu/pred/btb/timed_base_pred.hh (1)
42-42: LGTM!The new virtual method
getSecondPredictionMeta()is correctly added to the base class interface with a sensible default implementation returningnullptr. This allows derived classes to optionally provide second-prediction metadata support without breaking existing implementations.
src/cpu/pred/btb/btb_mgsc.cc (1)
484-495: LGTM!The implementation correctly snapshots all five folded history states (
indexBwFoldedHist,indexLFoldedHist,indexIFoldedHist,indexGFoldedHist,indexPFoldedHist) for the second prediction metadata, consistent with the meta initialization inputPCHistory()(lines 463-468) and the pattern established in other BTB predictors.
src/cpu/pred/btb/btb_tage.hh (1)
114-114: LGTM!The declaration correctly uses the
overridespecifier and is appropriately placed alongsidegetPredictionMeta().
src/cpu/pred/btb/btb_ittage.hh (1)
102-102: LGTM!The declaration is consistent with the pattern used across other BTB predictor headers (
btb_tage.hh,btb_mgsc.hh) and properly uses theoverridespecifier.
src/cpu/pred/btb/ras.cc (1)
81-92: Second‑prediction RAS meta snapshot is consistent with existing primary meta flowCapturing
ssp/sctr/TOSR/TOSWplusgetTop().retAddrinto a freshRASMetacleanly mirrorsgetTop_meta()without side effects. This looks correct for second‑prediction recovery and keeps the primarymetaobject untouched.
src/cpu/pred/btb/decoupled_bpred.cc (6)
482-570: 2‑taken statistics wiring and formulas look coherentThe DBPBTBStats extensions (indirectNum/IndirectMiss, predProduce{1,2}Taken, secondPredCommitted, 2‑taken hit/miss/override counters, totalPredCount, and the four ratio formulas) are internally consistent:
totalPredCountis incremented once per prediction cycle intick().predProduce2TakenvspredProduce1Takenreflect whetherhasSecondPredictionsurvived validation.- Hit/miss/remains/discarded counters are updated in
requestNewPrediction()andvalidateSecondFBPrediction().- Ratios (
predTwoTakenRatio,commitSecondPredRatio,twoTakenHitRatio,twoTakenRemainsRatio) are simple formula combinations over these base counters.Stats infra will handle zero denominators gracefully. No functional issues here.
586-648: 2‑taken tick() control flow and DFF usage are logically consistentThe new tick() sequence when idle and FSQ not full:
- Increments
totalPredCount.- Calls
requestNewPrediction()(which may populate a second prediction via uBTB).- Calls
trainUbtbFor2Taken(), which consumes the previous cycle’spredDFF(if valid) and trains uBTB in 1‑taken or 2‑taken mode.- Stores the current S3 prediction and
ubtbHitIndexintopredDFFbefore clearingpredsOfEachStage, ensuring the next cycle has a valid DFF input.- Generates
finalPred, computes bubbles, clears stage preds, and validates the second FB.- Optionally invokes
abtb->preloadBlock(secondPrediction.bbStart)only when the second prediction survives override.This ordering achieves the intended “train with previous S3, then snapshot current S3 for next cycle” behavior. The additional sampling of
fsqEntryDistis harmless and useful.
1895-1942: Second‑prediction metadata wiring into FetchStream looks correct and avoids training meta misuse
createFetchStreamEntry(bool is_second_pred)now:
- Marks
entry.isSecondFBPred = is_second_pred.- For first predictions, uses each component’s
getPredictionMeta()as before.- For second predictions:
- For MBTB (
components[i] == btb), injectsubtb->getSecondPredictionMetaForMBTB(), which is built in the 2‑taken uBTB path.- For all other components, calls
getSecondPredictionMeta(), which each predictor specializes only with the recovery‑relevant state.This separation between first vs second prediction metas is exactly what you need to keep training paths for second FBs restricted to RAS+MBTB and to avoid dereferencing inappropriate meta structures elsewhere. The
isSecondFBPredflag then correctly drives selective update and commit behavior downstream.
1177-1199: Selective update path for second FB predictions keeps training scoped to RAS/MBTBThe new branch in
update():
- For primary (non‑second) streams, continues to call
updatePredictorComponents(stream)unchanged.- For
stream.isSecondFBPred, performs:
ras->update(stream);stream.setUpdateInstEndPC(predictWidth);stream.setUpdateBTBEntries();btb->getAndSetNewBTBEntry(stream);btb->update(stream);and then increments
secondPredCommitted.This matches the documented design: second predictions only influence RAS state and MBTB’s always‑taken management, while leaving uBTB, TAGE, ITTAGE, MGSC, etc., untouched. The use of the MBTB meta seeded from uBTB in
createFetchStreamEntry(true)makes this safe.
1482-1488: Guarding componentcommitBranchcalls for second FB streams avoids invalid meta useThe additional guard:
if (!entry.isSecondFBPred) { for (auto &component : components) { component->commitBranch(entry, inst); } }prevents calling
commitBranchon predictors with second‑prediction metas that intentionally lack full training information. This is a good defensive change and aligns with the 2‑taken design where only RAS/MBTB are updated on second FB commits (handled separately inupdate()).
2225-2244:trainUbtbFor2Takencleanly encapsulates 1‑taken vs 2‑taken training modesThe new helper:
- Uses the last‑stage prediction
s3_predas the in‑cycle “ground truth”.- In 2‑taken mode, only trains when
predDFF.validis set, callingubtb->train2Taken(predDFF.prevS3Pred, s3_pred, predDFF.prevUbtbHitIndex);.- In 1‑taken mode, falls back to
ubtb->train1Taken(s3_pred);.- Always resets
predDFFat the end.This keeps the cross‑cycle DFF bookkeeping localized and keeps the main
tick()logic readable. The intentional “no training when enable2Taken && !predDFF.valid” behavior just means you skip the very first block after reset/squash, which is acceptable.
src/cpu/pred/btb/btb_ubtb.hh (1)
84-111: uBTB header extensions for 2‑taken support are well‑scoped and backward‑compatible
- Extending
TickedUBTBEntrywithvalid_2nd,pt_2nd, andbranch_info_2ndplus properly initializing them in both constructors keeps existing uses intact while adding second‑branch capacity.- The new APIs (
putPCHistory2Taken,train1Taken,train2Taken,getSecondPredictionMetaForMBTB,recoverHist) and their helpers cleanly separate 1‑taken logic from 2‑taken logic, and expose only what decoupled_bpred needs for second‑prediction and MBTB meta plumbing.- UBTBStats additions cover both condition‑checking and training scenarios with a single formula (
twoTakenTrainSuccessfulRatio) derived from the granular counters.Header and implementation remain in sync, with no ABI break for existing users beyond the new 2‑taken surface.
Also applies to: 132-144, 186-213, 224-253, 301-358, 419-449
src/cpu/pred/btb/btb_ubtb.cc (7)
146-205: Second‑prediction construction helpers align with FullBTBPrediction semantics
fillSecondPredictionandfillSecondPredictionFallthroughcorrectly:
- Initialize a fresh
FullBTBPrediction(clearingbtbEntries,condTakens, andindirectTargets).- Set
bbStartto the second FB’s start andpredSourceto 0 (uBTB stage).- For pt_2nd=true:
- Build a
BTBEntryfromBranchInfo.- Enforce the design constraint that the second branch is either unconditional or an
alwaysTakenconditional.- Populate
condTakensonly for the always‑taken conditional case and indirect/return targets as needed.- For pt_2nd=false:
- Leave
btbEntriesempty to represent a pure fallthrough block; onlybbStartand timing metadata are used downstream.This matches how
finalPredandsecondPredictionare consumed inDecoupledBPUWithBTBand keeps the 2‑taken invariants explicit.
228-303:putPCHistory2Takencleanly extends 1‑taken lookup with optional 2nd FB and MBTB metaThe new
putPCHistory2Taken:
- Reuses the existing lookup +
PredStatistics+fillStagePredictionspipeline.- Stores
hit_indexin bothlastPred.hit_indexand the returned pair, enabling DFF‑based training.- Only attempts a second prediction when
entry.valid && entry.valid_2nd.- For pt_2nd=true:
- Builds
secondPredictionviafillSecondPrediction.- Enforces that the second branch’s control PC lies within its own fetch block; otherwise, it discards the second prediction and bumps
twoTakenPredRangeFailed.- Creates an MBTB meta via
createSecondPredictionMetaForMBTB.- For pt_2nd=false:
- Builds a fallthrough
secondPredictionand always treats it as valid.- Provides an empty MBTB meta to keep pipelines consistent.
The returned
{hit_index, has_second_prediction}exactly matches what the decoupled BPU expects.
305-339: cppcheck “mismatchingContainers” warning on LRU index computation is a false positiveIn
lookup’s miss‑handling path you later reuse the MRU heap intrainCommon:std::make_heap(mruList.begin(), mruList.end(), older()); UBTBIter lru_iter = mruList.front(); int toBeReplacedIndex = lru_iter - ubtb.begin();Here:
UBTBIterisstd::vector<TickedUBTBEntry>::iterator.mruListis astd::vector<UBTBIter>, i.e., it stores iterators into theubtbvector.- Subtracting
ubtb.begin()fromlru_iteris exactly how you recover the index of that entry inubtb.So, although cppcheck reports “iterators of different containers,” you are in fact working with an iterator into
ubtb, not intomruList, and the arithmetic is safe as long asubtbisn’t reallocated (which it isn’t after construction). No change needed here.
342-403:replaceEntryandaddSecondPredictionToEntrycorrectly encode first and second branches into uBTB entries
replaceEntry:
- Asserts a valid index and a valid taken entry.
- Builds a
TickedUBTBEntryfrom the taken entry withtick=curTick()andvalid_2nd=false.- Updates
targetusingnewPrediction.getTarget(predictWidth)so RAS/ITTAGE‑corrected targets are captured.- Sets the uBTB‑specific tag via
getTag(newPrediction.bbStart)and computesnumNTCondsviacalculateNumNTConds(newPrediction).
addSecondPredictionToEntry:
- Fills
valid_2ndandpt_2ndonce, leaving existing second‑branch info untouched on subsequent calls.- For
pt_2nd=true, asserts the second FB’s taken entry matchesbtbEntries[0], then copies it intobranch_info_2ndand overwrites the target with the prediction’s final target (again honoring RAS/ITTAGE).- For
pt_2nd=false, setsvalid_2ndandpt_2ndbut leavesbranch_info_2ndas a benign default, since it’s unused in the fallthrough case.This is the right place to centralize the encoding of 2‑taken metadata into uBTB entries.
421-438:calculateNumNTCondslogic is sound but relies on getHistInfo’s contractThe helper:
int numNTConds = prediction.getHistInfo().first; if (prediction.getTakenEntry().isCond) { numNTConds--; assert(numNTConds >= 0); }matches the documented intent:
getHistInfo().firstis treated as “number of conditional branches in this FB,” and you subtract one when the taken branch itself is conditional, leaving “number of not‑taken cond branches before the taken branch.” This is then stored into the uBTB entry for later speculative history reconstruction.Given the existing usage of
getHistInfo()elsewhere, this is consistent, but it is worth keeping in mind that any future change togetHistInfo()’s semantics would need to preserve that interpretation.
657-684:train2Takencorrectly falls back to 1‑taken when 2‑taken preconditions fail
train2Taken:
- First enforces the “consecutive FB” relationship via
dff_pred.getTarget(predictWidth) == s3_pred.bbStart; on violation, it falls back totrainCommon(hit_index, dff_pred, nullptr).- Then runs
check2TakenConditions; failures also fall back to single‑block training ondff_pred.- Only when both checks pass does it call
trainCommon(hit_index, dff_pred, &s3_pred), i.e., encode the second FB as part of the entry.This is a robust way to ensure 2‑taken training never corrupts entries when the structural assumptions about the two blocks do not hold.
814-922: 2‑taken UBTBStats initialization and formula are wired correctlyThe extended UBTBStats constructor:
- Adds counters for each 2‑taken condition failure/acceptance case, pt_2nd prediction outcomes, and detailed training scenarios.
- Defines
twoTakenTrainSuccessfulRatioas aRate<Count,Count>formula and then sets it to:twoTakenTrainSuccessfulRatio = (twoTakenAcceptOther + twoTakenAcceptAlwaysTaken + twoTakenAcceptFallthrough) / twoTakenConditionChecks;This matches the natural definition “fraction of condition checks that led to accepting a 2‑taken pattern” and will be well‑behaved under the stats framework (0/0 handled by the infra).
No functional issues here.
| cpu.branchPred.btb.numEntries = 16384 | ||
| # TODO: BTB TAGE do not bave base table, do not support SC | ||
| cpu.branchPred.tage.tableSizes = [2048] * 14 # 2ways, 2048 sets | ||
| cpu.branchPred.enable2Taken = not args.disable_2taken |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for both attribute names to understand their usage and declarations
rg -nP --type=cc --type=hh --type=py -C3 '\b(enable2Taken|enableTwoTaken)\b'Repository: OpenXiangShan/GEM5
Length of output: 180
🏁 Script executed:
#!/bin/bash
# First, check what files exist and find xiangshan.py
find . -name "xiangshan.py" -type f 2>/dev/null | head -5Repository: OpenXiangShan/GEM5
Length of output: 92
🏁 Script executed:
#!/bin/bash
# Search for both attribute names in all file types
rg -n '\b(enable2Taken|enableTwoTaken)\b' -C 3Repository: OpenXiangShan/GEM5
Length of output: 9501
Unify attribute naming for 2-taken prediction across FTB and BTB.
enableTwoTaken (FTB, line 373) and enable2Taken (BTB, line 385) are separate attributes in different classes serving the same semantic purpose. The inconsistent naming—one spelling out "Two" and the other using the digit "2"—risks confusion during maintenance and development. Consider standardizing to a single naming convention across both implementations.
🤖 Prompt for AI Agents
In @configs/example/xiangshan.py around line 385, The config uses two different
attribute names for the same feature—cpu.ftb.enableTwoTaken and
cpu.branchPred.enable2Taken—causing inconsistency; pick one canonical name
(e.g., enableTwoTaken) and rename the other to match across both classes (update
cpu.branchPred.enable2Taken -> cpu.branchPred.enableTwoTaken or vice versa),
then update any code references (including the assignment using
args.disable_2taken and any docs/tests) to the chosen name so all consumers use
the same attribute.
| for var in GCBV_REF_SO GCB_RESTORER gem5_home; do | ||
| checkForVariable $var | ||
| done | ||
|
|
||
| $gem5 $gem5_home/configs/example/xiangshan.py --generic-rv-cpt=$1 --bp-type=DecoupledBPUWithBTB --ideal-kmhv3 No newline at end of file |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine common.sh
fd -t f 'common.sh' util/xs_scripts --exec cat -n {}Repository: OpenXiangShan/GEM5
Length of output: 958
🏁 Script executed:
# Examine the target script
cat -n util/xs_scripts/kmh_v3_btb_2taken.shRepository: OpenXiangShan/GEM5
Length of output: 427
Add gem5 to the variable check loop and remove unused GCBV_REF_SO.
The script references $gem5 on line 10 but doesn't validate it in the variable check loop (lines 6–8), even though gem5_home is validated. Additionally, GCBV_REF_SO is checked but never used in the script. While gem5 is always defined by common.sh, checking it maintains consistency and defensive programming.
Proposed fix
-for var in GCBV_REF_SO GCB_RESTORER gem5_home; do
+for var in GCB_RESTORER gem5_home gem5; do
checkForVariable $var
done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for var in GCBV_REF_SO GCB_RESTORER gem5_home; do | |
| checkForVariable $var | |
| done | |
| $gem5 $gem5_home/configs/example/xiangshan.py --generic-rv-cpt=$1 --bp-type=DecoupledBPUWithBTB --ideal-kmhv3 | |
| for var in GCB_RESTORER gem5_home gem5; do | |
| checkForVariable $var | |
| done | |
| $gem5 $gem5_home/configs/example/xiangshan.py --generic-rv-cpt=$1 --bp-type=DecoupledBPUWithBTB --ideal-kmhv3 |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 10-10: gem5 is referenced but not assigned.
(SC2154)
[warning] 10-10: gem5_home is referenced but not assigned.
(SC2154)
🤖 Prompt for AI Agents
In @util/xs_scripts/kmh_v3_btb_2taken.sh around lines 6-10, The loop currently
checks GCBV_REF_SO, GCB_RESTORER and gem5_home but the script later uses $gem5
and never uses GCBV_REF_SO; update the variable validation loop to remove
GCBV_REF_SO and include gem5 so that checkForVariable is invoked for gem5 (e.g.,
for var in gem5 GCB_RESTORER gem5_home; do checkForVariable $var; done) ensuring
$gem5 is validated before the $gem5 invocation in the call to xiangshan.py.
Summary by CodeRabbit
New Features
--disable-2takenconfiguration flag to control the feature.Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.