Conversation
📝 WalkthroughWalkthroughThe changes implement multi-level prefetch hierarchy in BertiPrefetcher by introducing an Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Change-Id: I01d9003bf78eaabc3619349a60c3f80a24afe58c
🚀 Coremark Smoke Test Results
✅ Difftest smoke test passed! |
There was a problem hiding this comment.
Pull request overview
This PR adds L2 ahead prefetch capability to the BERTI prefetcher. When BERTI predicts a memory access pattern with sufficient confidence, it now generates two prefetch requests: one for L1 cache and another for L2 cache at a further distance (4x the delta).
- Adds filterL2 member to support separate filtering for L2 prefetch requests
- Extends sendPFWithFilter with ahead_level parameter to distinguish between L1 and L2 prefetch requests
- Generates L2 ahead prefetch requests in non-aggressive mode by multiplying the best delta by 4 (shift left by 2)
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/mem/cache/prefetch/sms.cc | Initializes filterL2 for berti prefetcher instance when created by XSCompositePrefetcher |
| src/mem/cache/prefetch/berti.hh | Adds filterL2 member variable and updates sendPFWithFilter signature with ahead_level parameter |
| src/mem/cache/prefetch/berti.cc | Implements L2 ahead prefetch generation and filter selection logic based on ahead_level |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| boost::compute::detail::lru_cache<Addr, Addr>* filterHere; | ||
| filterHere = ahead_level > 1 ? filterL2 : filter; |
There was a problem hiding this comment.
Potential null pointer dereference: filterL2 is used without checking if it's null. While filterL2 is initialized in sms.cc for the berti instance, if BertiPrefetcher is instantiated independently outside of XSCompositePrefetcher, filterL2 will be null. This could cause a crash when ahead_level > 1. Consider adding a null check before using filterL2, similar to how XSStridePrefetcher handles this by checking ahead_level first and then using the appropriate filter.
| uint64_t l2_depth_ratio = 2; | ||
| Addr pf_addr_l2 = useByteAddr ? | ||
| pfi.getAddr() + (entry->bestDelta.delta << l2_depth_ratio) : | ||
| (blockIndex(pfi.getAddr()) + (entry->bestDelta.delta << l2_depth_ratio)) << lBlkSize; |
There was a problem hiding this comment.
Magic number without explanation: The value 2 for l2_depth_ratio is a magic number that controls how far ahead L2 prefetching occurs. While the PR description mentions improving the threshold from 2 to 3, this code still uses 2. Consider either making this a configurable parameter or adding a comment explaining why 2 is the appropriate multiplier for L2 ahead distance.
| uint64_t l2_depth_ratio = 2; | |
| Addr pf_addr_l2 = useByteAddr ? | |
| pfi.getAddr() + (entry->bestDelta.delta << l2_depth_ratio) : | |
| (blockIndex(pfi.getAddr()) + (entry->bestDelta.delta << l2_depth_ratio)) << lBlkSize; | |
| // L2 depth ratio controls how far ahead L2 prefetching occurs | |
| // relative to the learned delta. A value of 2 preserves the | |
| // original behavior of this prefetcher (using delta << 2 for L2). | |
| constexpr uint64_t l2DepthRatio = 2; | |
| Addr pf_addr_l2 = useByteAddr ? | |
| pfi.getAddr() + (entry->bestDelta.delta << l2DepthRatio) : | |
| (blockIndex(pfi.getAddr()) + (entry->bestDelta.delta << l2DepthRatio)) << lBlkSize; |
| @@ -166,6 +166,7 @@ class BertiPrefetcher : public Queued | |||
| public: | |||
|
|
|||
| boost::compute::detail::lru_cache<Addr, Addr> *filter; | |||
There was a problem hiding this comment.
Missing documentation: The new filterL2 member variable lacks a comment explaining its purpose. Consider adding a comment to clarify that it's used for filtering L2-level prefetch requests, similar to how other member variables in the class are documented.
| boost::compute::detail::lru_cache<Addr, Addr> *filter; | |
| boost::compute::detail::lru_cache<Addr, Addr> *filter; | |
| // LRU filter used to suppress redundant L2-level prefetch requests. |
| @@ -187,7 +188,7 @@ class BertiPrefetcher : public Queued | |||
| int getBestDelta() { return lastUsedBestDelta; } | |||
|
|
|||
There was a problem hiding this comment.
Missing documentation: The new ahead_level parameter lacks documentation in the function signature. Consider adding a parameter comment to explain what the different ahead_level values represent (e.g., 1 for L1, 2 for L2, etc.).
| /** | |
| * Decide whether to send a prefetch request with filtering. | |
| * | |
| * @param ahead_level Prefetch ahead level (e.g., 1 for L1, 2 for L2, etc.), | |
| * indicating the cache level or distance ahead for this prefetch. | |
| */ |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/mem/cache/prefetch/berti.cc (1)
256-266: Misleading variable name and missing address validation.
Variable naming issue:
l2_depth_ratiois used as a shift amount (delta << l2_depth_ratio), making the actual multiplier 4, not 2. This is confusing. Consider renaming tol2_depth_shiftor changing the value to 4 and using multiplication.Missing validation: The L2 prefetch address calculation could overflow or produce invalid addresses. Consider adding bounds checking.
Hardcoded values: The shift amount (2) and coverage threshold (8) are magic numbers that could be configuration parameters.
♻️ Suggested improvements
- uint64_t l2_depth_ratio = 2; + // L2 prefetches at 4x the delta distance + const uint64_t l2_depth_multiplier = 4; Addr pf_addr_l2 = useByteAddr ? - pfi.getAddr() + (entry->bestDelta.delta << l2_depth_ratio) : - (blockIndex(pfi.getAddr()) + (entry->bestDelta.delta << l2_depth_ratio)) << lBlkSize; + pfi.getAddr() + (entry->bestDelta.delta * l2_depth_multiplier) : + (blockIndex(pfi.getAddr()) + (entry->bestDelta.delta * l2_depth_multiplier)) << lBlkSize; + // Validate address is within reasonable bounds before prefetching sendPFWithFilter(pfi, pf_addr_l2, addresses, 32, PrefetchSourceType::Berti, entry->bestDelta.coverageCounter >= 8, 2);
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/mem/cache/prefetch/berti.ccsrc/mem/cache/prefetch/berti.hhsrc/mem/cache/prefetch/sms.cc
🔇 Additional comments (6)
src/mem/cache/prefetch/berti.hh (2)
190-191: LGTM - signature extension supports multi-level prefetch.The addition of the
ahead_levelparameter enables routing prefetches to different cache levels (L1 vs L2), which aligns with the PR objective of adding L2 ahead prefetch capability.
169-169: No action required. ThefilterL2pointer is safely initialized before use within the XSCompositePrefetcher constructor (sms.cc lines 64-67) and cannot be dereferenced without initialization. The initialization happens immediately in the constructor body after the berti object is received via the parameter, guaranteeing that filterL2 is assigned before any call to sendPFWithFilter. The existing conditional checkif (berti)ensures that when the BertiPrefetcher object exists and is non-null, filterL2 is always initialized.src/mem/cache/prefetch/sms.cc (1)
64-67: LGTM - proper L2 filter initialization.The addition of braces and the
filterL2assignment follows the existing pattern used forSstrideand properly guards against nullberti. This enables L2-level prefetch filtering.src/mem/cache/prefetch/berti.cc (3)
242-252: LGTM - aggressive prefetch path correctly updated.The
ahead_level=1parameter properly marks this as an L1 prefetch request, consistent with the multi-level prefetch hierarchy being introduced.
296-297: LGTM - proper metadata assignment for multi-level prefetch.The metadata fields
pfahead_hostandpfaheadcorrectly track the target prefetch level and whether it's ahead of the current cache, enabling downstream routing decisions.
254-271: Clarify if threshold change from 2 to 3 was applied or incomplete.The code in
berti.hhline 84 still showsinfo.coverageCounter >= 2forL2_PREFassignment, not>= 3. If the PR description claims this threshold was changed from 2 to 3, this change does not appear to be reflected in the codebase.Additionally, the lines under review (254-271) use different thresholds (
>= 8for L2 prefetches,> 5for PHT trigger), which don't relate to the 2→3 change mentioned. Confirm whether the threshold change is incomplete, refers to a different parameter, or the PR description is inaccurate.
| bool | ||
| BertiPrefetcher::sendPFWithFilter(const PrefetchInfo &pfi, Addr addr, std::vector<AddrPriority> &addresses, int prio, | ||
| PrefetchSourceType src, bool using_best_delta_and_confident) | ||
| PrefetchSourceType src, bool using_best_delta_and_confident, int ahead_level) | ||
| { | ||
| if (archDBer && cache->level() == 1) { | ||
| if (archDBer && cache->level() == 1 && ahead_level == 1) { | ||
| archDBer->l1PFTraceWrite(curTick(), pfi.getPC(), pfi.getAddr(), addr, src); | ||
| } | ||
| if (using_best_delta_and_confident) { | ||
| lastUsedBestDelta = blockIndex(addr) - blockIndex(pfi.getAddr()); | ||
| } | ||
| if (filter->contains(addr)) { | ||
| boost::compute::detail::lru_cache<Addr, Addr>* filterHere; | ||
| filterHere = ahead_level > 1 ? filterL2 : filter; | ||
| if (filterHere->contains(addr)) { | ||
| DPRINTF(BertiPrefetcher, "Skip recently prefetched: %lx\n", addr); | ||
| return false; | ||
| } else { | ||
| int64_t blk_delta = (int64_t)blockIndex(addr) - blockIndex(pfi.getAddr()); | ||
| topDeltas[blk_delta] = topDeltas.count(blk_delta) ? topDeltas[blk_delta] + 1 : 1; | ||
| DPRINTF(BertiPrefetcher, "Send pf: %lx\n", addr); | ||
| filter->insert(addr, 0); | ||
| filterHere->insert(addr, 0); | ||
| addresses.push_back(AddrPriority(addr, prio, src)); | ||
| addresses.back().pfahead_host = ahead_level; | ||
| addresses.back().pfahead = ahead_level > cache->level(); | ||
| return true; | ||
| } | ||
| } |
There was a problem hiding this comment.
Critical: Potential null pointer dereference of filterL2.
At line 286-287, filterL2 is selected when ahead_level > 1, but there's no null check before dereferencing it at line 287. While sms.cc initializes filterL2, other instantiation paths might not, leading to a segmentation fault.
🐛 Proposed fix with null check
BertiPrefetcher::sendPFWithFilter(const PrefetchInfo &pfi, Addr addr, std::vector<AddrPriority> &addresses, int prio,
PrefetchSourceType src, bool using_best_delta_and_confident, int ahead_level)
{
if (archDBer && cache->level() == 1 && ahead_level == 1) {
archDBer->l1PFTraceWrite(curTick(), pfi.getPC(), pfi.getAddr(), addr, src);
}
if (using_best_delta_and_confident) {
lastUsedBestDelta = blockIndex(addr) - blockIndex(pfi.getAddr());
}
boost::compute::detail::lru_cache<Addr, Addr>* filterHere;
filterHere = ahead_level > 1 ? filterL2 : filter;
+ if (!filterHere) {
+ DPRINTF(BertiPrefetcher, "Filter not initialized for ahead_level %d\n", ahead_level);
+ return false;
+ }
if (filterHere->contains(addr)) {
DPRINTF(BertiPrefetcher, "Skip recently prefetched: %lx\n", addr);
return false;
Change-Id: I01d9003bf78eaabc3619349a60c3f80a24afe58c
Config: idealkmhv3, 0.8 coverage
Result: a lot improvement in GemsFDTD.
Improve the threshold from 2 to 3 to balance L1_PREF and L2_PREF.

Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.