Skip to content

Conversation

@beckwen
Copy link

@beckwen beckwen commented Nov 12, 2025

…esctrl subsystem

commit 594902c986e269660302f09df9ec4bf1cf017b77 upstream.

In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain structure representing a NUMA node relies on the cacheinfo interface (rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map) for monitoring. The L3 cache information of a SNC NUMA node determines which domains are summed for the "top level" L3-scoped events.

rdt_mon_domain::ci is initialized using the first online CPU of a NUMA node. When this CPU goes offline, its shared_cpu_map is cleared to contain only the offline CPU itself. Subsequently, attempting to read counters via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning zero values for "top-level events" without any error indication.

Replace the cacheinfo references in struct rdt_mon_domain and struct rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).

rdt_domain_hdr::cpu_mask contains the online CPUs associated with that domain. When reading "top-level events", select a CPU from rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine valid CPUs for reading RMID counter via the MSR interface.

Considering all CPUs associated with the L3 cache improves the chances of picking a housekeeping CPU on which the counter reading work can be queued, avoiding an unnecessary IPI.

Intel-SIG: commit 594902c986e
x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem backport to RDT driver for CWF

Test case: run the tool under kernel tools/testing/selftests/resctrl
./resctrl_tests

Fixes: 328ea68 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files")

Reviewed-by: Reinette Chatre reinette.chatre@intel.com
Tested-by: Tony Luck tony.luck@intel.com
Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com

…esctrl subsystem

commit 594902c986e269660302f09df9ec4bf1cf017b77 upstream.

In the resctrl subsystem's Sub-NUMA Cluster (SNC) mode, the rdt_mon_domain
structure representing a NUMA node relies on the cacheinfo interface
(rdt_mon_domain::ci) to store L3 cache information (e.g., shared_cpu_map)
for monitoring. The L3 cache information of a SNC NUMA node determines
which domains are summed for the "top level" L3-scoped events.

rdt_mon_domain::ci is initialized using the first online CPU of a NUMA
node. When this CPU goes offline, its shared_cpu_map is cleared to contain
only the offline CPU itself. Subsequently, attempting to read counters
via smp_call_on_cpu(offline_cpu) fails (and error ignored), returning
zero values for "top-level events" without any error indication.

Replace the cacheinfo references in struct rdt_mon_domain and struct
rmid_read with the cacheinfo ID (a unique identifier for the L3 cache).

rdt_domain_hdr::cpu_mask contains the online CPUs associated with that
domain. When reading "top-level events", select a CPU from
rdt_domain_hdr::cpu_mask and utilize its L3 shared_cpu_map to determine
valid CPUs for reading RMID counter via the MSR interface.

Considering all CPUs associated with the L3 cache improves the chances
of picking a housekeeping CPU on which the counter reading work can be
queued, avoiding an unnecessary IPI.

Intel-SIG: commit 594902c986e
x86,fs/resctrl: Remove inappropriate references to cacheinfo in the resctrl subsystem
backport to RDT driver for CWF

Fixes: 328ea68 ("x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files")
Signed-off-by: Qinyun Tan <qinyuntan@linux.alibaba.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250530182053.37502-2-qinyuntan@linux.alibaba.com
Signed-off-by: Kui Wen <kui.wen@intel.com>
@kchuyizhou
Copy link

Hello, please also backport commit d2e1b84c5141ff2ad465279acfc3cf943c960b78("fs/resctrl: Eliminate false positive lockdep warning when reading SNC counters") which is a fix patch to commit 594902c98

@kchuyizhou
Copy link

Please ensure that the Date and author information of the patch are consistent with the content of the upstream patch:

Author: Qinyun Tan qinyuntan@linux.alibaba.com
Date: Sat May 31 02:20:53 2025 +0800

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants