Skip to content

Conversation

@quanxianwang
Copy link

dependency #102 #103 #104

CPU topology series:
There are totally 5 series for this new cpu topology.
1st has been in 6.6, 2nd, 3rd, 4th, 5th series will be backported to 6.6. This PR is 5th patch series.

patch series 1: [patch V3 00/60] x86/apic: Decrapification and static calls
https://lore.kernel.org/all/20230801103042.936020332@linutronix.de/
Patches are merged in 6.6-rc1
Patch series 2: [patch V4 00/41] x86/cpu: Rework the topology evaluation
https://lore.kernel.org/all/20230814085006.593997112@linutronix.de/
Patch 1-22 merged in 6.7-rc1
Patch series 3: [patch V6 00/19] x86/cpu: Rework topology evaluation
https://lore.kernel.org/all/20240212153109.330805450@linutronix.de/
Patch series 4: [patch V3 00/22] x86/topology: More cleanups and preparatory work
https://lore.kernel.org/all/20240212154529.402604963@linutronix.de/
Patch series 5: [patch 00/30] x86/apic: Rework APIC registration
https://lore.kernel.org/all/20240213205415.307029033@linutronix.de/
Patches are merged in this merge window and should be included in 6.9-rc1.

Testing:
Intel platform
EMR/SRFSP/GNRSP/CWF/DMR simics testing - PASS

Testing Process:
git clone https://github.com/intel/lkvs
cd lkvs/BM

testcase-1:
#./runtests -c cpu_topology.sh -t numa_nodes_compare
[RESULT][cpu_topology.sh] [PASS] [0] [0.163s]
testcase-2:
./runtests -c cpu_topology.sh -t verify_thread_per_core
[RESULT][cpu_topology.sh] [PASS] [0] [0.158s]
testcase-3:
#./runtests -c cpu_topology.sh -t verify_cores_per_socket
[RESULT][cpu_topology.sh] [PASS] [0] [0.158s]
testcase-4:
#./runtests -c cpu_topology.sh -t verify_socket_num
[RESULT][cpu_topology.sh] [PASS] [0] [0.155s]
testcase-5:
#./runtests -c cpu_topology.sh -t verify_level_type
[RESULT][cpu_topology.sh] [PASS] [0] [0.158s]

package dependency: cpuid tools (used by test 1-5)

KAGA-KOKO and others added 30 commits December 15, 2025 22:12
commit 94f0b39 upstream.

Use the provided topology helper function instead of fiddling in cpu_data.

Intel-SIG: commit 94f0b39 hwmon: (fam15h_power) Use topology_core_id().
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.506988471@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit e952563 upstream.

Rename it to core_id and stick it to the other ID fields.

No functional change.

Intel-SIG: commit e952563 x86/cpu: Move cpu_core_id into topology info.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.566519388@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit e3c0c5d upstream.

No functional change.

Intel-SIG: commit e3c0c5d x86/cpu: Move cu_id into topology info.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.628405546@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 594957d upstream.

cpuinfo_x86::x86_coreid_bits is only used by the AMD numa topology code. No
point in evaluating it on non AMD systems.

No functional change.

Intel-SIG: commit 594957d x86/cpu: Remove pointless evaluation of x86_coreid_bits.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.687588373@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 22dc963 upstream.

Yet another topology related data pair. Rename logical_proc_id to
logical_pkg_id so it fits the common naming conventions.

No functional change.

Intel-SIG: commit 22dc963 x86/cpu: Move logical package and die IDs into topology info.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.745139505@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 6e29032 upstream.

The topology IDs which identify the LLC and L2 domains clearly belong to
the per CPU topology information.

Move them into cpuinfo_x86::cpuinfo_topo and get rid of the extra per CPU
data and the related exports.

This also paves the way to do proper topology evaluation during early boot
because it removes the only per CPU dependency for that.

No functional change.

Intel-SIG: commit 6e29032 x86/cpu: Move cpu_l[l2]c_id into topology info.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.803864641@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 9ff4275 upstream.

APIC ID checks compare with BAD_APICID all over the place, but some
initializers and some code which fiddles with global data structure use
-1[U] instead. That simply cannot work at all.

Fix it up and use BAD_APICID consistently all over the place.

Intel-SIG: commit 9ff4275 x86/apic: Use BAD_APICID consistently.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.862835121@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 4705243 upstream.

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width and fixup the most obvious usage sites of that.

The APIC callbacks will be addressed separately.

Intel-SIG: commit 4705243 x86/apic: Use u32 for APIC IDs in global data.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.922905727@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 5d376b8 upstream.

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width and move the default implementation to local.h as there are
no users outside the apic directory.

Intel-SIG: commit 5d376b8 x86/apic: Use u32 for check_apicid_used().
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.981956102@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 8aa2a41 upstream.

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width and fixup a few related usage sites for consistency sake.

Intel-SIG: commit 8aa2a41 x86/apic: Use u32 for cpu_present_to_apicid().
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085113.054064391@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 01ccf9b upstream.

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width even if that callback going to be removed soonish.

Intel-SIG: commit 01ccf9b x86/apic: Use u32 for phys_pkg_id().
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085113.113097126@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 59f7928 upstream.

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width.

Intel-SIG: commit 59f7928 x86/apic: Use u32 for [gs]et_apic_id().
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085113.172569282@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 92fe9bb upstream.

The data type for APIC IDs was standardized to 'u32' in the
following recent commit:

   db4a408 ("x86/apic: Use u32 for wakeup_secondary_cpu[_64]()")

Which changed the function arguments type signature of the
apic->wakeup_secondary_cpu() APIC driver function.

Propagate this to hv_snp_boot_ap() as well, which also addresses a
'assignment from incompatible pointer type' build warning that triggers
under the -Werror=incompatible-pointer-types GCC warning.

Fixes: db4a408 ("x86/apic: Use u32 for wakeup_secondary_cpu[_64]()")
Intel-SIG: commit 92fe9bb x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230814085113.233274223@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit db4a408 upstream.

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width.

Intel-SIG: commit db4a408 x86/apic: Use u32 for wakeup_secondary_cpu[_64]().
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085113.233274223@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 90781f0 upstream.

Per CPU cpuinfo is used to persist the logical package and die IDs. That's
really not the right place simply because cpuinfo is subject to be
reinitialized when a CPU goes through an offline/online cycle.

This works by chance today, but that's far from correct and neither obvious
nor documented.

Add a per cpu datastructure which persists those logical IDs, which allows
to cleanup the CPUID evaluation code.

This is a temporary workaround until the larger topology management is in
place, which makes all of this logical management mechanics obsolete.

Intel-SIG: commit 90781f0 x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085113.292947071@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 48525fd upstream.

Provide debug files which dump the topology related information of
cpuinfo_x86. This is useful to validate the upcoming conversion of the
topology evaluation for correctness or bug compatibility.

Intel-SIG: commit 48525fd x86/cpu: Provide debug interface.
x86/cpu: Rework the topology evaluation - part1

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085113.353191313@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 43d86e3 upstream.

Provide a few helper functions to read CPUID leafs or individual registers
into a data structure without requiring unions.

Intel-SIG: commit 43d86e3 x86/cpu: Provide cpuid_read() et al..
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/878r3mg570.ffs@tglx
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 97c7d5723537de08e076892e07d6089ae9777965 upstream.

<asm/cpuid.h> uses static_assert() at multiple locations but it does not
include the CPP macro's definition at linux/build_bug.h.

Include the needed header to make <asm/cpuid.h> self-sufficient.

This gets triggered when cpuid.h is included in new C files, which is to
be done in further commits.

Fixes: 43d86e3 ("x86/cpu: Provide cpuid_read() et al.")
Intel-SIG: commit 97c7d5723537 x86/cpuid: Include <linux/build_bug.h> in <asm/cpuid.h>.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250304085152.51092-5-darwi@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit ebdb203 upstream.

Topology evaluation is a complete disaster and impenetrable mess. It's
scattered all over the place with some vendor implementations doing early
evaluation and some not. The most horrific part is the permanent
overwriting of smt_max_siblings and __max_die_per_package, instead of
establishing them once on the boot CPU and validating the result on the
APs.

The goals are:

  - One topology evaluation entry point

  - Proper sharing of pointlessly duplicated code

  - Proper structuring of the evaluation logic and preferences.

  - Evaluating important system wide information only once on the boot CPU

  - Making the 0xb/0x1f leaf parsing less convoluted and actually fixing
    the short comings of leaf 0x1f evaluation.

Start to consolidate the topology evaluation code by providing the entry
points for the early boot CPU evaluation and for the final parsing on the
boot CPU and the APs.

Move the trivial pieces into that new code:

   - The initialization of cpuinfo_x86::topo

   - The evaluation of CPUID leaf 1, which presets topo::initial_apicid

   - topo_apicid is set to topo::initial_apicid when invoked from early
     boot. When invoked for the final evaluation on the boot CPU it reads
     the actual APIC ID, which makes apic_get_initial_apicid() obsolete
     once everything is converted over.

Provide a temporary helper function topo_converted() which shields off the
not yet converted CPU vendors from invoking code which would break them.
This shielding covers all vendor CPUs which support SMP, but not the
historical pure UP ones as they only need the topology info init and
eventually the initial APIC initialization.

Provide two new members in cpuinfo_x86::topo to store the maximum number of
SMT siblings and the number of dies per package and add them to the debugfs
readout. These two members will be used to populate this information on the
boot CPU and to validate the APs against it.

Intel-SIG: commit ebdb203 x86/cpu: Provide cpu_init/parse_topology().
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240212153624.581436579@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit bda74aa upstream.

The legacy topology detection via CPUID leaf 4, which provides the number
of cores in the package and CPUID leaf 1 which provides the number of
logical CPUs in case that FEATURE_HT is enabled and the CMP_LEGACY feature
is not set, is shared for Intel, Centaur and Zhaoxin CPUs.

Lift the code from common.c without the early detection hack and provide it
as common fallback mechanism.

Will be utilized in later changes.

Intel-SIG: commit bda74aa x86/cpu: Add legacy topology parser.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.644448852@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 598e719 upstream.

Centaur and Zhaoxin CPUs use only the legacy SMP detection. Remove the
invocations from their 32bit path and exclude them from the 64-bit call
path.

No functional change intended.

Intel-SIG: commit 598e719 x86/cpu: Use common topology code for Centaur and Zhaoxin.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.706794189@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 92853a7 upstream.

In preparation of a complete replacement for the topology leaf 0xb/0x1f
evaluation, move __max_die_per_package into the common code.

Will be removed once everything is converted over.

Intel-SIG: commit 92853a7 x86/cpu: Move __max_die_per_package to common.c.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.768188958@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 3d41009 upstream.

detect_extended_topology() along with it's early() variant is a classic
example for duct tape engineering:

  - It evaluates an array of subleafs with a boatload of local variables
    for the relevant topology levels instead of using an array to save the
    enumerated information and propagate it to the right level

  - It has no boundary checks for subleafs

  - It prevents updating the die_id with a crude workaround instead of
    checking for leaf 0xb which does not provide die information.

  - It's broken vs. the number of dies evaluation as it uses:

      num_processors[DIE_LEVEL] / num_processors[CORE_LEVEL]

    which "works" only correctly if there is none of the intermediate
    topology levels (MODULE/TILE) enumerated.

There is zero value in trying to "fix" that code as the only proper fix is
to rewrite it from scratch.

Implement a sane parser with proper code documentation, which will be used
for the consolidated topology evaluation in the next step.

Intel-SIG: commit 3d41009 x86/cpu: Provide a sane leaf 0xb/0x1f parser.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.830571770@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 22d6366 upstream.

Intel CPUs use either topology leaf 0xb/0x1f evaluation or the legacy
SMP/HT evaluation based on CPUID leaf 0x1/0x4.

Move it over to the consolidated topology code and remove the random
topology hacks which are sprinkled into the Intel and the common code.

No functional change intended.

Intel-SIG: commit 22d6366 x86/cpu: Use common topology code for Intel.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153624.893644349@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 0c2f6d0 upstream.

Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
this bit is set by the BIOS then CPUID evaluation including topology
enumeration does not work correctly as the evaluation code does not try
to analyze any leaf greater than two.

This went unnoticed before because the original topology code just
repeated evaluation several times and managed to overwrite the initial
limited information with the correct one later. The new evaluation code
does it once and therefore ends up with the limited and wrong
information.

Cure this by unlocking CPUID right before evaluating anything which
depends on the maximum CPUID leaf being greater than two instead of
rereading stuff after unlock.

Fixes: 22d6366 ("x86/cpu: Use common topology code for Intel")
Reported-by: Peter Schneider <pschneider1968@googlemail.com>
Intel-SIG: commit 0c2f6d0 x86/topology/intel: Unlock CPUID before evaluating anything.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/fd3f73dc-a86f-4bcf-9c60-43556a21eb42@googlemail.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit f7fb3b2 upstream.

AMD/HYGON uses various methods for topology evaluation:

  - Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb,
    which is the preferred variant for modern CPUs.

    Leaf 0xb will be superseded by leaf 0x80000026 soon, which is just
    another variant of the Intel 0x1f leaf for whatever reasons.

  - Subleaf 0x80000008 and NODEID_MSR base

  - Legacy fallback

That code is following the principle of random bits and pieces all over the
place which results in multiple evaluations and impenetrable code flows in
the same way as the Intel parsing did.

Provide a sane implementation by clearly separating the three variants and
bringing them in the proper preference order in one place.

This provides the parsing for both AMD and HYGON because there is no point
in having a separate HYGON parser which only differs by 3 lines of
code. Any further divergence between AMD and HYGON can be handled in
different functions, while still sharing the existing parsers.

Intel-SIG: commit f7fb3b2 x86/cpu: Provide an AMD/HYGON specific topology parser.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20240212153625.020038641@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 1b3108f upstream.

CPUID 0x80000008 ECX.cpu_nthreads describes the number of threads in the
package. The parser uses this value to initialize the SMT domain level.

That's wrong because cpu_nthreads does not describe the number of threads
per physical core. So this needs to set the CORE domain level and let the
later parsers set the SMT shift if available.

Preset the SMT domain level with the assumption of one thread per core,
which is correct ifrt here are no other CPUID leafs to parse, and propagate
cpu_nthreads and the core level APIC bitwidth into the CORE domain.

Fixes: f7fb3b2 ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Reported-by: Laura Nao <laura.nao@collabora.com>
Intel-SIG: commit 1b3108f x86/cpu/amd: Make the CPUID 0x80000008 parser correct.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Laura Nao <laura.nao@collabora.com>
Link: https://lore.kernel.org/r/20240410194311.535206450@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit c064b53 upstream.

A system with NODEID_MSR was reported to crash during early boot without
any output.

The reason is that the union which is used for accessing the bitfields in
the MSR is written wrongly and the resulting executable code accesses the
wrong part of the MSR data.

As a consequence a later division by that value results in 0 and that
result is used for another division as divisor, which obviously does not
work well.

The magic world of C, unions and bitfields:

    union {
    	  u64   bita : 3,
	        bitb : 3;
	  u64   all;
    } x;

    x.all = foo();

    a = x.bita;
    b = x.bitb;

results in the effective executable code of:

   a = b = x.bita;

because bita and bitb are treated as union members and therefore both end
up at bit offset 0.

Wrapping the bitfield into an anonymous struct:

    union {
    	  struct {
    	     u64  bita : 3,
	          bitb : 3;
          };
	  u64	  all;
    } x;

works like expected.

Rework the NODEID_MSR union in exactly that way to cure the problem.

Fixes: f7fb3b2 ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Reported-by: Laura Nao <laura.nao@collabora.com>
Intel-SIG: commit c064b53 x86/cpu/amd: Make the NODEID_MSR union actually work.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Laura Nao <laura.nao@collabora.com>
Link: https://lore.kernel.org/r/20240410194311.596282919@linutronix.de
Closes: https://lore.kernel.org/all/20240322175210.124416-1-laura.nao@collabora.com/
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 7211274 upstream.

The topology rework missed that early_init_amd() tries to re-enable the
Topology Extensions when the BIOS disabled them.

The new parser is invoked before early_init_amd() so the re-enable attempt
happens too late.

Move it into the AMD specific topology parser code where it belongs.

Fixes: f7fb3b2 ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Intel-SIG: commit 7211274 x86/cpu/amd: Move TOPOEXT enablement into the topology parser.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/878r1j260l.ffs@tglx
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 5754ace upstream.

The original topology evaluation code initialized cpu_data::topo::llc_id
with the die ID initialy and then eventually overwrite it with information
gathered from a CPUID leaf.

The conversion analysis failed to spot that particular detail and omitted
this initial assignment under the assumption that each topology evaluation
path will set it up. That assumption is mostly correct, but turns out to be
wrong in case that the CPUID leaf 0x80000006 does not provide a LLC ID.

In that case, LLC ID is invalid and as a consequence the setup of the
scheduling domain CPU masks is incorrect which subsequently causes the
scheduler core to complain about it during CPU hotplug:

  BUG: arch topology borken
       the CLS domain not a subset of the MC domain

Cure it by reusing legacy_set_llc() and assigning the die ID if the LLC ID
is invalid after all possible parsers have been tried.

Fixes: f7fb3b2 ("x86/cpu: Provide an AMD/HYGON specific topology parser")
Reported-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Intel-SIG: commit 5754ace x86/topology/amd: Ensure that LLC ID is initialized.
x86/cpu: Rework the topology evaluation - part2

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Link: https://lore.kernel.org/r/PUZPR04MB63168AC442C12627E827368581292@PUZPR04MB6316.apcprd04.prod.outlook.com
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
KAGA-KOKO and others added 29 commits January 7, 2026 21:18
commit 0e53e7b upstream.

Move the actually required content of generic_processor_id() into the call
sites and use common helper functions for them. This separates the early
boot registration and the ACPI hotplug mechanism completely which allows
further cleanups and improvements.

Intel-SIG: commit 0e53e7b x86/cpu/topology: Sanitize the APIC admission logic.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.230433953@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 7c0edad upstream.

Managing possible CPUs is an unreadable and uncomprehensible maze. Aside of
that it's backwards because it applies command line limits after
registering all APICs.

Rewrite it so that it:

  - Applies the command line limits upfront so that only the allowed amount
    of APIC IDs can be registered.

  - Applies eventual late restrictions in an understandable way

  - Uses simple min_t() calculations which are trivial to follow.

  - Provides a separate function for resetting to UP mode late in the
    bringup process.

Intel-SIG: commit 7c0edad x86/cpu/topology: Rework possible CPU management.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.290098853@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 5c5682b upstream.

When a kdump kernel is started from a crashing CPU then there is no
guarantee that this CPU is the real boot CPU (BSP). If the kdump kernel
tries to online the BSP then the INIT sequence will reset the machine.

There is a command line option to prevent this, but in case of nested kdump
kernels this is wrong.

But that command line option is not required at all because the real
BSP is enumerated as the first CPU by firmware. Support for the only
known system which was different (Voyager) got removed long ago.

Detect whether the boot CPU APIC ID is the first APIC ID enumerated by
the firmware. If the first APIC ID enumerated is not matching the boot
CPU APIC ID then skip registering it.

Intel-SIG: commit 5c5682b x86/cpu: Detect real BSP on crash kernels.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.348542071@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit f1f758a upstream.

Topology on X86 is determined by the registered APIC IDs and the
segmentation information retrieved from CPUID. Depending on the granularity
of the provided CPUID information the most fine grained scheme looks like
this according to Intel terminology:

   [PKG][DIEGRP][DIE][TILE][MODULE][CORE][THREAD]

Not enumerated domain levels consume 0 bits in the APIC ID. This allows to
provide a consistent view at the topology and determine other information
precisely like the number of cores in a package on hybrid systems, where
the existing assumption that number or cores == number of threads / threads
per core does not hold.

Provide per domain level bitmaps which record the APIC ID split into the
domain levels to make later evaluation of domain level specific information
simple. This allows to calculate e.g. the logical IDs without any further
extra logic.

Contrary to the existing registration mechanism this records disabled CPUs,
which are subject to later hotplug as well. That's useful for boot time
sizing of package or die dependent allocations without using heuristics.

Intel-SIG: commit f1f758a x86/topology: Add a mechanism to track topology via APIC IDs.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.406985021@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 7cdcdab upstream.

The topology bitmaps track all possible APIC IDs which have been registered
during enumeration. As sizing and further topology information is going to
be derived from these bitmaps, reject attempts to hotplug an APIC ID which
was not registered during enumeration.

Intel-SIG: commit 7cdcdab x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.462231229@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit ea2dd8a upstream.

There is no point in assigning the CPU numbers during ACPI physical
hotplug. The number of possible hotplug CPUs is known when the possible map
is initialized, so the CPU numbers can be associated to the registered
non-present APIC IDs right there.

This allows to put more code into the __init section and makes the related
data __ro_after_init.

Intel-SIG: commit ea2dd8a x86/cpu/topology: Assign hotpluggable CPUIDs during init.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.517339971@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit c8f8082 upstream.

XEN/PV has a completely broken vCPU enumeration scheme, which just works by
chance and provides zero topology information. Each vCPU ends up being a
single core package.

Dom0 provides MADT which can be used for topology information, but that
table is the unmodified host table, which means that there can be more CPUs
registered than the number of vCPUs XEN provides for the dom0 guest.

DomU does not have ACPI and both rely on counting the possible vCPUs via an
hypercall.

To prepare for using CPUID topology information either via MADT or via fake
APIC IDs count the number of possible CPUs during early boot and adjust
nr_cpu_ids() accordingly.

Intel-SIG: commit c8f8082 x86/xen/smp_pv: Count number of vCPUs early.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.571795063@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 354da4c upstream.

It turns out that XEN/PV Dom0 has halfways usable CPUID/MADT enumeration
except that it cannot deal with CPUs which are enumerated as disabled in
MADT.

DomU has no MADT and provides at least rudimentary topology information in
CPUID leaves 1 and 4.

For both it's important that there are not more possible Linux CPUs than
vCPUs provided by the hypervisor.

As this is ensured by counting the vCPUs before enumeration happens:

  - lift the restrictions in the CPUID evaluation and the MADT parser

  - Utilize MADT registration for Dom0

  - Keep the fake APIC ID registration for DomU

  - Fix the XEN APIC fake so the readout of the local APIC ID works for
    Dom0 via the hypercall and for DomU by returning the registered
    fake APIC IDs.

With that the XEN/PV fake approximates usefulness.

Intel-SIG: commit 354da4c x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.626195405@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
…ckets platform"

This reverts commit af350ab.

upstream commmit implement this function.
b4bac27("x86/tsc: Use topology_max_packages() to get package
number")

Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 090610b upstream.

Now that all possible APIC IDs are tracked in the topology bitmaps, its
trivial to retrieve the real information from there.

This gets rid of the guesstimates for the maximal packages and dies per
package as the actual numbers can be determined before a single AP has been
brought up.

The number of SMT threads can now be determined correctly from the bitmaps
in all situations. Up to now a system which has SMT disabled in the BIOS
will still claim that it is SMT capable, because the lowest APIC ID bit is
reserved for that and CPUID leaf 0xb/0x1f still enumerates the SMT domain
accordingly. By calculating the bitmap weights of the SMT and the CORE
domain and setting them into relation the SMT disabled in BIOS situation
reports correctly that the system is not SMT capable.

It also handles the situation correctly when a hybrid systems boot CPU does
not have SMT as it takes the SMT capability of the APs fully into account.

Intel-SIG: commit 090610b x86/cpu/topology: Use topology bitmaps for sizing.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.681709880@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 882e0cf upstream.

The early initcall to initialize the primary thread mask is not longer
required because topology_init_possible_cpus() can mark primary threads
correctly when initializing the possible and present map as the number of
SMT threads is already determined correctly.

The XENPV workaround is not longer required because XENPV now registers
fake APIC IDs which will just work like any other enumeration.

Intel-SIG: commit 882e0cf x86/cpu/topology: Mop up primary thread mask handling.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.736104257@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 5e40fb2 upstream.

No point in creating a mask via fls(). smp_num_siblings is guaranteed to be
a power of 2. So just using (smp_num_siblings - 1) has the same effect.

Intel-SIG: commit 5e40fb2 x86/cpu/topology: Simplify cpu_mark_primary_thread().
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.791176581@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit b7065f4 upstream.

With the topology bitmaps in place the logical package and die IDs can
trivially be retrieved by determining the bitmap weight of the relevant
topology domain level up to and including the physical ID in question.

Provide a function to that effect.

Intel-SIG: commit b7065f4 x86/cpu/topology: Provide logical pkg/die mapping.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.846136196@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 380414b upstream.

Replace the logical package and die management functionality and retrieve
the logical IDs from the topology bitmaps.

Intel-SIG: commit 380414b x86/cpu/topology: Use topology logical mapping mechanism.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.901865302@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 3205c98 upstream.

Similar to other sizing information the number of cores per package can be
established from the topology bitmap.

Provide a function for retrieving that information and replace the buggy
hack in the CPUID evaluation with it.

Intel-SIG: commit 3205c98 x86/cpu/topology: Retrieve cores per package from topology bitmaps.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210252.956858282@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 8078f4d upstream.

It's really a non-intuitive name. Rename it to __max_threads_per_core which
is obvious.

Intel-SIG: commit 8078f4d x86/cpu/topology: Rename smp_num_siblings.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210253.011307973@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit bd745d1 upstream.

The plural of die is dies.

Intel-SIG: commit bd745d1 x86/cpu/topology: Rename topology_max_die_per_package().
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210253.065874205@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit fd43b8a upstream.

Expose properly accounted information and accessors so the fiddling with
other topology variables can be replaced.

Intel-SIG: commit fd43b8a x86/cpu/topology: Provide __num_[cores|threads]_per_package.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210253.120958987@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 89b0f15 upstream.

Now that __num_cores_per_package and __num_threads_per_package are
available, cpuinfo::x86_max_cores and the related math all over the place
can be replaced with the ready to consume data.

Intel-SIG: commit 89b0f15 x86/cpu/topology: Get rid of cpuinfo::x86_max_cores.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://lore.kernel.org/r/20240213210253.176147806@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit f0551af upstream.

Borislav reported that one of his systems has a broken MADT table which
advertises eight present APICs and 24 non-present APICs in the same
package.

The non-present ones are considered hot-pluggable by the topology
evaluation code, which is obviously bogus as there is no way to hot-plug
within the same package.

As the topology evaluation code accounts for hot-pluggable CPUs in a
package, the maximum number of cores per package is computed wrong, which
in turn causes the uncore performance counter driver to access non-existing
MSRs. It will probably confuse other entities which rely on the maximum
number of cores and threads per package too.

Cure this by ignoring hot-pluggable APIC IDs within a present package.

In theory it would be reasonable to just do this unconditionally, but then
there is this thing called reality^Wvirtualization which ruins
everything. Virtualization is the only existing user of "physical" hotplug
and the virtualization tools allow the above scenario. Whether that is
actually in use or not is unknown.

As it can be argued that the virtualization case is not affected by the
issues which exposed the reported problem, allow the bogosity if the kernel
determined that it is running in a VM for now.

Fixes: 89b0f15 ("x86/cpu/topology: Get rid of cpuinfo::x86_max_cores")
Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Intel-SIG: commit f0551af x86/topology: Ignore non-present APIC IDs in a present package.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/87a5nbvccx.ffs@tglx
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 7af541c upstream.

The local APICs have not yet been enumerated so the logical ID evaluation
from the topology bitmaps does not work and would return an error code.

Skip the evaluation during the early boot CPUID evaluation and only apply
it on the final run.

Fixes: 380414b ("x86/cpu/topology: Use topology logical mapping mechanism")
Intel-SIG: commit 7af541c x86/topology: Don't evaluate logical IDs during early boot.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit a9025cd upstream.

topo_set_cpuids() updates cpu_present_map and cpu_possible map. It is
invoked during enumeration and "physical hotplug" operations. In the
latter case this results in a kernel crash because cpu_possible_map is
marked read only after init completes.

There is no reason to update cpu_possible_map in that function. During
enumeration cpu_possible_map is not relevant and gets fully initialized
after enumeration completed. On "physical hotplug" the bit is already set
because the kernel allows only CPUs to be plugged which have been
enumerated and associated to a CPU number during early boot.

Remove the bogus update of cpu_possible_map.

Fixes: 0e53e7b ("x86/cpu/topology: Sanitize the APIC admission logic")
Reported-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Intel-SIG: commit a9025cd x86/topology: Don't update cpu_possible_map in topo_set_cpuids().
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87ttkc6kwx.ffs@tglx
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 8a95db3 upstream.

The topology core expects the boot APIC to be registered from earhy APIC
detection first and then again when the firmware tables are evaluated. This
is used for detecting the real BSP CPU on a kexec kernel.

The recent conversion of XEN/PV to register fake APIC IDs failed to
register the boot CPU APIC correctly as it only registers it once. This
causes the BSP detection mechanism to trigger wrongly:

   CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1

Additionally this results in one CPU being ignored.

Register the boot CPU APIC twice so that the XEN/PV fake enumeration
behaves like real firmware.

Reported-by: Juergen Gross <jgross@suse.com>
Fixes: e753070 ("x86/xen/smp_pv: Register fake APICs")
Intel-SIG: commit 8a95db3 x86/xen/smp_pv: Register the boot CPU APIC properly.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/87a5l8s2fg.ffs@tglx
Signed-off-by: Juergen Gross <jgross@suse.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 9d22c96 upstream.

The ACPI specification clearly states how the processors should be
enumerated in the MADT:

 "To ensure that the boot processor is supported post initialization,
  two guidelines should be followed. The first is that OSPM should
  initialize processors in the order that they appear in the MADT. The
  second is that platform firmware should list the boot processor as the
  first processor entry in the MADT.
  ...
  Failure of OSPM implementations and platform firmware to abide by
  these guidelines can result in both unpredictable and non optimal
  platform operation."

The kernel relies on that ordering to detect the real BSP on crash kernels
which is important to avoid sending a INIT IPI to it as that would cause a
full machine reset.

On a Dell XPS 16 9640 the BIOS ignores this rule and enumerates the CPUs in
the wrong order. As a consequence the kernel falsely detects a crash kernel
and disables the corresponding CPU.

Prevent this by checking the IA32_APICBASE MSR for the BSP bit on the boot
CPU. If that bit is set, then the MADT based BSP detection can be safely
ignored. If the kernel detects a mismatch between the BSP bit and the first
enumerated MADT entry then emit a firmware bug message.

This obviously also has to be taken into account when the boot APIC ID and
the first enumerated APIC ID match. If the boot CPU does not have the BSP
bit set in the APICBASE MSR then there is no way for the boot CPU to
determine which of the CPUs is the real BSP. Sending an INIT to the real
BSP would reset the machine so the only sane way to deal with that is to
limit the number of CPUs to one and emit a corresponding warning message.

Fixes: 5c5682b ("x86/cpu: Detect real BSP on crash kernels")
Reported-by: Carsten Tolkmit <ctolkmit@ennit.de>
Intel-SIG: commit 9d22c96 x86/topology: Handle bogus ACPI tables correctly.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Carsten Tolkmit <ctolkmit@ennit.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87le48jycb.ffs@tglx
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218837
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 73da582a476ea6e3512f89f8ed57dfed945829a2 upstream.

The rework of possible CPUs management erroneously disabled SMP when the
IO/APIC is disabled either by the 'noapic' command line parameter or during
IO/APIC setup. SMP is possible without IO/APIC.

Remove the ioapic_is_disabled conditions from the relevant possible CPU
management code paths to restore the orgininal behaviour.

Fixes: 7c0edad ("x86/cpu/topology: Rework possible CPU management")
Intel-SIG: commit 73da582a476e x86/cpu/topology: Remove limit of CPUs due to disabled IO/APIC.
x86/apic: Rework APIC registration

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241202145905.1482-1-ffmancera@riseup.net
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 8b37357a78d7fa13d88ea822b35b40137da1c85e upstream.

Xen disables ACPI for PV guests in DomU, which causes acpi_mps_check() to
return 1 when CONFIG_X86_MPPARSE is not set. As a result, the local APIC is
disabled and the guest is later limited to a single vCPU, despite being
configured with more.

This regression was introduced in version 6.9 in commit 7c0edad
("x86/cpu/topology: Rework possible CPU management"), which added an
early check that limits CPUs to 1 if apic_is_disabled.

Update the acpi_mps_check() logic to return 0 early when running as a Xen
PV guest in DomU, preventing APIC from being disabled in this specific case
and restoring correct multi-vCPU behaviour.

Fixes: 7c0edad ("x86/cpu/topology: Rework possible CPU management")
Intel-SIG: commit 8b37357a78d7 x86/acpi: Don't limit CPUs to 1 for Xen PV guests due to disabled ACPI.
x86/apic: Rework APIC registration

Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250407132445.6732-2-arkamar@atlas.cz
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit bf56c41 upstream.

Recent topology checks of the x86 boot code uncovered the need for
PV guests to have the boot cpu marked in the APICBASE MSR.

Fixes: 9d22c96 ("x86/topology: Handle bogus ACPI tables correctly")
Reported-by: Niels Dettenbach <nd@syndicat.com>
Intel-SIG: commit bf56c41 x86/xen: mark boot CPU of PV guest in MSR_IA32_APICBASE.
x86/apic: Rework APIC registration

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit 5e25eb2 upstream.

If there is no local APIC enumerated and registered then the topology
bitmaps are empty. Therefore, topology_init_possible_cpus() will die with
a division by zero exception.

Prevent this by registering a fake APIC id to populate the topology
bitmap. This also allows to use all topology query interfaces
unconditionally. It does not affect the actual APIC code because either
the local APIC address was not registered or no local APIC could be
detected.

Fixes: f1f758a ("x86/topology: Add a mechanism to track topology via APIC IDs")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Intel-SIG: commit 5e25eb2 x86/topology: Handle the !APIC case gracefully.
x86/apic: Rework APIC registration

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
commit b4bac27 upstream.

Commit b50db70 ("x86/tsc: Disable clocksource watchdog for TSC on
qualified platorms") was introduced to solve problem that sometimes TSC
clocksource is wrongly judged as unstable by watchdog like 'jiffies', HPET,
etc.

In it, the hardware package number is a key factor for judging whether to
disable the watchdog for TSC, and 'nr_online_nodes' was chosen due to, at
that time (kernel v5.1x), it is available in early boot phase before
registering 'tsc-early' clocksource, where all non-boot CPUs are not
brought up yet.

Dave and Rui pointed out there are many cases in which 'nr_online_nodes'
is cheated and not accurate, like:

 * SNC (sub-numa cluster) mode enabled
 * numa emulation (numa=fake=8 etc.)
 * numa=off
 * platforms with CPU-less HBM nodes, CPU-less Optane memory nodes.
 * 'maxcpus=' cmdline setup, where chopped CPUs could be onlined later
 * 'nr_cpus=', 'possible_cpus=' cmdline setup, where chopped CPUs can
   not be onlined after boot

The SNC case is the most user-visible case, as many CSP (Cloud Service
Provider) enable this feature in their server fleets. When SNC3 enabled, a
2 socket machine will appear to have 6 NUMA nodes, and get impacted by the
issue in reality.

Thomas' recent patchset of refactoring x86 topology code improves
topology_max_packages() greatly, by making it more accurate and available
in early boot phase, which works well in most of the above cases.

The only exceptions are 'nr_cpus=' and 'possible_cpus=' setup, which may
under-estimate the package number. As during topology setup, the boot CPU
iterates through all enumerated APIC IDs and either accepts or rejects the
APIC ID. For accepted IDs, it figures out which bits of the ID map to the
package number.  It tracks which package numbers have been seen in a
bitmap.  topology_max_packages() just returns the number of bits set in
that bitmap.

'nr_cpus=' and 'possible_cpus=' can cause more APIC IDs to be rejected and
can artificially lower the number of bits in the package bitmap and thus
topology_max_packages().  This means that, for example, a system with 8
physical packages might reject all the CPUs on 6 of those packages and be
left with only 2 packages and 2 bits set in the package bitmap. It needs
the TSC watchdog, but would disable it anyway.  This isn't ideal, but it
only happens for debug-oriented options. This is fixable by tracking the
package numbers for rejected CPUs.  But it's not worth the trouble for
debugging.

So use topology_max_packages() to replace nr_online_nodes().

Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Intel-SIG: commit b4bac27 x86/tsc: Use topology_max_packages() to get package number.
x86/apic: Rework APIC registration

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/all/20240729021202.180955-1-feng.tang@intel.com
Closes: https://lore.kernel.org/lkml/a4860054-0f16-6513-f121-501048431086@intel.com/
[ Quanxian Wang: amend commit log ]
Signed-off-by: Quanxian Wang <quanxian.wang@intel.com>
@quanxianwang quanxianwang force-pushed the DMR-new-cpu-topo-6.6-PR4-withfixes branch from 327b269 to 3d5dcf5 Compare January 8, 2026 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants