build(deps): bump idna from 3.4 to 3.7 in /drivers/gpu/drm/ci/xfails by dependabot[bot] · Pull Request #1 · rafaeljw/linux-pm

dependabot · 2024-04-12T03:47:25Z

Bumps idna from 3.4 to 3.7.

Release notes

v3.7

What's Changed

Fix issue where specially crafted inputs to encode() could take exceptionally long amount of time to process. [CVE-2024-3651]

Thanks to Guido Vranken for reporting the issue.

Full Changelog: kjd/idna@v3.6...v3.7

Changelog

Sourced from idna's changelog.

3.7 (2024-04-11) ++++++++++++++++

Fix issue where specially crafted inputs to encode() could take exceptionally long amount of time to process. [CVE-2024-3651]

Thanks to Guido Vranken for reporting the issue.

3.6 (2023-11-25) ++++++++++++++++

Fix regression to include tests in source distribution.

3.5 (2023-11-24) ++++++++++++++++

Update to Unicode 15.1.0

String codec name is now "idna2008" as overriding the system codec "idna" was not working.

Fix typing error for codec encoding

"setup.cfg" has been added for this release due to some downstream lack of adherence to PEP 517. Should be removed in a future release so please prepare accordingly.

Removed reliance on a symlink for the "idna-data" tool to comport with PEP 517 and the Python Packaging User Guide for sdist archives.

Added security reporting protocol for project

Thanks Jon Ribbens, Diogo Teles Sant'Anna, Wu Tingfeng for contributions to this release.

Commits

1d365e1 Release v3.7
c1b3154 Merge pull request #172 from kjd/optimize-contextj
0394ec7 Merge branch 'master' into optimize-contextj
cd58a23 Merge pull request #152 from elliotwutingfeng/dev
5beb28b More efficient resolution of joiner contexts
1b12148 Update ossf/scorecard-action to v2.3.1
d516b87 Update Github actions/checkout to v4
c095c75 Merge branch 'master' into dev
60a0a4c Fix typo in GitHub Actions workflow key
5918a0e Merge branch 'master' into dev
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the Security Alerts page.

* thermal-core: thermal: gov_power_allocator: Avoid overwriting PID coefficients from setup time thermal: sysfs: Fix up white space in trip_point_temp_store() iwlwifi: mvm: Use for_each_thermal_trip() for walking trip points iwlwifi: mvm: Populate trip table before registering thermal zone iwlwifi: mvm: Drop unused fw_trips_index[] from iwl_mvm_thermal_device thermal: core: Change governor name to const char pointer thermal: gov_bang_bang: Fix possible cooling device state ping-pong thermal: gov_fair_share: Fix dependency on trip points ordering * thermal-intel: thermal/intel: Fix intel_tcc_get_temp() to support negative CPU temperature

… into linux-next * acpi-pm: ACPI: PM: s2idle: Enable Low-Power S0 Idle MSFT UUID for non-AMD systems * acpi-resource: ACPI: resource: Do IRQ override on Lunnen Ground laptops ACPI: resource: Add IRQ override quirk for ASUS ExpertBook B2502FBA ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CVA * acpi-bus: ACPI: bus: make acpi_bus_type const * acpi-scan: ACPI: scan: Defer enumeration of devices with a _DEP pointing to IVSC device

…-processor' into linux-next * acpi-tables: ACPI: NFIT: Switch to use acpi_evaluate_dsm_typed() * acpi-video: ACPI: video: Handle fetching EDID that is longer than 256 bytes * acpi-property: ACPI: property: Ignore bad graph port nodes on Dell XPS 9315 ACPI: utils: Make acpi_handle_path() not static * acpi-processor: ACPI: processor_idle: Fix memory leak in acpi_processor_power_exit()

* pm-sleep: Documentation: PM: Fix PCI hibernation support description PM: hibernate: Add support for LZ4 compression for hibernation PM: hibernate: Move to crypto APIs for LZO compression PM: hibernate: Rename lzo* to make it generic PM: sleep: Call dpm_async_fn() directly in each suspend phase PM: sleep: Move devices to new lists earlier in each suspend phase PM: sleep: Move some assignments from under a lock PM: sleep: stats: Log errors right after running suspend callbacks PM: sleep: stats: Use locking in dpm_save_failed_dev() PM: sleep: stats: Call dpm_save_failed_step() at most once per phase PM: sleep: stats: Define suspend_stats next to the code using it PM: sleep: stats: Use unsigned int for success and failure counters PM: sleep: stats: Use an array of step failure counters PM: sleep: stats: Use array of suspend step names PM: sleep: Relocate two device PM core functions PM: sleep: Simplify dpm_suspended_list walk in dpm_resume() PM: sleep: Use bool for all 1-bit fields in struct dev_pm_info

* pm-cpufreq: cpufreq: intel_pstate: remove cpudata::prev_cummulative_iowait cpufreq: Change default transition delay to 2ms cpufreq: amd-pstate: Fix min_perf assignment in amd_pstate_adjust_perf() Documentation: PM: amd-pstate: Fix section title underline Documentation: introduce amd-pstate preferrd core mode kernel command line options Documentation: amd-pstate: introduce amd-pstate preferred core cpufreq: amd-pstate: Update amd-pstate preferred core ranking dynamically ACPI: cpufreq: Add highest perf change notification cpufreq: amd-pstate: Enable amd-pstate preferred core support ACPI: CPPC: Add helper to get the highest performance value x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion * pm-cpuidle: cpuidle: Avoid potential overflow in integer multiplication cpuidle: haltpoll: do not shrink guest poll_limit_ns below grow_start

* pm-powercap: powercap: dtpm_cpu: Fix error check against freq_qos_add_request() powercap: intel_rapl: Add support for Arrow Lake powercap: intel_rapl: Add support for Lunar Lake-M paltform powercap: intel_rapl_tpmi: Fix System Domain probing powercap: intel_rapl_tpmi: Fix a register bug powercap: intel_rapl: Fix locking in TPMI RAPL powercap: intel_rapl: Fix a NULL pointer dereference

* pm-em: (23 commits) Documentation: EM: Update with runtime modification design PM: EM: Add em_dev_compute_costs() PM: EM: Remove old table PM: EM: Change debugfs configuration to use runtime EM table data drivers/thermal/devfreq_cooling: Use new Energy Model interface drivers/thermal/cpufreq_cooling: Use new Energy Model interface powercap/dtpm_devfreq: Use new Energy Model interface to get table powercap/dtpm_cpu: Use new Energy Model interface to get table PM: EM: Optimize em_cpu_energy() and remove division PM: EM: Support late CPUs booting and capacity adjustment PM: EM: Add performance field to struct em_perf_state and optimize PM: EM: Add em_perf_state_from_pd() to get performance states table PM: EM: Introduce em_dev_update_perf_domain() for EM updates PM: EM: Add functions for memory allocations for new EM tables PM: EM: Use runtime modified EM for CPUs energy estimation in EAS PM: EM: Introduce runtime modifiable table PM: EM: Split the allocation and initialization of the EM table PM: EM: Check if the get_cost() callback is present in em_compute_costs() PM: EM: Introduce em_compute_costs() PM: EM: Refactor em_pd_get_efficient_state() to be more flexible ...

* pm-runtime: PM: runtime: Add pm_runtime_put_autosuspend() replacement PM: runtime: Simplify pm_runtime_get_if_active() usage

* acpi-misc: ACPI: use %pe for better readability of errors while printing

Bumps [idna](https://github.com/kjd/idna) from 3.4 to 3.7. - [Release notes](https://github.com/kjd/idna/releases) - [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst) - [Commits](kjd/idna@v3.4...v3.7) --- updated-dependencies: - dependency-name: idna dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

When the system is booted with kernel command line argument "nosmt" or "maxcpus" to limit the number of CPUs, disabling turbo via: echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo results in a crash: PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI ... RIP: 0010:store_no_turbo+0x100/0x1f0 ... This occurs because for_each_possible_cpu() returns CPUs even if they are not online. For those CPUs, all_cpu_data[] will be NULL. Since commit 973207a ("cpufreq: intel_pstate: Rearrange max frequency updates handling code"), all_cpu_data[] is dereferenced even for CPUs which are not online, causing the NULL pointer dereference. To fix that, pass CPU number to intel_pstate_update_max_freq() and use all_cpu_data[] for those CPUs for which there is a valid cpufreq policy. Fixes: 973207a ("cpufreq: intel_pstate: Rearrange max frequency updates handling code") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221068 Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 6.16+ <stable@vger.kernel.org> # 6.16+ Link: https://patch.msgid.link/20260225001752.890164-1-srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

A DABT is reported[1] on an android based system when resume from hiberate. This happens because swsusp_arch_suspend_exit() is marked with SYM_CODE_*() and does not have a CFI hash, but swsusp_arch_resume() will attempt to verify the CFI hash when calling a copy of swsusp_arch_suspend_exit(). Given that there's an existing requirement that the entrypoint to swsusp_arch_suspend_exit() is the first byte of the .hibernate_exit.text section, we cannot fix this by marking swsusp_arch_suspend_exit() with SYM_FUNC_*(). The simplest fix for now is to disable the CFI check in swsusp_arch_resume(). Mark swsusp_arch_resume() as __nocfi to disable the CFI check. [1] [ 22.991934][ T1] Unable to handle kernel paging request at virtual address 0000000109170ffc [ 22.991934][ T1] Mem abort info: [ 22.991934][ T1] ESR = 0x0000000096000007 [ 22.991934][ T1] EC = 0x25: DABT (current EL), IL = 32 bits [ 22.991934][ T1] SET = 0, FnV = 0 [ 22.991934][ T1] EA = 0, S1PTW = 0 [ 22.991934][ T1] FSC = 0x07: level 3 translation fault [ 22.991934][ T1] Data abort info: [ 22.991934][ T1] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 [ 22.991934][ T1] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 22.991934][ T1] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 22.991934][ T1] [0000000109170ffc] user address but active_mm is swapper [ 22.991934][ T1] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP [ 22.991934][ T1] Dumping ftrace buffer: [ 22.991934][ T1] (ftrace buffer empty) [ 22.991934][ T1] Modules linked in: [ 22.991934][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.98-android15-8-g0b1d2aee7fc3-dirty-4k #1 688c7060a825a3ac418fe53881730b355915a419 [ 22.991934][ T1] Hardware name: Unisoc UMS9360-base Board (DT) [ 22.991934][ T1] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 22.991934][ T1] pc : swsusp_arch_resume+0x2ac/0x344 [ 22.991934][ T1] lr : swsusp_arch_resume+0x294/0x344 [ 22.991934][ T1] sp : ffffffc08006b960 [ 22.991934][ T1] x29: ffffffc08006b9c0 x28: 0000000000000000 x27: 0000000000000000 [ 22.991934][ T1] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000820 [ 22.991934][ T1] x23: ffffffd0817e3000 x22: ffffffd0817e3000 x21: 0000000000000000 [ 22.991934][ T1] x20: ffffff8089171000 x19: ffffffd08252c8c8 x18: ffffffc080061058 [ 22.991934][ T1] x17: 00000000529c6ef0 x16: 00000000529c6ef0 x15: 0000000000000004 [ 22.991934][ T1] x14: ffffff8178c88000 x13: 0000000000000006 x12: 0000000000000000 [ 22.991934][ T1] x11: 0000000000000015 x10: 0000000000000001 x9 : ffffffd082533000 [ 22.991934][ T1] x8 : 0000000109171000 x7 : 205b5d3433393139 x6 : 392e32322020205b [ 22.991934][ T1] x5 : 000000010916f000 x4 : 000000008164b000 x3 : ffffff808a4e0530 [ 22.991934][ T1] x2 : ffffffd08058e784 x1 : 0000000082326000 x0 : 000000010a283000 [ 22.991934][ T1] Call trace: [ 22.991934][ T1] swsusp_arch_resume+0x2ac/0x344 [ 22.991934][ T1] hibernation_restore+0x158/0x18c [ 22.991934][ T1] load_image_and_restore+0xb0/0xec [ 22.991934][ T1] software_resume+0xf4/0x19c [ 22.991934][ T1] software_resume_initcall+0x34/0x78 [ 22.991934][ T1] do_one_initcall+0xe8/0x370 [ 22.991934][ T1] do_initcall_level+0xc8/0x19c [ 22.991934][ T1] do_initcalls+0x70/0xc0 [ 22.991934][ T1] do_basic_setup+0x1c/0x28 [ 22.991934][ T1] kernel_init_freeable+0xe0/0x148 [ 22.991934][ T1] kernel_init+0x20/0x1a8 [ 22.991934][ T1] ret_from_fork+0x10/0x20 [ 22.991934][ T1] Code: a9400a61 f94013e0 f9438923 f9400a64 (b85fc110) Co-developed-by: Jeson Gao <jeson.gao@unisoc.com> Signed-off-by: Jeson Gao <jeson.gao@unisoc.com> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> [catalin.marinas@arm.com: commit log updated by Mark Rutland] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

When creating a synthetic event based on an existing synthetic event that had a stacktrace field and the new synthetic event used that field a kernel crash occurred: ~# cd /sys/kernel/tracing ~# echo 's:stack unsigned long stack[];' > dynamic_events ~# echo 'hist:keys=prev_pid:s0=common_stacktrace if prev_state & 3' >> events/sched/sched_switch/trigger ~# echo 'hist:keys=next_pid:s1=$s0:onmatch(sched.sched_switch).trace(stack,$s1)' >> events/sched/sched_switch/trigger The above creates a synthetic event that takes a stacktrace when a task schedules out in a non-running state and passes that stacktrace to the sched_switch event when that task schedules back in. It triggers the "stack" synthetic event that has a stacktrace as its field (called "stack"). ~# echo 's:syscall_stack s64 id; unsigned long stack[];' >> dynamic_events ~# echo 'hist:keys=common_pid:s2=stack' >> events/synthetic/stack/trigger ~# echo 'hist:keys=common_pid:s3=$s2,i0=id:onmatch(synthetic.stack).trace(syscall_stack,$i0,$s3)' >> events/raw_syscalls/sys_exit/trigger The above makes another synthetic event called "syscall_stack" that attaches the first synthetic event (stack) to the sys_exit trace event and records the stacktrace from the stack event with the id of the system call that is exiting. When enabling this event (or using it in a historgram): ~# echo 1 > events/synthetic/syscall_stack/enable Produces a kernel crash! BUG: unable to handle page fault for address: 0000000000400010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 6 UID: 0 PID: 1257 Comm: bash Not tainted 6.16.3+deb14-amd64 #1 PREEMPT(lazy) Debian 6.16.3-1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:trace_event_raw_event_synth+0x90/0x380 Code: c5 00 00 00 00 85 d2 0f 84 e1 00 00 00 31 db eb 34 0f 1f 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 <49> 8b 04 24 48 83 c3 01 8d 0c c5 08 00 00 00 01 cd 41 3b 5d 40 0f RSP: 0018:ffffd2670388f958 EFLAGS: 00010202 RAX: ffff8ba1065cc100 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: fffff266ffda7b90 RDI: ffffd2670388f9b0 RBP: 0000000000000010 R08: ffff8ba104e76000 R09: ffffd2670388fa50 R10: ffff8ba102dd42e0 R11: ffffffff9a908970 R12: 0000000000400010 R13: ffff8ba10a246400 R14: ffff8ba10a710220 R15: fffff266ffda7b90 FS: 00007fa3bc63f740(0000) GS:ffff8ba2e0f48000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000400010 CR3: 0000000107f9e003 CR4: 0000000000172ef0 Call Trace: <TASK> ? __tracing_map_insert+0x208/0x3a0 action_trace+0x67/0x70 event_hist_trigger+0x633/0x6d0 event_triggers_call+0x82/0x130 trace_event_buffer_commit+0x19d/0x250 trace_event_raw_event_sys_exit+0x62/0xb0 syscall_exit_work+0x9d/0x140 do_syscall_64+0x20a/0x2f0 ? trace_event_raw_event_sched_switch+0x12b/0x170 ? save_fpregs_to_fpstate+0x3e/0x90 ? _raw_spin_unlock+0xe/0x30 ? finish_task_switch.isra.0+0x97/0x2c0 ? __rseq_handle_notify_resume+0xad/0x4c0 ? __schedule+0x4b8/0xd00 ? restore_fpregs_from_fpstate+0x3c/0x90 ? switch_fpu_return+0x5b/0xe0 ? do_syscall_64+0x1ef/0x2f0 ? do_fault+0x2e9/0x540 ? __handle_mm_fault+0x7d1/0xf70 ? count_memcg_events+0x167/0x1d0 ? handle_mm_fault+0x1d7/0x2e0 ? do_user_addr_fault+0x2c3/0x7f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The reason is that the stacktrace field is not labeled as such, and is treated as a normal field and not as a dynamic event that it is. In trace_event_raw_event_synth() the event is field is still treated as a dynamic array, but the retrieval of the data is considered a normal field, and the reference is just the meta data: // Meta data is retrieved instead of a dynamic array str_val = (char *)(long)var_ref_vals[val_idx]; // Then when it tries to process it: len = *((unsigned long *)str_val) + 1; It triggers a kernel page fault. To fix this, first when defining the fields of the first synthetic event, set the filter type to FILTER_STACKTRACE. This is used later by the second synthetic event to know that this field is a stacktrace. When creating the field of the new synthetic event, have it use this FILTER_STACKTRACE to know to create a stacktrace field to copy the stacktrace into. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260122194824.6905a38e@gandalf.local.home Fixes: 00cf3d6 ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Oneway transactions sent to frozen targets via binder_proc_transaction() return a BR_TRANSACTION_PENDING_FROZEN error but they are still treated as successful since the target is expected to thaw at some point. It is then not safe to access 't' after BR_TRANSACTION_PENDING_FROZEN errors as the transaction could have been consumed by the now thawed target. This is the case for binder_netlink_report() which derreferences 't' after a pending frozen error, as pointed out by the following KASAN report: ================================================================== BUG: KASAN: slab-use-after-free in binder_netlink_report.isra.0+0x694/0x6c8 Read of size 8 at addr ffff00000f98ba38 by task binder-util/522 CPU: 4 UID: 0 PID: 522 Comm: binder-util Not tainted 6.19.0-rc6-00015-gc03e9c42ae8f #1 PREEMPT Hardware name: linux,dummy-virt (DT) Call trace: binder_netlink_report.isra.0+0x694/0x6c8 binder_transaction+0x66e4/0x79b8 binder_thread_write+0xab4/0x4440 binder_ioctl+0x1fd4/0x2940 [...] Allocated by task 522: __kmalloc_cache_noprof+0x17c/0x50c binder_transaction+0x584/0x79b8 binder_thread_write+0xab4/0x4440 binder_ioctl+0x1fd4/0x2940 [...] Freed by task 488: kfree+0x1d0/0x420 binder_free_transaction+0x150/0x234 binder_thread_read+0x2d08/0x3ce4 binder_ioctl+0x488/0x2940 [...] ================================================================== Instead, make a transaction copy so the data can be safely accessed by binder_netlink_report() after a pending frozen error. While here, add a comment about not using t->buffer in binder_netlink_report(). Cc: stable@vger.kernel.org Fixes: 6374034 ("binder: introduce transaction reports via netlink") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260122180203.1502637-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…alled from deferred_grow_zone() Commit 3acb913 ("mm/mm_init: use deferred_init_memmap_chunk() in deferred_grow_zone()") made deferred_grow_zone() call deferred_init_memmap_chunk() within a pgdat_resize_lock() critical section with irqs disabled. It did check for irqs_disabled() in deferred_init_memmap_chunk() to avoid calling cond_resched(). For a PREEMPT_RT kernel build, however, spin_lock_irqsave() does not disable interrupt but rcu_read_lock() is called. This leads to the following bug report. BUG: sleeping function called from invalid context at mm/mm_init.c:2091 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 3 locks held by swapper/0/1: #0: ffff80008471b7a0 (sched_domains_mutex){+.+.}-{4:4}, at: sched_domains_mutex_lock+0x28/0x40 #1: ffff003bdfffef48 (&pgdat->node_size_lock){+.+.}-{3:3}, at: deferred_grow_zone+0x140/0x278 #2: ffff800084acf600 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x1b4/0x408 CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.19.0-rc6-test #1 PREEMPT_{RT,(full) } Tainted: [W]=WARN Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0xdc/0xf8 dump_stack+0x1c/0x28 __might_resched+0x384/0x530 deferred_init_memmap_chunk+0x560/0x688 deferred_grow_zone+0x190/0x278 _deferred_grow_zone+0x18/0x30 get_page_from_freelist+0x780/0xf78 __alloc_frozen_pages_noprof+0x1dc/0x348 alloc_slab_page+0x30/0x110 allocate_slab+0x98/0x2a0 new_slab+0x4c/0x80 ___slab_alloc+0x5a4/0x770 __slab_alloc.constprop.0+0x88/0x1e0 __kmalloc_node_noprof+0x2c0/0x598 __sdt_alloc+0x3b8/0x728 build_sched_domains+0xe0/0x1260 sched_init_domains+0x14c/0x1c8 sched_init_smp+0x9c/0x1d0 kernel_init_freeable+0x218/0x358 kernel_init+0x28/0x208 ret_from_fork+0x10/0x20 Fix it adding a new argument to deferred_init_memmap_chunk() to explicitly tell it if cond_resched() is allowed or not instead of relying on some current state information which may vary depending on the exact kernel configuration options that are enabled. Link: https://lkml.kernel.org/r/20260122184343.546627-1-longman@redhat.com Fixes: 3acb913 ("mm/mm_init: use deferred_init_memmap_chunk() in deferred_grow_zone()") Signed-off-by: Waiman Long <longman@redhat.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: <stable@vger.kernrl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Fix a use-after-free which happens due to enslave failure after the new slave has been added to the array. Since the new slave can be used for Tx immediately, we can use it after it has been freed by the enslave error cleanup path which frees the allocated slave memory. Slave update array is supposed to be called last when further enslave failures are not expected. Move it after xdp setup to avoid any problems. It is very easy to reproduce the problem with a simple xdp_pass prog: ip l add bond1 type bond mode balance-xor ip l set bond1 up ip l set dev bond1 xdp object xdp_pass.o sec xdp_pass ip l add dumdum type dummy Then run in parallel: while :; do ip l set dumdum master bond1 1>/dev/null 2>&1; done; mausezahn bond1 -a own -b rand -A rand -B 1.1.1.1 -c 0 -t tcp "dp=1-1023, flags=syn" The crash happens almost immediately: [ 605.602850] Oops: general protection fault, probably for non-canonical address 0xe0e6fc2460000137: 0000 [#1] SMP KASAN NOPTI [ 605.602916] KASAN: maybe wild-memory-access in range [0x07380123000009b8-0x07380123000009bf] [ 605.602946] CPU: 0 UID: 0 PID: 2445 Comm: mausezahn Kdump: loaded Tainted: G B 6.19.0-rc6+ #21 PREEMPT(voluntary) [ 605.602979] Tainted: [B]=BAD_PAGE [ 605.602998] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 605.603032] RIP: 0010:netdev_core_pick_tx+0xcd/0x210 [ 605.603063] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 3e 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 6b 08 49 8d 7d 30 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 25 01 00 00 49 8b 45 30 4c 89 e2 48 89 ee 48 89 [ 605.603111] RSP: 0018:ffff88817b9af348 EFLAGS: 00010213 [ 605.603145] RAX: dffffc0000000000 RBX: ffff88817d28b420 RCX: 0000000000000000 [ 605.603172] RDX: 00e7002460000137 RSI: 0000000000000008 RDI: 07380123000009be [ 605.603199] RBP: ffff88817b541a00 R08: 0000000000000001 R09: fffffbfff3ed8c0c [ 605.603226] R10: ffffffff9f6c6067 R11: 0000000000000001 R12: 0000000000000000 [ 605.603253] R13: 073801230000098e R14: ffff88817d28b448 R15: ffff88817b541a84 [ 605.603286] FS: 00007f6570ef67c0(0000) GS:ffff888221dfa000(0000) knlGS:0000000000000000 [ 605.603319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 605.603343] CR2: 00007f65712fae40 CR3: 000000011371b000 CR4: 0000000000350ef0 [ 605.603373] Call Trace: [ 605.603392] <TASK> [ 605.603410] __dev_queue_xmit+0x448/0x32a0 [ 605.603434] ? __pfx_vprintk_emit+0x10/0x10 [ 605.603461] ? __pfx_vprintk_emit+0x10/0x10 [ 605.603484] ? __pfx___dev_queue_xmit+0x10/0x10 [ 605.603507] ? bond_start_xmit+0xbfb/0xc20 [bonding] [ 605.603546] ? _printk+0xcb/0x100 [ 605.603566] ? __pfx__printk+0x10/0x10 [ 605.603589] ? bond_start_xmit+0xbfb/0xc20 [bonding] [ 605.603627] ? add_taint+0x5e/0x70 [ 605.603648] ? add_taint+0x2a/0x70 [ 605.603670] ? end_report.cold+0x51/0x75 [ 605.603693] ? bond_start_xmit+0xbfb/0xc20 [bonding] [ 605.603731] bond_start_xmit+0x623/0xc20 [bonding] Fixes: 9e2ee5c ("net, bonding: Add XDP support to the bonding driver") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reported-by: Chen Zhen <chenzhen126@huawei.com> Closes: https://lore.kernel.org/netdev/fae17c21-4940-5605-85b2-1d5e17342358@huawei.com/ CC: Jussi Maki <joamaki@gmail.com> CC: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20260123120659.571187-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>

There was a lockdep warning in sprd_gpio: [ 6.258269][T329@C6] [ BUG: Invalid wait context ] [ 6.258270][T329@C6] 6.18.0-android17-0-g30527ad7aaae-ab00009-4k #1 Tainted: G W OE [ 6.258272][T329@C6] ----------------------------- [ 6.258273][T329@C6] modprobe/329 is trying to lock: [ 6.258275][T329@C6] ffffff8081c91690 (&sprd_gpio->lock){....}-{3:3}, at: sprd_gpio_irq_unmask+0x4c/0xa4 [gpio_sprd] [ 6.258282][T329@C6] other info that might help us debug this: [ 6.258283][T329@C6] context-{5:5} [ 6.258285][T329@C6] 3 locks held by modprobe/329: [ 6.258286][T329@C6] #0: ffffff808baca108 (&dev->mutex){....}-{4:4}, at: __driver_attach+0xc4/0x204 [ 6.258295][T329@C6] #1: ffffff80965e7240 (request_class#4){+.+.}-{4:4}, at: __setup_irq+0x1cc/0x82c [ 6.258304][T329@C6] #2: ffffff80965e70c8 (lock_class#4){....}-{2:2}, at: __setup_irq+0x21c/0x82c [ 6.258313][T329@C6] stack backtrace: [ 6.258314][T329@C6] CPU: 6 UID: 0 PID: 329 Comm: modprobe Tainted: G W OE 6.18.0-android17-0-g30527ad7aaae-ab00009-4k #1 PREEMPT 3ad5b0f45741a16e5838da790706e16ceb6717df [ 6.258316][T329@C6] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 6.258317][T329@C6] Hardware name: Unisoc UMS9632-base Board (DT) [ 6.258318][T329@C6] Call trace: [ 6.258318][T329@C6] show_stack+0x20/0x30 (C) [ 6.258321][T329@C6] __dump_stack+0x28/0x3c [ 6.258324][T329@C6] dump_stack_lvl+0xac/0xf0 [ 6.258326][T329@C6] dump_stack+0x18/0x3c [ 6.258329][T329@C6] __lock_acquire+0x824/0x2c28 [ 6.258331][T329@C6] lock_acquire+0x148/0x2cc [ 6.258333][T329@C6] _raw_spin_lock_irqsave+0x6c/0xb4 [ 6.258334][T329@C6] sprd_gpio_irq_unmask+0x4c/0xa4 [gpio_sprd 814535e93c6d8e0853c45c02eab0fa88a9da6487] [ 6.258337][T329@C6] irq_startup+0x238/0x350 [ 6.258340][T329@C6] __setup_irq+0x504/0x82c [ 6.258342][T329@C6] request_threaded_irq+0x118/0x184 [ 6.258344][T329@C6] devm_request_threaded_irq+0x94/0x120 [ 6.258347][T329@C6] sc8546_init_irq+0x114/0x170 [sc8546_charger 223586ccafc27439f7db4f95b0c8e6e882349a99] [ 6.258352][T329@C6] sc8546_charger_probe+0x53c/0x5a0 [sc8546_charger 223586ccafc27439f7db4f95b0c8e6e882349a99] [ 6.258358][T329@C6] i2c_device_probe+0x2c8/0x350 [ 6.258361][T329@C6] really_probe+0x1a8/0x46c [ 6.258363][T329@C6] __driver_probe_device+0xa4/0x10c [ 6.258366][T329@C6] driver_probe_device+0x44/0x1b4 [ 6.258369][T329@C6] __driver_attach+0xd0/0x204 [ 6.258371][T329@C6] bus_for_each_dev+0x10c/0x168 [ 6.258373][T329@C6] driver_attach+0x2c/0x3c [ 6.258376][T329@C6] bus_add_driver+0x154/0x29c [ 6.258378][T329@C6] driver_register+0x70/0x10c [ 6.258381][T329@C6] i2c_register_driver+0x48/0xc8 [ 6.258384][T329@C6] init_module+0x28/0xfd8 [sc8546_charger 223586ccafc27439f7db4f95b0c8e6e882349a99] [ 6.258389][T329@C6] do_one_initcall+0x128/0x42c [ 6.258392][T329@C6] do_init_module+0x60/0x254 [ 6.258395][T329@C6] load_module+0x1054/0x1220 [ 6.258397][T329@C6] __arm64_sys_finit_module+0x240/0x35c [ 6.258400][T329@C6] invoke_syscall+0x60/0xec [ 6.258402][T329@C6] el0_svc_common+0xb0/0xe4 [ 6.258405][T329@C6] do_el0_svc+0x24/0x30 [ 6.258407][T329@C6] el0_svc+0x54/0x1c4 [ 6.258409][T329@C6] el0t_64_sync_handler+0x68/0xdc [ 6.258411][T329@C6] el0t_64_sync+0x1c4/0x1c8 This is because the spin_lock would change to rt_mutex in PREEMPT_RT, however the sprd_gpio->lock would use in hard-irq, this is unsafe. So change the spin_lock_t to raw_spin_lock_t to use the spinlock in hard-irq. Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20260126094209.9855-1-xuewen.yan@unisoc.com [Bartosz: tweaked the commit message] Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

Add NULL pointer checks in ice_vsi_set_napi_queues() to prevent crashes during resume from suspend when rings[q_idx]->q_vector is NULL. Tested adaptor: 60:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-XXV for SFP [8086:159b] (rev 02) Subsystem: Intel Corporation Ethernet Network Adapter E810-XXV-2 [8086:4003] SR-IOV state: both disabled and enabled can reproduce this issue. kernel version: v6.18 Reproduce steps: Boot up and execute suspend like systemctl suspend or rtcwake. Log: <1>[ 231.443607] BUG: kernel NULL pointer dereference, address: 0000000000000040 <1>[ 231.444052] #PF: supervisor read access in kernel mode <1>[ 231.444484] #PF: error_code(0x0000) - not-present page <6>[ 231.444913] PGD 0 P4D 0 <4>[ 231.445342] Oops: Oops: 0000 [#1] SMP NOPTI <4>[ 231.446635] RIP: 0010:netif_queue_set_napi+0xa/0x170 <4>[ 231.447067] Code: 31 f6 31 ff c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 c9 74 0b <48> 83 79 30 00 0f 84 39 01 00 00 55 41 89 d1 49 89 f8 89 f2 48 89 <4>[ 231.447513] RSP: 0018:ffffcc780fc078c0 EFLAGS: 00010202 <4>[ 231.447961] RAX: ffff8b848ca30400 RBX: ffff8b848caf2028 RCX: 0000000000000010 <4>[ 231.448443] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b848dbd4000 <4>[ 231.448896] RBP: ffffcc780fc078e8 R08: 0000000000000000 R09: 0000000000000000 <4>[ 231.449345] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 <4>[ 231.449817] R13: ffff8b848dbd4000 R14: ffff8b84833390c8 R15: 0000000000000000 <4>[ 231.450265] FS: 00007c7b29e9d740(0000) GS:ffff8b8c068e2000(0000) knlGS:0000000000000000 <4>[ 231.450715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 231.451179] CR2: 0000000000000040 CR3: 000000030626f004 CR4: 0000000000f72ef0 <4>[ 231.451629] PKRU: 55555554 <4>[ 231.452076] Call Trace: <4>[ 231.452549] <TASK> <4>[ 231.452996] ? ice_vsi_set_napi_queues+0x4d/0x110 [ice] <4>[ 231.453482] ice_resume+0xfd/0x220 [ice] <4>[ 231.453977] ? __pfx_pci_pm_resume+0x10/0x10 <4>[ 231.454425] pci_pm_resume+0x8c/0x140 <4>[ 231.454872] ? __pfx_pci_pm_resume+0x10/0x10 <4>[ 231.455347] dpm_run_callback+0x5f/0x160 <4>[ 231.455796] ? dpm_wait_for_superior+0x107/0x170 <4>[ 231.456244] device_resume+0x177/0x270 <4>[ 231.456708] dpm_resume+0x209/0x2f0 <4>[ 231.457151] dpm_resume_end+0x15/0x30 <4>[ 231.457596] suspend_devices_and_enter+0x1da/0x2b0 <4>[ 231.458054] enter_state+0x10e/0x570 Add defensive checks for both the ring pointer and its q_vector before dereferencing, allowing the system to resume successfully even when q_vectors are unmapped. Fixes: 2a5dc09 ("ice: move netif_queue_set_napi to rtnl-protected sections") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

When deleting TC steering flows, iterate only over actual devcom peers instead of assuming all possible ports exist. This avoids touching non-existent peers and ensures cleanup is limited to devices the driver is currently connected to. BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 133c8a067 P4D 0 Oops: Oops: 0002 [#1] SMP CPU: 19 UID: 0 PID: 2169 Comm: tc Not tainted 6.18.0+ #156 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_tc_del_fdb_peers_flow+0xbe/0x200 [mlx5_core] Code: 00 00 a8 08 74 a8 49 8b 46 18 f6 c4 02 74 9f 4c 8d bf a0 12 00 00 4c 89 ff e8 0e e7 96 e1 49 8b 44 24 08 49 8b 0c 24 4c 89 ff <48> 89 41 08 48 89 08 49 89 2c 24 49 89 5c 24 08 e8 7d ce 96 e1 49 RSP: 0018:ff11000143867528 EFLAGS: 00010246 RAX: 0000000000000000 RBX: dead000000000122 RCX: 0000000000000000 RDX: ff11000143691580 RSI: ff110001026e5000 RDI: ff11000106f3d2a0 RBP: dead000000000100 R08: 00000000000003fd R09: 0000000000000002 R10: ff11000101c75690 R11: ff1100085faea178 R12: ff11000115f0ae78 R13: 0000000000000000 R14: ff11000115f0a800 R15: ff11000106f3d2a0 FS: 00007f35236bf740(0000) GS:ff110008dc809000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000157a01001 CR4: 0000000000373eb0 Call Trace: <TASK> mlx5e_tc_del_flow+0x46/0x270 [mlx5_core] mlx5e_flow_put+0x25/0x50 [mlx5_core] mlx5e_delete_flower+0x2a6/0x3e0 [mlx5_core] tc_setup_cb_reoffload+0x20/0x80 fl_reoffload+0x26f/0x2f0 [cls_flower] ? mlx5e_tc_reoffload_flows_work+0xc0/0xc0 [mlx5_core] ? mlx5e_tc_reoffload_flows_work+0xc0/0xc0 [mlx5_core] tcf_block_playback_offloads+0x9e/0x1c0 tcf_block_unbind+0x7b/0xd0 tcf_block_setup+0x186/0x1d0 tcf_block_offload_cmd.isra.0+0xef/0x130 tcf_block_offload_unbind+0x43/0x70 __tcf_block_put+0x85/0x160 ingress_destroy+0x32/0x110 [sch_ingress] __qdisc_destroy+0x44/0x100 qdisc_graft+0x22b/0x610 tc_get_qdisc+0x183/0x4d0 rtnetlink_rcv_msg+0x2d7/0x3d0 ? rtnl_calcit.isra.0+0x100/0x100 netlink_rcv_skb+0x53/0x100 netlink_unicast+0x249/0x320 ? __alloc_skb+0x102/0x1f0 netlink_sendmsg+0x1e3/0x420 __sock_sendmsg+0x38/0x60 ____sys_sendmsg+0x1ef/0x230 ? copy_msghdr_from_user+0x6c/0xa0 ___sys_sendmsg+0x7f/0xc0 ? ___sys_recvmsg+0x8a/0xc0 ? __sys_sendto+0x119/0x180 __sys_sendmsg+0x61/0xb0 do_syscall_64+0x55/0x640 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f35238bb764 Code: 15 b9 86 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00 f3 0f 1e fa 80 3d e5 08 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55 RSP: 002b:00007ffed4c35638 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000055a2efcc75e0 RCX: 00007f35238bb764 RDX: 0000000000000000 RSI: 00007ffed4c356a0 RDI: 0000000000000003 RBP: 00007ffed4c35710 R08: 0000000000000010 R09: 00007f3523984b20 R10: 0000000000000004 R11: 0000000000000202 R12: 00007ffed4c35790 R13: 000000006947df8f R14: 000055a2efcc75e0 R15: 00007ffed4c35780 Fixes: 9be6c21 ("net/mlx5e: Handle offloads flows per peer") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1769411695-18820-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

During device unmapping (triggered by module unload or explicit unmap), a refcount underflow occurs causing a use-after-free warning: [14747.574913] ------------[ cut here ]------------ [14747.574916] refcount_t: underflow; use-after-free. [14747.574917] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x55/0x90, CPU#9: kworker/9:1/378 [14747.574924] Modules linked in: rnbd_client(-) rtrs_client rnbd_server rtrs_server rtrs_core ... [14747.574998] CPU: 9 UID: 0 PID: 378 Comm: kworker/9:1 Tainted: G O N 6.19.0-rc3lblk-fnext+ #42 PREEMPT(voluntary) [14747.575005] Workqueue: rnbd_clt_wq unmap_device_work [rnbd_client] [14747.575010] RIP: 0010:refcount_warn_saturate+0x55/0x90 [14747.575037] Call Trace: [14747.575038] <TASK> [14747.575038] rnbd_clt_unmap_device+0x170/0x1d0 [rnbd_client] [14747.575044] process_one_work+0x211/0x600 [14747.575052] worker_thread+0x184/0x330 [14747.575055] ? __pfx_worker_thread+0x10/0x10 [14747.575058] kthread+0x10d/0x250 [14747.575062] ? __pfx_kthread+0x10/0x10 [14747.575066] ret_from_fork+0x319/0x390 [14747.575069] ? __pfx_kthread+0x10/0x10 [14747.575072] ret_from_fork_asm+0x1a/0x30 [14747.575083] </TASK> [14747.575096] ---[ end trace 0000000000000000 ]--- Befor this patch :- The bug is a double kobject_put() on dev->kobj during device cleanup. Kobject Lifecycle: kobject_init_and_add() sets kobj.kref = 1 (initialization) kobject_put() sets kobj.kref = 0 (should be called once) * Before this patch: rnbd_clt_unmap_device() rnbd_destroy_sysfs() kobject_del(&dev->kobj) [remove from sysfs] kobject_put(&dev->kobj) PUT #1 (WRONG!) kref: 1 to 0 rnbd_dev_release() kfree(dev) [DEVICE FREED!] rnbd_destroy_gen_disk() [use-after-free!] rnbd_clt_put_dev() refcount_dec_and_test(&dev->refcount) kobject_put(&dev->kobj) PUT #2 (UNDERFLOW!) kref: 0 to -1 [WARNING!] The first kobject_put() in rnbd_destroy_sysfs() prematurely frees the device via rnbd_dev_release(), then the second kobject_put() in rnbd_clt_put_dev() causes refcount underflow. * After this patch :- Remove kobject_put() from rnbd_destroy_sysfs(). This function should only remove sysfs visibility (kobject_del), not manage object lifetime. Call Graph (FIXED): rnbd_clt_unmap_device() rnbd_destroy_sysfs() kobject_del(&dev->kobj) [remove from sysfs only] [kref unchanged: 1] rnbd_destroy_gen_disk() [device still valid] rnbd_clt_put_dev() refcount_dec_and_test(&dev->refcount) kobject_put(&dev->kobj) ONLY PUT (CORRECT!) kref: 1 to 0 [BALANCED] rnbd_dev_release() kfree(dev) [CLEAN DESTRUCTION] This follows the kernel pattern where sysfs removal (kobject_del) is separate from object destruction (kobject_put). Fixes: 581cf83 ("block: rnbd: add .release to rnbd_dev_ktype") Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Since the commit 25c6a5a ("net: phy: micrel: Dynamically control external clock of KSZ PHY"), the clock of Micrel PHY has been enabled by phy_driver::resume() and disabled by phy_driver::suspend(). However, devm_clk_get_optional_enabled() is used in kszphy_probe(), so the clock will automatically be disabled when the device is unbound from the bus. Therefore, this could cause the clock to be disabled twice, resulting in clk driver warnings. For example, this issue can be reproduced on i.MX6ULL platform, and we can see the following logs when removing the FEC MAC drivers. $ echo 2188000.ethernet > /sys/bus/platform/drivers/fec/unbind $ echo 20b4000.ethernet > /sys/bus/platform/drivers/fec/unbind [ 109.758207] ------------[ cut here ]------------ [ 109.758240] WARNING: drivers/clk/clk.c:1188 at clk_core_disable+0xb4/0xd0, CPU#0: sh/639 [ 109.771011] enet2_ref already disabled [ 109.793359] Call trace: [ 109.822006] clk_core_disable from clk_disable+0x28/0x34 [ 109.827340] clk_disable from clk_disable_unprepare+0xc/0x18 [ 109.833029] clk_disable_unprepare from devm_clk_release+0x1c/0x28 [ 109.839241] devm_clk_release from devres_release_all+0x98/0x100 [ 109.845278] devres_release_all from device_unbind_cleanup+0xc/0x70 [ 109.851571] device_unbind_cleanup from device_release_driver_internal+0x1a4/0x1f4 [ 109.859170] device_release_driver_internal from bus_remove_device+0xbc/0xe4 [ 109.866243] bus_remove_device from device_del+0x140/0x458 [ 109.871757] device_del from phy_mdio_device_remove+0xc/0x24 [ 109.877452] phy_mdio_device_remove from mdiobus_unregister+0x40/0xac [ 109.883918] mdiobus_unregister from fec_enet_mii_remove+0x40/0x78 [ 109.890125] fec_enet_mii_remove from fec_drv_remove+0x4c/0x158 [ 109.896076] fec_drv_remove from device_release_driver_internal+0x17c/0x1f4 [ 109.962748] WARNING: drivers/clk/clk.c:1047 at clk_core_unprepare+0xfc/0x13c, CPU#0: sh/639 [ 109.975805] enet2_ref already unprepared [ 110.002866] Call trace: [ 110.031758] clk_core_unprepare from clk_unprepare+0x24/0x2c [ 110.037440] clk_unprepare from devm_clk_release+0x1c/0x28 [ 110.042957] devm_clk_release from devres_release_all+0x98/0x100 [ 110.048989] devres_release_all from device_unbind_cleanup+0xc/0x70 [ 110.055280] device_unbind_cleanup from device_release_driver_internal+0x1a4/0x1f4 [ 110.062877] device_release_driver_internal from bus_remove_device+0xbc/0xe4 [ 110.069950] bus_remove_device from device_del+0x140/0x458 [ 110.075469] device_del from phy_mdio_device_remove+0xc/0x24 [ 110.081165] phy_mdio_device_remove from mdiobus_unregister+0x40/0xac [ 110.087632] mdiobus_unregister from fec_enet_mii_remove+0x40/0x78 [ 110.093836] fec_enet_mii_remove from fec_drv_remove+0x4c/0x158 [ 110.099782] fec_drv_remove from device_release_driver_internal+0x17c/0x1f4 After analyzing the process of removing the FEC driver, as shown below, it can be seen that the clock was disabled twice by the PHY driver. fec_drv_remove() --> fec_enet_close() --> phy_stop() --> phy_suspend() --> kszphy_suspend() #1 The clock is disabled --> fec_enet_mii_remove() --> mdiobus_unregister() --> phy_mdio_device_remove() --> device_del() --> devm_clk_release() #2 The clock is disabled again Therefore, devm_clk_get_optional() is used to fix the above issue. And to avoid the issue mentioned by the commit 9853294 ("net: phy: micrel: use devm_clk_get_optional_enabled for the rmii-ref clock"), the clock is enabled by clk_prepare_enable() to get the correct clock rate. Fixes: 25c6a5a ("net: phy: micrel: Dynamically control external clock of KSZ PHY") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260126081544.983517-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

HCA CAP structure is allocated in mlx5_hca_caps_alloc(). mlx5_mdev_init() mlx5_hca_caps_alloc() And HCA CAP is read from the device in mlx5_init_one(). The vhca_id's debugfs file is published even before above two operations are done. Due to this when user reads the vhca id before the initialization, following call trace is observed. Fix this by deferring debugfs publication until the HCA CAP is allocated and read from the device. BUG: kernel NULL pointer dereference, address: 0000000000000004 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 23 UID: 0 PID: 6605 Comm: cat Kdump: loaded Not tainted 6.18.0-rc7-sf+ #110 PREEMPT(full) Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016 RIP: 0010:vhca_id_show+0x17/0x30 [mlx5_core] Code: cb 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 47 70 48 c7 c6 45 f0 12 c1 48 8b 80 70 03 00 00 <8b> 50 04 0f ca 0f b7 d2 e8 8c 82 47 cb 31 c0 c3 cc cc cc cc 0f 1f RSP: 0018:ffffd37f4f337d40 EFLAGS: 00010203 RAX: 0000000000000000 RBX: ffff8f18445c9b40 RCX: 0000000000000001 RDX: ffff8f1109825180 RSI: ffffffffc112f045 RDI: ffff8f18445c9b40 RBP: 0000000000000000 R08: 0000645eac0d2928 R09: 0000000000000006 R10: ffffd37f4f337d48 R11: 0000000000000000 R12: ffffd37f4f337dd8 R13: ffffd37f4f337db0 R14: ffff8f18445c9b68 R15: 0000000000000001 FS: 00007f3eea099580(0000) GS:ffff8f2090f1f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000004 CR3: 00000008b64e4006 CR4: 00000000003726f0 Call Trace: <TASK> seq_read_iter+0x11f/0x4f0 ? _raw_spin_unlock+0x15/0x30 ? do_anonymous_page+0x104/0x810 seq_read+0xf6/0x120 ? srso_alias_untrain_ret+0x1/0x10 full_proxy_read+0x5c/0x90 vfs_read+0xad/0x320 ? handle_mm_fault+0x1ab/0x290 ksys_read+0x52/0xd0 do_syscall_64+0x61/0x11e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: dd3dd72 ("net/mlx5: Expose vhca_id to debugfs") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1769503961-124173-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Fix race condition where PTP periodic work runs while VSI is being rebuilt, accessing NULL vsi->rx_rings. The sequence was: 1. ice_ptp_prepare_for_reset() cancels PTP work 2. ice_ptp_rebuild() immediately queues PTP work 3. VSI rebuild happens AFTER ice_ptp_rebuild() 4. PTP work runs and accesses NULL vsi->rx_rings Fix: Keep PTP work cancelled during rebuild, only queue it after VSI rebuild completes in ice_rebuild(). Added ice_ptp_queue_work() helper function to encapsulate the logic for queuing PTP work, ensuring it's only queued when PTP is supported and the state is ICE_PTP_READY. Error log: [ 121.392544] ice 0000:60:00.1: PTP reset successful [ 121.392692] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 121.392712] #PF: supervisor read access in kernel mode [ 121.392720] #PF: error_code(0x0000) - not-present page [ 121.392727] PGD 0 [ 121.392734] Oops: Oops: 0000 [#1] SMP NOPTI [ 121.392746] CPU: 8 UID: 0 PID: 1005 Comm: ice-ptp-0000:60 Tainted: G S 6.19.0-rc6+ #4 PREEMPT(voluntary) [ 121.392761] Tainted: [S]=CPU_OUT_OF_SPEC [ 121.392773] RIP: 0010:ice_ptp_update_cached_phctime+0xbf/0x150 [ice] [ 121.393042] Call Trace: [ 121.393047] <TASK> [ 121.393055] ice_ptp_periodic_work+0x69/0x180 [ice] [ 121.393202] kthread_worker_fn+0xa2/0x260 [ 121.393216] ? __pfx_ice_ptp_periodic_work+0x10/0x10 [ice] [ 121.393359] ? __pfx_kthread_worker_fn+0x10/0x10 [ 121.393371] kthread+0x10d/0x230 [ 121.393382] ? __pfx_kthread+0x10/0x10 [ 121.393393] ret_from_fork+0x273/0x2b0 [ 121.393407] ? __pfx_kthread+0x10/0x10 [ 121.393417] ret_from_fork_asm+0x1a/0x30 [ 121.393432] </TASK> Fixes: 803bef8 ("ice: factor out ice_ptp_rebuild_owner()") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

This reverts commit d930ffa. According to Thomas, commit d930ffa ("drm/gma500: use drm_crtc_vblank_crtc()") breaks the driver with a NULL-ptr oops on startup. This is because the IRQ initialization in gma_irq_install() now uses CRTCs that are only allocated later in psb_modeset_init(). Stack trace is below. Revert. Go back to the drawing board. [ 65.831766] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 [#1] SMP KASAN NOPTI [ 65.832114] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f] [ 65.832232] CPU: 1 UID: 0 PID: 296 Comm: (udev-worker) Tainted: G E 6.19.0-rc6-1-default+ #4622 PREEMPT(voluntary) [ 65.832376] Tainted: [E]=UNSIGNED_MODULE [ 65.832448] Hardware name: /DN2800MT, BIOS MTCDT10N.86A.0164.2012.1213.1024 12/13/2012 [ 65.832543] RIP: 0010:drm_crtc_vblank_crtc+0x24/0xd0 [ 65.832652] Code: 90 90 90 90 90 90 0f 1f 44 00 00 48 89 f8 48 81 c7 18 01 00 00 48 83 ec 10 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 <0f> b6 14 11 84 d2 74 05 80 fa 03 7e 58 48 89 c6 8b 90 18 01 00 00 [ 65.832820] RSP: 0018:ffff88800c8f7688 EFLAGS: 00010006 [ 65.832919] RAX: fffffffffffffff0 RBX: ffff88800fff4928 RCX: 0000000000000021 [ 65.833011] RDX: dffffc0000000000 RSI: ffffc90000978130 RDI: 0000000000000108 [ 65.833107] RBP: ffffed1001ffea03 R08: 0000000000000000 R09: ffffed100191eec7 [ 65.833199] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880014480c8 [ 65.833289] R13: dffffc0000000000 R14: fffffffffffffff0 R15: ffff88800fff4000 [ 65.833380] FS: 00007fe53d4d5d80(0000) GS:ffff888148dd8000(0000) knlGS:0000000000000000 [ 65.833488] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 65.833575] CR2: 00007fac707420b8 CR3: 000000000ebd1000 CR4: 00000000000006f0 [ 65.833668] Call Trace: [ 65.833735] <TASK> [ 65.833808] gma_irq_preinstall+0x190/0x3e0 [gma500_gfx] [ 65.834054] gma_irq_install+0xb2/0x240 [gma500_gfx] [ 65.834282] psb_driver_load+0x7b2/0x1090 [gma500_gfx] [ 65.834516] ? __pfx_psb_driver_load+0x10/0x10 [gma500_gfx] [ 65.834726] ? ksize+0x1d/0x40 [ 65.834817] ? drmm_add_final_kfree+0x3b/0xb0 [ 65.834935] ? __pfx_psb_pci_probe+0x10/0x10 [gma500_gfx] [ 65.835164] psb_pci_probe+0xc8/0x150 [gma500_gfx] [ 65.835384] local_pci_probe+0xd5/0x190 [ 65.835492] pci_call_probe+0x167/0x4b0 [ 65.835594] ? __pfx_pci_call_probe+0x10/0x10 [ 65.835693] ? local_clock+0x11/0x30 [ 65.835808] ? __pfx___driver_attach+0x10/0x10 [ 65.835915] ? do_raw_spin_unlock+0x55/0x230 [ 65.836014] ? pci_match_device+0x303/0x790 [ 65.836124] ? pci_match_device+0x386/0x790 [ 65.836226] ? __pfx_pci_assign_irq+0x10/0x10 [ 65.836320] ? kernfs_create_link+0x16a/0x230 [ 65.836418] ? do_raw_spin_unlock+0x55/0x230 [ 65.836526] ? __pfx___driver_attach+0x10/0x10 [ 65.836626] pci_device_probe+0x175/0x2c0 [ 65.836735] call_driver_probe+0x64/0x1e0 [ 65.836842] really_probe+0x194/0x740 [ 65.836951] ? __pfx___driver_attach+0x10/0x10 [ 65.837053] __driver_probe_device+0x18c/0x3a0 [ 65.837163] ? __pfx___driver_attach+0x10/0x10 [ 65.837262] driver_probe_device+0x4a/0x120 [ 65.837369] __driver_attach+0x19c/0x550 [ 65.837474] ? __pfx___driver_attach+0x10/0x10 [ 65.837575] bus_for_each_dev+0xe6/0x150 [ 65.837669] ? local_clock+0x11/0x30 [ 65.837770] ? __pfx_bus_for_each_dev+0x10/0x10 [ 65.837891] bus_add_driver+0x2af/0x4f0 [ 65.838000] ? __pfx_psb_init+0x10/0x10 [gma500_gfx] [ 65.838236] driver_register+0x19f/0x3a0 [ 65.838342] ? rcu_is_watching+0x11/0xb0 [ 65.838446] do_one_initcall+0xb5/0x3a0 [ 65.838546] ? __pfx_do_one_initcall+0x10/0x10 [ 65.838644] ? __kasan_slab_alloc+0x2c/0x70 [ 65.838741] ? rcu_is_watching+0x11/0xb0 [ 65.838837] ? __kmalloc_cache_noprof+0x3e8/0x6e0 [ 65.838937] ? klp_module_coming+0x1a0/0x2e0 [ 65.839033] ? do_init_module+0x85/0x7f0 [ 65.839126] ? kasan_unpoison+0x40/0x70 [ 65.839230] do_init_module+0x26e/0x7f0 [ 65.839341] ? __pfx_do_init_module+0x10/0x10 [ 65.839450] init_module_from_file+0x13f/0x160 [ 65.839549] ? __pfx_init_module_from_file+0x10/0x10 [ 65.839651] ? __lock_acquire+0x578/0xae0 [ 65.839791] ? do_raw_spin_unlock+0x55/0x230 [ 65.839886] ? idempotent_init_module+0x585/0x720 [ 65.839993] idempotent_init_module+0x1ff/0x720 [ 65.840097] ? __pfx_cred_has_capability.isra.0+0x10/0x10 [ 65.840211] ? __pfx_idempotent_init_module+0x10/0x10 Reported-by: Thomas Zimmermann <tzimmermann@suse.de> Closes: https://lore.kernel.org/r/5aec1964-072c-4335-8f37-35e6efb4910e@suse.de Fixes: d930ffa ("drm/gma500: use drm_crtc_vblank_crtc()") Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patch.msgid.link/20260130151319.31264-1-jani.nikula@intel.com

An issue was triggered: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 15 UID: 0 PID: 658 Comm: bash Tainted: 6.19.0-rc6-next-2026012 Tainted: [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), RIP: 0010:strcmp+0x10/0x30 RSP: 0018:ffffc900017f7dc0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888107cd4358 RDX: 0000000019f73907 RSI: ffffffff82cc381a RDI: 0000000000000000 RBP: ffff8881016bef0d R08: 000000006c0e7145 R09: 0000000056c0e714 R10: 0000000000000001 R11: ffff888107cd4358 R12: 0007ffffffffffff R13: ffff888101399200 R14: ffff888100fcb360 R15: 0007ffffffffffff CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000105c79000 CR4: 00000000000006f0 Call Trace: <TASK> dmemcg_limit_write.constprop.0+0x16d/0x390 ? __pfx_set_resource_max+0x10/0x10 kernfs_fop_write_iter+0x14e/0x200 vfs_write+0x367/0x510 ksys_write+0x66/0xe0 do_syscall_64+0x6b/0x390 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f42697e1887 It was trriggered setting max without limitation, the command is like: "echo test/region0 > dmem.max". To fix this issue, add check whether options is valid after parsing the region_name. Fixes: b168ed4 ("kernel/cgroup: Add "dmem" memory accounting cgroup") Cc: stable@vger.kernel.org # v6.14+ Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>

syzbot reported a kernel BUG in fib6_add_rt2node() when adding an IPv6 route. [0] Commit f72514b ("ipv6: clear RA flags when adding a static route") introduced logic to clear RTF_ADDRCONF from existing routes when a static route with the same nexthop is added. However, this causes a problem when the existing route has a gateway. When RTF_ADDRCONF is cleared from a route that has a gateway, that route becomes eligible for ECMP, i.e. rt6_qualify_for_ecmp() returns true. The issue is that this route was never added to the fib6_siblings list. This leads to a mismatch between the following counts: - The sibling count computed by iterating fib6_next chain, which includes the newly ECMP-eligible route - The actual siblings in fib6_siblings list, which does not include that route When a subsequent ECMP route is added, fib6_add_rt2node() hits BUG_ON(sibling->fib6_nsiblings != rt->fib6_nsiblings) because the counts don't match. Fix this by only clearing RTF_ADDRCONF when the existing route does not have a gateway. Routes without a gateway cannot qualify for ECMP anyway (rt6_qualify_for_ecmp() requires fib_nh_gw_family), so clearing RTF_ADDRCONF on them is safe and matches the original intent of the commit. [0]: kernel BUG at net/ipv6/ip6_fib.c:1217! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6010 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:fib6_add_rt2node+0x3433/0x3470 net/ipv6/ip6_fib.c:1217 [...] Call Trace: <TASK> fib6_add+0x8da/0x18a0 net/ipv6/ip6_fib.c:1532 __ip6_ins_rt net/ipv6/route.c:1351 [inline] ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3946 ipv6_route_ioctl+0x35c/0x480 net/ipv6/route.c:4571 inet6_ioctl+0x219/0x280 net/ipv6/af_inet6.c:577 sock_do_ioctl+0xdc/0x300 net/socket.c:1245 sock_ioctl+0x576/0x790 net/socket.c:1366 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: f72514b ("ipv6: clear RA flags when adding a static route") Reported-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cb809def1baaac68ab92 Tested-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260204095837.1285552-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock or per-VMA lock, whichever was used to lock VMA under question, to avoid deadlock reported by syzbot: -> #1 (&mm->mmap_lock){++++}-{4:4}: __might_fault+0xed/0x170 _copy_to_iter+0x118/0x1720 copy_page_to_iter+0x12d/0x1e0 filemap_read+0x720/0x10a0 blkdev_read_iter+0x2b5/0x4e0 vfs_read+0x7f4/0xae0 ksys_read+0x12a/0x250 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: __lock_acquire+0x1509/0x26d0 lock_acquire+0x185/0x340 down_read+0x98/0x490 blkdev_read_iter+0x2a7/0x4e0 __kernel_read+0x39a/0xa90 freader_fetch+0x1d5/0xa80 __build_id_parse.isra.0+0xea/0x6a0 do_procmap_query+0xd75/0x1050 procfs_procmap_ioctl+0x7a/0xb0 __x64_sys_ioctl+0x18e/0x210 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&mm->mmap_lock); lock(&sb->s_type->i_mutex_key#8); lock(&mm->mmap_lock); rlock(&sb->s_type->i_mutex_key#8); *** DEADLOCK *** This seems to be exacerbated (as we haven't seen these syzbot reports before that) by the recent: 777a856 ("lib/buildid: use __kernel_read() for sleepable context") To make this safe, we need to grab file refcount while VMA is still locked, but other than that everything is pretty straightforward. Internal build_id_parse() API assumes VMA is passed, but it only needs the underlying file reference, so just add another variant build_id_parse_file() that expects file passed directly. [akpm@linux-foundation.org: fix up kerneldoc] Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org Fixes: ed5d583 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

tcf_ife_encode() must make sure ife_encode() does not return NULL. syzbot reported: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:ife_tlv_meta_encode+0x41/0xa0 net/ife/ife.c:166 CPU: 3 UID: 0 PID: 8990 Comm: syz.0.696 Not tainted syzkaller #0 PREEMPT(full) Call Trace: <TASK> ife_encode_meta_u32+0x153/0x180 net/sched/act_ife.c:101 tcf_ife_encode net/sched/act_ife.c:841 [inline] tcf_ife_act+0x1022/0x1de0 net/sched/act_ife.c:877 tc_act include/net/tc_wrapper.h:130 [inline] tcf_action_exec+0x1c0/0xa20 net/sched/act_api.c:1152 tcf_exts_exec include/net/pkt_cls.h:349 [inline] mall_classify+0x1a0/0x2a0 net/sched/cls_matchall.c:42 tc_classify include/net/tc_wrapper.h:197 [inline] __tcf_classify net/sched/cls_api.c:1764 [inline] tcf_classify+0x7f2/0x1380 net/sched/cls_api.c:1860 multiq_classify net/sched/sch_multiq.c:39 [inline] multiq_enqueue+0xe0/0x510 net/sched/sch_multiq.c:66 dev_qdisc_enqueue+0x45/0x250 net/core/dev.c:4147 __dev_xmit_skb net/core/dev.c:4262 [inline] __dev_queue_xmit+0x2998/0x46c0 net/core/dev.c:4798 Fixes: 295a6e0 ("net/sched: act_ife: Change to use ife module") Reported-by: syzbot+5cf914f193dffde3bd3c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6970d61d.050a0220.706b.0010.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yotam Gigi <yotam.gi@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260121133724.3400020-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

In mesh_rx_csa_frame(), elems->mesh_chansw_params_ie is dereferenced at lines 1638 and 1642 without a prior NULL check: ifmsh->chsw_ttl = elems->mesh_chansw_params_ie->mesh_ttl; ... pre_value = le16_to_cpu(elems->mesh_chansw_params_ie->mesh_pre_value); The mesh_matches_local() check above only validates the Mesh ID, Mesh Configuration, and Supported Rates IEs. It does not verify the presence of the Mesh Channel Switch Parameters IE (element ID 118). When a received CSA action frame omits that IE, ieee802_11_parse_elems() leaves elems->mesh_chansw_params_ie as NULL, and the unconditional dereference causes a kernel NULL pointer dereference. A remote mesh peer with an established peer link (PLINK_ESTAB) can trigger this by sending a crafted SPECTRUM_MGMT/CHL_SWITCH action frame that includes a matching Mesh ID and Mesh Configuration IE but omits the Mesh Channel Switch Parameters IE. No authentication beyond the default open mesh peering is required. Crash confirmed on kernel 6.17.0-5-generic via mac80211_hwsim: BUG: kernel NULL pointer dereference, address: 0000000000000000 Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:ieee80211_mesh_rx_queued_mgmt+0x143/0x2a0 [mac80211] CR2: 0000000000000000 Fix by adding a NULL check for mesh_chansw_params_ie after mesh_matches_local() returns, consistent with how other optional IEs are guarded throughout the mesh code. The bug has been present since v3.13 (released 2014-01-19). Fixes: 8f2535b ("mac80211: process the CSA frame for mesh accordingly") Cc: stable@vger.kernel.org Signed-off-by: Vahagn Vardanian <vahagn@redrays.io> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Fix a circular locking dependency between dbg_mutex and the domain rx/tx mutexes that could lead to a deadlock. The dump path in dr_dump_domain_all() was acquiring locks in the order: dbg_mutex -> rx.mutex -> tx.mutex While the table/matcher creation paths acquire locks in the order: rx.mutex -> tx.mutex -> dbg_mutex This inverted lock ordering creates a circular dependency. Fix this by changing dr_dump_domain_all() to acquire the domain lock before dbg_mutex, matching the order used in mlx5dr_table_create() and mlx5dr_matcher_create(). Lockdep splat: ====================================================== WARNING: possible circular locking dependency detected 6.19.0-rc6net_next_e817c4e #1 Not tainted ------------------------------------------------------ sos/30721 is trying to acquire lock: ffff888102df5900 (&dmn->info.rx.mutex){+.+.}-{4:4}, at: dr_dump_start+0x131/0x450 [mlx5_core] but task is already holding lock: ffff888102df5bc0 (&dmn->dump_info.dbg_mutex){+.+.}-{4:4}, at: dr_dump_start+0x10b/0x450 [mlx5_core] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&dmn->dump_info.dbg_mutex){+.+.}-{4:4}: __mutex_lock+0x91/0x1060 mlx5dr_matcher_create+0x377/0x5e0 [mlx5_core] mlx5_cmd_dr_create_flow_group+0x62/0xd0 [mlx5_core] mlx5_create_flow_group+0x113/0x1c0 [mlx5_core] mlx5_chains_create_prio+0x453/0x2290 [mlx5_core] mlx5_chains_get_table+0x2e2/0x980 [mlx5_core] esw_chains_create+0x1e6/0x3b0 [mlx5_core] esw_create_offloads_fdb_tables.cold+0x62/0x63f [mlx5_core] esw_offloads_enable+0x76f/0xd20 [mlx5_core] mlx5_eswitch_enable_locked+0x35a/0x500 [mlx5_core] mlx5_devlink_eswitch_mode_set+0x561/0x950 [mlx5_core] devlink_nl_eswitch_set_doit+0x67/0xe0 genl_family_rcv_msg_doit+0xe0/0x130 genl_rcv_msg+0x188/0x290 netlink_rcv_skb+0x4b/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1ed/0x2c0 netlink_sendmsg+0x210/0x450 __sock_sendmsg+0x38/0x60 __sys_sendto+0x119/0x180 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x70/0xd00 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #1 (&dmn->info.tx.mutex){+.+.}-{4:4}: __mutex_lock+0x91/0x1060 mlx5dr_table_create+0x11d/0x530 [mlx5_core] mlx5_cmd_dr_create_flow_table+0x62/0x140 [mlx5_core] __mlx5_create_flow_table+0x46f/0x960 [mlx5_core] mlx5_create_flow_table+0x16/0x20 [mlx5_core] esw_create_offloads_fdb_tables+0x136/0x240 [mlx5_core] esw_offloads_enable+0x76f/0xd20 [mlx5_core] mlx5_eswitch_enable_locked+0x35a/0x500 [mlx5_core] mlx5_devlink_eswitch_mode_set+0x561/0x950 [mlx5_core] devlink_nl_eswitch_set_doit+0x67/0xe0 genl_family_rcv_msg_doit+0xe0/0x130 genl_rcv_msg+0x188/0x290 netlink_rcv_skb+0x4b/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1ed/0x2c0 netlink_sendmsg+0x210/0x450 __sock_sendmsg+0x38/0x60 __sys_sendto+0x119/0x180 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x70/0xd00 entry_SYSCALL_64_after_hwframe+0x4b/0x53 -> #0 (&dmn->info.rx.mutex){+.+.}-{4:4}: __lock_acquire+0x18b6/0x2eb0 lock_acquire+0xd3/0x2c0 __mutex_lock+0x91/0x1060 dr_dump_start+0x131/0x450 [mlx5_core] seq_read_iter+0xe3/0x410 seq_read+0xfb/0x130 full_proxy_read+0x53/0x80 vfs_read+0xba/0x330 ksys_read+0x65/0xe0 do_syscall_64+0x70/0xd00 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&dmn->dump_info.dbg_mutex); lock(&dmn->info.tx.mutex); lock(&dmn->dump_info.dbg_mutex); lock(&dmn->info.rx.mutex); *** DEADLOCK *** Fixes: 9222f0b ("net/mlx5: DR, Add support for dumping steering info") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260224114652.1787431-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

…ming KASAN reports a NULL instruction fetch (RIP=0x0) from dc_stream_program_cursor_position(): BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:0x0 Call Trace: dc_stream_program_cursor_position+0x344/0x920 [amdgpu] amdgpu_dm_atomic_commit_tail+... [ +1.041013] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ +0.000027] #PF: supervisor instruction fetch in kernel mode [ +0.000013] #PF: error_code(0x0010) - not-present page [ +0.000012] PGD 0 P4D 0 [ +0.000017] Oops: Oops: 0010 [#1] SMP KASAN NOPTI [ +0.000017] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G E 6.18.0+ #3 PREEMPT(voluntary) [ +0.000023] Tainted: [E]=UNSIGNED_MODULE [ +0.000010] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000016] Workqueue: events drm_mode_rmfb_work_fn [ +0.000022] RIP: 0010:0x0 [ +0.000017] Code: Unable to access opcode bytes at 0xffffffffffffffd6. [ +0.000015] RSP: 0018:ffffc9000017f4c8 EFLAGS: 00010246 [ +0.000016] RAX: 0000000000000000 RBX: ffff88810afdda80 RCX: 1ffff110457000d1 [ +0.000014] RDX: 1ffffffff87b75bd RSI: 0000000000000000 RDI: ffff88810afdda80 [ +0.000014] RBP: ffffc9000017f538 R08: 0000000000000000 R09: ffff88822b800690 [ +0.000013] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc3dbac20 [ +0.000014] R13: 0000000000000000 R14: ffff88811ab80000 R15: dffffc0000000000 [ +0.000014] FS: 0000000000000000(0000) GS:ffff888434599000(0000) knlGS:0000000000000000 [ +0.000015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000013] CR2: ffffffffffffffd6 CR3: 000000010ee88000 CR4: 0000000000350ef0 [ +0.000014] Call Trace: [ +0.000010] <TASK> [ +0.000010] dc_stream_program_cursor_position+0x344/0x920 [amdgpu] [ +0.001086] ? __pfx_mutex_lock+0x10/0x10 [ +0.000015] ? unwind_next_frame+0x18b/0xa70 [ +0.000019] amdgpu_dm_atomic_commit_tail+0x1124/0xfa20 [amdgpu] [ +0.001040] ? ret_from_fork_asm+0x1a/0x30 [ +0.000018] ? filter_irq_stacks+0x90/0xa0 [ +0.000022] ? __pfx_amdgpu_dm_atomic_commit_tail+0x10/0x10 [amdgpu] [ +0.001058] ? kasan_save_track+0x18/0x70 [ +0.000015] ? kasan_save_alloc_info+0x37/0x60 [ +0.000015] ? __kasan_kmalloc+0xc3/0xd0 [ +0.000013] ? __kmalloc_cache_noprof+0x1aa/0x600 [ +0.000016] ? drm_atomic_helper_setup_commit+0x788/0x1450 [ +0.000017] ? drm_atomic_helper_commit+0x7e/0x290 [ +0.000014] ? drm_atomic_commit+0x205/0x2e0 [ +0.000015] ? process_one_work+0x629/0xf80 [ +0.000016] ? worker_thread+0x87f/0x1570 [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_check_write+0x14/0x30 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? _raw_spin_lock_irq+0x8a/0xf0 [ +0.000015] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ +0.000016] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __wait_for_common+0x204/0x460 [ +0.000015] ? sched_clock_noinstr+0x9/0x10 [ +0.000014] ? __pfx_schedule_timeout+0x10/0x10 [ +0.000014] ? local_clock_noinstr+0xe/0xd0 [ +0.000015] ? __pfx___wait_for_common+0x10/0x10 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __wait_for_common+0x204/0x460 [ +0.000014] ? __pfx_schedule_timeout+0x10/0x10 [ +0.000015] ? __kasan_kmalloc+0xc3/0xd0 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? wait_for_completion_timeout+0x1d/0x30 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? drm_crtc_commit_wait+0x32/0x180 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? drm_atomic_helper_wait_for_dependencies+0x46a/0x800 [ +0.000019] commit_tail+0x231/0x510 [ +0.000017] drm_atomic_helper_commit+0x219/0x290 [ +0.000015] ? __pfx_drm_atomic_helper_commit+0x10/0x10 [ +0.000016] drm_atomic_commit+0x205/0x2e0 [ +0.000014] ? __pfx_drm_atomic_commit+0x10/0x10 [ +0.000013] ? __pfx_drm_connector_free+0x10/0x10 [ +0.000014] ? __pfx___drm_printfn_info+0x10/0x10 [ +0.000017] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? drm_atomic_set_crtc_for_connector+0x49e/0x660 [ +0.000015] ? drm_atomic_set_fb_for_plane+0x155/0x290 [ +0.000015] drm_framebuffer_remove+0xa9b/0x1240 [ +0.000014] ? finish_task_switch.isra.0+0x15a/0x840 [ +0.000015] ? __switch_to+0x385/0xda0 [ +0.000015] ? srso_safe_ret+0x1/0x20 [ +0.000013] ? __pfx_drm_framebuffer_remove+0x10/0x10 [ +0.000016] ? kasan_print_address_stack_frame+0x221/0x280 [ +0.000015] drm_mode_rmfb_work_fn+0x14b/0x240 [ +0.000015] process_one_work+0x629/0xf80 [ +0.000012] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000019] worker_thread+0x87f/0x1570 [ +0.000013] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000014] ? __pfx_try_to_wake_up+0x10/0x10 [ +0.000017] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? kasan_print_address_stack_frame+0x227/0x280 [ +0.000017] ? __pfx_worker_thread+0x10/0x10 [ +0.000014] kthread+0x396/0x830 [ +0.000013] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ +0.000015] ? __pfx_kthread+0x10/0x10 [ +0.000012] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? recalc_sigpending+0x180/0x210 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __pfx_kthread+0x10/0x10 [ +0.000014] ret_from_fork+0x31c/0x3e0 [ +0.000014] ? __pfx_kthread+0x10/0x10 [ +0.000013] ret_from_fork_asm+0x1a/0x30 [ +0.000019] </TASK> [ +0.000010] Modules linked in: rfcomm(E) cmac(E) algif_hash(E) algif_skcipher(E) af_alg(E) snd_seq_dummy(E) snd_hrtimer(E) qrtr(E) xt_MASQUERADE(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) xt_mark(E) xt_tcpudp(E) nft_compat(E) nf_tables(E) x_tables(E) bnep(E) snd_hda_codec_alc882(E) snd_hda_codec_atihdmi(E) snd_hda_codec_realtek_lib(E) snd_hda_codec_hdmi(E) snd_hda_codec_generic(E) iwlmvm(E) snd_hda_intel(E) binfmt_misc(E) snd_hda_codec(E) snd_hda_core(E) mac80211(E) snd_intel_dspcfg(E) snd_intel_sdw_acpi(E) snd_hwdep(E) snd_pcm(E) libarc4(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) amd_atl(E) intel_rapl_msr(E) snd_seq(E) intel_rapl_common(E) iwlwifi(E) jc42(E) snd_seq_device(E) btusb(E) snd_timer(E) btmtk(E) btrtl(E) edac_mce_amd(E) eeepc_wmi(E) polyval_clmulni(E) btbcm(E) ghash_clmulni_intel(E) asus_wmi(E) ee1004(E) platform_profile(E) btintel(E) snd(E) nls_iso8859_1(E) aesni_intel(E) soundcore(E) i2c_piix4(E) cfg80211(E) sparse_keymap(E) wmi_bmof(E) bluetooth(E) k10temp(E) rapl(E) [ +0.000300] i2c_smbus(E) ccp(E) joydev(E) input_leds(E) gpio_amdpt(E) mac_hid(E) sch_fq_codel(E) msr(E) parport_pc(E) ppdev(E) lp(E) parport(E) efi_pstore(E) nfnetlink(E) dmi_sysfs(E) autofs4(E) cdc_ether(E) usbnet(E) amdgpu(E) amdxcp(E) hid_generic(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) drm_exec(E) drm_panel_backlight_quirks(E) gpu_sched(E) drm_suballoc_helper(E) video(E) drm_buddy(E) usbhid(E) drm_display_helper(E) r8152(E) hid(E) mii(E) cec(E) ahci(E) rc_core(E) igc(E) libahci(E) wmi(E) [ +0.000294] CR2: 0000000000000000 [ +0.000013] ---[ end trace 0000000000000000 ]--- The crash happens when we unconditionally call into the timing generator manual trigger hook: pipe_ctx->stream_res.tg->funcs->program_manual_trigger(...) On some configurations the timing generator (tg), its funcs table, or the program_manual_trigger callback can be NULL. Guard all of these before calling the hook. If the first pipe matching the stream cannot trigger, keep scanning to find another matching pipe with a valid hook. The issue was originally found on Vg20/DCE 12.1 Mario successfully tested on Polaris 11/DCE 11.2 Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Alexander Deucher <alexander.deucher@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Fixes: ba448f9 ("drm/amd/display: mouse event trigger to boost RR when idle") Suggested-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-and-tested-by: Mario Kleiner <mario.kleiner.de@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

A userspace program can trigger the RIVA NV3 arbitration code by calling the FBIOPUT_VSCREENINFO ioctl on /dev/fb*. When doing so, the driver recomputes FIFO arbitration parameters in nv3_arb(), using state->mclk_khz (derived from the PRAMDAC MCLK PLL) as a divisor without validating it first. In a normal setup, state->mclk_khz is provided by the real hardware and is non-zero. However, an attacker can construct a malicious or misconfigured device (e.g. a crafted/emulated PCI device) that exposes a bogus PLL configuration, causing state->mclk_khz to become zero. Once nv3_get_param() calls nv3_arb(), the division by state->mclk_khz in the gns calculation causes a divide error and crashes the kernel. Fix this by checking whether state->mclk_khz is zero and bailing out before doing the division. The following log reveals it: rivafb: setting virtual Y resolution to 2184 divide error: 0000 [#1] PREEMPT SMP KASAN PTI CPU: 0 PID: 2187 Comm: syz-executor.0 Not tainted 5.18.0-rc1+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:nv3_arb drivers/video/fbdev/riva/riva_hw.c:439 [inline] RIP: 0010:nv3_get_param+0x3ab/0x13b0 drivers/video/fbdev/riva/riva_hw.c:546 Call Trace: nv3CalcArbitration.constprop.0+0x255/0x460 drivers/video/fbdev/riva/riva_hw.c:603 nv3UpdateArbitrationSettings drivers/video/fbdev/riva/riva_hw.c:637 [inline] CalcStateExt+0x447/0x1b90 drivers/video/fbdev/riva/riva_hw.c:1246 riva_load_video_mode+0x8a9/0xea0 drivers/video/fbdev/riva/fbdev.c:779 rivafb_set_par+0xc0/0x5f0 drivers/video/fbdev/riva/fbdev.c:1196 fb_set_var+0x604/0xeb0 drivers/video/fbdev/core/fbmem.c:1033 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1109 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1188 __x64_sys_ioctl+0x122/0x190 fs/ioctl.c:856 Fixes: 1da177e ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de>

There are two places where ksmbd_vfs_kern_path_end_removing() needs to be called in order to balance what the corresponding successful call to ksmbd_vfs_kern_path_start_removing() has done, i.e. drop inode locks and put the taken references. Otherwise there might be potential deadlocks and unbalanced locks which are caught like: BUG: workqueue leaked lock or atomic: kworker/5:21/0x00000000/7596 last function: handle_ksmbd_work 2 locks held by kworker/5:21/7596: #0: ffff8881051ae448 (sb_writers#3){.+.+}-{0:0}, at: ksmbd_vfs_kern_path_locked+0x142/0x660 #1: ffff888130e966c0 (&type->i_mutex_dir_key#3/1){+.+.}-{4:4}, at: ksmbd_vfs_kern_path_locked+0x17d/0x660 CPU: 5 PID: 7596 Comm: kworker/5:21 Not tainted 6.1.162-00456-gc29b353f383b #138 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 Workqueue: ksmbd-io handle_ksmbd_work Call Trace: <TASK> dump_stack_lvl+0x44/0x5b process_one_work.cold+0x57/0x5c worker_thread+0x82/0x600 kthread+0x153/0x190 ret_from_fork+0x22/0x30 </TASK> Found by Linux Verification Center (linuxtesting.org). Fixes: d5fc140 ("smb/server: avoid deadlock when linking with ReplaceIfExists") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>

Since a recent cpuset code change [1] the kernel emits warnings like this: WARNING: kernel/cgroup/cpuset.c:966 at rebuild_sched_domains_locked+0xe0/0x120, CPU#0: kworker/0:0/9 Modules linked in: CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.20.0-20260215.rc0.git3.bb7a3fc2c976.300.fc43.s390x+git #1 PREEMPTLAZY Hardware name: IBM 3931 A01 703 (KVM/Linux) Workqueue: events topology_work_fn Krnl PSW : 0704c00180000000 000002922e7af5c4 (rebuild_sched_domains_locked+0xe4/0x120) ... Call Trace: [<000002922e7af5c4>] rebuild_sched_domains_locked+0xe4/0x120 [<000002922e7af634>] rebuild_sched_domains+0x34/0x50 [<000002922e6ba232>] process_one_work+0x1b2/0x490 [<000002922e6bc4b8>] worker_thread+0x1f8/0x3b0 [<000002922e6c6a98>] kthread+0x148/0x170 [<000002922e645ffc>] __ret_from_fork+0x3c/0x240 [<000002922f51f492>] ret_from_fork+0xa/0x30 Reason for this is that the s390 specific smp initialization code schedules a work which rebuilds scheduling domains way before the scheduler is smp aware. With the mentioned commit the (invalid) rebuild request is not anymore silently discarded but instead leads to warning. Address this by avoiding the early rebuild request. Reported-by: Marc Hartmayer <marc@linux.ibm.com> Tested-by: Marc Hartmayer <marc@linux.ibm.com> Fixes: 6ee4304 ("cpuset: Remove unnecessary checks in rebuild_sched_domains_locked") [1] Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

With PREEMPT_RT as potential configuration option, spinlock_t is now considered as a sleeping lock, and thus might cause issues when used in an atomic context. But even with PREEMPT_RT as potential configuration option, raw_spinlock_t remains as a true spinning lock/atomic context. This creates potential issues with the s390 debug/tracing feature. The functions to trace errors are called in various contexts, including under lock of raw_spinlock_t, and thus the used spinlock_t in each debug area is in violation of the locking semantics. Here are two examples involving failing PCI Read accesses that are traced while holding `pci_lock` in `drivers/pci/access.c`: ============================= [ BUG: Invalid wait context ] 6.19.0-devel #18 Not tainted ----------------------------- bash/3833 is trying to lock: 0000027790baee30 (&rc->lock){-.-.}-{3:3}, at: debug_event_common+0xfc/0x300 other info that might help us debug this: context-{5:5} 5 locks held by bash/3833: #0: 0000027efbb29450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0xf0 #1: 00000277f0504a90 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x13e/0x260 #2: 00000277beed8c18 (kn->active#339){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x164/0x260 #3: 00000277e9859190 (&dev->mutex){....}-{4:4}, at: pci_dev_lock+0x2e/0x40 #4: 00000383068a7708 (pci_lock){....}-{2:2}, at: pci_bus_read_config_dword+0x4a/0xb0 stack backtrace: CPU: 6 UID: 0 PID: 3833 Comm: bash Kdump: loaded Not tainted 6.19.0-devel #18 PREEMPTLAZY Hardware name: IBM 9175 ME1 701 (LPAR) Call Trace: [<00000383048afec2>] dump_stack_lvl+0xa2/0xe8 [<00000383049ba166>] __lock_acquire+0x816/0x1660 [<00000383049bb1fa>] lock_acquire+0x24a/0x370 [<00000383059e3860>] _raw_spin_lock_irqsave+0x70/0xc0 [<00000383048bbb6c>] debug_event_common+0xfc/0x300 [<0000038304900b0a>] __zpci_load+0x17a/0x1f0 [<00000383048fad88>] pci_read+0x88/0xd0 [<00000383054cbce0>] pci_bus_read_config_dword+0x70/0xb0 [<00000383054d55e4>] pci_dev_wait+0x174/0x290 [<00000383054d5a3e>] __pci_reset_function_locked+0xfe/0x170 [<00000383054d9b30>] pci_reset_function+0xd0/0x100 [<00000383054ee21a>] reset_store+0x5a/0x80 [<0000038304e98758>] kernfs_fop_write_iter+0x1e8/0x260 [<0000038304d995da>] new_sync_write+0x13a/0x180 [<0000038304d9c5d0>] vfs_write+0x200/0x330 [<0000038304d9c88c>] ksys_write+0x7c/0xf0 [<00000383059cfa80>] __do_syscall+0x210/0x500 [<00000383059e4c06>] system_call+0x6e/0x90 INFO: lockdep is turned off. ============================= [ BUG: Invalid wait context ] 6.19.0-devel #3 Not tainted ----------------------------- bash/6861 is trying to lock: 0000009da05c7430 (&rc->lock){-.-.}-{3:3}, at: debug_event_common+0xfc/0x300 other info that might help us debug this: context-{5:5} 5 locks held by bash/6861: #0: 000000acff404450 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x7c/0xf0 #1: 000000acff41c490 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x13e/0x260 #2: 0000009da36937d8 (kn->active#75){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x164/0x260 #3: 0000009dd15250d0 (&zdev->state_lock){+.+.}-{4:4}, at: enable_slot+0x2e/0xc0 #4: 000001a19682f708 (pci_lock){....}-{2:2}, at: pci_bus_read_config_byte+0x42/0xa0 stack backtrace: CPU: 16 UID: 0 PID: 6861 Comm: bash Kdump: loaded Not tainted 6.19.0-devel #3 PREEMPTLAZY Hardware name: IBM 9175 ME1 701 (LPAR) Call Trace: [<000001a194837ec2>] dump_stack_lvl+0xa2/0xe8 [<000001a194942166>] __lock_acquire+0x816/0x1660 [<000001a1949431fa>] lock_acquire+0x24a/0x370 [<000001a19596b810>] _raw_spin_lock_irqsave+0x70/0xc0 [<000001a194843b6c>] debug_event_common+0xfc/0x300 [<000001a194888b0a>] __zpci_load+0x17a/0x1f0 [<000001a194882d88>] pci_read+0x88/0xd0 [<000001a195453b88>] pci_bus_read_config_byte+0x68/0xa0 [<000001a195457bc2>] pci_setup_device+0x62/0xad0 [<000001a195458e70>] pci_scan_single_device+0x90/0xe0 [<000001a19488a0f6>] zpci_bus_scan_device+0x46/0x80 [<000001a19547f958>] enable_slot+0x98/0xc0 [<000001a19547f134>] power_write_file+0xc4/0x110 [<000001a194e20758>] kernfs_fop_write_iter+0x1e8/0x260 [<000001a194d215da>] new_sync_write+0x13a/0x180 [<000001a194d245d0>] vfs_write+0x200/0x330 [<000001a194d2488c>] ksys_write+0x7c/0xf0 [<000001a195957a30>] __do_syscall+0x210/0x500 [<000001a19596cbb6>] system_call+0x6e/0x90 INFO: lockdep is turned off. Since it is desired to keep it possible to create trace records in most situations, including this particular case (failing PCI config space accesses are relevant), convert the used spinlock_t in `struct debug_info` to raw_spinlock_t. The impact is small, as the debug area lock only protects bounded memory access without external dependencies, apart from one function debug_set_size() where kfree() is implicitly called with the lock held. Move debug_info_free() out of this lock, to keep remove this external dependency. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>

…ilure() [BUG] There is a bug report that when btrfs hits ENOSPC error in a critical path, btrfs flips RO (this part is expected, although the ENOSPC bug still needs to be addressed). The problem is after the RO flip, if there is a read repair pending, we can hit the ASSERT() inside btrfs_repair_io_failure() like the following: BTRFS info (device vdc): relocating block group 30408704 flags metadata|raid1 ------------[ cut here ]------------ BTRFS: Transaction aborted (error -28) WARNING: fs/btrfs/extent-tree.c:3235 at __btrfs_free_extent.isra.0+0x453/0xfd0, CPU#1: btrfs/383844 Modules linked in: kvm_intel kvm irqbypass [...] ---[ end trace 0000000000000000 ]--- BTRFS info (device vdc state EA): 2 enospc errors during balance BTRFS info (device vdc state EA): balance: ended with status: -30 BTRFS error (device vdc state EA): parent transid verify failed on logical 30556160 mirror 2 wanted 8 found 6 BTRFS error (device vdc state EA): bdev /dev/nvme0n1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0 [...] assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938 ------------[ cut here ]------------ assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938 kernel BUG at fs/btrfs/bio.c:938! Oops: invalid opcode: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 868 Comm: kworker/u8:13 Tainted: G W N 6.19.0-rc6+ #4788 PREEMPT(full) Tainted: [W]=WARN, [N]=TEST Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 Workqueue: btrfs-endio simple_end_io_work RIP: 0010:btrfs_repair_io_failure.cold+0xb2/0x120 RSP: 0000:ffffc90001d2bcf0 EFLAGS: 00010246 RAX: 0000000000000051 RBX: 0000000000001000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8305cf42 RDI: 00000000ffffffff RBP: 0000000000000002 R08: 00000000fffeffff R09: ffffffff837fa988 R10: ffffffff8327a9e0 R11: 6f69747265737361 R12: ffff88813018d310 R13: ffff888168b8a000 R14: ffffc90001d2bd90 R15: ffff88810a169000 FS: 0000000000000000(0000) GS:ffff8885e752c000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 ------------[ cut here ]------------ [CAUSE] The cause of -ENOSPC error during the test case btrfs/124 is still unknown, although it's known that we still have cases where metadata can be over-committed but can not be fulfilled correctly, thus if we hit such ENOSPC error inside a critical path, we have no choice but abort the current transaction. This will mark the fs read-only. The problem is inside the btrfs_repair_io_failure() path that we require the fs not to be mount read-only. This is normally fine, but if we are doing a read-repair meanwhile the fs flips RO due to a critical error, we can enter btrfs_repair_io_failure() with super block set to read-only, thus triggering the above crash. [FIX] Just replace the ASSERT() with a proper return if the fs is already read-only. Reported-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-btrfs/20260126045555.GB31641@lst.de/ Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

The io_zcrx_put_niov_uref() function uses a non-atomic check-then-decrement pattern (atomic_read followed by separate atomic_dec) to manipulate user_refs. This is serialized against other callers by rq_lock, but io_zcrx_scrub() modifies the same counter with atomic_xchg() WITHOUT holding rq_lock. On SMP systems, the following race exists: CPU0 (refill, holds rq_lock) CPU1 (scrub, no rq_lock) put_niov_uref: atomic_read(uref) - 1 // window opens atomic_xchg(uref, 0) - 1 return_niov_freelist(niov) [PUSH #1] // window closes atomic_dec(uref) - wraps to -1 returns true return_niov(niov) return_niov_freelist(niov) [PUSH #2: DOUBLE-FREE] The same niov is pushed to the freelist twice, causing free_count to exceed nr_iovs. Subsequent freelist pushes then perform an out-of-bounds write (a u32 value) past the kvmalloc'd freelist array into the adjacent slab object. Fix this by replacing the non-atomic read-then-dec in io_zcrx_put_niov_uref() with an atomic_try_cmpxchg loop that atomically tests and decrements user_refs. This makes the operation safe against concurrent atomic_xchg from scrub without requiring scrub to acquire rq_lock. Fixes: 34a3e60 ("io_uring/zcrx: implement zerocopy receive pp memory provider") Cc: stable@vger.kernel.org Signed-off-by: Kai Aizen <kai@snailsploit.com> [pavel: removed a warning and a comment] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

XSK wakeup must use the async ICOSQ (with proper locking), as it is not guaranteed to run on the same CPU as the channel. The commit that converted the NAPI trigger path to use the sync ICOSQ incorrectly applied the same change to XSK, causing XSK wakeups to use the sync ICOSQ as well. Revert XSK flows to use the async ICOSQ. XDP program attach/detach triggers channel reopen, while XSK pool enable/disable can happen on-the-fly via NDOs without reopening channels. As a result, xsk_pool state cannot be reliably used at mlx5e_open_channel() time to decide whether an async ICOSQ is needed. Update the async_icosq_needed logic to depend on the presence of an XDP program rather than the xsk_pool, ensuring the async ICOSQ is available when XSK wakeups are enabled. This fixes multiple issues: 1. Illegal synchronize_rcu() in an RCU read- side critical section via mlx5e_xsk_wakeup() -> mlx5e_trigger_napi_icosq() -> synchronize_net(). The stack holds RCU read-lock in xsk_poll(). 2. Hitting a NULL pointer dereference in mlx5e_xsk_wakeup(): [] BUG: kernel NULL pointer dereference, address: 0000000000000240 [] #PF: supervisor read access in kernel mode [] #PF: error_code(0x0000) - not-present page [] PGD 0 P4D 0 [] Oops: Oops: 0000 [#1] SMP [] CPU: 0 UID: 0 PID: 2255 Comm: qemu-system-x86 Not tainted 6.19.0-rc5+ #229 PREEMPT(none) [] Hardware name: [...] [] RIP: 0010:mlx5e_xsk_wakeup+0x53/0x90 [mlx5_core] Reported-by: Daniel Borkmann <daniel@iogearbox.net> Closes: https://lore.kernel.org/all/20260123223916.361295-1-daniel@iogearbox.net/ Fixes: 56aca3e ("net/mlx5e: Use regular ICOSQ for triggering NAPI") Tested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Alice Mikityanska <alice.kernel@fastmail.im> Link: https://patch.msgid.link/20260217074525.1761454-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

The ALB RX path may access rx_hashtbl concurrently with bond teardown. During rapid bond up/down cycles, rlb_deinitialize() frees rx_hashtbl while RX handlers are still running, leading to a null pointer dereference detected by KASAN. However, the root cause is that rlb_arp_recv() can still be accessed after setting recv_probe to NULL, which is actually a use-after-free (UAF) issue. That is the reason for using the referenced commit in the Fixes tag. [ 214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI [ 214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef] [ 214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary) [ 214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022 [ 214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding] [ 214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00 [ 214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206 [ 214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e [ 214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8 [ 214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8 [ 214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119 [ 214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810 [ 214.286943] FS: 00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000 [ 214.295966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0 [ 214.310347] Call Trace: [ 214.313070] <IRQ> [ 214.315318] ? __pfx_rlb_arp_recv+0x10/0x10 [bonding] [ 214.320975] bond_handle_frame+0x166/0xb60 [bonding] [ 214.326537] ? __pfx_bond_handle_frame+0x10/0x10 [bonding] [ 214.332680] __netif_receive_skb_core.constprop.0+0x576/0x2710 [ 214.339199] ? __pfx_arp_process+0x10/0x10 [ 214.343775] ? sched_balance_find_src_group+0x98/0x630 [ 214.349513] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10 [ 214.356513] ? arp_rcv+0x307/0x690 [ 214.360311] ? __pfx_arp_rcv+0x10/0x10 [ 214.364499] ? __lock_acquire+0x58c/0xbd0 [ 214.368975] __netif_receive_skb_one_core+0xae/0x1b0 [ 214.374518] ? __pfx___netif_receive_skb_one_core+0x10/0x10 [ 214.380743] ? lock_acquire+0x10b/0x140 [ 214.385026] process_backlog+0x3f1/0x13a0 [ 214.389502] ? process_backlog+0x3aa/0x13a0 [ 214.394174] __napi_poll.constprop.0+0x9f/0x370 [ 214.399233] net_rx_action+0x8c1/0xe60 [ 214.403423] ? __pfx_net_rx_action+0x10/0x10 [ 214.408193] ? lock_acquire.part.0+0xbd/0x260 [ 214.413058] ? sched_clock_cpu+0x6c/0x540 [ 214.417540] ? mark_held_locks+0x40/0x70 [ 214.421920] handle_softirqs+0x1fd/0x860 [ 214.426302] ? __pfx_handle_softirqs+0x10/0x10 [ 214.431264] ? __neigh_event_send+0x2d6/0xf50 [ 214.436131] do_softirq+0xb1/0xf0 [ 214.439830] </IRQ> The issue is reproducible by repeatedly running ip link set bond0 up/down while receiving ARP messages, where rlb_arp_recv() can race with rlb_deinitialize() and dereference a freed rx_hashtbl entry. Fix this by setting recv_probe to NULL and then calling synchronize_net() to wait for any concurrent RX processing to finish. This ensures that no RX handler can access rx_hashtbl after it is freed in bond_alb_deinitialize(). Reported-by: Liang Li <liali@redhat.com> Fixes: 3aba891 ("bonding: move processing of recv handlers into handle_frame()") Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Jay Vosburgh <jv@jvosburgh.net> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20260218060919.101574-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

@gfp

Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Commit d49004c ("arch, mm: consolidate initialization of nodes, zones and memory map") moved free_area_init() from setup_arch() to mm_core_init_early(), which runs after setup_arch() returns. This changed the ordering relative to init_cpu_to_node() on x86. Before the commit, free_area_init() ran during paging_init() (called from setup_arch()) *before* init_cpu_to_node(). After the commit, it runs *after* init_cpu_to_node(). On machines with memoryless NUMA nodes (e.g., node 0 has CPUs but no memory), this causes a NULL pointer dereference: 1. numa_register_nodes() skips memoryless nodes: no alloc_node_data() and no node_set_online() for them. 2. init_cpu_to_node() sets memoryless nodes online (they have CPUs) but does not allocate NODE_DATA. 3. free_area_init() checks "if (!node_online(nid))" to decide whether to call alloc_offline_node_data(). Since the memoryless node is now online, the allocation is skipped, leaving NODE_DATA(nid) == NULL. 4. The immediate "pgdat = NODE_DATA(nid)" dereferences NULL. The crash happens before console_init(), so no output is visible without earlyprintk. With earlyprintk enabled, the following panic is observed: BUG: unable to handle page fault for address: 000000000002a1e0 Oops: Oops: 0000 [#1] SMP NOPTI RIP: 0010:free_area_init_node+0x3a/0x540 Call Trace: <TASK> free_area_init+0x331/0x4e0 start_kernel+0x69/0x4a0 x86_64_start_reservations+0x24/0x30 x86_64_start_kernel+0x125/0x130 common_startup_64+0x13e/0x148 </TASK> Kernel panic - not syncing: Attempted to kill the idle task! Fix this by checking "if (!NODE_DATA(nid))" instead of "if (!node_online(nid))". This directly tests whether the per-node data structure needs to be allocated, regardless of the node's online status. This change is also safe for non-x86 architectures as they all allocate NODE_DATA for every node including memoryless ones, so the check simply evaluates to false with no change in behavior. Link: https://lkml.kernel.org/r/20260222115702.3659-1-ming.lei@redhat.com Fixes: d49004c ("arch, mm: consolidate initialization of nodes, zones and memory map") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

CXL testing environment can trigger following trace Oops: general protection fault, probably for non-canonical address 0xdffffc0000000092: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000490-0x0000000000000497] RIP: 0010:cxl_dpa_to_region+0x105/0x1f0 [cxl_core] Call Trace: <TASK> cxl_event_trace_record+0xd1/0xa70 [cxl_core] __cxl_event_trace_record+0x12f/0x1e0 [cxl_core] cxl_mem_get_records_log+0x261/0x500 [cxl_core] cxl_mem_get_event_records+0x7c/0xc0 [cxl_core] cxl_mock_mem_probe+0xd38/0x1c60 [cxl_mock_mem] platform_probe+0x9d/0x130 really_probe+0x1c8/0x960 __driver_probe_device+0x187/0x3e0 driver_probe_device+0x45/0x120 __device_attach_driver+0x15d/0x280 When CXL subsystem adds a cxl port to a hierarchy, there is a small window where the new port becomes visible before it is bound to a driver. This happens because device_add() adds a device to bus device list before bus_probe_device() binds it to a driver. So if two cxl memdevs are trying to add a dport to a same port via devm_cxl_enumerate_ports(), the second cxl memdev may observe the port and attempt to add a dport, but fails because the port has not yet been attached to cxl port driver. That causes the memdev->endpoint can not be updated. The sequence is like: CPU 0 CPU 1 devm_cxl_enumerate_ports() # port not found, add it add_port_attach_ep() # hold the parent port lock # to add the new port devm_cxl_create_port() device_add() # Add dev to bus devs list bus_add_device() devm_cxl_enumerate_ports() # found the port find_cxl_port_by_uport() # hold port lock to add a dport device_lock(the port) find_or_add_dport() cxl_port_add_dport() return -ENXIO because port->dev.driver is NULL device_unlock(the port) bus_probe_device() # hold the port lock # for attaching device_lock(the port) attaching the new port device_unlock(the port) To fix this race, require that dport addition holds the host lock of the target port(the host of CXL root and all cxl host bridge ports is the platform firmware device, the host of all other ports is their parent port). The CXL subsystem already requires holding the host lock while attaching a new port. Therefore, successfully acquiring the host lock guarantees that port attaching has completed. Fixes: 4f06d81 ("cxl: Defer dport allocation for switch ports") Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20260210-fix-port-enumeration-failure-v3-2-06acce0b9ead@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>

…bled When FEAT_LPA2 is enabled, bits 8-9 of the PTE replace the shareability attribute with bits 50-51 of the output address. The _PAGE_GCS{,_RO} definitions include the PTE_SHARED bits as 0b11 (this matches the other _PAGE_* definitions) but using this macro directly leads to the following panic when enabling GCS on a system/model with LPA2: Unable to handle kernel paging request at virtual address fffff1ffc32d8008 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000060f4d000 [fffff1ffc32d8008] pgd=100000006184b003, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP CPU: 0 UID: 0 PID: 513 Comm: gcs_write_fault Tainted: G M 7.0.0-rc1 #1 PREEMPT Tainted: [M]=MACHINE_CHECK Hardware name: QEMU QEMU Virtual Machine, BIOS 2025.02-8+deb13u1 11/08/2025 pstate: 03402005 (nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : zap_huge_pmd+0x168/0x468 lr : zap_huge_pmd+0x2c/0x468 sp : ffff800080beb660 x29: ffff800080beb660 x28: fff00000c2058180 x27: ffff800080beb898 x26: fff00000c2058180 x25: ffff800080beb820 x24: 00c800010b600f41 x23: ffffc1ffc30af1a8 x22: fff00000c2058180 x21: 0000ffff8dc00000 x20: fff00000c2bc6370 x19: ffff800080beb898 x18: ffff800080bebb60 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000007 x14: 000000000000000a x13: 0000aaaacbbbffff x12: 0000000000000000 x11: 0000ffff8ddfffff x10: 00000000000001fe x9 : 0000ffff8ddfffff x8 : 0000ffff8de00000 x7 : 0000ffff8da00000 x6 : fff00000c2bc6370 x5 : 0000ffff8da00000 x4 : 000000010b600000 x3 : ffffc1ffc0000000 x2 : fff00000c2058180 x1 : fffff1ffc32d8000 x0 : 000000c00010b600 Call trace: zap_huge_pmd+0x168/0x468 (P) unmap_page_range+0xd70/0x1560 unmap_single_vma+0x48/0x80 unmap_vmas+0x90/0x180 unmap_region+0x88/0xe4 vms_complete_munmap_vmas+0xf8/0x1e0 do_vmi_align_munmap+0x158/0x180 do_vmi_munmap+0xac/0x160 __vm_munmap+0xb0/0x138 vm_munmap+0x14/0x20 gcs_free+0x70/0x80 mm_release+0x1c/0xc8 exit_mm_release+0x28/0x38 do_exit+0x190/0x8ec do_group_exit+0x34/0x90 get_signal+0x794/0x858 arch_do_signal_or_restart+0x11c/0x3e0 exit_to_user_mode_loop+0x10c/0x17c el0_da+0x8c/0x9c el0t_64_sync_handler+0xd0/0xf0 el0t_64_sync+0x198/0x19c Code: aa1603e2 d34cfc00 cb813001 8b011861 (f9400420) Similarly to how the kernel handles protection_map[], use a gcs_page_prot variable to store the protection bits and clear PTE_SHARED if LPA2 is enabled. Also remove the unused PAGE_GCS{,_RO} macros. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Fixes: 6497b66 ("arm64/mm: Map pages for guarded control stack") Reported-by: Emanuele Rocca <emanuele.rocca@arm.com> Cc: stable@vger.kernel.org Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>

test_progs run with ASAN reported [1]: ==126==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7f1ff3cfa340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77 #1 0x5610c15bb520 in bpf_program_attach_fd /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13164 #2 0x5610c15bb740 in bpf_program__attach_xdp /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13204 #3 0x5610c14f91d3 in test_xdp_flowtable /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c:138 #4 0x5610c1533566 in run_one_test /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:1406 #5 0x5610c1537fb0 in main /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:2097 #6 0x7f1ff25df1c9 (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb) #7 0x7f1ff25df28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb) #8 0x5610c0bd3180 in _start (/tmp/work/vmtest/vmtest/selftests/bpf/test_progs+0x593180) (BuildId: cdf9f103f42307dc0a2cd6cfc8afcbc1366cf8bd) Fix by properly destroying bpf_link on exit in xdp_flowtable test. [1] https://github.com/kernel-patches/vmtest/actions/runs/22361085418/job/64716490680 Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://lore.kernel.org/r/20260225003351.465104-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Lockdep found a bug in the event scheduling when a pinned event was failed and wakes up the threads in the ring buffer like below. It seems it should not grab a wait-queue lock under perf-context lock. Let's do it with irq_work. [ 39.913691] ============================= [ 39.914157] [ BUG: Invalid wait context ] [ 39.914623] 6.15.0-next-20250530-next-2025053 #1 Not tainted [ 39.915271] ----------------------------- [ 39.915731] repro/837 is trying to lock: [ 39.916191] ffff88801acfabd8 (&event->waitq){....}-{3:3}, at: __wake_up+0x26/0x60 [ 39.917182] other info that might help us debug this: [ 39.917761] context-{5:5} [ 39.918079] 4 locks held by repro/837: [ 39.918530] #0: ffffffff8725cd00 (rcu_read_lock){....}-{1:3}, at: __perf_event_task_sched_in+0xd1/0xbc0 [ 39.919612] #1: ffff88806ca3c6f8 (&cpuctx_lock){....}-{2:2}, at: __perf_event_task_sched_in+0x1a7/0xbc0 [ 39.920748] #2: ffff88800d91fc18 (&ctx->lock){....}-{2:2}, at: __perf_event_task_sched_in+0x1f9/0xbc0 [ 39.921819] #3: ffffffff8725cd00 (rcu_read_lock){....}-{1:3}, at: perf_event_wakeup+0x6c/0x470 Fixes: f4b07fd ("perf/core: Use POLLHUP for a pinned event in error") Closes: https://lore.kernel.org/lkml/aD2w50VDvGIH95Pf@ly-workstation Reported-by: "Lai, Yi" <yi1.lai@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: "Lai, Yi" <yi1.lai@linux.intel.com> Link: https://patch.msgid.link/20250603045105.1731451-1-namhyung@kernel.org

…kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.0, take #1 - Make sure we don't leak any S1POE state from guest to guest when the feature is supported on the HW, but not enabled on the host - Propagate the ID registers from the host into non-protected VMs managed by pKVM, ensuring that the guest sees the intended feature set - Drop double kern_hyp_va() from unpin_host_sve_state(), which could bite us if we were to change kern_hyp_va() to not being idempotent - Don't leak stage-2 mappings in protected mode - Correctly align the faulting address when dealing with single page stage-2 mappings for PAGE_SIZE > 4kB - Fix detection of virtualisation-capable GICv5 IRS, due to the maintainer being obviously fat fingered... - Remove duplication of code retrieving the ASID for the purpose of S1 PT handling - Fix slightly abusive const-ification in vgic_set_kvm_info()

The current cpuset partition code is able to dynamically update the sched domains of a running system and the corresponding HK_TYPE_DOMAIN housekeeping cpumask to perform what is essentially the "isolcpus=domain,..." boot command line feature at run time. The housekeeping cpumask update requires flushing a number of different workqueues which may not be safe with cpus_read_lock() held as the workqueue flushing code may acquire cpus_read_lock() or acquiring locks which have locking dependency with cpus_read_lock() down the chain. Below is an example of such circular locking problem. ====================================================== WARNING: possible circular locking dependency detected 6.18.0-test+ #2 Tainted: G S ------------------------------------------------------ test_cpuset_prs/10971 is trying to acquire lock: ffff888112ba4958 ((wq_completion)sync_wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x7a/0x180 but task is already holding lock: ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (cpuset_mutex){+.+.}-{4:4}: -> #3 (cpu_hotplug_lock){++++}-{0:0}: -> #2 (rtnl_mutex){+.+.}-{4:4}: -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}: -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}: Chain exists of: (wq_completion)sync_wq --> cpu_hotplug_lock --> cpuset_mutex 5 locks held by test_cpuset_prs/10971: #0: ffff88816810e440 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 #1: ffff8891ab620890 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x260/0x5f0 #2: ffff8890a78b83e8 (kn->active#187){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x2b6/0x5f0 #3: ffffffffadf32900 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_partition_write+0x77/0x130 #4: ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130 Call Trace: <TASK> : touch_wq_lockdep_map+0x93/0x180 __flush_workqueue+0x111/0x10b0 housekeeping_update+0x12d/0x2d0 update_parent_effective_cpumask+0x595/0x2440 update_prstate+0x89d/0xce0 cpuset_partition_write+0xc5/0x130 cgroup_file_write+0x1a5/0x680 kernfs_fop_write_iter+0x3df/0x5f0 vfs_write+0x525/0xfd0 ksys_write+0xf9/0x1d0 do_syscall_64+0x95/0x520 entry_SYSCALL_64_after_hwframe+0x76/0x7e To avoid such a circular locking dependency problem, we have to call housekeeping_update() without holding the cpus_read_lock() and cpuset_mutex. The current set of wq's flushed by housekeeping_update() may not have work functions that call cpus_read_lock() directly, but we are likely to extend the list of wq's that are flushed in the future. Moreover, the current set of work functions may hold locks that may have cpu_hotplug_lock down the dependency chain. So housekeeping_update() is now called after releasing cpus_read_lock and cpuset_mutex at the end of a cpuset operation. These two locks are then re-acquired later before calling rebuild_sched_domains_locked(). To enable mutual exclusion between the housekeeping_update() call and other cpuset control file write actions, a new top level cpuset_top_mutex is introduced. This new mutex will be acquired first to allow sharing variables used by both code paths. However, cpuset update from CPU hotplug can still happen in parallel with the housekeeping_update() call, though that should be rare in production environment. As cpus_read_lock() is now no longer held when tmigr_isolated_exclude_cpumask() is called, it needs to acquire it directly. The lockdep_is_cpuset_held() is also updated to return true if either cpuset_top_mutex or cpuset_mutex is held. Fixes: 03ff735 ("cpuset: Update HK_TYPE_DOMAIN cpumask from cpuset") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>

Offloading ETS requires computing each class' WRR weight: this is done by averaging over the sums of quanta as 'q_sum' and 'q_psum'. Using unsigned int, the same integer size as the individual DRR quanta, can overflow and even cause division by zero, like it happened in the following splat: Oops: divide error: 0000 [#1] SMP PTI CPU: 13 UID: 0 PID: 487 Comm: tc Tainted: G E 6.19.0-virtme #45 PREEMPT(full) Tainted: [E]=UNSIGNED_MODULE Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 RIP: 0010:ets_offload_change+0x11f/0x290 [sch_ets] Code: e4 45 31 ff eb 03 41 89 c7 41 89 cb 89 ce 83 f9 0f 0f 87 b7 00 00 00 45 8b 08 31 c0 45 01 cc 45 85 c9 74 09 41 6b c4 64 31 d2 <41> f7 f2 89 c2 44 29 fa 45 89 df 41 83 fb 0f 0f 87 c7 00 00 00 44 RSP: 0018:ffffd0a180d77588 EFLAGS: 00010246 RAX: 00000000ffffff38 RBX: ffff8d3d482ca000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd0a180d77660 RBP: ffffd0a180d77690 R08: ffff8d3d482ca2d8 R09: 00000000fffffffe R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffe R13: ffff8d3d472f2000 R14: 0000000000000003 R15: 0000000000000000 FS: 00007f440b6c2740(0000) GS:ffff8d3dc9803000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000003cdd2000 CR3: 0000000007b58002 CR4: 0000000000172ef0 Call Trace: <TASK> ets_qdisc_change+0x870/0xf40 [sch_ets] qdisc_create+0x12b/0x540 tc_modify_qdisc+0x6d7/0xbd0 rtnetlink_rcv_msg+0x168/0x6b0 netlink_rcv_skb+0x5c/0x110 netlink_unicast+0x1d6/0x2b0 netlink_sendmsg+0x22e/0x470 ____sys_sendmsg+0x38a/0x3c0 ___sys_sendmsg+0x99/0xe0 __sys_sendmsg+0x8a/0xf0 do_syscall_64+0x111/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f440b81c77e Code: 4d 89 d8 e8 d4 bc 00 00 4c 8b 5d f8 41 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 11 c9 c3 0f 1f 80 00 00 00 00 48 8b 45 10 0f 05 <c9> c3 83 e2 39 83 fa 08 75 e7 e8 13 ff ff ff 0f 1f 00 f3 0f 1e fa RSP: 002b:00007fff951e4c10 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000481820 RCX: 00007f440b81c77e RDX: 0000000000000000 RSI: 00007fff951e4cd0 RDI: 0000000000000003 RBP: 00007fff951e4c20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff951f4fa8 R13: 00000000699ddede R14: 00007f440bb01000 R15: 0000000000486980 </TASK> Modules linked in: sch_ets(E) netdevsim(E) ---[ end trace 0000000000000000 ]--- RIP: 0010:ets_offload_change+0x11f/0x290 [sch_ets] Code: e4 45 31 ff eb 03 41 89 c7 41 89 cb 89 ce 83 f9 0f 0f 87 b7 00 00 00 45 8b 08 31 c0 45 01 cc 45 85 c9 74 09 41 6b c4 64 31 d2 <41> f7 f2 89 c2 44 29 fa 45 89 df 41 83 fb 0f 0f 87 c7 00 00 00 44 RSP: 0018:ffffd0a180d77588 EFLAGS: 00010246 RAX: 00000000ffffff38 RBX: ffff8d3d482ca000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffd0a180d77660 RBP: ffffd0a180d77690 R08: ffff8d3d482ca2d8 R09: 00000000fffffffe R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffffe R13: ffff8d3d472f2000 R14: 0000000000000003 R15: 0000000000000000 FS: 00007f440b6c2740(0000) GS:ffff8d3dc9803000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000003cdd2000 CR3: 0000000007b58002 CR4: 0000000000172ef0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x30000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- Fix this using 64-bit integers for 'q_sum' and 'q_psum'. Cc: stable@vger.kernel.org Fixes: d35eb52 ("net: sch_ets: Make the ETS qdisc offloadable") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/28504887df314588c7255e9911769c36f751edee.1771964872.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Since the conversion of ice to page pool, the ethtool loopback test crashes: BUG: kernel NULL pointer dereference, address: 000000000000000c #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 1100f1067 P4D 0 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 23 UID: 0 PID: 5904 Comm: ethtool Kdump: loaded Not tainted 6.19.0-0.rc7.260128g1f97d9dcf5364.49.eln154.x86_64 #1 PREEMPT(lazy) Hardware name: [...] RIP: 0010:ice_alloc_rx_bufs+0x1cd/0x310 [ice] Code: 83 6c 24 30 01 66 41 89 47 08 0f 84 c0 00 00 00 41 0f b7 dc 48 8b 44 24 18 48 c1 e3 04 41 bb 00 10 00 00 48 8d 2c 18 8b 04 24 <89> 45 0c 41 8b 4d 00 49 d3 e3 44 3b 5c 24 24 0f 83 ac fe ff ff 44 RSP: 0018:ff7894738aa1f768 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000700 RDI: 0000000000000000 RBP: 0000000000000000 R08: ff16dcae79880200 R09: 0000000000000019 R10: 0000000000000001 R11: 0000000000001000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ff16dcae6c670000 FS: 00007fcf428850c0(0000) GS:ff16dcb149710000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000c CR3: 0000000121227005 CR4: 0000000000773ef0 PKRU: 55555554 Call Trace: <TASK> ice_vsi_cfg_rxq+0xca/0x460 [ice] ice_vsi_cfg_rxqs+0x54/0x70 [ice] ice_loopback_test+0xa9/0x520 [ice] ice_self_test+0x1b9/0x280 [ice] ethtool_self_test+0xe5/0x200 __dev_ethtool+0x1106/0x1a90 dev_ethtool+0xbe/0x1a0 dev_ioctl+0x258/0x4c0 sock_do_ioctl+0xe3/0x130 __x64_sys_ioctl+0xb9/0x100 do_syscall_64+0x7c/0x700 entry_SYSCALL_64_after_hwframe+0x76/0x7e [...] It crashes because we have not initialized libeth for the rx ring. Fix it by treating ICE_VSI_LB VSIs slightly more like normal PF VSIs and letting them have a q_vector. It's just a dummy, because the loopback test does not use interrupts, but it contains a napi struct that can be passed to libeth_rx_fq_create() called from ice_vsi_cfg_rxq() -> ice_rxq_pp_create(). Fixes: 93f53db ("ice: switch to Page Pool") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

The libie_fwlog_deinit() function can be called during driver unload even when firmware logging was never properly initialized. This led to call trace: [ 148.576156] Oops: Oops: 0000 [#1] SMP NOPTI [ 148.576167] CPU: 80 UID: 0 PID: 12843 Comm: rmmod Kdump: loaded Not tainted 6.17.0-rc7next-queue-3oct-01915-g06d79d51cf51 #1 PREEMPT(full) [ 148.576177] Hardware name: HPE ProLiant DL385 Gen10 Plus/ProLiant DL385 Gen10 Plus, BIOS A42 07/18/2020 [ 148.576182] RIP: 0010:__dev_printk+0x16/0x70 [ 148.576196] Code: 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 41 55 41 54 49 89 d4 55 48 89 fd 53 48 85 f6 74 3c <4c> 8b 6e 50 48 89 f3 4d 85 ed 75 03 4c 8b 2e 48 89 df e8 f3 27 98 [ 148.576204] RSP: 0018:ffffd2fd7ea17a48 EFLAGS: 00010202 [ 148.576211] RAX: ffffd2fd7ea17aa0 RBX: ffff8eb288ae2000 RCX: 0000000000000000 [ 148.576217] RDX: ffffd2fd7ea17a70 RSI: 00000000000000c8 RDI: ffffffffb68d3d88 [ 148.576222] RBP: ffffffffb68d3d88 R08: 0000000000000000 R09: 0000000000000000 [ 148.576227] R10: 00000000000000c8 R11: ffff8eb2b1a49400 R12: ffffd2fd7ea17a70 [ 148.576231] R13: ffff8eb3141fb000 R14: ffffffffc1215b48 R15: ffffffffc1215bd8 [ 148.576236] FS: 00007f5666ba6740(0000) GS:ffff8eb2472b9000(0000) knlGS:0000000000000000 [ 148.576242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 148.576247] CR2: 0000000000000118 CR3: 000000011ad17000 CR4: 0000000000350ef0 [ 148.576252] Call Trace: [ 148.576258] <TASK> [ 148.576269] _dev_warn+0x7c/0x96 [ 148.576290] libie_fwlog_deinit+0x112/0x117 [libie_fwlog] [ 148.576303] ixgbe_remove+0x63/0x290 [ixgbe] [ 148.576342] pci_device_remove+0x42/0xb0 [ 148.576354] device_release_driver_internal+0x19c/0x200 [ 148.576365] driver_detach+0x48/0x90 [ 148.576372] bus_remove_driver+0x6d/0xf0 [ 148.576383] pci_unregister_driver+0x2e/0xb0 [ 148.576393] ixgbe_exit_module+0x1c/0xd50 [ixgbe] [ 148.576430] __do_sys_delete_module.isra.0+0x1bc/0x2e0 [ 148.576446] do_syscall_64+0x7f/0x980 It can be reproduced by trying to unload ixgbe driver in recovery mode. Fix that by checking if fwlog is supported before doing unroll. Fixes: 641585b ("ixgbe: fwlog support for e610") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never initialized because inet6_init() exits before ndisc_init() is called which initializes it. Then, if neigh_suppress is enabled and an ICMPv6 Neighbor Discovery packet reaches the bridge, br_do_suppress_nd() will dereference ipv6_stub->nd_tbl which is NULL, passing it to neigh_lookup(). This causes a kernel NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000268 Oops: 0000 [#1] PREEMPT SMP NOPTI [...] RIP: 0010:neigh_lookup+0x16/0xe0 [...] Call Trace: <IRQ> ? neigh_lookup+0x16/0xe0 br_do_suppress_nd+0x160/0x290 [bridge] br_handle_frame_finish+0x500/0x620 [bridge] br_handle_frame+0x353/0x440 [bridge] __netif_receive_skb_core.constprop.0+0x298/0x1110 __netif_receive_skb_one_core+0x3d/0xa0 process_backlog+0xa0/0x140 __napi_poll+0x2c/0x170 net_rx_action+0x2c4/0x3a0 handle_softirqs+0xd0/0x270 do_softirq+0x3f/0x60 Fix this by replacing IS_ENABLED(IPV6) call with ipv6_mod_enabled() in the callers. This is in essence disabling NS/NA suppression when IPv6 is disabled. Fixes: ed842fa ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports") Reported-by: Guruprasad C P <gurucp2005@gmail.com> Closes: https://lore.kernel.org/netdev/CAHXs0ORzd62QOG-Fttqa2Cx_A_VFp=utE2H2VTX5nqfgs7LDxQ@mail.gmail.com/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260304120357.9778-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>

When booting with the 'ipv6.disable=1' parameter, the nd_tbl is never initialized because inet6_init() exits before ndisc_init() is called which initializes it. If an IPv6 packet is injected into the interface, route_shortcircuit() is called and a NULL pointer dereference happens on neigh_lookup(). BUG: kernel NULL pointer dereference, address: 0000000000000380 Oops: Oops: 0000 [#1] SMP NOPTI [...] RIP: 0010:neigh_lookup+0x20/0x270 [...] Call Trace: <TASK> vxlan_xmit+0x638/0x1ef0 [vxlan] dev_hard_start_xmit+0x9e/0x2e0 __dev_queue_xmit+0xbee/0x14e0 packet_sendmsg+0x116f/0x1930 __sys_sendto+0x1f5/0x200 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x12f/0x1590 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fix this by adding an early check on route_shortcircuit() when protocol is ETH_P_IPV6. Note that ipv6_mod_enabled() cannot be used here because VXLAN can be built-in even when IPv6 is built as a module. Fixes: e15a00a ("vxlan: add ipv6 route short circuit support") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260304120357.9778-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>

…ipv6-nexthop-and-add-selftest' Jiayuan Chen says: ==================== net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop and add selftest syzbot reported a kernel panic [1] when an IPv4 route references a loopback IPv6 nexthop object: BUG: unable to handle page fault for address: ffff8d069e7aa000 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 6aa01067 P4D 6aa01067 PUD 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 2 UID: 0 PID: 530 Comm: ping Not tainted 6.19.0+ #193 PREEMPT RIP: 0010:ip_route_output_key_hash_rcu+0x578/0x9e0 RSP: 0018:ffffd2ffc1573918 EFLAGS: 00010286 RAX: ffff8d069e7aa000 RBX: ffffd2ffc1573988 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffd2ffc1573978 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d060d496000 R13: 0000000000000000 R14: ffff8d060399a600 R15: ffff8d06019a6ab8 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8d069e7aa000 CR3: 0000000106eb0001 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ip_route_output_key_hash+0x86/0x1a0 __ip4_datagram_connect+0x2b5/0x4e0 udp_connect+0x2c/0x60 inet_dgram_connect+0x88/0xd0 __sys_connect_file+0x56/0x90 __sys_connect+0xa8/0xe0 __x64_sys_connect+0x18/0x30 x64_sys_call+0xfb9/0x26e0 do_syscall_64+0xd3/0x1510 entry_SYSCALL_64_after_hwframe+0x76/0x7e Reproduction: ip -6 nexthop add id 100 dev lo ip route add 172.20.20.0/24 nhid 100 ping -c1 172.20.20.1 # kernel crash Problem Description When a standalone IPv6 nexthop object is created with a loopback device, fib6_nh_init() misclassifies it as a reject route. Nexthop objects have no destination prefix (fc_dst=::), so fib6_is_reject() always matches any loopback nexthop. The reject path skips fib_nh_common_init(), leaving nhc_pcpu_rth_output unallocated. When an IPv4 route later references this nexthop and triggers a route lookup, __mkroute_output() calls raw_cpu_ptr(nhc->nhc_pcpu_rth_output) on a NULL pointer, causing a page fault. The reject classification was designed for regular IPv6 routes to prevent kernel routing loops, but nexthop objects should not be subject to this check since they carry no destination information. Loop prevention is handled separately when the route itself is created. [1] https://syzkaller.appspot.com/bug?extid=334190e097a98a1b81bb ==================== Link: https://patch.msgid.link/20260304113817.294966-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Many ethernet drivers report xdp Rx queue frag size as being the same as DMA write size. However, the only user of this field, namely bpf_xdp_frags_increase_tail(), clearly expects a truesize. Such difference leads to unspecific memory corruption issues under certain circumstances, e.g. in ixgbevf maximum DMA write size is 3 KB, so when running xskxceiver's XDP_ADJUST_TAIL_GROW_MULTI_BUFF, 6K packet fully uses all DMA-writable space in 2 buffers. This would be fine, if only rxq->frag_size was properly set to 4K, but value of 3K results in a negative tailroom, because there is a non-zero page offset. We are supposed to return -EINVAL and be done with it in such case, but due to tailroom being stored as an unsigned int, it is reported to be somewhere near UINT_MAX, resulting in a tail being grown, even if the requested offset is too much (it is around 2K in the abovementioned test). This later leads to all kinds of unspecific calltraces. [ 7340.337579] xskxceiver[1440]: segfault at 1da718 ip 00007f4161aeac9d sp 00007f41615a6a00 error 6 [ 7340.338040] xskxceiver[1441]: segfault at 7f410000000b ip 00000000004042b5 sp 00007f415bffecf0 error 4 [ 7340.338179] in libc.so.6[61c9d,7f4161aaf000+160000] [ 7340.339230] in xskxceiver[42b5,400000+69000] [ 7340.340300] likely on CPU 6 (core 0, socket 6) [ 7340.340302] Code: ff ff 01 e9 f4 fe ff ff 0f 1f 44 00 00 4c 39 f0 74 73 31 c0 ba 01 00 00 00 f0 0f b1 17 0f 85 ba 00 00 00 49 8b 87 88 00 00 00 <4c> 89 70 08 eb cc 0f 1f 44 00 00 48 8d bd f0 fe ff ff 89 85 ec fe [ 7340.340888] likely on CPU 3 (core 0, socket 3) [ 7340.345088] Code: 00 00 00 ba 00 00 00 00 be 00 00 00 00 89 c7 e8 31 ca ff ff 89 45 ec 8b 45 ec 85 c0 78 07 b8 00 00 00 00 eb 46 e8 0b c8 ff ff <8b> 00 83 f8 69 74 24 e8 ff c7 ff ff 8b 00 83 f8 0b 74 18 e8 f3 c7 [ 7340.404334] Oops: general protection fault, probably for non-canonical address 0x6d255010bdffc: 0000 [#1] SMP NOPTI [ 7340.405972] CPU: 7 UID: 0 PID: 1439 Comm: xskxceiver Not tainted 6.19.0-rc1+ #21 PREEMPT(lazy) [ 7340.408006] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014 [ 7340.409716] RIP: 0010:lookup_swap_cgroup_id+0x44/0x80 [ 7340.410455] Code: 83 f8 1c 73 39 48 ba ff ff ff ff ff ff ff 03 48 8b 04 c5 20 55 fa bd 48 21 d1 48 89 ca 83 e1 01 48 d1 ea c1 e1 04 48 8d 04 90 <8b> 00 48 83 c4 10 d3 e8 c3 cc cc cc cc 31 c0 e9 98 b7 dd 00 48 89 [ 7340.412787] RSP: 0018:ffffcc5c04f7f6d0 EFLAGS: 00010202 [ 7340.413494] RAX: 0006d255010bdffc RBX: ffff891f477895a8 RCX: 0000000000000010 [ 7340.414431] RDX: 0001c17e3fffffff RSI: 00fa070000000000 RDI: 000382fc7fffffff [ 7340.415354] RBP: 00fa070000000000 R08: ffffcc5c04f7f8f8 R09: ffffcc5c04f7f7d0 [ 7340.416283] R10: ffff891f4c1a7000 R11: ffffcc5c04f7f9c8 R12: ffffcc5c04f7f7d0 [ 7340.417218] R13: 03ffffffffffffff R14: 00fa06fffffffe00 R15: ffff891f47789500 [ 7340.418229] FS: 0000000000000000(0000) GS:ffff891ffdfaa000(0000) knlGS:0000000000000000 [ 7340.419489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7340.420286] CR2: 00007f415bfffd58 CR3: 0000000103f03002 CR4: 0000000000772ef0 [ 7340.421237] PKRU: 55555554 [ 7340.421623] Call Trace: [ 7340.421987] <TASK> [ 7340.422309] ? softleaf_from_pte+0x77/0xa0 [ 7340.422855] swap_pte_batch+0xa7/0x290 [ 7340.423363] zap_nonpresent_ptes.constprop.0.isra.0+0xd1/0x270 [ 7340.424102] zap_pte_range+0x281/0x580 [ 7340.424607] zap_pmd_range.isra.0+0xc9/0x240 [ 7340.425177] unmap_page_range+0x24d/0x420 [ 7340.425714] unmap_vmas+0xa1/0x180 [ 7340.426185] exit_mmap+0xe1/0x3b0 [ 7340.426644] __mmput+0x41/0x150 [ 7340.427098] exit_mm+0xb1/0x110 [ 7340.427539] do_exit+0x1b2/0x460 [ 7340.427992] do_group_exit+0x2d/0xc0 [ 7340.428477] get_signal+0x79d/0x7e0 [ 7340.428957] arch_do_signal_or_restart+0x34/0x100 [ 7340.429571] exit_to_user_mode_loop+0x8e/0x4c0 [ 7340.430159] do_syscall_64+0x188/0x6b0 [ 7340.430672] ? __do_sys_clone3+0xd9/0x120 [ 7340.431212] ? switch_fpu_return+0x4e/0xd0 [ 7340.431761] ? arch_exit_to_user_mode_prepare.isra.0+0xa1/0xc0 [ 7340.432498] ? do_syscall_64+0xbb/0x6b0 [ 7340.433015] ? __handle_mm_fault+0x445/0x690 [ 7340.433582] ? count_memcg_events+0xd6/0x210 [ 7340.434151] ? handle_mm_fault+0x212/0x340 [ 7340.434697] ? do_user_addr_fault+0x2b4/0x7b0 [ 7340.435271] ? clear_bhb_loop+0x30/0x80 [ 7340.435788] ? clear_bhb_loop+0x30/0x80 [ 7340.436299] ? clear_bhb_loop+0x30/0x80 [ 7340.436812] ? clear_bhb_loop+0x30/0x80 [ 7340.437323] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 7340.437973] RIP: 0033:0x7f4161b14169 [ 7340.438468] Code: Unable to access opcode bytes at 0x7f4161b1413f. [ 7340.439242] RSP: 002b:00007ffc6ebfa770 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca [ 7340.440173] RAX: fffffffffffffe00 RBX: 00000000000005a1 RCX: 00007f4161b14169 [ 7340.441061] RDX: 00000000000005a1 RSI: 0000000000000109 RDI: 00007f415bfff990 [ 7340.441943] RBP: 00007ffc6ebfa7a0 R08: 0000000000000000 R09: 00000000ffffffff [ 7340.442824] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 7340.443707] R13: 0000000000000000 R14: 00007f415bfff990 R15: 00007f415bfff6c0 [ 7340.444586] </TASK> [ 7340.444922] Modules linked in: rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel vfat fat kvm snd_pcm irqbypass rapl iTCO_wdt snd_timer intel_pmc_bxt iTCO_vendor_support snd ixgbevf virtio_net soundcore i2c_i801 pcspkr libeth_xdp net_failover i2c_smbus lpc_ich failover libeth virtio_balloon joydev 9p fuse loop zram lz4hc_compress lz4_compress 9pnet_virtio 9pnet netfs ghash_clmulni_intel serio_raw qemu_fw_cfg [ 7340.449650] ---[ end trace 0000000000000000 ]--- The issue can be fixed in all in-tree drivers, but we cannot just trust OOT drivers to not do this. Therefore, make tailroom a signed int and produce a warning when it is negative to prevent such mistakes in the future. Fixes: bf25146 ("bpf: add frags support to the bpf_xdp_adjust_tail() API") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20260305111253.2317394-10-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Larysa Zaremba says: ==================== Address XDP frags having negative tailroom Aside from the issue described below, tailroom calculation does not account for pages being split between frags, e.g. in i40e, enetc and AF_XDP ZC with smaller chunks. These series address the problem by calculating modulo (skb_frag_off() % rxq->frag_size) in order to get data offset within a smaller block of memory. Please note, xskxceiver tail grow test passes without modulo e.g. in xdpdrv mode on i40e, because there is not enough descriptors to get to flipped buffers. Many ethernet drivers report xdp Rx queue frag size as being the same as DMA write size. However, the only user of this field, namely bpf_xdp_frags_increase_tail(), clearly expects a truesize. Such difference leads to unspecific memory corruption issues under certain circumstances, e.g. in ixgbevf maximum DMA write size is 3 KB, so when running xskxceiver's XDP_ADJUST_TAIL_GROW_MULTI_BUFF, 6K packet fully uses all DMA-writable space in 2 buffers. This would be fine, if only rxq->frag_size was properly set to 4K, but value of 3K results in a negative tailroom, because there is a non-zero page offset. We are supposed to return -EINVAL and be done with it in such case, but due to tailroom being stored as an unsigned int, it is reported to be somewhere near UINT_MAX, resulting in a tail being grown, even if the requested offset is too much(it is around 2K in the abovementioned test). This later leads to all kinds of unspecific calltraces. [ 7340.337579] xskxceiver[1440]: segfault at 1da718 ip 00007f4161aeac9d sp 00007f41615a6a00 error 6 [ 7340.338040] xskxceiver[1441]: segfault at 7f410000000b ip 00000000004042b5 sp 00007f415bffecf0 error 4 [ 7340.338179] in libc.so.6[61c9d,7f4161aaf000+160000] [ 7340.339230] in xskxceiver[42b5,400000+69000] [ 7340.340300] likely on CPU 6 (core 0, socket 6) [ 7340.340302] Code: ff ff 01 e9 f4 fe ff ff 0f 1f 44 00 00 4c 39 f0 74 73 31 c0 ba 01 00 00 00 f0 0f b1 17 0f 85 ba 00 00 00 49 8b 87 88 00 00 00 <4c> 89 70 08 eb cc 0f 1f 44 00 00 48 8d bd f0 fe ff ff 89 85 ec fe [ 7340.340888] likely on CPU 3 (core 0, socket 3) [ 7340.345088] Code: 00 00 00 ba 00 00 00 00 be 00 00 00 00 89 c7 e8 31 ca ff ff 89 45 ec 8b 45 ec 85 c0 78 07 b8 00 00 00 00 eb 46 e8 0b c8 ff ff <8b> 00 83 f8 69 74 24 e8 ff c7 ff ff 8b 00 83 f8 0b 74 18 e8 f3 c7 [ 7340.404334] Oops: general protection fault, probably for non-canonical address 0x6d255010bdffc: 0000 [#1] SMP NOPTI [ 7340.405972] CPU: 7 UID: 0 PID: 1439 Comm: xskxceiver Not tainted 6.19.0-rc1+ #21 PREEMPT(lazy) [ 7340.408006] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014 [ 7340.409716] RIP: 0010:lookup_swap_cgroup_id+0x44/0x80 [ 7340.410455] Code: 83 f8 1c 73 39 48 ba ff ff ff ff ff ff ff 03 48 8b 04 c5 20 55 fa bd 48 21 d1 48 89 ca 83 e1 01 48 d1 ea c1 e1 04 48 8d 04 90 <8b> 00 48 83 c4 10 d3 e8 c3 cc cc cc cc 31 c0 e9 98 b7 dd 00 48 89 [ 7340.412787] RSP: 0018:ffffcc5c04f7f6d0 EFLAGS: 00010202 [ 7340.413494] RAX: 0006d255010bdffc RBX: ffff891f477895a8 RCX: 0000000000000010 [ 7340.414431] RDX: 0001c17e3fffffff RSI: 00fa070000000000 RDI: 000382fc7fffffff [ 7340.415354] RBP: 00fa070000000000 R08: ffffcc5c04f7f8f8 R09: ffffcc5c04f7f7d0 [ 7340.416283] R10: ffff891f4c1a7000 R11: ffffcc5c04f7f9c8 R12: ffffcc5c04f7f7d0 [ 7340.417218] R13: 03ffffffffffffff R14: 00fa06fffffffe00 R15: ffff891f47789500 [ 7340.418229] FS: 0000000000000000(0000) GS:ffff891ffdfaa000(0000) knlGS:0000000000000000 [ 7340.419489] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7340.420286] CR2: 00007f415bfffd58 CR3: 0000000103f03002 CR4: 0000000000772ef0 [ 7340.421237] PKRU: 55555554 [ 7340.421623] Call Trace: [ 7340.421987] <TASK> [ 7340.422309] ? softleaf_from_pte+0x77/0xa0 [ 7340.422855] swap_pte_batch+0xa7/0x290 [ 7340.423363] zap_nonpresent_ptes.constprop.0.isra.0+0xd1/0x270 [ 7340.424102] zap_pte_range+0x281/0x580 [ 7340.424607] zap_pmd_range.isra.0+0xc9/0x240 [ 7340.425177] unmap_page_range+0x24d/0x420 [ 7340.425714] unmap_vmas+0xa1/0x180 [ 7340.426185] exit_mmap+0xe1/0x3b0 [ 7340.426644] __mmput+0x41/0x150 [ 7340.427098] exit_mm+0xb1/0x110 [ 7340.427539] do_exit+0x1b2/0x460 [ 7340.427992] do_group_exit+0x2d/0xc0 [ 7340.428477] get_signal+0x79d/0x7e0 [ 7340.428957] arch_do_signal_or_restart+0x34/0x100 [ 7340.429571] exit_to_user_mode_loop+0x8e/0x4c0 [ 7340.430159] do_syscall_64+0x188/0x6b0 [ 7340.430672] ? __do_sys_clone3+0xd9/0x120 [ 7340.431212] ? switch_fpu_return+0x4e/0xd0 [ 7340.431761] ? arch_exit_to_user_mode_prepare.isra.0+0xa1/0xc0 [ 7340.432498] ? do_syscall_64+0xbb/0x6b0 [ 7340.433015] ? __handle_mm_fault+0x445/0x690 [ 7340.433582] ? count_memcg_events+0xd6/0x210 [ 7340.434151] ? handle_mm_fault+0x212/0x340 [ 7340.434697] ? do_user_addr_fault+0x2b4/0x7b0 [ 7340.435271] ? clear_bhb_loop+0x30/0x80 [ 7340.435788] ? clear_bhb_loop+0x30/0x80 [ 7340.436299] ? clear_bhb_loop+0x30/0x80 [ 7340.436812] ? clear_bhb_loop+0x30/0x80 [ 7340.437323] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 7340.437973] RIP: 0033:0x7f4161b14169 [ 7340.438468] Code: Unable to access opcode bytes at 0x7f4161b1413f. [ 7340.439242] RSP: 002b:00007ffc6ebfa770 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca [ 7340.440173] RAX: fffffffffffffe00 RBX: 00000000000005a1 RCX: 00007f4161b14169 [ 7340.441061] RDX: 00000000000005a1 RSI: 0000000000000109 RDI: 00007f415bfff990 [ 7340.441943] RBP: 00007ffc6ebfa7a0 R08: 0000000000000000 R09: 00000000ffffffff [ 7340.442824] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 7340.443707] R13: 0000000000000000 R14: 00007f415bfff990 R15: 00007f415bfff6c0 [ 7340.444586] </TASK> [ 7340.444922] Modules linked in: rfkill intel_rapl_msr intel_rapl_common intel_uncore_frequency_common skx_edac_common nfit libnvdimm kvm_intel vfat fat kvm snd_pcm irqbypass rapl iTCO_wdt snd_timer intel_pmc_bxt iTCO_vendor_support snd ixgbevf virtio_net soundcore i2c_i801 pcspkr libeth_xdp net_failover i2c_smbus lpc_ich failover libeth virtio_balloon joydev 9p fuse loop zram lz4hc_compress lz4_compress 9pnet_virtio 9pnet netfs ghash_clmulni_intel serio_raw qemu_fw_cfg [ 7340.449650] ---[ end trace 0000000000000000 ]--- The issue can be fixed in all in-tree drivers, but we cannot just trust OOT drivers to not do this. Therefore, make tailroom a signed int and produce a warning when it is negative to prevent such mistakes in the future. The issue can also be easily reproduced with ice driver, by applying the following diff to xskxceiver and enjoying a kernel panic in xdpdrv mode: diff --git a/tools/testing/selftests/bpf/prog_tests/test_xsk.c b/tools/testing/selftests/bpf/prog_tests/test_xsk.c index 5af28f3..042d587fa7ef 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_xsk.c +++ b/tools/testing/selftests/bpf/prog_tests/test_xsk.c @@ -2541,8 +2541,8 @@ int testapp_adjust_tail_grow_mb(struct test_spec *test) { test->mtu = MAX_ETH_JUMBO_SIZE; /* Grow by (frag_size - last_frag_Size) - 1 to stay inside the last fragment */ - return testapp_adjust_tail(test, (XSK_UMEM__MAX_FRAME_SIZE / 2) - 1, - XSK_UMEM__LARGE_FRAME_SIZE * 2); + return testapp_adjust_tail(test, XSK_UMEM__MAX_FRAME_SIZE * 100, + 6912); } int testapp_tx_queue_consumer(struct test_spec *test) If we print out the values involved in the tailroom calculation: tailroom = rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag); 4294967040 = 3456 - 3456 - 256 I personally reproduced and verified the issue in ice and i40e, aside from WiP ixgbevf implementation. ==================== Link: https://patch.msgid.link/20260305111253.2317394-1-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

rafaeljw and others added 10 commits February 19, 2024 12:13

Merge branch 'pm-runtime' into linux-next

3c28b08

* pm-runtime: PM: runtime: Add pm_runtime_put_autosuspend() replacement PM: runtime: Simplify pm_runtime_get_if_active() usage

Merge branch 'acpi-misc' into linux-next

4cb5c33

* acpi-misc: ACPI: use %pe for better readability of errors while printing

dependabot bot added the dependencies Pull requests that update a dependency file label Apr 12, 2024

rafaeljw force-pushed the linux-next branch from 4cb5c33 to af08929 Compare February 26, 2026 19:40

rafaeljw force-pushed the linux-next branch from af08929 to 96ce7d6 Compare February 27, 2026 13:04

rafaeljw force-pushed the linux-next branch from 96ce7d6 to 6645722 Compare February 28, 2026 15:56

rafaeljw force-pushed the linux-next branch from 6645722 to 868cf44 Compare March 2, 2026 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(deps): bump idna from 3.4 to 3.7 in /drivers/gpu/drm/ci/xfails#1

build(deps): bump idna from 3.4 to 3.7 in /drivers/gpu/drm/ci/xfails#1
dependabot[bot] wants to merge 10 commits intolinux-nextfrom
dependabot/pip/drivers/gpu/drm/ci/xfails/idna-3.7

dependabot bot commented on behalf of github Apr 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot bot commented on behalf of github Apr 12, 2024

v3.7

What's Changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant