Skip to content

scst_usr_cleanup thread causes soft lockup and scst_usr_released threads in D state #279

@ChengyuanLiCY

Description

@ChengyuanLiCY

We noticed that the scst_usr_cleanup kernel thread causes the CPU in the soft lockup state, and the scst_usr_released threads are in the D state in our storage nodes. This situation often occurs when a scst user space program exits unexpectedly, i.e. the all opened fds on /dev/scst_user are released at the same time when the user space program exits.

The issues are seen on both scst v3.6 and v3.8, and the OS is Ubuntu 22.04/Kernel 5.15.0-26.

The error logs:

root     226734  0.0  0.0      0     0 ?        D    05:36   0:00 [scst_usr_releas]
root     226736  0.0  0.0      0     0 ?        D    05:36   0:00 [scst_usr_releas]
root     226775  0.0  0.0      0     0 ?        D    05:36   0:00 [scst_usr_releas]
root     226776  0.0  0.0      0     0 ?        D    05:36   0:00 [scst_usr_releas]
root     226777  0.0  0.0      0     0 ?        D    05:36   0:00 [scst_usr_releas]
root     226778  0.0  0.0      0     0 ?        D    05:36   0:00 [scst_usr_releas]

# cat /proc/226776/stack
[<0>] msleep+0x2d/0x40
[<0>] scst_acg_del_lun+0xa5/0x1e0 [scst]
[<0>] scst_unregister_virtual_device+0x11c/0x2b0 [scst]
[<0>] dev_user_exit_dev.isra.0+0x101/0x1e0 [scst_user]
[<0>] __dev_user_release+0x13/0x30 [scst_user]
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x1f/0x30
[Wed Jan 22 05:37:21 2025] rcu: INFO: rcu_sched self-detected stall on CPU
[Wed Jan 22 05:37:21 2025] rcu:         31-....: (14998 ticks this GP) idle=a83/1/0x4000000000000000 softirq=753634/753634 fqs=6302
[Wed Jan 22 05:37:21 2025]      (t=15000 jiffies g=2826113 q=405490)
[Wed Jan 22 05:37:21 2025] NMI backtrace for cpu 31
[Wed Jan 22 05:37:21 2025] CPU: 31 PID: 4156 Comm: scst_usr_cleanu Kdump: loaded Tainted: P        W  OEL    5.15.0-26-generic #26
[Wed Jan 22 05:37:21 2025] Hardware name: QCT D52B-1U/D52B-1U, BIOS 3B17.EBY05 02/08/2022
[Wed Jan 22 05:37:21 2025] Call Trace:
[Wed Jan 22 05:37:21 2025]  <IRQ>
[Wed Jan 22 05:37:21 2025]  show_stack+0x52/0x58
[Wed Jan 22 05:37:21 2025]  dump_stack_lvl+0x4a/0x5f
[Wed Jan 22 05:37:21 2025]  dump_stack+0x10/0x12
[Wed Jan 22 05:37:21 2025]  nmi_cpu_backtrace.cold+0x4d/0x93
[Wed Jan 22 05:37:21 2025]  ? lapic_can_unplug_cpu+0x80/0x80
[Wed Jan 22 05:37:21 2025]  nmi_trigger_cpumask_backtrace+0xe4/0xf0
[Wed Jan 22 05:37:21 2025]  arch_trigger_cpumask_backtrace+0x19/0x20
[Wed Jan 22 05:37:21 2025]  trigger_single_cpu_backtrace+0x44/0x4b
[Wed Jan 22 05:37:21 2025]  rcu_dump_cpu_stacks+0x102/0x145
[Wed Jan 22 05:37:21 2025]  print_cpu_stall.cold+0x2f/0xe2
[Wed Jan 22 05:37:21 2025]  check_cpu_stall+0x105/0x270
[Wed Jan 22 05:37:21 2025]  rcu_sched_clock_irq+0x9a/0x240
[Wed Jan 22 05:37:21 2025]  update_process_times+0x94/0xd0
[Wed Jan 22 05:37:21 2025]  ? tick_nohz_handler+0xb0/0xb0
[Wed Jan 22 05:37:21 2025]  tick_sched_handle+0x29/0x60
[Wed Jan 22 05:37:21 2025]  ? tick_nohz_handler+0xb0/0xb0
[Wed Jan 22 05:37:21 2025]  tick_sched_timer+0x88/0xb0
[Wed Jan 22 05:37:21 2025]  __hrtimer_run_queues+0x106/0x210
[Wed Jan 22 05:37:21 2025]  ? clockevents_program_event+0xa6/0x120
[Wed Jan 22 05:37:21 2025]  hrtimer_interrupt+0x101/0x220
[Wed Jan 22 05:37:21 2025]  __sysvec_apic_timer_interrupt+0x5e/0xd0
[Wed Jan 22 05:37:21 2025]  sysvec_apic_timer_interrupt+0x77/0x90
[Wed Jan 22 05:37:21 2025]  </IRQ>
[Wed Jan 22 05:37:21 2025]  <TASK>
[Wed Jan 22 05:37:21 2025]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[Wed Jan 22 05:37:21 2025] RIP: 0010:_raw_spin_lock_bh+0x1b/0x30
[Wed Jan 22 05:37:21 2025] Code: e5 e8 39 ee 3f ff 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 65 81 05 90 f3 8e 6d 00 02 00 00 31 c0 ba 01 00 00 00 f0 0f b1 17 <75> 01 c3 55 89 c6 48 89 e5 e8 37 e9 3f ff 66 90 5d c3 0f 1f 00 0f
[Wed Jan 22 05:37:21 2025] RSP: 0018:ffff9b915fd73e20 EFLAGS: 00000246
[Wed Jan 22 05:37:21 2025] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000122
[Wed Jan 22 05:37:21 2025] RDX: 0000000000000001 RSI: ffff8f383a2c5640 RDI: ffff8f383a2c56b8
[Wed Jan 22 05:37:21 2025] RBP: ffff9b915fd73e68 R08: 0000000000000000 R09: 0000000000000101
[Wed Jan 22 05:37:21 2025] R10: 0000000000000068 R11: 0000000000000000 R12: 0000000000000009
[Wed Jan 22 05:37:21 2025] R13: ffff8f383a2c56d0 R14: ffff8f383a2c56b8 R15: 0000000000000130
[Wed Jan 22 05:37:21 2025]  ? sgv_pool_flush+0x44/0x180 [scst]
[Wed Jan 22 05:37:21 2025]  dev_user_cleanup_thread+0x150/0x5a0 [scst_user]
[Wed Jan 22 05:37:21 2025]  ? wait_woken+0x70/0x70
[Wed Jan 22 05:37:21 2025]  ? dev_user_get_cmd_to_user+0x1f0/0x1f0 [scst_user]
[Wed Jan 22 05:37:21 2025]  kthread+0x127/0x150
[Wed Jan 22 05:37:21 2025]  ? set_kthread_struct+0x50/0x50
[Wed Jan 22 05:37:21 2025]  ret_from_fork+0x1f/0x30
[Wed Jan 22 05:37:21 2025]  </TASK>
[Wed Jan 22 05:37:31 2025] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 31-... } 15047 jiffies s: 17481 root: 0x4/.
[Wed Jan 22 05:37:31 2025] rcu: blocking rcu_node structures (internal RCU debug): l=1:30-44:0x2/.
[Wed Jan 22 05:37:31 2025] Task dump for CPU 31:
[Wed Jan 22 05:37:31 2025] task:scst_usr_cleanu state:R  running task     stack:    0 pid: 4156 ppid:     2 flags:0x00004008
[Wed Jan 22 05:37:31 2025] Call Trace:
[Wed Jan 22 05:37:31 2025]  <TASK>
[Wed Jan 22 05:37:31 2025]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[Wed Jan 22 05:37:31 2025]  ? sysvec_apic_timer_interrupt+0x4e/0x90
[Wed Jan 22 05:37:31 2025]  ? dev_user_get_next_cmd+0x54/0x3d0 [scst_user]
[Wed Jan 22 05:37:31 2025]  ? _raw_spin_unlock_bh+0x1e/0x20
[Wed Jan 22 05:37:31 2025]  ? sgv_pool_flush+0x117/0x180 [scst]
[Wed Jan 22 05:37:31 2025]  ? dev_user_cleanup_thread+0x150/0x5a0 [scst_user]
[Wed Jan 22 05:37:31 2025]  ? wait_woken+0x70/0x70
[Wed Jan 22 05:37:31 2025]  ? dev_user_get_cmd_to_user+0x1f0/0x1f0 [scst_user]
[Wed Jan 22 05:37:31 2025]  ? kthread+0x127/0x150
[Wed Jan 22 05:37:31 2025]  ? set_kthread_struct+0x50/0x50
[Wed Jan 22 05:37:31 2025]  ? ret_from_fork+0x1f/0x30
[Wed Jan 22 05:37:31 2025]  </TASK>
[Wed Jan 22 05:37:46 2025] watchdog: BUG: soft lockup - CPU#31 stuck for 82s! [scst_usr_cleanu:4156]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions