From 09ac4bce8ff9f44e3e01fd652044a2ecfdb7d4c7 Mon Sep 17 00:00:00 2001 From: manish1 Date: Wed, 13 May 2026 10:04:11 +0000 Subject: [PATCH] Backport upstream fix for khungtaskd panic on stalled coredumps On Linux 6.1, coredump_task_exit() parks sibling threads in TASK_UNINTERRUPTIBLE|TASK_FREEZABLE while one thread of the group writes the core file. Under sustained memory pressure the dump can take longer than kernel.hung_task_timeout_secs, at which point khungtaskd flags the parked siblings and (with hung_task_panic=1) panics the box. Backport mainline v6.12 commit b8e753128ed0 ("exit: Sleep at TASK_IDLE when waiting for application core dump") which switches that wait to TASK_IDLE|TASK_FREEZABLE so the watchdog skips it. Signed-off-by: manish1 --- ...d-panic-on-stalled-coredump-upstream.patch | 55 +++++++++++++++++++ patch/series | 4 ++ 2 files changed, 59 insertions(+) create mode 100644 patch/fix-khungtaskd-panic-on-stalled-coredump-upstream.patch diff --git a/patch/fix-khungtaskd-panic-on-stalled-coredump-upstream.patch b/patch/fix-khungtaskd-panic-on-stalled-coredump-upstream.patch new file mode 100644 index 000000000..d484c9abd --- /dev/null +++ b/patch/fix-khungtaskd-panic-on-stalled-coredump-upstream.patch @@ -0,0 +1,55 @@ +From b8e753128ed074fcb48e9ceded940752f6b1c19f Mon Sep 17 00:00:00 2001 +From: "Paul E. McKenney" +Date: Wed, 24 Jul 2024 16:51:52 -0700 +Subject: [PATCH] exit: Sleep at TASK_IDLE when waiting for application core + dump + +[ Upstream commit b8e753128ed074fcb48e9ceded940752f6b1c19f ] + +Currently, the coredump_task_exit() function sets the task state +to TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well. +But a combination of large memory and slow (and/or highly contended) +mass storage can cause application core dumps to take more than +two minutes, which can cause check_hung_task(), which is invoked by +check_hung_uninterruptible_tasks(), to produce task-blocked splats. +There does not seem to be any reasonable benefit to getting these splats. + +Furthermore, as Oleg Nesterov points out, TASK_UNINTERRUPTIBLE could +be misleading because the task sleeping in coredump_task_exit() really +is killable, albeit indirectly. See the check of signal->core_state +in prepare_signal() and the check of fatal_signal_pending() +in dump_interrupted(), which bypass the normal unkillability of +TASK_UNINTERRUPTIBLE, resulting in coredump_finish() invoking +wake_up_process() on any threads sleeping in coredump_task_exit(). + +Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE. + +Reported-by: Anhad Jai Singh +Signed-off-by: Paul E. McKenney +Acked-by: Oleg Nesterov +[manish1: backport from mainline v6.12 to 6.1.123 - applies cleanly, + surrounding code in coredump_task_exit() is identical between v6.1 + and v6.12; no functional adaptation required. Fixes recurring + hung_task panics on switches running SONiC 202505 + when orchagent crashes under sustained memory pressure and the coredump writer cannot complete within + kernel.hung_task_timeout_secs.] +Signed-off-by: manish1 +--- + kernel/exit.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/kernel/exit.c b/kernel/exit.c +index 7430852a8571..0d62a53605df 100644 +--- a/kernel/exit.c ++++ b/kernel/exit.c +@@ -428,7 +428,7 @@ static void coredump_task_exit(struct task_struct *tsk) + complete(&core_state->startup); + + for (;;) { +- set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE); ++ set_current_state(TASK_IDLE|TASK_FREEZABLE); + if (!self.task) /* see coredump_finish() */ + break; + schedule(); +-- +2.39.0 diff --git a/patch/series b/patch/series index ebc3574e6..9aed86a23 100755 --- a/patch/series +++ b/patch/series @@ -237,6 +237,10 @@ cisco-npu-disable-other-bars.patch # https://github.com/sonic-net/sonic-buildimage/issues/20901 PCI-ASPM-Fix-link-state-exit-during-switch-upstream.patch +# Fix to stop khungtaskd from panicking on long application core dumps. +# Backport of mainline v6.12 commit b8e753128ed0 +fix-khungtaskd-panic-on-stalled-coredump-upstream.patch + # # ############################################################