soft lockup是内核常见的维测手段之一,了解soft lockup的机制有助于在看到该类报错后明确定位方向。本文要搞清楚:
- soft lockup是什么
- soft lockup检测原理
soft lockup是什么
soft lockup是指内核长时间占用某CPU,导致该CPU上其他任务无法被调度执行的情况
一种简单的soft lockup复现:
1 2 3 4
| while (1) { do_something(); }
|
soft lockup检测原理
1、定义timer:kernel/watchdog.c
1
| static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer);
|
注意每CPU上都有一个timer。
2、启用timer:kernel/watchdog.c
1 2 3 4 5 6 7 8
| static void watchdog_enable(unsigned int cpu) { struct hrtimer *hrtimer = this_cpu_ptr(&watchdog_hrtimer);
hrtimer_setup(hrtimer, watchdog_timer_fn, CLOCK_MONOTONIC, HRTIMER_MODE_REL_HARD); hrtimer_start(hrtimer, ns_to_ktime(sample_period), HRTIMER_MODE_REL_PINNED_HARD); }
|
3、回调执行:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) { hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
now = get_timestamp(); period_ts = READ_ONCE(*this_cpu_ptr(&watchdog_report_ts)); touch_ts = __this_cpu_read(watchdog_touch_ts); duration = is_softlockup(touch_ts, period_ts, now); if (unlikely(duration)) { update_report_ts(); pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); }
return HRTIMER_RESTART; }
|