1、参考文章2、实践
2.1 内核配置2.2 内核崩溃定位
1、参考文章这些天遇到一个非常离谱的内核错误,用眼睛看根本无法定位……因为看不懂内核DUMP日志,这里简单记录一下
定位内核模块crash的方法
2、实践 2.1 内核配置首先内核需要可DEBUG,相关配置可参考博主之前的文章用VSCode + QEMU跑起来能够可视化Debug的NOVA文件系统,这里重点关注下面内核DEBUG选项配置部分:
# 接下来配置内核Debug选项,直接用命令即可# 下述代码-e表示enable,-d表示disable./scripts/config -e DEBUG_INFO -e GDB_scriptS -e CONFIG_DEBUG_SECTION_MISMATCH -e CONFIG_frame_POINTER -d CONFIG_RANDOMIZE_base
2.2 内核崩溃定位当内核崩溃时,你会看到类似下面的堆栈输出:
[ 370.075682] hunter: hk_readdir: ino 0, size 0, pos 0x0[ 370.076316] BUG: unable to handle kernel NULL pointer dereference at 000000000000000a[ 370.076557] #PF error: [normal kernel read fault][ 370.076748] PGD 2365bd067 P4D 2365bd067 PUD 23750b067 PMD 0 [ 370.077071] Oops: 0000 [#1] SMP NOPTI[ 370.077350] CPU: 0 PID: 1126 Comm: ls Not tainted 5.1.0-nova+ #94[ 370.077537] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014[ 370.077933] RIP: 0010:hk_readdir+0x203/0x30c[ 370.078217] Code: 49 8b 04 24 49 8b 4c 24 08 41 83 e1 0f 4c 89 e7 e8 d1 93 ad 00 85 c0 75 40 48 8b 1b 48 85 db 74 52 48 85 db 74 49[ 370.078766] RSP: 0018:ffffc90000d4be30 EFLAGS: 00000282[ 370.078941] RAX: 000000000000001f RBX: ffff888237241020 RCX: 0000000000000000[ 370.079179] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000001d32688[ 370.079397] RBP: ffffc90000d4be80 R08: 0000000000000001 R09: 0000000000000008[ 370.079599] R10: 206f6e69203a7269 R11: 20657a6973202c30 R12: ffffc90000d4bed0[ 370.079800] R13: ffff8882366f4800 R14: 000000000000001f R15: ffff88823733c280[ 370.080046] FS: 0000000001d313c0(0000) GS:ffff888238a00000(0000) knlGS:0000000000000000[ 370.080269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033[ 370.080431] CR2: 000000000000000a CR3: 00000002367d0000 CR4: 00000000000006f0[ 370.080674] Call Trace:[ 370.081378] iterate_dir+0x8c/0x190[ 370.081514] ksys_getdents64+0x97/0x130[ 370.081625] ? iterate_dir+0x190/0x190[ 370.081733] __x64_sys_getdents64+0x11/0x20[ 370.081869] do_syscall_64+0x43/0xf0[ 370.081974] entry_SYSCALL_64_after_hwframe+0x44/0xa9[ 370.082251] RIP: 0033:0x4e760b[ 370.082354] Code: 04 48 81 ec 80 00 00 00 e8 b2 e4 f5 ff 48 81 c4 80 00 00 00 5b c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 d9 08[ 370.082854] RSP: 002b:00007ffd0daf4888 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9[ 370.083074] RAX: ffffffffffffffda RBX: 0000000001d32610 RCX: 00000000004e760b[ 370.083270] RDX: 0000000000008000 RSI: 0000000001d32640 RDI: 0000000000000003[ 370.083460] RBP: 0000000001d32640 R08: 0000000000000003 R09: 0000000000888940[ 370.083651] R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffe0[ 370.083844] R13: 0000000000000000 R14: 0000000000400628 R15: 0000000000000001[ 370.084084] Modules linked in:[ 370.084278] CR2: 000000000000000a...
这里主要关注RIP: 0010:hk_readdir+0x203/0x30c,它指出了崩溃位置,即hk_readdir函数的0x203偏移处,接下来的任务便是找到它对应的代码:对你的二进制Linux内核文件运行gdb,并打相应断点。
gdb vmlinuxb *hk_readdir+0x203
可以看到类似下面的输出:
Breakpoint 1 at 0xffffffff81327b24: file fs/hunter/dir.c, line 56.
对着这个地址找过去就OK了。