So how does one debug a kernel panic? The easiest way is to compile your kernel with
CONFIG_DEBUG_INFO
(if not available, just add a -g
to CFLAGS
) and run the vmlinux
through gdb. When you're in gdb you can do funky stuff like disassemble functions, show source listings etc., just from hex numbers in the oops.Consider an oops where a nasty error occurred at
EIP:0010:[<c012f8da>]
. This happened to me back when was writing code for my MSc. Running the oops through ksymoops (this applies only to kernel 2.4 as this step is no longer required for kernel 2.6 and above; the newer kernels show you function names in the panic message) gave me the following output which showed the name of the offending function.
Code; c012f8da <do_mmap_pgoff+2a/550> <=====
The first part is
0x0010
, which is the value of the segment register which you can safely ignore (unless you're messing with the GDT). Then to find out what line in the offending function (offset 0x2a
, the 550
is apparently function size) is, you just do this:
(gdb) list *do_mmap_pgoff+0x2a
0xc012f8da is in do_mmap_pgoff (mmap.c:404).
399 unsigned int vm_flags;
400 int correct_wcount = 0;
401 int error;
402 rb_node_t ** rb_link, * rb_parent;
403
404 if (file && (!file->f_op || !file->f_op->mmap))
405 return -ENODEV;
406
407 if (!len)
408 return addr;
Ta da! Line 404 is the culprit. Inspecting the disassembled code, see the register being used to dereference (the one on the left is the source, the one on the right is the destination):
mov 0x10(%edx),%eax
If you look at the register values in the oops:
eax: c171e3e4 ebx: 00000000 ecx: fffffffe edx: fffffffe
esi: 00001812 edi: cf1dbf10 ebp: cf2cbc14 esp: cf2cbbb4
ds: 0018 es: 0018 ss: 0018
It's obvious
edx
has some bogus value (in this case, -2). And a bogus dereference means, something naughty happened with a pointer, and the only pointer you can see in the line is file
. The value of file
was set incorrectly to -2, and (if you look at the original source), file
is actually a parameter passed to the function. Therefore, you'll need to trace the function that called it by looking at the call trace. You can find out which functions the hex numbers represent by typing disassemble value
where value
should be a hex number starting with 0x
.Thanks to Zwane, Jeff and Alex for helping me learn how to do this many moons ago.
2 comments:
This will come handy if I ever make kernel-hacking as a hobby. Nice, obi.
Very very useful, thanks!
Post a Comment