So how does one debug a kernel panic? The easiest way is to compile your kernel with
CONFIG_DEBUG_INFO(if not available, just add a
CFLAGS) and run the
vmlinuxthrough gdb. When you're in gdb you can do funky stuff like disassemble functions, show source listings etc., just from hex numbers in the oops.
Consider an oops where a nasty error occurred at
EIP:0010:[<c012f8da>]. This happened to me back when was writing code for my MSc. Running the oops through ksymoops (this applies only to kernel 2.4 as this step is no longer required for kernel 2.6 and above; the newer kernels show you function names in the panic message) gave me the following output which showed the name of the offending function.
Code; c012f8da <do_mmap_pgoff+2a/550> <=====
The first part is
0x0010, which is the value of the segment register which you can safely ignore (unless you're messing with the GDT). Then to find out what line in the offending function (offset
550is apparently function size) is, you just do this:
(gdb) list *do_mmap_pgoff+0x2a
0xc012f8da is in do_mmap_pgoff (mmap.c:404).
399 unsigned int vm_flags;
400 int correct_wcount = 0;
401 int error;
402 rb_node_t ** rb_link, * rb_parent;
404 if (file && (!file->f_op || !file->f_op->mmap))
405 return -ENODEV;
407 if (!len)
408 return addr;
Ta da! Line 404 is the culprit. Inspecting the disassembled code, see the register being used to dereference (the one on the left is the source, the one on the right is the destination):
If you look at the register values in the oops:
eax: c171e3e4 ebx: 00000000 ecx: fffffffe edx: fffffffe
esi: 00001812 edi: cf1dbf10 ebp: cf2cbc14 esp: cf2cbbb4
ds: 0018 es: 0018 ss: 0018
edxhas some bogus value (in this case, -2). And a bogus dereference means, something naughty happened with a pointer, and the only pointer you can see in the line is
file. The value of
filewas set incorrectly to -2, and (if you look at the original source),
fileis actually a parameter passed to the function. Therefore, you'll need to trace the function that called it by looking at the call trace. You can find out which functions the hex numbers represent by typing
valueshould be a hex number starting with
Thanks to Zwane, Jeff and Alex for helping me learn how to do this many moons ago.