Gdb on kernel dumps

So, how can we make gdb open dump, really?

The solution which may be accepted by gdb upstream (unwilling to have kernel-specific code inside of gdb) is to extend the gdb's Python binding to enable us write own gdb target in Python. Then there's a new library called libkdumpfile which should enable us to open potentially any dump format (from any architecture). And then you need a glue - the target itself, which enables you to access the data.

Currently it's splitted into three projects:

https://github.com/ptesarik/libkdumpfile.git
Library accessing the contents of (potentially) any kernel dump.
https://github.com/jeffmahoney/gdb-python.git
Fork of gdb, which atop of the recent gdb version brings only a small extending of Python bindings.
https://github.com/jeffmahoney/crash-python.git
A gdb target using these bindings, accessing the dump through the libkdumpfile library.

Please be aware that at this moment it is all a work in progress with no guarantees!

The possible way how to get it running looks like that:

export MYLOCAL=/tmp/mylocal
git clone -b python-working-target https://github.com/jeffmahoney/gdb-python.git
pushd gdb-python/
./configure --prefix=$MYLOCAL '--enable-targets=x86_64-pc-linux,s390x-linux,s390-linux,ppc64-linux'
make
make install
popd

git clone https://github.com/ptesarik/libkdumpfile.git
pushd libkdumpfile
autoreconf -fi
./configure --prefix=$MYLOCAL --with-python
make
make install
popd

export PYTHONPATH=$MYLOCAL/lib/python2.7/site-packages/:$MYLOCAL/lib64/python2.7/site-packages/
export LD_LIBRARY_PATH=$MYLOCAL/lib64

git clone https://github.com/jeffmahoney/crash-python.git
pushd 
cd crash-python
python setup.py install --prefix $MYLOCAL
popd

Ok, we have it installed, now how to use it? At /path/to/my/ there's a debuginfo and a vmcore:

# export PYTHONPATH=$MYLOCAL/lib/python2.7/site-packages/:$MYLOCAL/lib64/python2.7/site-packages/
# export LD_LIBRARY_PATH=$MYLOCAL/lib64
# $MYLOCAL/bin/gdb /path/to/my/vmlinux-3.16.7-29-desktop.debug
GNU gdb (GDB) 7.10.50.20151210-cvs
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
...
(gdb) python from crash.kdump import target
(gdb) python target.Target("/path/to/my/vmcore")
(gdb) info threads 
  Id   Target Id         Frame 
  1    pid 1 "systemd"   0xffffffff8161f172 in context_switch (next=<optimized out>, prev=<optimized out>, rq=<optimized out>) at ../kernel/sched/core.c:2334
  2    pid 2 "kthreadd"  0xffffffff8161f172 in context_switch (next=<optimized out>, prev=<optimized out>, rq=<optimized out>) at ../kernel/sched/core.c:2334
  3    pid 3 "ksoftirqd/0" 0xffffffff8161f172 in context_switch (next=<optimized out>, prev=<optimized out>, rq=<optimized out>) at ../kernel/sched/core.c:2334
  4    pid 4 "kworker/0:0" 0xffffffff8161f172 in context_switch (next=<optimized out>, prev=<optimized out>, rq=<optimized out>) at ../kernel/sched/core.c:2334
...
(gdb) thread 1
(gdb) bt f
#0  0xffffffff8161f172 in context_switch (next=<optimized out>, prev=<optimized out>, rq=<optimized out>) at ../kernel/sched/core.c:2334
        mm = 0x0 <irq_stack_union>
        oldmm = 0xffff880439fb6b20
#1  __schedule () at ../kernel/sched/core.c:2795
        prev = <unavailable>
        switch_count = <optimized out>
        rq = 0xffff88013a6b4010
...

Where to continue?

See that all the kernel-specific functionality is concentrated into one tiny file (in my installation, it's the $MYLOCAL/lib/python2.7/site-packages/crash-0.1-py2.7.egg/crash/kdump/target.py). This is expected to grow - see Jeff's work-in-progress branch "crash-wip" of crash-python, or my tiny target accessing s390's dump".

What are Linux kernel dumps?

Linux kernel can be configured to reserve an area of memory for crashkernel. Once the original kernel panics (i.e. dies), instead of just rebooting, it does kexec to this crashkernel. That one through /proc/vmcore has access to the original kernel memory. It runs a kdump tool which saves that memory to the dump - either to the disk file, or somewhere over the network. Makedumpfile tool can be set to save only the "interesting" pages - like it can omit the (usually space-consuming and otherwise unimportant) userspace pages and compress the saved ones - which can be done only to some file formats.

There are also other ways - like taking the dump from from the hypervisor's side (xm dump-core), or just taking for instance VMware's VMSS file.

Dumps can be essential for analyzing the cause of the panic (and finding the bug in the kernel eventually), because in contrast to just the Oops message, in contains e.g. the contents of the failing process stack, so we can see what were the relevant functions's arguments.

For more info about the obtaining of the dumps, see kernel documentation.

What do we do with the dumps?

For inspecting the dumps, there's currently only one tool available - the crash. It's very useful, has many commands for inspecting certain kernel structures/subsystems (network, files, memory, devices, runqueues, ...), understands many kernel versions and many architectures - however it has its downsides, above all, these concern me:

raw stacktraces - though the information about local variable's location, inlined functions etc. is included in the debuginfo, crash doesn't use it, so you have to find it yourself (looking into the assembly)
poor extensibility - if you want to do some mass-processing on the dump (which often happens - like you want to check all the records allocated from the given SLAB), you have to do that using plenty of pipes, awk scripts and so on
not working multiarch - when you have a dump from s390, you need a s390 machine with crash to open it

...on the contrary, gdb (which itself is embedded in the crash, btw) does know all that

Why cannot gdb open kernel dumps?

It can, but only the ELFs - which are not useful for a this-day's machines. Furthermore, it doesn't understand virtual memory and last but not least - it knows nothing about the Linux kernel.