Interpreting kernel output at process crash

When a usermode process crashes the linux kernel usually prints some lines. These can be looked up with dmesg or journalctl.

This is an exmple:

foobar[20400]: segfault at 0 ip 00000000f79ffb5b sp 00000000fffd3828 error 4 in libc-2.29.so[f78d2000+145000]
Code: 66 0f 6f 25 37 7c 03 00 66 0f 6f 2d 3f 7c 03 00 66 0f 6f 35 47 7c 03 00 83 f9 30 0f 87 8e 00 00 00 83 f8 30 0f 87 85 00 00 00 <f3> 0f 6f 0f f3 0f 6f 16 66 0f 6f f9 66 44 0f 6f c5 66 44 0f 6f ca

Gets printed here.

The first line contains following information:

The error 4 could be decoded looking at in binary notation 0b00000100 using the enum x86_pf_error_code in traps.h.

 *   bit 0 ==    0: no page found       1: protection fault
 *   bit 1 ==    0: read access         1: write access
 *   bit 2 ==    0: kernel-mode access  1: user-mode access
 *   bit 3 ==                           1: use of reserved bit detected
 *   bit 4 ==                           1: fault was an instruction fetch
 *   bit 5 ==                           1: protection keys block access

Therefore in this case we have no page found, read access and user-mode access.

The second line

The second line shows the bytes surrounding the instruction (the first byte of it marked with <..>) which caused the exception. This sequence can be used to find the source line and therefore the function causing the exception.

If the dmesg output is captured on a different system, make sure you have really the same versions of the executable and shared libraries in the examining system installed. Additionally install the matching dbgsym packages as described in Installing the debugging symbols.

Following translates the byte sequence to be used inside gdb:

echo -n "find /b ..., ..., 0x" && \
echo "66 0f 6f 25 37 7c 03 00 66 0f 6f 2d 3f 7c 03 00 66 0f 6f 35 47 7c 03 00 83 f9 30 0f 87 8e 00 00 00 83 f8 30 0f 87 85 00 00 00 <f3> 0f 6f 0f f3 0f 6f 16 66 0f 6f f9 66 44 0f 6f c5 66 44 0f 6f ca" \
 | sed 's/[<>]//g' | sed 's/ /, 0x/g'

This is the result, just the start and end address is still missing.

find /b ..., ..., 0x66, 0x0f, 0x6f, 0x25, 0x37, 0x7c, 0x03, 0x00, 0x66, 0x0f, 0x6f, 0x2d, 0x3f, 0x7c, 0x03, 0x00, 0x66, 0x0f, 0x6f, 0x35, 0x47, 0x7c, 0x03, 0x00, 0x83, 0xf9, 0x30, 0x0f, 0x87, 0x8e, 0x00, 0x00, 0x00, 0x83, 0xf8, 0x30, 0x0f, 0x87, 0x85, 0x00, 0x00, 0x00, 0xf3, 0x0f, 0x6f, 0x0f, 0xf3, 0x0f, 0x6f, 0x16, 0x66, 0x0f, 0x6f, 0xf9, 0x66, 0x44, 0x0f, 0x6f, 0xc5, 0x66, 0x44, 0x0f, 0x6f, 0xca

Then start a gdb session like following. Fill start and end address in the find command either from the nameless .text if the crash happened in the executable, or these from the shared library:

$ gdb -q 
(gdb) set width 0
(gdb) set pagination off
(gdb) file /sbin/foobar
(gdb) b main
(gdb) run 
...
(gdb) dele 1
(gdb) info target
...
        0x5655c400 - 0x56579381 is .text
...
        0xf7967320 - 0xf7aaa4db is .text in /lib/x86_64-linux-gnux32/libc.so.6
...
(gdb) find /b 0xf7967320, 0xf7aaa4db, 0x66, 0x0f, 0x6f, 0x25, 0x37, 0x7c, 0x03, 0x00, 0x66, 0x0f, 0x6f, 0x2d, 0x3f, 0x7c, 0x03, 0x00, 0x66, 0x0f, 0x6f, 0x35, 0x47, 0x7c, 0x03, 0x00, 0x83, 0xf9, 0x30, 0x0f, 0x87, 0x8e, 0x00, 0x00, 0x00, 0x83, 0xf8, 0x30, 0x0f, 0x87, 0x85, 0x00, 0x00, 0x00, 0xf3, 0x0f, 0x6f, 0x0f, 0xf3, 0x0f, 0x6f, 0x16, 0x66, 0x0f, 0x6f, 0xf9, 0x66, 0x44, 0x0f, 0x6f, 0xc5, 0x66, 0x44, 0x0f, 0x6f, 0xca
0xf7a94b31 <__strncasecmp_l_sse42+50>
1 pattern found.
(gdb) b * (0xf7a94b31 + 42)
Breakpoint 2 at 0xf7a94b5b: file ../sysdeps/x86_64/multiarch/strcmp-sse42.S, line 199.
(gdb) info b
Num     Type           Disp Enb Address    What
2       breakpoint     keep y   0xf7a94b5b ../sysdeps/x86_64/multiarch/strcmp-sse42.S:199
(gdb) disassemble /r 0xf7a94b31, 0xf7a94b31 + 62
Dump of assembler code from 0xf7a94b31 to 0xf7a94b6f:
   0xf7a94b31 <__strncasecmp_l_sse42+50>:       66 0f 6f 25 37 7c 03 00     movdqa 0x37c37(%rip),%xmm4        # 0xf7acc770
   0xf7a94b39 <__strncasecmp_l_sse42+58>:       66 0f 6f 2d 3f 7c 03 00     movdqa 0x37c3f(%rip),%xmm5        # 0xf7acc780
   0xf7a94b41 <__strncasecmp_l_sse42+66>:       66 0f 6f 35 47 7c 03 00     movdqa 0x37c47(%rip),%xmm6        # 0xf7acc790
   0xf7a94b49 <__strncasecmp_l_sse42+74>:       83 f9 30                    cmp    $0x30,%ecx
   0xf7a94b4c <__strncasecmp_l_sse42+77>:       0f 87 8e 00 00 00           ja     0xf7a94be0 <__strncasecmp_l_sse42+225>
   0xf7a94b52 <__strncasecmp_l_sse42+83>:       83 f8 30                    cmp    $0x30,%eax
   0xf7a94b55 <__strncasecmp_l_sse42+86>:       0f 87 85 00 00 00           ja     0xf7a94be0 <__strncasecmp_l_sse42+225>
   0xf7a94b5b <__strncasecmp_l_sse42+92>:       f3 0f 6f 0f                 movdqu (%rdi),%xmm1
   0xf7a94b5f <__strncasecmp_l_sse42+96>:       f3 0f 6f 16                 movdqu (%rsi),%xmm2
   0xf7a94b63 <__strncasecmp_l_sse42+100>:      66 0f 6f f9                 movdqa %xmm1,%xmm7
   0xf7a94b67 <__strncasecmp_l_sse42+104>:      66 44 0f 6f c5              movdqa %xmm5,%xmm8
   0xf7a94b6c <__strncasecmp_l_sse42+109>:      66 44 0f 6f ca              movdqa %xmm2,%xmm9
End of assembler dump.

In this case the crash happened in __strncasecmp_l_sse42. Unfortunately functions in glibc are used by most applications and libraries, so this might be no good example to start with. It makes more sense to anaylze this way if it is crashing inside the executable or a rarely used shared library.

addr2line

Before ASLR got enabled by default the ip value could just be given to addr2line:

$ addr2line --addresses --functions --pretty-print --exe=foobar 0000000000401020
0x00401020: main at /home/benutzer/crash.c:2

But this is not going to work with an address from an ASLR enabled process. Could still be controlled e.g. by /proc/sys/kernel/randomize_va_space. With set disable-randomization off in gdb or export LLDB_LAUNCH_FLAG_DISABLE_ASLR=1 for lldb ASLR could be controlled for a single process.


CategoryDebugging