2019-05-19 15:08:55 +03:00
|
|
|
// SPDX-License-Identifier: GPL-2.0-only
|
2005-04-17 02:20:36 +04:00
|
|
|
/*
|
|
|
|
* kallsyms.c: in-kernel printing of symbolic oopses and stack traces.
|
|
|
|
*
|
|
|
|
* Rewritten and vastly simplified by Rusty Russell for in-kernel
|
|
|
|
* module loader:
|
|
|
|
* Copyright 2002 Rusty Russell <rusty@rustcorp.com.au> IBM Corporation
|
|
|
|
*
|
|
|
|
* ChangeLog:
|
|
|
|
*
|
|
|
|
* (25/Aug/2004) Paulo Marques <pmarques@grupopie.com>
|
|
|
|
* Changed the compression method from stem compression to "table lookup"
|
|
|
|
* compression (see scripts/kallsyms.c for a more complete description)
|
|
|
|
*/
|
|
|
|
#include <linux/kallsyms.h>
|
|
|
|
#include <linux/init.h>
|
|
|
|
#include <linux/seq_file.h>
|
|
|
|
#include <linux/fs.h>
|
2010-05-21 06:04:21 +04:00
|
|
|
#include <linux/kdb.h>
|
2005-04-17 02:20:36 +04:00
|
|
|
#include <linux/err.h>
|
|
|
|
#include <linux/proc_fs.h>
|
2005-10-31 02:03:48 +03:00
|
|
|
#include <linux/sched.h> /* for cond_resched */
|
2006-12-07 07:35:30 +03:00
|
|
|
#include <linux/ctype.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 11:04:11 +03:00
|
|
|
#include <linux/slab.h>
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
#include <linux/filter.h>
|
2017-09-01 15:35:38 +03:00
|
|
|
#include <linux/ftrace.h>
|
2020-05-28 11:00:58 +03:00
|
|
|
#include <linux/kprobes.h>
|
2021-07-08 04:09:20 +03:00
|
|
|
#include <linux/build_bug.h>
|
2014-04-08 02:39:20 +04:00
|
|
|
#include <linux/compiler.h>
|
2021-07-08 04:09:20 +03:00
|
|
|
#include <linux/module.h>
|
|
|
|
#include <linux/kernel.h>
|
2022-05-10 15:26:13 +03:00
|
|
|
#include <linux/bsearch.h>
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* These will be re-linked against their real values
|
|
|
|
* during the second link stage.
|
|
|
|
*/
|
2014-04-08 02:39:20 +04:00
|
|
|
extern const unsigned long kallsyms_addresses[] __weak;
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
extern const int kallsyms_offsets[] __weak;
|
2014-04-08 02:39:20 +04:00
|
|
|
extern const u8 kallsyms_names[] __weak;
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Tell the compiler that the count isn't in the small data section if the arch
|
|
|
|
* has one (eg: FRV).
|
2007-11-29 03:22:04 +03:00
|
|
|
*/
|
2018-09-03 15:09:34 +03:00
|
|
|
extern const unsigned int kallsyms_num_syms
|
2020-10-22 05:36:07 +03:00
|
|
|
__section(".rodata") __attribute__((weak));
|
2007-11-29 03:22:04 +03:00
|
|
|
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
extern const unsigned long kallsyms_relative_base
|
2020-10-22 05:36:07 +03:00
|
|
|
__section(".rodata") __attribute__((weak));
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
|
2020-02-02 08:09:22 +03:00
|
|
|
extern const char kallsyms_token_table[] __weak;
|
2014-04-08 02:39:20 +04:00
|
|
|
extern const u16 kallsyms_token_index[] __weak;
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2018-09-03 15:09:34 +03:00
|
|
|
extern const unsigned int kallsyms_markers[] __weak;
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Expand a compressed symbol data into the resulting uncompressed string,
|
2013-04-15 09:34:43 +04:00
|
|
|
* if uncompressed string is too long (>= maxlen), it will be truncated,
|
2009-05-13 00:43:35 +04:00
|
|
|
* given the offset to where the symbol is in the compressed stream.
|
|
|
|
*/
|
2013-04-15 09:34:43 +04:00
|
|
|
static unsigned int kallsyms_expand_symbol(unsigned int off,
|
|
|
|
char *result, size_t maxlen)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
|
|
|
int len, skipped_first = 0;
|
2020-02-02 08:09:22 +03:00
|
|
|
const char *tptr;
|
|
|
|
const u8 *data;
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/* Get the compressed symbol length from the first symbol byte. */
|
2005-04-17 02:20:36 +04:00
|
|
|
data = &kallsyms_names[off];
|
|
|
|
len = *data;
|
|
|
|
data++;
|
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Update the offset to return the offset for the next symbol on
|
|
|
|
* the compressed stream.
|
|
|
|
*/
|
2005-04-17 02:20:36 +04:00
|
|
|
off += len + 1;
|
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* For every byte on the compressed symbol data, copy the table
|
|
|
|
* entry for that byte.
|
|
|
|
*/
|
|
|
|
while (len) {
|
|
|
|
tptr = &kallsyms_token_table[kallsyms_token_index[*data]];
|
2005-04-17 02:20:36 +04:00
|
|
|
data++;
|
|
|
|
len--;
|
|
|
|
|
|
|
|
while (*tptr) {
|
2009-05-13 00:43:35 +04:00
|
|
|
if (skipped_first) {
|
2013-04-15 09:34:43 +04:00
|
|
|
if (maxlen <= 1)
|
|
|
|
goto tail;
|
2005-04-17 02:20:36 +04:00
|
|
|
*result = *tptr;
|
|
|
|
result++;
|
2013-04-15 09:34:43 +04:00
|
|
|
maxlen--;
|
2005-04-17 02:20:36 +04:00
|
|
|
} else
|
|
|
|
skipped_first = 1;
|
|
|
|
tptr++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-04-15 09:34:43 +04:00
|
|
|
tail:
|
|
|
|
if (maxlen)
|
|
|
|
*result = '\0';
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/* Return to offset to the next symbol. */
|
2005-04-17 02:20:36 +04:00
|
|
|
return off;
|
|
|
|
}
|
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Get symbol type information. This is encoded as a single char at the
|
|
|
|
* beginning of the symbol name.
|
|
|
|
*/
|
2005-04-17 02:20:36 +04:00
|
|
|
static char kallsyms_get_symbol_type(unsigned int off)
|
|
|
|
{
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Get just the first code, look it up in the token table,
|
|
|
|
* and return the first char from this token.
|
|
|
|
*/
|
|
|
|
return kallsyms_token_table[kallsyms_token_index[kallsyms_names[off + 1]]];
|
2005-04-17 02:20:36 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Find the offset on the compressed stream given and index in the
|
|
|
|
* kallsyms array.
|
|
|
|
*/
|
2005-04-17 02:20:36 +04:00
|
|
|
static unsigned int get_symbol_offset(unsigned long pos)
|
|
|
|
{
|
2006-12-08 13:35:57 +03:00
|
|
|
const u8 *name;
|
2005-04-17 02:20:36 +04:00
|
|
|
int i;
|
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Use the closest marker we have. We have markers every 256 positions,
|
|
|
|
* so that should be close enough.
|
|
|
|
*/
|
|
|
|
name = &kallsyms_names[kallsyms_markers[pos >> 8]];
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Sequentially scan all the symbols up to the point we're searching
|
|
|
|
* for. Every symbol is stored in a [<len>][<len> bytes of data] format,
|
|
|
|
* so we just need to add the len to the current pointer for every
|
|
|
|
* symbol we wish to skip.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < (pos & 0xFF); i++)
|
2005-04-17 02:20:36 +04:00
|
|
|
name = name + (*name) + 1;
|
|
|
|
|
|
|
|
return name - kallsyms_names;
|
|
|
|
}
|
|
|
|
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
static unsigned long kallsyms_sym_address(int idx)
|
|
|
|
{
|
|
|
|
if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
|
|
|
|
return kallsyms_addresses[idx];
|
|
|
|
|
|
|
|
/* values are unsigned offsets if --absolute-percpu is not in effect */
|
|
|
|
if (!IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU))
|
|
|
|
return kallsyms_relative_base + (u32)kallsyms_offsets[idx];
|
|
|
|
|
|
|
|
/* ...otherwise, positive offsets are absolute values */
|
|
|
|
if (kallsyms_offsets[idx] >= 0)
|
|
|
|
return kallsyms_offsets[idx];
|
|
|
|
|
|
|
|
/* ...and negative offsets are relative to kallsyms_relative_base - 1 */
|
|
|
|
return kallsyms_relative_base - 1 - kallsyms_offsets[idx];
|
|
|
|
}
|
|
|
|
|
2021-10-04 19:29:33 +03:00
|
|
|
static bool cleanup_symbol_name(char *s)
|
2021-04-08 21:28:32 +03:00
|
|
|
{
|
|
|
|
char *res;
|
|
|
|
|
2021-10-04 19:29:33 +03:00
|
|
|
if (!IS_ENABLED(CONFIG_LTO_CLANG))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* LLVM appends various suffixes for local functions and variables that
|
|
|
|
* must be promoted to global scope as part of LTO. This can break
|
|
|
|
* hooking of static functions with kprobes. '.' is not a valid
|
|
|
|
* character in an identifier in C. Suffixes observed:
|
|
|
|
* - foo.llvm.[0-9a-f]+
|
|
|
|
* - foo.[0-9a-f]+
|
|
|
|
* - foo.[0-9a-f]+.cfi_jt
|
|
|
|
*/
|
|
|
|
res = strchr(s, '.');
|
|
|
|
if (res) {
|
|
|
|
*res = '\0';
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!IS_ENABLED(CONFIG_CFI_CLANG) ||
|
|
|
|
!IS_ENABLED(CONFIG_LTO_CLANG_THIN) ||
|
|
|
|
CONFIG_CLANG_VERSION >= 130000)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Prior to LLVM 13, the following suffixes were observed when thinLTO
|
|
|
|
* and CFI are both enabled:
|
|
|
|
* - foo$[0-9]+
|
|
|
|
*/
|
2021-04-08 21:28:32 +03:00
|
|
|
res = strrchr(s, '$');
|
2021-10-04 19:29:33 +03:00
|
|
|
if (res) {
|
2021-04-08 21:28:32 +03:00
|
|
|
*res = '\0';
|
2021-10-04 19:29:33 +03:00
|
|
|
return true;
|
|
|
|
}
|
2021-04-08 21:28:32 +03:00
|
|
|
|
2021-10-04 19:29:33 +03:00
|
|
|
return false;
|
2021-04-08 21:28:32 +03:00
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
/* Lookup the address for this symbol. Returns 0 if not found. */
|
|
|
|
unsigned long kallsyms_lookup_name(const char *name)
|
|
|
|
{
|
2007-07-17 15:03:51 +04:00
|
|
|
char namebuf[KSYM_NAME_LEN];
|
2005-04-17 02:20:36 +04:00
|
|
|
unsigned long i;
|
|
|
|
unsigned int off;
|
|
|
|
|
2022-03-16 15:24:08 +03:00
|
|
|
/* Skip the search for empty string. */
|
|
|
|
if (!*name)
|
|
|
|
return 0;
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
|
2013-04-15 09:34:43 +04:00
|
|
|
off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
|
2005-04-17 02:20:36 +04:00
|
|
|
|
|
|
|
if (strcmp(namebuf, name) == 0)
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
return kallsyms_sym_address(i);
|
2021-04-08 21:28:32 +03:00
|
|
|
|
|
|
|
if (cleanup_symbol_name(namebuf) && strcmp(namebuf, name) == 0)
|
|
|
|
return kallsyms_sym_address(i);
|
2005-04-17 02:20:36 +04:00
|
|
|
}
|
|
|
|
return module_kallsyms_lookup_name(name);
|
|
|
|
}
|
|
|
|
|
2021-02-02 15:13:26 +03:00
|
|
|
/*
|
|
|
|
* Iterate over all symbols in vmlinux. For symbols from modules use
|
|
|
|
* module_kallsyms_on_each_symbol instead.
|
|
|
|
*/
|
2008-12-06 03:03:58 +03:00
|
|
|
int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
|
|
|
|
unsigned long),
|
|
|
|
void *data)
|
|
|
|
{
|
|
|
|
char namebuf[KSYM_NAME_LEN];
|
|
|
|
unsigned long i;
|
|
|
|
unsigned int off;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
for (i = 0, off = 0; i < kallsyms_num_syms; i++) {
|
2013-04-15 09:34:43 +04:00
|
|
|
off = kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf));
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
ret = fn(data, namebuf, NULL, kallsyms_sym_address(i));
|
2008-12-06 03:03:58 +03:00
|
|
|
if (ret != 0)
|
|
|
|
return ret;
|
livepatch: Avoid CPU hogging with cond_resched
When initializing a 'struct klp_object' in klp_init_object_loaded(), and
performing relocations in klp_resolve_symbols(), klp_find_object_symbol()
is invoked to look up the address of a symbol in an already-loaded module
(or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or
module_kallsyms_on_each_symbol() to find the address of the symbol that is
being patched.
It turns out that symbol lookups often take up the most CPU time when
enabling and disabling a patch, and may hog the CPU and cause other tasks
on that CPU's runqueue to starve -- even in paths where interrupts are
enabled. For example, under certain workloads, enabling a KLP patch with
many objects or functions may cause ksoftirqd to be starved, and thus for
interrupts to be backlogged and delayed. This may end up causing TCP
retransmits on the host where the KLP patch is being applied, and in
general, may cause any interrupts serviced by softirqd to be delayed while
the patch is being applied.
So as to ensure that kallsyms_on_each_symbol() does not end up hogging the
CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol()
and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol
lookup in vmlinux and a module respectively. Without this patch, if a
live-patch is applied on a 36-core Intel host with heavy TCP traffic, a
~10x spike is observed in TCP retransmits while the patch is being applied.
Additionally, collecting sched events with perf indicates that ksoftirqd is
awakened ~1.3 seconds before it's eventually scheduled. With the patch, no
increase in TCP retransmit events is observed, and ksoftirqd is scheduled
shortly after it's awakened.
Signed-off-by: David Vernet <void@manifault.com>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20211229215646.830451-1-void@manifault.com
2021-12-30 00:56:47 +03:00
|
|
|
cond_resched();
|
2008-12-06 03:03:58 +03:00
|
|
|
}
|
2021-02-02 15:13:26 +03:00
|
|
|
return 0;
|
2008-12-06 03:03:58 +03:00
|
|
|
}
|
|
|
|
|
2006-10-03 12:13:48 +04:00
|
|
|
static unsigned long get_symbol_pos(unsigned long addr,
|
|
|
|
unsigned long *symbolsize,
|
|
|
|
unsigned long *offset)
|
|
|
|
{
|
|
|
|
unsigned long symbol_start = 0, symbol_end = 0;
|
|
|
|
unsigned long i, low, high, mid;
|
|
|
|
|
2009-01-14 23:38:20 +03:00
|
|
|
/* This kernel should never had been booted. */
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE))
|
|
|
|
BUG_ON(!kallsyms_addresses);
|
|
|
|
else
|
|
|
|
BUG_ON(!kallsyms_offsets);
|
2009-01-14 23:38:20 +03:00
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/* Do a binary search on the sorted kallsyms_addresses array. */
|
2006-10-03 12:13:48 +04:00
|
|
|
low = 0;
|
|
|
|
high = kallsyms_num_syms;
|
|
|
|
|
|
|
|
while (high - low > 1) {
|
2008-07-25 12:45:34 +04:00
|
|
|
mid = low + (high - low) / 2;
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
if (kallsyms_sym_address(mid) <= addr)
|
2006-10-03 12:13:48 +04:00
|
|
|
low = mid;
|
|
|
|
else
|
|
|
|
high = mid;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2009-05-13 00:43:35 +04:00
|
|
|
* Search for the first aliased symbol. Aliased
|
|
|
|
* symbols are symbols with the same address.
|
2006-10-03 12:13:48 +04:00
|
|
|
*/
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
while (low && kallsyms_sym_address(low-1) == kallsyms_sym_address(low))
|
2006-10-03 12:13:48 +04:00
|
|
|
--low;
|
|
|
|
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
symbol_start = kallsyms_sym_address(low);
|
2006-10-03 12:13:48 +04:00
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/* Search for next non-aliased symbol. */
|
2006-10-03 12:13:48 +04:00
|
|
|
for (i = low + 1; i < kallsyms_num_syms; i++) {
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
if (kallsyms_sym_address(i) > symbol_start) {
|
|
|
|
symbol_end = kallsyms_sym_address(i);
|
2006-10-03 12:13:48 +04:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/* If we found no next symbol, we use the end of the section. */
|
2006-10-03 12:13:48 +04:00
|
|
|
if (!symbol_end) {
|
|
|
|
if (is_kernel_inittext(addr))
|
|
|
|
symbol_end = (unsigned long)_einittext;
|
2017-07-11 01:51:20 +03:00
|
|
|
else if (IS_ENABLED(CONFIG_KALLSYMS_ALL))
|
2006-10-03 12:13:48 +04:00
|
|
|
symbol_end = (unsigned long)_end;
|
|
|
|
else
|
|
|
|
symbol_end = (unsigned long)_etext;
|
|
|
|
}
|
|
|
|
|
2007-05-08 11:28:41 +04:00
|
|
|
if (symbolsize)
|
|
|
|
*symbolsize = symbol_end - symbol_start;
|
|
|
|
if (offset)
|
|
|
|
*offset = addr - symbol_start;
|
2006-10-03 12:13:48 +04:00
|
|
|
|
|
|
|
return low;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Lookup an address but don't bother to find any names.
|
|
|
|
*/
|
|
|
|
int kallsyms_lookup_size_offset(unsigned long addr, unsigned long *symbolsize,
|
|
|
|
unsigned long *offset)
|
|
|
|
{
|
2008-01-30 01:13:22 +03:00
|
|
|
char namebuf[KSYM_NAME_LEN];
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
|
2019-08-24 16:12:31 +03:00
|
|
|
if (is_ksym_addr(addr)) {
|
|
|
|
get_symbol_pos(addr, symbolsize, offset);
|
|
|
|
return 1;
|
|
|
|
}
|
2021-07-08 04:09:20 +03:00
|
|
|
return !!module_address_lookup(addr, symbolsize, offset, NULL, NULL, namebuf) ||
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
!!__bpf_address_lookup(addr, symbolsize, offset, namebuf);
|
2006-10-03 12:13:48 +04:00
|
|
|
}
|
|
|
|
|
2021-07-08 04:09:20 +03:00
|
|
|
static const char *kallsyms_lookup_buildid(unsigned long addr,
|
|
|
|
unsigned long *symbolsize,
|
|
|
|
unsigned long *offset, char **modname,
|
|
|
|
const unsigned char **modbuildid, char *namebuf)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
const char *ret;
|
|
|
|
|
2007-07-17 15:03:51 +04:00
|
|
|
namebuf[KSYM_NAME_LEN - 1] = 0;
|
2005-04-17 02:20:36 +04:00
|
|
|
namebuf[0] = 0;
|
|
|
|
|
2006-10-03 12:13:48 +04:00
|
|
|
if (is_ksym_addr(addr)) {
|
|
|
|
unsigned long pos;
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2006-10-03 12:13:48 +04:00
|
|
|
pos = get_symbol_pos(addr, symbolsize, offset);
|
2005-04-17 02:20:36 +04:00
|
|
|
/* Grab name */
|
2013-04-15 09:34:43 +04:00
|
|
|
kallsyms_expand_symbol(get_symbol_offset(pos),
|
|
|
|
namebuf, KSYM_NAME_LEN);
|
2007-05-30 10:43:16 +04:00
|
|
|
if (modname)
|
|
|
|
*modname = NULL;
|
2021-07-08 04:09:20 +03:00
|
|
|
if (modbuildid)
|
|
|
|
*modbuildid = NULL;
|
2021-04-08 21:28:32 +03:00
|
|
|
|
|
|
|
ret = namebuf;
|
|
|
|
goto found;
|
2005-04-17 02:20:36 +04:00
|
|
|
}
|
|
|
|
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
/* See if it's in a module or a BPF JITed image. */
|
|
|
|
ret = module_address_lookup(addr, symbolsize, offset,
|
2021-07-08 04:09:20 +03:00
|
|
|
modname, modbuildid, namebuf);
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
if (!ret)
|
|
|
|
ret = bpf_address_lookup(addr, symbolsize,
|
|
|
|
offset, modname, namebuf);
|
2017-09-01 15:35:38 +03:00
|
|
|
|
|
|
|
if (!ret)
|
|
|
|
ret = ftrace_mod_address_lookup(addr, symbolsize,
|
|
|
|
offset, modname, namebuf);
|
2021-04-08 21:28:32 +03:00
|
|
|
|
|
|
|
found:
|
|
|
|
cleanup_symbol_name(namebuf);
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
return ret;
|
2005-04-17 02:20:36 +04:00
|
|
|
}
|
|
|
|
|
2021-07-08 04:09:20 +03:00
|
|
|
/*
|
|
|
|
* Lookup an address
|
|
|
|
* - modname is set to NULL if it's in the kernel.
|
|
|
|
* - We guarantee that the returned name is valid until we reschedule even if.
|
|
|
|
* It resides in a module.
|
|
|
|
* - We also guarantee that modname will be valid until rescheduled.
|
|
|
|
*/
|
|
|
|
const char *kallsyms_lookup(unsigned long addr,
|
|
|
|
unsigned long *symbolsize,
|
|
|
|
unsigned long *offset,
|
|
|
|
char **modname, char *namebuf)
|
|
|
|
{
|
|
|
|
return kallsyms_lookup_buildid(addr, symbolsize, offset, modname,
|
|
|
|
NULL, namebuf);
|
|
|
|
}
|
|
|
|
|
2007-05-08 11:28:43 +04:00
|
|
|
int lookup_symbol_name(unsigned long addr, char *symname)
|
|
|
|
{
|
2021-04-08 21:28:32 +03:00
|
|
|
int res;
|
|
|
|
|
2007-05-08 11:28:43 +04:00
|
|
|
symname[0] = '\0';
|
2007-07-17 15:03:51 +04:00
|
|
|
symname[KSYM_NAME_LEN - 1] = '\0';
|
2007-05-08 11:28:43 +04:00
|
|
|
|
|
|
|
if (is_ksym_addr(addr)) {
|
|
|
|
unsigned long pos;
|
|
|
|
|
|
|
|
pos = get_symbol_pos(addr, NULL, NULL);
|
|
|
|
/* Grab name */
|
2013-04-15 09:34:43 +04:00
|
|
|
kallsyms_expand_symbol(get_symbol_offset(pos),
|
|
|
|
symname, KSYM_NAME_LEN);
|
2021-04-08 21:28:32 +03:00
|
|
|
goto found;
|
2007-05-08 11:28:43 +04:00
|
|
|
}
|
2009-05-13 00:43:35 +04:00
|
|
|
/* See if it's in a module. */
|
2021-04-08 21:28:32 +03:00
|
|
|
res = lookup_module_symbol_name(addr, symname);
|
|
|
|
if (res)
|
|
|
|
return res;
|
|
|
|
|
|
|
|
found:
|
|
|
|
cleanup_symbol_name(symname);
|
|
|
|
return 0;
|
2007-05-08 11:28:43 +04:00
|
|
|
}
|
|
|
|
|
2007-05-08 11:28:47 +04:00
|
|
|
int lookup_symbol_attrs(unsigned long addr, unsigned long *size,
|
|
|
|
unsigned long *offset, char *modname, char *name)
|
|
|
|
{
|
2021-04-08 21:28:32 +03:00
|
|
|
int res;
|
|
|
|
|
2007-05-08 11:28:47 +04:00
|
|
|
name[0] = '\0';
|
2007-07-17 15:03:51 +04:00
|
|
|
name[KSYM_NAME_LEN - 1] = '\0';
|
2007-05-08 11:28:47 +04:00
|
|
|
|
|
|
|
if (is_ksym_addr(addr)) {
|
|
|
|
unsigned long pos;
|
|
|
|
|
|
|
|
pos = get_symbol_pos(addr, size, offset);
|
|
|
|
/* Grab name */
|
2013-04-15 09:34:43 +04:00
|
|
|
kallsyms_expand_symbol(get_symbol_offset(pos),
|
|
|
|
name, KSYM_NAME_LEN);
|
2007-05-08 11:28:47 +04:00
|
|
|
modname[0] = '\0';
|
2021-04-08 21:28:32 +03:00
|
|
|
goto found;
|
2007-05-08 11:28:47 +04:00
|
|
|
}
|
2009-05-13 00:43:35 +04:00
|
|
|
/* See if it's in a module. */
|
2021-04-08 21:28:32 +03:00
|
|
|
res = lookup_module_symbol_attrs(addr, size, offset, modname, name);
|
|
|
|
if (res)
|
|
|
|
return res;
|
|
|
|
|
|
|
|
found:
|
|
|
|
cleanup_symbol_name(name);
|
|
|
|
return 0;
|
2007-05-08 11:28:47 +04:00
|
|
|
}
|
|
|
|
|
2007-05-01 02:09:48 +04:00
|
|
|
/* Look up a kernel symbol and return it in a text buffer. */
|
2011-03-24 05:42:29 +03:00
|
|
|
static int __sprint_symbol(char *buffer, unsigned long address,
|
2021-07-08 04:09:20 +03:00
|
|
|
int symbol_offset, int add_offset, int add_buildid)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
|
|
|
char *modname;
|
2021-07-08 04:09:20 +03:00
|
|
|
const unsigned char *buildid;
|
2005-04-17 02:20:36 +04:00
|
|
|
const char *name;
|
|
|
|
unsigned long offset, size;
|
2008-11-20 02:36:36 +03:00
|
|
|
int len;
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2011-03-24 05:42:29 +03:00
|
|
|
address += symbol_offset;
|
2021-07-08 04:09:20 +03:00
|
|
|
name = kallsyms_lookup_buildid(address, &size, &offset, &modname, &buildid,
|
|
|
|
buffer);
|
2005-04-17 02:20:36 +04:00
|
|
|
if (!name)
|
2014-08-09 01:19:41 +04:00
|
|
|
return sprintf(buffer, "0x%lx", address - symbol_offset);
|
2007-07-16 10:41:24 +04:00
|
|
|
|
2008-11-20 02:36:36 +03:00
|
|
|
if (name != buffer)
|
|
|
|
strcpy(buffer, name);
|
|
|
|
len = strlen(buffer);
|
2011-03-24 05:42:29 +03:00
|
|
|
offset -= symbol_offset;
|
2008-11-20 02:36:36 +03:00
|
|
|
|
2012-05-30 02:07:33 +04:00
|
|
|
if (add_offset)
|
|
|
|
len += sprintf(buffer + len, "+%#lx/%#lx", offset, size);
|
|
|
|
|
2021-07-08 04:09:20 +03:00
|
|
|
if (modname) {
|
|
|
|
len += sprintf(buffer + len, " [%s", modname);
|
|
|
|
#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID)
|
|
|
|
if (add_buildid && buildid) {
|
|
|
|
/* build ID should match length of sprintf */
|
|
|
|
#if IS_ENABLED(CONFIG_MODULES)
|
|
|
|
static_assert(sizeof(typeof_member(struct module, build_id)) == 20);
|
|
|
|
#endif
|
|
|
|
len += sprintf(buffer + len, " %20phN", buildid);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
len += sprintf(buffer + len, "]");
|
|
|
|
}
|
2008-11-20 02:36:36 +03:00
|
|
|
|
|
|
|
return len;
|
2007-05-01 02:09:48 +04:00
|
|
|
}
|
2011-03-24 05:42:29 +03:00
|
|
|
|
|
|
|
/**
|
|
|
|
* sprint_symbol - Look up a kernel symbol and return it in a text buffer
|
|
|
|
* @buffer: buffer to be stored
|
|
|
|
* @address: address to lookup
|
|
|
|
*
|
|
|
|
* This function looks up a kernel symbol with @address and stores its name,
|
|
|
|
* offset, size and module name to @buffer if possible. If no symbol was found,
|
|
|
|
* just saves its @address as is.
|
|
|
|
*
|
|
|
|
* This function returns the number of bytes stored in @buffer.
|
|
|
|
*/
|
|
|
|
int sprint_symbol(char *buffer, unsigned long address)
|
|
|
|
{
|
2021-07-08 04:09:20 +03:00
|
|
|
return __sprint_symbol(buffer, address, 0, 1, 0);
|
2011-03-24 05:42:29 +03:00
|
|
|
}
|
2009-05-13 00:43:35 +04:00
|
|
|
EXPORT_SYMBOL_GPL(sprint_symbol);
|
2007-05-01 02:09:48 +04:00
|
|
|
|
2021-07-08 04:09:20 +03:00
|
|
|
/**
|
|
|
|
* sprint_symbol_build_id - Look up a kernel symbol and return it in a text buffer
|
|
|
|
* @buffer: buffer to be stored
|
|
|
|
* @address: address to lookup
|
|
|
|
*
|
|
|
|
* This function looks up a kernel symbol with @address and stores its name,
|
|
|
|
* offset, size, module name and module build ID to @buffer if possible. If no
|
|
|
|
* symbol was found, just saves its @address as is.
|
|
|
|
*
|
|
|
|
* This function returns the number of bytes stored in @buffer.
|
|
|
|
*/
|
|
|
|
int sprint_symbol_build_id(char *buffer, unsigned long address)
|
|
|
|
{
|
|
|
|
return __sprint_symbol(buffer, address, 0, 1, 1);
|
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(sprint_symbol_build_id);
|
|
|
|
|
2012-05-30 02:07:33 +04:00
|
|
|
/**
|
|
|
|
* sprint_symbol_no_offset - Look up a kernel symbol and return it in a text buffer
|
|
|
|
* @buffer: buffer to be stored
|
|
|
|
* @address: address to lookup
|
|
|
|
*
|
|
|
|
* This function looks up a kernel symbol with @address and stores its name
|
|
|
|
* and module name to @buffer if possible. If no symbol was found, just saves
|
|
|
|
* its @address as is.
|
|
|
|
*
|
|
|
|
* This function returns the number of bytes stored in @buffer.
|
|
|
|
*/
|
|
|
|
int sprint_symbol_no_offset(char *buffer, unsigned long address)
|
|
|
|
{
|
2021-07-08 04:09:20 +03:00
|
|
|
return __sprint_symbol(buffer, address, 0, 0, 0);
|
2012-05-30 02:07:33 +04:00
|
|
|
}
|
|
|
|
EXPORT_SYMBOL_GPL(sprint_symbol_no_offset);
|
|
|
|
|
2011-03-24 05:42:29 +03:00
|
|
|
/**
|
|
|
|
* sprint_backtrace - Look up a backtrace symbol and return it in a text buffer
|
|
|
|
* @buffer: buffer to be stored
|
|
|
|
* @address: address to lookup
|
|
|
|
*
|
|
|
|
* This function is for stack backtrace and does the same thing as
|
|
|
|
* sprint_symbol() but with modified/decreased @address. If there is a
|
|
|
|
* tail-call to the function marked "noreturn", gcc optimized out code after
|
|
|
|
* the call so that the stack-saved return address could point outside of the
|
|
|
|
* caller. This function ensures that kallsyms will find the original caller
|
|
|
|
* by decreasing @address.
|
|
|
|
*
|
|
|
|
* This function returns the number of bytes stored in @buffer.
|
|
|
|
*/
|
|
|
|
int sprint_backtrace(char *buffer, unsigned long address)
|
|
|
|
{
|
2021-07-08 04:09:20 +03:00
|
|
|
return __sprint_symbol(buffer, address, -1, 1, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* sprint_backtrace_build_id - Look up a backtrace symbol and return it in a text buffer
|
|
|
|
* @buffer: buffer to be stored
|
|
|
|
* @address: address to lookup
|
|
|
|
*
|
|
|
|
* This function is for stack backtrace and does the same thing as
|
|
|
|
* sprint_symbol() but with modified/decreased @address. If there is a
|
|
|
|
* tail-call to the function marked "noreturn", gcc optimized out code after
|
|
|
|
* the call so that the stack-saved return address could point outside of the
|
|
|
|
* caller. This function ensures that kallsyms will find the original caller
|
|
|
|
* by decreasing @address. This function also appends the module build ID to
|
|
|
|
* the @buffer if @address is within a kernel module.
|
|
|
|
*
|
|
|
|
* This function returns the number of bytes stored in @buffer.
|
|
|
|
*/
|
|
|
|
int sprint_backtrace_build_id(char *buffer, unsigned long address)
|
|
|
|
{
|
|
|
|
return __sprint_symbol(buffer, address, -1, 1, 1);
|
2011-03-24 05:42:29 +03:00
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
/* To avoid using get_symbol_offset for every symbol, we carry prefix along. */
|
2009-05-13 00:43:35 +04:00
|
|
|
struct kallsym_iter {
|
2005-04-17 02:20:36 +04:00
|
|
|
loff_t pos;
|
2018-06-06 15:54:10 +03:00
|
|
|
loff_t pos_arch_end;
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
loff_t pos_mod_end;
|
2017-09-06 15:40:41 +03:00
|
|
|
loff_t pos_ftrace_mod_end;
|
2020-05-28 11:00:58 +03:00
|
|
|
loff_t pos_bpf_end;
|
2005-04-17 02:20:36 +04:00
|
|
|
unsigned long value;
|
2009-05-13 00:43:35 +04:00
|
|
|
unsigned int nameoff; /* If iterating in core kernel symbols. */
|
2005-04-17 02:20:36 +04:00
|
|
|
char type;
|
2007-07-17 15:03:51 +04:00
|
|
|
char name[KSYM_NAME_LEN];
|
|
|
|
char module_name[MODULE_NAME_LEN];
|
2007-05-08 11:28:39 +04:00
|
|
|
int exported;
|
2017-11-08 23:51:04 +03:00
|
|
|
int show_value;
|
2005-04-17 02:20:36 +04:00
|
|
|
};
|
|
|
|
|
2018-06-06 15:54:10 +03:00
|
|
|
int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value,
|
|
|
|
char *type, char *name)
|
|
|
|
{
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int get_ksymbol_arch(struct kallsym_iter *iter)
|
|
|
|
{
|
|
|
|
int ret = arch_get_kallsym(iter->pos - kallsyms_num_syms,
|
|
|
|
&iter->value, &iter->type,
|
|
|
|
iter->name);
|
|
|
|
|
|
|
|
if (ret < 0) {
|
|
|
|
iter->pos_arch_end = iter->pos;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
static int get_ksymbol_mod(struct kallsym_iter *iter)
|
|
|
|
{
|
2018-06-06 15:54:10 +03:00
|
|
|
int ret = module_get_kallsym(iter->pos - iter->pos_arch_end,
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
&iter->value, &iter->type,
|
|
|
|
iter->name, iter->module_name,
|
|
|
|
&iter->exported);
|
|
|
|
if (ret < 0) {
|
|
|
|
iter->pos_mod_end = iter->pos;
|
2005-04-17 02:20:36 +04:00
|
|
|
return 0;
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2020-05-12 15:19:13 +03:00
|
|
|
/*
|
|
|
|
* ftrace_mod_get_kallsym() may also get symbols for pages allocated for ftrace
|
|
|
|
* purposes. In that case "__builtin__ftrace" is used as a module name, even
|
|
|
|
* though "__builtin__ftrace" is not a module.
|
|
|
|
*/
|
2017-09-06 15:40:41 +03:00
|
|
|
static int get_ksymbol_ftrace_mod(struct kallsym_iter *iter)
|
|
|
|
{
|
|
|
|
int ret = ftrace_mod_get_kallsym(iter->pos - iter->pos_mod_end,
|
|
|
|
&iter->value, &iter->type,
|
|
|
|
iter->name, iter->module_name,
|
|
|
|
&iter->exported);
|
|
|
|
if (ret < 0) {
|
|
|
|
iter->pos_ftrace_mod_end = iter->pos;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
static int get_ksymbol_bpf(struct kallsym_iter *iter)
|
|
|
|
{
|
2020-05-28 11:00:58 +03:00
|
|
|
int ret;
|
|
|
|
|
2019-01-17 19:15:21 +03:00
|
|
|
strlcpy(iter->module_name, "bpf", MODULE_NAME_LEN);
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
iter->exported = 0;
|
2020-05-28 11:00:58 +03:00
|
|
|
ret = bpf_get_kallsym(iter->pos - iter->pos_ftrace_mod_end,
|
|
|
|
&iter->value, &iter->type,
|
|
|
|
iter->name);
|
|
|
|
if (ret < 0) {
|
|
|
|
iter->pos_bpf_end = iter->pos;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This uses "__builtin__kprobes" as a module name for symbols for pages
|
|
|
|
* allocated for kprobes' purposes, even though "__builtin__kprobes" is not a
|
|
|
|
* module.
|
|
|
|
*/
|
|
|
|
static int get_ksymbol_kprobe(struct kallsym_iter *iter)
|
|
|
|
{
|
|
|
|
strlcpy(iter->module_name, "__builtin__kprobes", MODULE_NAME_LEN);
|
|
|
|
iter->exported = 0;
|
|
|
|
return kprobe_get_kallsym(iter->pos - iter->pos_bpf_end,
|
|
|
|
&iter->value, &iter->type,
|
|
|
|
iter->name) < 0 ? 0 : 1;
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
/* Returns space to next name. */
|
|
|
|
static unsigned long get_ksymbol_core(struct kallsym_iter *iter)
|
|
|
|
{
|
|
|
|
unsigned off = iter->nameoff;
|
|
|
|
|
2007-05-08 11:28:39 +04:00
|
|
|
iter->module_name[0] = '\0';
|
kallsyms: add support for relative offsets in kallsyms address table
Similar to how relative extables are implemented, it is possible to emit
the kallsyms table in such a way that it contains offsets relative to
some anchor point in the kernel image rather than absolute addresses.
On 64-bit architectures, it cuts the size of the kallsyms address table
in half, since offsets between kernel symbols can typically be expressed
in 32 bits. This saves several hundreds of kilobytes of permanent
.rodata on average. In addition, the kallsyms address table is no
longer subject to dynamic relocation when CONFIG_RELOCATABLE is in
effect, so the relocation work done after decompression now doesn't have
to do relocation updates for all these values. This saves up to 24
bytes (i.e., the size of a ELF64 RELA relocation table entry) per value,
which easily adds up to a couple of megabytes of uncompressed __init
data on ppc64 or arm64. Even if these relocation entries typically
compress well, the combined size reduction of 2.8 MB uncompressed for a
ppc64_defconfig build (of which 2.4 MB is __init data) results in a ~500
KB space saving in the compressed image.
Since it is useful for some architectures (like x86) to retain the
ability to emit absolute values as well, this patch also adds support
for capturing both absolute and relative values when
KALLSYMS_ABSOLUTE_PERCPU is in effect, by emitting absolute per-cpu
addresses as positive 32-bit values, and addresses relative to the
lowest encountered relative symbol as negative values, which are
subtracted from the runtime address of this base symbol to produce the
actual address.
Support for the above is enabled by default for all architectures except
IA-64 and Tile-GX, whose symbols are too far apart to capture in this
manner.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-16 00:58:19 +03:00
|
|
|
iter->value = kallsyms_sym_address(iter->pos);
|
2005-04-17 02:20:36 +04:00
|
|
|
|
|
|
|
iter->type = kallsyms_get_symbol_type(off);
|
|
|
|
|
2013-04-15 09:34:43 +04:00
|
|
|
off = kallsyms_expand_symbol(off, iter->name, ARRAY_SIZE(iter->name));
|
2005-04-17 02:20:36 +04:00
|
|
|
|
|
|
|
return off - iter->nameoff;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void reset_iter(struct kallsym_iter *iter, loff_t new_pos)
|
|
|
|
{
|
|
|
|
iter->name[0] = '\0';
|
|
|
|
iter->nameoff = get_symbol_offset(new_pos);
|
|
|
|
iter->pos = new_pos;
|
2017-09-06 15:40:41 +03:00
|
|
|
if (new_pos == 0) {
|
2018-06-06 15:54:10 +03:00
|
|
|
iter->pos_arch_end = 0;
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
iter->pos_mod_end = 0;
|
2017-09-06 15:40:41 +03:00
|
|
|
iter->pos_ftrace_mod_end = 0;
|
2020-05-28 11:00:58 +03:00
|
|
|
iter->pos_bpf_end = 0;
|
2017-09-06 15:40:41 +03:00
|
|
|
}
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
}
|
|
|
|
|
2018-06-06 15:54:09 +03:00
|
|
|
/*
|
|
|
|
* The end position (last + 1) of each additional kallsyms section is recorded
|
|
|
|
* in iter->pos_..._end as each section is added, and so can be used to
|
|
|
|
* determine which get_ksymbol_...() function to call next.
|
|
|
|
*/
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
static int update_iter_mod(struct kallsym_iter *iter, loff_t pos)
|
|
|
|
{
|
|
|
|
iter->pos = pos;
|
|
|
|
|
2018-06-06 15:54:10 +03:00
|
|
|
if ((!iter->pos_arch_end || iter->pos_arch_end > pos) &&
|
|
|
|
get_ksymbol_arch(iter))
|
|
|
|
return 1;
|
|
|
|
|
2018-06-06 15:54:09 +03:00
|
|
|
if ((!iter->pos_mod_end || iter->pos_mod_end > pos) &&
|
|
|
|
get_ksymbol_mod(iter))
|
2017-09-06 15:40:41 +03:00
|
|
|
return 1;
|
|
|
|
|
2018-06-06 15:54:09 +03:00
|
|
|
if ((!iter->pos_ftrace_mod_end || iter->pos_ftrace_mod_end > pos) &&
|
|
|
|
get_ksymbol_ftrace_mod(iter))
|
|
|
|
return 1;
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
|
2020-05-28 11:00:58 +03:00
|
|
|
if ((!iter->pos_bpf_end || iter->pos_bpf_end > pos) &&
|
|
|
|
get_ksymbol_bpf(iter))
|
|
|
|
return 1;
|
|
|
|
|
|
|
|
return get_ksymbol_kprobe(iter);
|
2005-04-17 02:20:36 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Returns false if pos at or past end of file. */
|
|
|
|
static int update_iter(struct kallsym_iter *iter, loff_t pos)
|
|
|
|
{
|
|
|
|
/* Module symbols can be accessed randomly. */
|
bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 00:24:50 +03:00
|
|
|
if (pos >= kallsyms_num_syms)
|
|
|
|
return update_iter_mod(iter, pos);
|
2009-05-13 00:43:35 +04:00
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
/* If we're not on the desired position, reset to new position. */
|
|
|
|
if (pos != iter->pos)
|
|
|
|
reset_iter(iter, pos);
|
|
|
|
|
|
|
|
iter->nameoff += get_ksymbol_core(iter);
|
|
|
|
iter->pos++;
|
|
|
|
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *s_next(struct seq_file *m, void *p, loff_t *pos)
|
|
|
|
{
|
|
|
|
(*pos)++;
|
|
|
|
|
|
|
|
if (!update_iter(m->private, *pos))
|
|
|
|
return NULL;
|
|
|
|
return p;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void *s_start(struct seq_file *m, loff_t *pos)
|
|
|
|
{
|
|
|
|
if (!update_iter(m->private, *pos))
|
|
|
|
return NULL;
|
|
|
|
return m->private;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void s_stop(struct seq_file *m, void *p)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static int s_show(struct seq_file *m, void *p)
|
|
|
|
{
|
2017-11-29 21:30:13 +03:00
|
|
|
void *value;
|
2005-04-17 02:20:36 +04:00
|
|
|
struct kallsym_iter *iter = m->private;
|
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/* Some debugging symbols have no name. Ignore them. */
|
2005-04-17 02:20:36 +04:00
|
|
|
if (!iter->name[0])
|
|
|
|
return 0;
|
|
|
|
|
2017-11-29 21:30:13 +03:00
|
|
|
value = iter->show_value ? (void *)iter->value : NULL;
|
2017-11-08 23:51:04 +03:00
|
|
|
|
2007-05-08 11:28:39 +04:00
|
|
|
if (iter->module_name[0]) {
|
|
|
|
char type;
|
|
|
|
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* Label it "global" if it is exported,
|
|
|
|
* "local" if not exported.
|
|
|
|
*/
|
2007-05-08 11:28:39 +04:00
|
|
|
type = iter->exported ? toupper(iter->type) :
|
|
|
|
tolower(iter->type);
|
2017-11-29 21:30:13 +03:00
|
|
|
seq_printf(m, "%px %c %s\t[%s]\n", value,
|
2011-03-23 02:34:22 +03:00
|
|
|
type, iter->name, iter->module_name);
|
2007-05-08 11:28:39 +04:00
|
|
|
} else
|
2017-11-29 21:30:13 +03:00
|
|
|
seq_printf(m, "%px %c %s\n", value,
|
2011-03-23 02:34:22 +03:00
|
|
|
iter->type, iter->name);
|
2005-04-17 02:20:36 +04:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2006-12-07 07:40:36 +03:00
|
|
|
static const struct seq_operations kallsyms_op = {
|
2005-04-17 02:20:36 +04:00
|
|
|
.start = s_start,
|
|
|
|
.next = s_next,
|
|
|
|
.stop = s_stop,
|
|
|
|
.show = s_show
|
|
|
|
};
|
|
|
|
|
2017-11-08 23:51:04 +03:00
|
|
|
static inline int kallsyms_for_perf(void)
|
|
|
|
{
|
|
|
|
#ifdef CONFIG_PERF_EVENTS
|
|
|
|
extern int sysctl_perf_event_paranoid;
|
|
|
|
if (sysctl_perf_event_paranoid <= 1)
|
|
|
|
return 1;
|
|
|
|
#endif
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We show kallsyms information even to normal users if we've enabled
|
|
|
|
* kernel profiling and are explicitly not paranoid (so kptr_restrict
|
|
|
|
* is clear, and sysctl_perf_event_paranoid isn't set).
|
|
|
|
*
|
|
|
|
* Otherwise, require CAP_SYSLOG (assuming kptr_restrict isn't set to
|
|
|
|
* block even that).
|
|
|
|
*/
|
2020-07-02 21:49:23 +03:00
|
|
|
bool kallsyms_show_value(const struct cred *cred)
|
2017-11-08 23:51:04 +03:00
|
|
|
{
|
|
|
|
switch (kptr_restrict) {
|
|
|
|
case 0:
|
|
|
|
if (kallsyms_for_perf())
|
2020-07-02 21:49:23 +03:00
|
|
|
return true;
|
2020-08-24 01:36:59 +03:00
|
|
|
fallthrough;
|
2017-11-08 23:51:04 +03:00
|
|
|
case 1:
|
2020-07-02 21:49:23 +03:00
|
|
|
if (security_capable(cred, &init_user_ns, CAP_SYSLOG,
|
|
|
|
CAP_OPT_NOAUDIT) == 0)
|
|
|
|
return true;
|
2020-08-24 01:36:59 +03:00
|
|
|
fallthrough;
|
2017-11-08 23:51:04 +03:00
|
|
|
default:
|
2020-07-02 21:49:23 +03:00
|
|
|
return false;
|
2017-11-08 23:51:04 +03:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
static int kallsyms_open(struct inode *inode, struct file *file)
|
|
|
|
{
|
2009-05-13 00:43:35 +04:00
|
|
|
/*
|
|
|
|
* We keep iterator in m->private, since normal case is to
|
2005-04-17 02:20:36 +04:00
|
|
|
* s_start from where we left off, so we avoid doing
|
2009-05-13 00:43:35 +04:00
|
|
|
* using get_symbol_offset for every symbol.
|
|
|
|
*/
|
2005-04-17 02:20:36 +04:00
|
|
|
struct kallsym_iter *iter;
|
2014-10-14 02:52:10 +04:00
|
|
|
iter = __seq_open_private(file, &kallsyms_op, sizeof(*iter));
|
2005-04-17 02:20:36 +04:00
|
|
|
if (!iter)
|
|
|
|
return -ENOMEM;
|
|
|
|
reset_iter(iter, 0);
|
|
|
|
|
2020-07-02 21:49:23 +03:00
|
|
|
/*
|
|
|
|
* Instead of checking this on every s_show() call, cache
|
|
|
|
* the result here at open time.
|
|
|
|
*/
|
|
|
|
iter->show_value = kallsyms_show_value(file->f_cred);
|
2014-10-14 02:52:10 +04:00
|
|
|
return 0;
|
2005-04-17 02:20:36 +04:00
|
|
|
}
|
|
|
|
|
2010-05-21 06:04:21 +04:00
|
|
|
#ifdef CONFIG_KGDB_KDB
|
|
|
|
const char *kdb_walk_kallsyms(loff_t *pos)
|
|
|
|
{
|
|
|
|
static struct kallsym_iter kdb_walk_kallsyms_iter;
|
|
|
|
if (*pos == 0) {
|
|
|
|
memset(&kdb_walk_kallsyms_iter, 0,
|
|
|
|
sizeof(kdb_walk_kallsyms_iter));
|
|
|
|
reset_iter(&kdb_walk_kallsyms_iter, 0);
|
|
|
|
}
|
|
|
|
while (1) {
|
|
|
|
if (!update_iter(&kdb_walk_kallsyms_iter, *pos))
|
|
|
|
return NULL;
|
|
|
|
++*pos;
|
|
|
|
/* Some debugging symbols have no name. Ignore them. */
|
|
|
|
if (kdb_walk_kallsyms_iter.name[0])
|
|
|
|
return kdb_walk_kallsyms_iter.name;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#endif /* CONFIG_KGDB_KDB */
|
|
|
|
|
2020-02-04 04:37:17 +03:00
|
|
|
static const struct proc_ops kallsyms_proc_ops = {
|
|
|
|
.proc_open = kallsyms_open,
|
|
|
|
.proc_read = seq_read,
|
|
|
|
.proc_lseek = seq_lseek,
|
|
|
|
.proc_release = seq_release_private,
|
2005-04-17 02:20:36 +04:00
|
|
|
};
|
|
|
|
|
|
|
|
static int __init kallsyms_init(void)
|
|
|
|
{
|
2020-02-04 04:37:17 +03:00
|
|
|
proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
|
2005-04-17 02:20:36 +04:00
|
|
|
return 0;
|
|
|
|
}
|
2009-05-13 00:43:35 +04:00
|
|
|
device_initcall(kallsyms_init);
|