954 строки
31 KiB
Markdown
954 строки
31 KiB
Markdown
# 1. Introduction
|
|
|
|
This tutorial illustrates how eBPF works and in particular how the eBPF verifier works on Windows,
|
|
starting from authoring a new eBPF program in C.
|
|
|
|
To try out this tutorial yourself, you should first install the [Prerequisites](GettingStarted.md#Prerequisites).
|
|
We'll start by understanding the basic structure of eBPF programs and then walk through how to
|
|
apply them in a real use case.
|
|
|
|
# 2. Authoring a simple eBPF Program
|
|
|
|
Note: This walkthrough is based on the one at [eBPF assembly with LLVM (qmonnet.github.io)](http://releases.llvm.org/7.0.1/LLVM-7.0.1-win64.exe),
|
|
and in fact the same steps should work on both Windows and Linux, including
|
|
in WSL on Windows. (The only exception is that the llvm-objdump utility will
|
|
fail if you have an old LLVM version in WSL, i.e., if `llvm-objdump -version`
|
|
shows only LLVM version 3.8.0, it is too old and needs to be upgraded first.)
|
|
However, we'll do this walkthrough assuming one is only using Windows.
|
|
|
|
**Step 1)** Author a new file by putting some content into a file, say bpf.c:
|
|
|
|
```
|
|
int func()
|
|
{
|
|
return 0;
|
|
}
|
|
```
|
|
|
|
For this example, that's all the content that's needed, no #includes or
|
|
anything.
|
|
|
|
**Step 2)** Compile optimized code with clang as follows:
|
|
|
|
```
|
|
> clang -target bpf -Wall -O2 -c bpf.c -o bpf.o
|
|
```
|
|
|
|
This will compile bpf.c (into bpf.o in this example) using bpf as the assembly format,
|
|
since eBPF has its own [instruction set architecture (ISA)](https://github.com/iovisor/bpf-docs/blob/master/eBPF.md).
|
|
|
|
To see what clang did, we can generate disassembly as follows:
|
|
|
|
```
|
|
> llvm-objdump --triple=bpf -S bpf.o
|
|
|
|
bpf.o: file format ELF64-BPF
|
|
|
|
Disassembly of section .text:
|
|
func:
|
|
0: b7 00 00 00 00 00 00 00 r0 = 0
|
|
1: 95 00 00 00 00 00 00 00 exit
|
|
```
|
|
|
|
You can see that all the program does is set register 0 (the register used
|
|
for return values in the eBPF ISA) to 0, and exit.
|
|
|
|
Since we compiled the program optimized, and without debug info, that's
|
|
all we can get.
|
|
|
|
**Step 3)** Repeat the above exercise but enable debugging using `-g` and
|
|
for this walkthrough we will put the result into a separate .o file,
|
|
bpf-d.o in this example:
|
|
|
|
```
|
|
> clang -target bpf -Wall -g -O2 -c bpf.c -o bpf-d.o
|
|
```
|
|
|
|
|
|
The `llvm-objdump -S` command from step 2 will now be able to show the
|
|
source lines as well:
|
|
|
|
```
|
|
> llvm-objdump --triple=bpf -S bpf-d.o
|
|
|
|
bpf.o: file format ELF64-BPF
|
|
|
|
Disassembly of section .text:
|
|
func:
|
|
; {
|
|
0: b7 00 00 00 00 00 00 00 r0 = 0
|
|
; return 0;
|
|
1: 95 00 00 00 00 00 00 00 exit
|
|
```
|
|
|
|
**Step 4)** Learn how sections work
|
|
|
|
In steps 2 and 3, the code is placed into a section called ".text" as can be
|
|
seen from the header in the middle of the disassembly output. One can list
|
|
all sections in the object file using -h as follows:
|
|
|
|
```
|
|
> llvm-objdump --triple=bpf -h bpf.o
|
|
|
|
bpf.o: file format ELF64-BPF
|
|
|
|
Sections:
|
|
Idx Name Size Address Type
|
|
0 00000000 0000000000000000
|
|
1 .strtab 0000003e 0000000000000000
|
|
2 .text 00000010 0000000000000000 TEXT
|
|
3 .BTF 00000019 0000000000000000
|
|
4 .BTF.ext 00000020 0000000000000000
|
|
5 .llvm_addrsig 00000000 0000000000000000
|
|
6 .symtab 00000048 0000000000000000
|
|
```
|
|
|
|
Notice that the only section with actual code in it (i.e., with the "TEXT"
|
|
label after it) is section 2, named ".text". And for comparison, the
|
|
debug-enabled object file also contains various debugging info:
|
|
|
|
```
|
|
> llvm-objdump --triple=bpf -h bpf-d.o
|
|
|
|
bpf-d.o: file format ELF64-BPF
|
|
|
|
Sections:
|
|
Idx Name Size Address Type
|
|
0 00000000 0000000000000000
|
|
1 .strtab 0000009b 0000000000000000
|
|
2 .text 00000010 0000000000000000 TEXT
|
|
3 .debug_str 00000052 0000000000000000
|
|
4 .debug_abbrev 00000034 0000000000000000
|
|
5 .debug_info 0000004b 0000000000000000
|
|
6 .rel.debug_info 00000090 0000000000000000
|
|
7 .debug_macinfo 00000001 0000000000000000
|
|
8 .BTF 0000007a 0000000000000000
|
|
9 .BTF.ext 00000048 0000000000000000
|
|
10 .rel.BTF.ext 00000020 0000000000000000
|
|
11 .debug_frame 00000028 0000000000000000
|
|
12 .rel.debug_frame 00000020 0000000000000000
|
|
13 .debug_line 0000003c 0000000000000000
|
|
14 .rel.debug_line 00000010 0000000000000000
|
|
15 .llvm_addrsig 00000000 0000000000000000
|
|
16 .symtab 00000120 0000000000000000
|
|
```
|
|
|
|
The static verifier that checks the safety of eBPF programs also supports multiple TEXT sections, with custom
|
|
names, so let's also try using a custom name instead, say "myprog". We
|
|
can do this by adding a pragma, where any functions following that pragma
|
|
will be put into a section with a specified name, until another such
|
|
pragma is encountered with a different name, or the end of the file is
|
|
reached. In this way, there can even be multiple sections per source file.
|
|
|
|
Author a new file, say in "bpf2.c" this time, with another function and a
|
|
pragma above each one:
|
|
|
|
```
|
|
#pragma clang section text="myprog"
|
|
|
|
int func()
|
|
{
|
|
return 0;
|
|
}
|
|
|
|
#pragma clang section text="another"
|
|
|
|
int anotherfunc()
|
|
{
|
|
return 1;
|
|
}
|
|
```
|
|
|
|
If we now compile the above code as before we can see the new list of sections.
|
|
|
|
```
|
|
> clang -target bpf -Wall -O2 -c bpf2.c -o bpf2.o
|
|
|
|
> llvm-objdump --triple=bpf -h bpf2.o
|
|
|
|
bpf2.o: file format ELF64-BPF
|
|
|
|
Sections:
|
|
Idx Name Size Address Type
|
|
0 00000000 0000000000000000
|
|
1 .strtab 00000055 0000000000000000
|
|
2 .text 00000000 0000000000000000 TEXT
|
|
3 myprog 00000010 0000000000000000 TEXT
|
|
4 another 00000010 0000000000000000 TEXT
|
|
5 .BTF 00000019 0000000000000000
|
|
6 .BTF.ext 00000020 0000000000000000
|
|
7 .llvm_addrsig 00000000 0000000000000000
|
|
8 .symtab 00000060 0000000000000000
|
|
```
|
|
|
|
Notice that there is still the .text section, but it has a size of 0,
|
|
because all the code is either in the "myprog" section or the "another"
|
|
section.
|
|
|
|
To dump a specific section (e.g., myprog), use the following:
|
|
|
|
```
|
|
> llvm-objdump --triple=bpf -S --section=myprog bpf2.o
|
|
```
|
|
|
|
# 3. Compiling eBPF for Windows
|
|
|
|
**Step 1)** Get the source code:
|
|
|
|
```
|
|
> git clone --recurse-submodules https://github.com/microsoft/ebpf-for-windows.git
|
|
|
|
> cd ebpf-for-windows
|
|
```
|
|
|
|
**Step 2)** Generate a solution:
|
|
|
|
eBPF for Windows uses the [Prevail eBPF verifier](https://github.com/vbpf/ebpf-verifier) as a submodule
|
|
which uses a cmake-based build, so first we need to generate the Visual Studio project for it:
|
|
|
|
```
|
|
> cmake -S external\ebpf-verifier -B external\ebpf-verifier\build
|
|
```
|
|
|
|
This will result in a Visual Studio solution and projects getting generated
|
|
in the specified subdirectory ("external\ebpf-verifier\build").
|
|
|
|
**Step 3)** Build the solution:
|
|
|
|
This can be done either from the command line or from within the Visual Studio UI.
|
|
|
|
To use the command line:
|
|
|
|
```
|
|
> msbuild /m /p:Configuration=Debug /p:Platform=x64 ebpf-for-windows.sln
|
|
```
|
|
|
|
Or, to use the Visual Studio UI, open the solution in Visual Studio:
|
|
|
|
```
|
|
> ebpf-for-windows.sln
|
|
```
|
|
|
|
Next, right click on the solution in the Solution Explorer and select "Restore NuGet Packages".
|
|
Then set the configuration to Debug and the platform to x64 (if not already set), and
|
|
compile it with "Build->Build Solution".
|
|
|
|
Building the solution may generate some compiler warnings, but should still
|
|
compile successfully.
|
|
|
|
|
|
# 4. Installing the eBPF netsh helper on Windows
|
|
|
|
Now we're ready to learn how to use eBPF on Windows. For this tutorial, we only need to install the `netsh` helper.
|
|
From an Admin command shell, do the following from your ebpf-for-windows directory:
|
|
|
|
```
|
|
> copy x64\Debug\*.dll %windir%\system32
|
|
> netsh add helper %windir%\system32\ebpfnetsh.dll
|
|
```
|
|
|
|
# 5. Verifying eBPF programs on Windows
|
|
|
|
Normally verification happens at the time an eBPF program is submitted to be loaded. That can be done,
|
|
but in this tutorial, we'll just do verification _without_ needing to load the program. This allows this
|
|
tutorial to be done on any machine, not just one with the eBPF driver installed into the kernel.
|
|
|
|
**Step 1)** Enumerate sections
|
|
|
|
In step 4 of part 2, we saw how to use `llvm-objdump -h` to list all sections
|
|
in an object file. We'll now do the same with `netsh`. Do the following from
|
|
the directory you used for part 1 (replace "Release" in the path with "Debug" if you only
|
|
built a Debug version in step 3):
|
|
|
|
```
|
|
> netsh ebpf show sections bpf.o
|
|
|
|
Section Type # Maps Size
|
|
==================== ====== ====== ======
|
|
.text 1 0 2
|
|
|
|
> netsh ebpf show sections bpf-d.o
|
|
|
|
Section Type # Maps Size
|
|
==================== ====== ====== ======
|
|
.text 1 0 2
|
|
|
|
> netsh ebpf show sections bpf2.o
|
|
|
|
Section Type # Maps Size
|
|
==================== ====== ====== ======
|
|
myprog 1 0 2
|
|
another 1 0 2
|
|
```
|
|
|
|
Notice that it only lists non-empty TEXT sections, whereas `llvm-objdump -h`
|
|
showed all sections. That's because netsh is just looking for eBPF
|
|
programs, which are always in non-empty TEXT sections.
|
|
|
|
`netsh` allows all keywords to be abbreviated, so we could have done
|
|
`netsh ebpf sh sec bpf.o` instead. Throughout this tutorial, we'll always spell
|
|
things out for readability, but feel free to abbreviate to save typing.
|
|
|
|
**Step 2)** Run the verifier on our sample program
|
|
|
|
```
|
|
> netsh ebpf show verification bpf.o
|
|
|
|
Verification succeeded
|
|
```
|
|
|
|
The verification command succeeded because there was only one
|
|
non-empty TEXT section in bpf.o, so the verifier found it and used that
|
|
as the eBPF program to verify. If we try the same on an object file with
|
|
multiple such sections, we get this:
|
|
|
|
```
|
|
> netsh ebpf show verification bpf2.o
|
|
|
|
Verification succeeded
|
|
```
|
|
|
|
This is because the verifier ran on the *first* eBPF program it found,
|
|
which was "myprog" in the section listing. We can explicitly
|
|
specify the section to use as follows:
|
|
|
|
```
|
|
> netsh ebpf show verification bpf2.o myprog
|
|
|
|
Verification succeeded
|
|
|
|
> netsh ebpf show verification bpf2.o another
|
|
|
|
Verification succeeded
|
|
```
|
|
|
|
**Step 2)** View disassembly
|
|
|
|
In step 2 of part 2, we saw how to use "llvm-objdump -S" to view disassembly.
|
|
We'll now do the same with `netsh`:
|
|
|
|
```
|
|
> netsh ebpf show disassembly bpf.o
|
|
0: r0 = 0
|
|
1: exit
|
|
```
|
|
|
|
You can see that the two instructions match the two seen back in step 2 of
|
|
part 2. Again for bpf2.o we
|
|
can specify which section to use, since there is more than one:
|
|
|
|
```
|
|
> netsh ebpf show disassembly bpf2.o myprog
|
|
0: r0 = 0
|
|
1: exit
|
|
|
|
> netsh ebpf show disassembly bpf2.o another
|
|
0: r0 = 1
|
|
1: exit
|
|
```
|
|
|
|
**Step 3)** View program stats
|
|
|
|
One can view various stats about the program, without running the
|
|
verification process, using the "level=verbose" option to "show section":
|
|
|
|
```
|
|
> netsh ebpf show section bpf.o .text verbose
|
|
|
|
Section : .text
|
|
Program Type : 1
|
|
# Maps : 0
|
|
Size : 2 instructions
|
|
adjust_head : 0
|
|
arith : 0
|
|
arith32 : 0
|
|
arith64 : 1
|
|
assign : 1
|
|
basic_blocks : 2
|
|
call_1 : 0
|
|
call_mem : 0
|
|
call_nomem : 0
|
|
joins : 0
|
|
jumps : 0
|
|
load : 0
|
|
load_store : 0
|
|
map_in_map : 0
|
|
other : 2
|
|
packet_access: 0
|
|
store : 0
|
|
```
|
|
|
|
So for our tiny bpf.c program that just does `return 0;`, we can see that
|
|
it has 2 instructions, in 2 basic blocks, with 1 assign and no jumps or
|
|
joins.
|
|
|
|
**Step 4)** View verifier verbose output
|
|
|
|
We can view verbose output to see what the verifier is actually doing,
|
|
using the "level=verbose" option to "show verification":
|
|
|
|
```
|
|
> netsh ebpf show verification bpf.o level=verbose
|
|
|
|
Preconditions : {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[0]
|
|
}
|
|
Stack: Numbers -> {}
|
|
entry:
|
|
goto 0;
|
|
|
|
Postconditions: {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[1]
|
|
}
|
|
Stack: Numbers -> {}
|
|
|
|
Preconditions : {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[1]
|
|
}
|
|
Stack: Numbers -> {}
|
|
0:
|
|
r0 = 0;
|
|
goto 1;
|
|
|
|
Postconditions: {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[3], r0.value=[0], r0.type=number
|
|
}
|
|
Stack: Numbers -> {}
|
|
|
|
Preconditions : {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[3], r0.value=[0], r0.type=number
|
|
}
|
|
Stack: Numbers -> {}
|
|
1:
|
|
assert r0 is number;
|
|
exit;
|
|
goto exit;
|
|
|
|
Postconditions: {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[6], r0.value=[0], r0.type=number
|
|
}
|
|
Stack: Numbers -> {}
|
|
|
|
Preconditions : {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[6], r0.value=[0], r0.type=number
|
|
}
|
|
Stack: Numbers -> {}
|
|
exit:
|
|
|
|
|
|
Postconditions: {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[7], r0.value=[0], r0.type=number
|
|
}
|
|
Stack: Numbers -> {}
|
|
|
|
|
|
0 errors
|
|
Verification succeeded
|
|
```
|
|
|
|
Normally we wouldn't need to do this, but it is illustrative to see how the
|
|
verifier works.
|
|
|
|
Each instruction is shown as before, but is preceded by its preconditions
|
|
(or inputs), and followed by its postconditions (or outputs).
|
|
|
|
"oo" means infinity, "r0" through "r10" are registers (r10 is the stack
|
|
pointer, r0 is used for return values, r1-5 are used to pass args to other
|
|
functions, r6 is the 'ctx' pointer, etc.
|
|
|
|
"meta_offset" is the number of bytes of packet metadata preceding (i.e.,
|
|
with negative offset from) the start of the packet buffer.
|
|
|
|
# 6. Advanced Topics
|
|
|
|
## 6.1. Hooks and arguments
|
|
|
|
Hook points are callouts exposed by the system to which eBPF programs can
|
|
attach. By convention, the section name of the eBPF program in an ELF file
|
|
is commonly used to designate which hook point the eBPF program is designed
|
|
for. Specifically, a set of prefix strings are used to match against the
|
|
section name. For example, any section name starting with "xdp" is meant
|
|
as an XDP layer program. This is a convenient default, but can be
|
|
overridden by an app asking to load an eBPF program, such as when the eBPF program is simply in the
|
|
".text" section.
|
|
|
|
Each hook point has a specified prototype which must be understood by the
|
|
verifier. That is, the verifier needs to understand all the hooks for the
|
|
specified platform on which the eBPF program will execute. The hook points
|
|
are in general different for Linux vs. Windows, as are the prototypes for
|
|
hook points that might be similarly named.
|
|
|
|
Typically the first and only argument of the hook point is a context
|
|
structure which contains an arbitrary amount of data. (Tail calls to
|
|
programs can have more than one argument, but hooks put all the info in a
|
|
hook-specific context structure passed as one argument.)
|
|
|
|
Let's say that the "xdp" hook point has the following prototype:
|
|
|
|
```
|
|
// xdp.h
|
|
#include <stdint.h>
|
|
|
|
typedef struct _ebpf_xdp_args
|
|
{
|
|
void* data;
|
|
void* data_end;
|
|
uint64_t data_meta;
|
|
} ebpf_xdp_args_t;
|
|
|
|
typedef int (*xdp_callout)(ebpf_xdp_args_t* args);
|
|
```
|
|
|
|
A sample eBPF program might look like this:
|
|
|
|
```
|
|
#include "xdp.h"
|
|
|
|
// Put "xdp" in the section name to specify XDP as the hook.
|
|
// The __attribute__ below has the same effect as the
|
|
// clang pragma used in section 2 of this tutorial.
|
|
__attribute__((section("xdp"), used))
|
|
int my_xdp_parser(ebpf_xdp_args_t* args)
|
|
{
|
|
int length = (char *)args->data_end - (char *)args->data;
|
|
|
|
if (length > 1) {
|
|
return 1; // allow
|
|
}
|
|
return 0; // block
|
|
}
|
|
```
|
|
|
|
The verifier needs to be enlightened with the same prototype or all
|
|
programs written for that hook will fail verification. For Windows,
|
|
this info is in the [windows_platform.cpp](../src/ebpf/libs/api/windows_platform.cpp) file,
|
|
which for the above prototype might have:
|
|
|
|
```
|
|
const EbpfContextDescriptor g_xdp_context_descriptor = {
|
|
24, // Size of ctx struct.
|
|
0, // Offset into ctx struct of pointer to data, or -1 if none.
|
|
8, // Offset into ctx struct of pointer to end of data, or -1 if none.
|
|
16, // Offset into ctx struct of pointer to metadata, or -1 if none.
|
|
};
|
|
|
|
const EbpfProgramType windows_xdp_program_type =
|
|
PTYPE("xdp", // Just for printing messages to users.
|
|
&g_xdp_context_descriptor,
|
|
EBPF_PROG_TYPE_XDP,
|
|
{"xdp"}); // Set of section name prefixes for matching.
|
|
```
|
|
|
|
Let's look at the code above in more detail. The EbpfContextDescriptor
|
|
info (i.e., `g_xdp_context_descriptor`) tells the verifier about the format
|
|
of the context structure (i.e., `struct ebpf_xdp_args`). The struct is
|
|
24 bytes long, includes packet data, and so the scalar fields that
|
|
are safe to access start at offset 16.
|
|
|
|
With the above, our sample program will pass verification:
|
|
|
|
```
|
|
> netsh ebpf show verification myxdp.o
|
|
|
|
Verification succeeded
|
|
```
|
|
|
|
What would have happened had the prototype not matched? Let's say the
|
|
verifier is the same as above but xdp.h instead had a different struct
|
|
definition:
|
|
|
|
```
|
|
typedef struct _ebpf_xdp_args
|
|
{
|
|
uint64_t more;
|
|
uint64_t stuff;
|
|
uint64_t here;
|
|
void* data;
|
|
void* data_end;
|
|
uint64_t data_meta;
|
|
} ebpf_xdp_args_t;
|
|
```
|
|
|
|
Now our sample program that checks the length would now be looking for
|
|
the data starting at offset 24, which is past the end of what the verifier
|
|
thinks the context structure size is, and the verifier fails the program:
|
|
|
|
```
|
|
> netsh ebpf show verification myxdp.o
|
|
error: Verification failed
|
|
|
|
Verification report:
|
|
|
|
0: r2 = *(u64 *)(r1 + 24)
|
|
assertion failed: Upper bound must be at most 24 (valid_access(r1, 24:8))
|
|
Code is unreachable after 0
|
|
|
|
1 errors
|
|
```
|
|
|
|
Notice that the verifier is complaining about access to memory pointed to
|
|
by r1 (since the first argument is in register R1) past the end of the
|
|
valid buffer of size 24. This illustrates why ideally the same header
|
|
file (xdp.h in the above example) should be included by the ebpf program,
|
|
the component exposing the hook, and the verifier itself, e.g., so that
|
|
the size of the context struct could be `sizeof(ebpf_xdp_args_t)`
|
|
rather than hardcoding the number 24 in the above example.
|
|
|
|
## 6.2. Helper functions and arguments
|
|
|
|
Now that we've seen how hooks work, let's look at how calls from an eBPF
|
|
program into helper functions exposed by the system are verified.
|
|
As with hook prototypes, the set of helper functions and their prototypes
|
|
can vary by platform. For comparison, helpers for Linux are documented in the
|
|
[IOVisor bpf helpers documentation](https://github.com/iovisor/bpf-docs/blob/master/bpf_helpers.rst).
|
|
|
|
Let's say the following helper function prototype is exposed by Windows:
|
|
|
|
```
|
|
// helpers.h
|
|
#include <stdint.h>
|
|
struct ebpf_map;
|
|
static int (*ebpf_map_update_elem)(struct ebpf_map* map, const void* key, const void* value, uint64_t flags) = (void*) 2;
|
|
```
|
|
|
|
We'll cover in section 6.3 what this function does, but for now we only care about the prototype.
|
|
We can create a sample (but, as we will see, invalid) program like so:
|
|
|
|
```
|
|
#include "helpers.h"
|
|
|
|
int func()
|
|
{
|
|
int result = ebpf_map_update_elem((struct ebpf_map*)0, (uint32_t*)0, (uint32_t*)0, 0);
|
|
return result;
|
|
}
|
|
```
|
|
|
|
Let's compile it and see what it looks like. Here we compile with `-g`
|
|
to include source line info:
|
|
|
|
```
|
|
> clang -target bpf -Wall -g -O2 -c helpers.c -o helpers.o
|
|
|
|
> llvm-objdump --triple bpf -S helpers.o
|
|
|
|
helpers.o: file format ELF64-BPF
|
|
|
|
Disassembly of section .text:
|
|
0000000000000000 func:
|
|
; {
|
|
0: b7 01 00 00 00 00 00 00 r1 = 0
|
|
; int result = ebpf_map_update_elem((struct ebpf_map*)0, (uint32_t*)0, (uint32_t*)0, 0);
|
|
1: b7 02 00 00 00 00 00 00 r2 = 0
|
|
2: b7 03 00 00 00 00 00 00 r3 = 0
|
|
3: b7 04 00 00 00 00 00 00 r4 = 0
|
|
4: 85 00 00 00 02 00 00 00 call 2
|
|
; return result;
|
|
5: 95 00 00 00 00 00 00 00 exit
|
|
```
|
|
|
|
Now let's see how the verifier deals with this. The verifier needs to
|
|
know the prototype in order to verify that the eBPF program passes arguments
|
|
correctly, and handles the results correct (e.g., not passing an invalid
|
|
value in a pointer argument).
|
|
|
|
The verifier calls into a `get_helper_prototype(2)` API exposed by
|
|
platform-specific code to query the prototype for a given helper function.
|
|
The platform-specific code ([windows_helpers.cpp](../src/ebpf/libs/api/windows_helpers.cpp)) will return an entry like this one:
|
|
|
|
```
|
|
{// long ebpf_map_update_elem(struct ebpf_map *map, const void *key, const
|
|
// void *value, uint64_t flags);
|
|
.name = "ebpf_map_update_elem",
|
|
.return_type = EbpfHelperReturnType::INTEGER,
|
|
.argument_type =
|
|
{
|
|
EbpfHelperArgumentType::PTR_TO_MAP,
|
|
EbpfHelperArgumentType::PTR_TO_MAP_KEY,
|
|
EbpfHelperArgumentType::PTR_TO_MAP_VALUE,
|
|
EbpfHelperArgumentType::ANYTHING,
|
|
EbpfHelperArgumentType::DONTCARE,
|
|
}},
|
|
```
|
|
|
|
The above helps the verifier know the type and semantics of the arguments
|
|
and the return value.
|
|
|
|
```
|
|
> netsh ebpf show disassembly helpers.o
|
|
0: r1 = 0
|
|
1: r2 = 0
|
|
2: r3 = 0
|
|
3: r4 = 0
|
|
4: r0 = ebpf_map_update_elem:2(r1:FD, r2:K, r3:V, r4)
|
|
5: exit
|
|
|
|
> netsh ebpf show verification helpers.o
|
|
|
|
error: Verification failed
|
|
|
|
Verification report:
|
|
|
|
4: r0 = ebpf_map_update_elem:2(r1:FD, r2:K, r3:V, r4)
|
|
assertion failed: r1 is map_fd
|
|
Code is unreachable after 4
|
|
|
|
1 errors
|
|
```
|
|
|
|
As shown above, the verifier understands the function name and prototype,
|
|
and knows that the program is invalid because it is passing null instead
|
|
of a valid value. We'll come back to this in section 6.3 to see how to
|
|
use the helper correctly.
|
|
|
|
### 6.2.1. Why -O2?
|
|
|
|
This section is a slight digression, so skip ahead if you prefer. It's
|
|
important that we compiled with `-O2` throughout this tutorial. What
|
|
happens if we didn't compile with `-O2`? The disassembly looks instead
|
|
like this:
|
|
|
|
```
|
|
> clang -target bpf -Wall -g -c helpers.c -o helpers.o
|
|
|
|
> llvm-objdump --triple bpf -S helpers.o
|
|
|
|
helpers.o: file format ELF64-BPF
|
|
|
|
Disassembly of section .text:
|
|
0000000000000000 func:
|
|
; {
|
|
0: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
|
|
; int result = ebpf_map_update_elem((struct ebpf_map*)0, (uint32_t*)0, (uint32_t*)0, 0);
|
|
2: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0)
|
|
3: b7 02 00 00 00 00 00 00 r2 = 0
|
|
4: 7b 1a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r1
|
|
5: bf 21 00 00 00 00 00 00 r1 = r2
|
|
6: 7b 2a e8 ff 00 00 00 00 *(u64 *)(r10 - 24) = r2
|
|
7: 79 a3 e8 ff 00 00 00 00 r3 = *(u64 *)(r10 - 24)
|
|
8: 79 a4 e8 ff 00 00 00 00 r4 = *(u64 *)(r10 - 24)
|
|
9: 79 a5 f0 ff 00 00 00 00 r5 = *(u64 *)(r10 - 16)
|
|
10: 8d 00 00 00 05 00 00 00 callx 5
|
|
11: 63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0
|
|
; return result;
|
|
12: 61 a0 fc ff 00 00 00 00 r0 = *(u32 *)(r10 - 4)
|
|
13: 95 00 00 00 00 00 00 00 exit
|
|
```
|
|
|
|
The helper function is called in line 10 via the `callx` instruction
|
|
(0x8d), but importantly that instruction *is not listed in the
|
|
[eBPF spec](https://github.com/iovisor/bpf-docs/blob/master/eBPF.md)*!
|
|
Furthermore, the PREVAIL verifier's ELF parser also has problems with it.
|
|
Let's see
|
|
why. Unlike the optimized disassembly where the helper id is encoded in
|
|
the instruction, here the value 32 (0x20) is encoded in the data section:
|
|
|
|
```
|
|
> llvm-objdump --triple bpf -s helpers.o --section .data
|
|
|
|
helpers.o: file format ELF64-BPF
|
|
|
|
Contents of section .data:
|
|
0000 02000000 00000000
|
|
```
|
|
|
|
An entry also appears in the relocation section, which we can see as follows.
|
|
Since we compiled with `-g`, there are also relocation sections for debug
|
|
symbols so we use `-section` to specify the code (i.e., text) section only,
|
|
where without it llvm-objdump will dump all of them.
|
|
|
|
```
|
|
> llvm-objdump --triple bpf --section .rel.text -r helpers.o
|
|
|
|
helpers.o: file format ELF64-BPF
|
|
|
|
RELOCATION RECORDS FOR [.rel.text]:
|
|
0000000000000000 R_BPF_64_64 .data
|
|
```
|
|
|
|
However the verifier's ELF parser only handles relocation records for
|
|
maps (which we'll cover next), not helper functions, since in "correct" eBPF bytecode (i.e.,
|
|
bytecode conforming to the eBPF spec), relocation records are always for
|
|
maps. So if you forget to compile with -O2, it will fail elf parsing even
|
|
before trying to verify the bytecode.
|
|
|
|
## 6.3. Maps
|
|
|
|
Now that we've seen how helpers work, let's move on to
|
|
[maps](https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md#maps),
|
|
which are memory structures that can be shared between eBPF programs and/or
|
|
applications. They are typically used to store state between invocations
|
|
of eBPF programs, or to expose information (e.g., statistics) to applications.
|
|
|
|
To see how maps are exposed to eBPF programs, let's first start from a
|
|
plain eBPF program:
|
|
|
|
```
|
|
__attribute__((section("myprog"), used))
|
|
int func()
|
|
{
|
|
return 0;
|
|
}
|
|
```
|
|
|
|
We can add a reference to a map to which the program will have access
|
|
by creating a `maps` section as follows. We'll use a "per-CPU array"
|
|
in this example so that there are no race conditions or corrupted data
|
|
if multiple instances of our program are simultaneously running on different
|
|
CPUs.
|
|
|
|
|
|
```
|
|
#include <stdint.h>
|
|
|
|
struct ebpf_map {
|
|
uint32_t size;
|
|
uint32_t type;
|
|
uint32_t key_size;
|
|
uint32_t value_size;
|
|
uint32_t max_entries;
|
|
};
|
|
#define BPF_MAP_TYPE_PERCPU_ARRAY 1
|
|
|
|
__attribute__((section("maps"), used))
|
|
struct ebpf_map map =
|
|
{sizeof(struct ebpf_map), BPF_MAP_TYPE_PERCPU_ARRAY, 2, 4, 512};
|
|
|
|
__attribute__((section("myprog"), used))
|
|
int func()
|
|
{
|
|
return 0;
|
|
}
|
|
```
|
|
|
|
So far the program doesn't actually use the map, but the presence of
|
|
the maps section means that when the program is loaded, the system
|
|
will look for the given map and create one if it doesn't already exist,
|
|
using the map parameters specified. We can see the fields encoded
|
|
into the `maps` section as follows:
|
|
|
|
```
|
|
> clang -target bpf -Wall -g -O2 -c maponly.c -o maponly.o
|
|
> llvm-objdump -s -section maps maponly.o
|
|
|
|
maponly.o: file format ELF64-BPF
|
|
|
|
Contents of section maps:
|
|
0000 14000000 01000000 02000000 04000000 ................
|
|
0010 00020000 ....
|
|
```
|
|
|
|
Now to make use of the map, we have to use helper functions to access it:
|
|
```
|
|
void *ebpf_map_lookup_elem(struct ebpf_map* map, const void* key);
|
|
int ebpf_map_update_elem(struct ebpf_map* map, const void* key, const void* value, uint64_t flags);
|
|
int ebpf_map_delete_elem(struct ebpf_map* map, const void* key);
|
|
```
|
|
|
|
Let's update the program to write the value "42" to the map section for the
|
|
current CPU, by changing the "myprog" section to the following:
|
|
```
|
|
static void* (*ebpf_map_lookup_elem)(struct ebpf_map* map, const void* key) = (void*) 0;
|
|
static int (*ebpf_map_update_elem)(struct ebpf_map *map, const void *key, const void *value, uint64_t flags) = (void*) 1;
|
|
|
|
__attribute__((section("myprog"), used))
|
|
int func1()
|
|
{
|
|
uint32_t key = 0;
|
|
uint32_t value = 42;
|
|
int result = ebpf_map_update_elem(&map, &key, &value, 0);
|
|
return result;
|
|
}
|
|
```
|
|
|
|
This program results in the following disassembly:
|
|
```
|
|
> llvm-objdump -S -section=myprog map.o
|
|
|
|
map.o: file format ELF64-BPF
|
|
|
|
Disassembly of section myprog:
|
|
func1:
|
|
; {
|
|
0: b7 01 00 00 00 00 00 00 r1 = 0
|
|
; uint32_t key = 0;
|
|
1: 63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r1
|
|
2: b7 01 00 00 2a 00 00 00 r1 = 42
|
|
; uint32_t value = 42;
|
|
3: 63 1a f8 ff 00 00 00 00 *(u32 *)(r10 - 8) = r1
|
|
4: bf a2 00 00 00 00 00 00 r2 = r10
|
|
; uint32_t key = 0;
|
|
5: 07 02 00 00 fc ff ff ff r2 += -4
|
|
6: bf a3 00 00 00 00 00 00 r3 = r10
|
|
7: 07 03 00 00 f8 ff ff ff r3 += -8
|
|
; int result = ebpf_map_update_elem(&map, &key, &value, 0);
|
|
8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
|
|
10: b7 04 00 00 00 00 00 00 r4 = 0
|
|
11: 85 00 00 00 01 00 00 00 call 1
|
|
; return result;
|
|
12: 95 00 00 00 00 00 00 00 exit
|
|
```
|
|
|
|
Above shows "call 1", but `netsh` shows more details
|
|
```
|
|
> netsh ebpf show disassembly map.o
|
|
0: r1 = 0
|
|
1: *(u32 *)(r10 - 4) = r1
|
|
2: r1 = 42
|
|
3: *(u32 *)(r10 - 8) = r1
|
|
4: r2 = r10
|
|
5: r2 += -4
|
|
6: r3 = r10
|
|
7: r3 += -8
|
|
8: r1 = map_fd 1026
|
|
10: r4 = 0
|
|
11: r0 = ebpf_map_update_elem:1(r1:FD, r2:K, r3:V, r4)
|
|
12: exit
|
|
````
|
|
|
|
Notice from instruction 11 that `netsh` understands that
|
|
`ebpf_map_update_elem()` expects
|
|
a file descriptor (FD) in R1, a key in R2, a value in R3, and R4 can be
|
|
anything.
|
|
|
|
R1 was set in instruction 8 to a map FD value of 1026. Where did that value
|
|
come from, since the llvm-objdump disassembly didn't have it? The
|
|
create_map_crab() function in the Prevail verifier creates a dummy value
|
|
based on (value_size * 256) + key_size. Since we passed
|
|
value_size = 4 and key_size = 2, this gives us 1026. When installed,
|
|
this value gets replaced with a real map address. Let's see how that happens.
|
|
|
|
Now that we're actually using the map, rather than just defining it,
|
|
the relocation section is also populated. The relocation section for
|
|
a program is in a section with the ".rel" prefix followed by the
|
|
program section name ("myprog" in this example):
|
|
|
|
```
|
|
> llvm-objdump --triple bpf -section=.relmyprog -r map.o
|
|
|
|
map.o: file format ELF64-BPF
|
|
|
|
RELOCATION RECORDS FOR [.relmyprog]:
|
|
0000000000000040 R_BPF_64_64 map
|
|
```
|
|
|
|
This record means that the actual address of `map` should be inserted at
|
|
offset 0x40, but where is that? llvm-objdump and check both gave us
|
|
instruction numbers not offsets, but we can see the raw bytes as follows:
|
|
|
|
```
|
|
> llvm-objdump -s -section=myprog map.o
|
|
|
|
map.o: file format ELF64-BPF
|
|
|
|
Contents of section myprog:
|
|
0000 b7010000 00000000 631afcff 00000000 ........c.......
|
|
0010 b7010000 2a000000 631af8ff 00000000 ....*...c.......
|
|
0020 bfa20000 00000000 07020000 fcffffff ................
|
|
0030 bfa30000 00000000 07030000 f8ffffff ................
|
|
0040 18010000 00000000 00000000 00000000 ................
|
|
0050 b7040000 00000000 85000000 01000000 ................
|
|
0060 95000000 00000000 ........
|
|
```
|
|
|
|
We see that offset 0x40 has "18010000 00000000 00000000 00000000".
|
|
Looking back at the llvm-objdump disassembly above, we see that
|
|
is indeed instruction 8.
|
|
|
|
So, to summarize, the verifier operates on pseudo FDs, not actual
|
|
FDs or addresses. When the program is actually installed, the relocation
|
|
section will be used to insert the actual map address into the executable
|
|
code.
|