ebpf-for-windows/docs/tutorial.md

954 строки
31 KiB
Markdown

# 1. Introduction
This tutorial illustrates how eBPF works and in particular how the eBPF verifier works on Windows,
starting from authoring a new eBPF program in C.
To try out this tutorial yourself, you should first install the [Prerequisites](GettingStarted.md#Prerequisites).
We'll start by understanding the basic structure of eBPF programs and then walk through how to
apply them in a real use case.
# 2. Authoring a simple eBPF Program
Note: This walkthrough is based on the one at [eBPF assembly with LLVM (qmonnet.github.io)](http://releases.llvm.org/7.0.1/LLVM-7.0.1-win64.exe),
and in fact the same steps should work on both Windows and Linux, including
in WSL on Windows. (The only exception is that the llvm-objdump utility will
fail if you have an old LLVM version in WSL, i.e., if `llvm-objdump -version`
shows only LLVM version 3.8.0, it is too old and needs to be upgraded first.)
However, we'll do this walkthrough assuming one is only using Windows.
**Step 1)** Author a new file by putting some content into a file, say bpf.c:
```
int func()
{
return 0;
}
```
For this example, that's all the content that's needed, no #includes or
anything.
**Step 2)** Compile optimized code with clang as follows:
```
> clang -target bpf -Wall -O2 -c bpf.c -o bpf.o
```
This will compile bpf.c (into bpf.o in this example) using bpf as the assembly format,
since eBPF has its own [instruction set architecture (ISA)](https://github.com/iovisor/bpf-docs/blob/master/eBPF.md).
To see what clang did, we can generate disassembly as follows:
```
> llvm-objdump --triple=bpf -S bpf.o
bpf.o: file format ELF64-BPF
Disassembly of section .text:
func:
0: b7 00 00 00 00 00 00 00 r0 = 0
1: 95 00 00 00 00 00 00 00 exit
```
You can see that all the program does is set register 0 (the register used
for return values in the eBPF ISA) to 0, and exit.
Since we compiled the program optimized, and without debug info, that's
all we can get.
**Step 3)** Repeat the above exercise but enable debugging using `-g` and
for this walkthrough we will put the result into a separate .o file,
bpf-d.o in this example:
```
> clang -target bpf -Wall -g -O2 -c bpf.c -o bpf-d.o
```
The `llvm-objdump -S` command from step 2 will now be able to show the
source lines as well:
```
> llvm-objdump --triple=bpf -S bpf-d.o
bpf.o: file format ELF64-BPF
Disassembly of section .text:
func:
; {
0: b7 00 00 00 00 00 00 00 r0 = 0
; return 0;
1: 95 00 00 00 00 00 00 00 exit
```
**Step 4)** Learn how sections work
In steps 2 and 3, the code is placed into a section called ".text" as can be
seen from the header in the middle of the disassembly output. One can list
all sections in the object file using -h as follows:
```
> llvm-objdump --triple=bpf -h bpf.o
bpf.o: file format ELF64-BPF
Sections:
Idx Name Size Address Type
0 00000000 0000000000000000
1 .strtab 0000003e 0000000000000000
2 .text 00000010 0000000000000000 TEXT
3 .BTF 00000019 0000000000000000
4 .BTF.ext 00000020 0000000000000000
5 .llvm_addrsig 00000000 0000000000000000
6 .symtab 00000048 0000000000000000
```
Notice that the only section with actual code in it (i.e., with the "TEXT"
label after it) is section 2, named ".text". And for comparison, the
debug-enabled object file also contains various debugging info:
```
> llvm-objdump --triple=bpf -h bpf-d.o
bpf-d.o: file format ELF64-BPF
Sections:
Idx Name Size Address Type
0 00000000 0000000000000000
1 .strtab 0000009b 0000000000000000
2 .text 00000010 0000000000000000 TEXT
3 .debug_str 00000052 0000000000000000
4 .debug_abbrev 00000034 0000000000000000
5 .debug_info 0000004b 0000000000000000
6 .rel.debug_info 00000090 0000000000000000
7 .debug_macinfo 00000001 0000000000000000
8 .BTF 0000007a 0000000000000000
9 .BTF.ext 00000048 0000000000000000
10 .rel.BTF.ext 00000020 0000000000000000
11 .debug_frame 00000028 0000000000000000
12 .rel.debug_frame 00000020 0000000000000000
13 .debug_line 0000003c 0000000000000000
14 .rel.debug_line 00000010 0000000000000000
15 .llvm_addrsig 00000000 0000000000000000
16 .symtab 00000120 0000000000000000
```
The static verifier that checks the safety of eBPF programs also supports multiple TEXT sections, with custom
names, so let's also try using a custom name instead, say "myprog". We
can do this by adding a pragma, where any functions following that pragma
will be put into a section with a specified name, until another such
pragma is encountered with a different name, or the end of the file is
reached. In this way, there can even be multiple sections per source file.
Author a new file, say in "bpf2.c" this time, with another function and a
pragma above each one:
```
#pragma clang section text="myprog"
int func()
{
return 0;
}
#pragma clang section text="another"
int anotherfunc()
{
return 1;
}
```
If we now compile the above code as before we can see the new list of sections.
```
> clang -target bpf -Wall -O2 -c bpf2.c -o bpf2.o
> llvm-objdump --triple=bpf -h bpf2.o
bpf2.o: file format ELF64-BPF
Sections:
Idx Name Size Address Type
0 00000000 0000000000000000
1 .strtab 00000055 0000000000000000
2 .text 00000000 0000000000000000 TEXT
3 myprog 00000010 0000000000000000 TEXT
4 another 00000010 0000000000000000 TEXT
5 .BTF 00000019 0000000000000000
6 .BTF.ext 00000020 0000000000000000
7 .llvm_addrsig 00000000 0000000000000000
8 .symtab 00000060 0000000000000000
```
Notice that there is still the .text section, but it has a size of 0,
because all the code is either in the "myprog" section or the "another"
section.
To dump a specific section (e.g., myprog), use the following:
```
> llvm-objdump --triple=bpf -S --section=myprog bpf2.o
```
# 3. Compiling eBPF for Windows
**Step 1)** Get the source code:
```
> git clone --recurse-submodules https://github.com/microsoft/ebpf-for-windows.git
> cd ebpf-for-windows
```
**Step 2)** Generate a solution:
eBPF for Windows uses the [Prevail eBPF verifier](https://github.com/vbpf/ebpf-verifier) as a submodule
which uses a cmake-based build, so first we need to generate the Visual Studio project for it:
```
> cmake -S external\ebpf-verifier -B external\ebpf-verifier\build
```
This will result in a Visual Studio solution and projects getting generated
in the specified subdirectory ("external\ebpf-verifier\build").
**Step 3)** Build the solution:
This can be done either from the command line or from within the Visual Studio UI.
To use the command line:
```
> msbuild /m /p:Configuration=Debug /p:Platform=x64 ebpf-for-windows.sln
```
Or, to use the Visual Studio UI, open the solution in Visual Studio:
```
> ebpf-for-windows.sln
```
Next, right click on the solution in the Solution Explorer and select "Restore NuGet Packages".
Then set the configuration to Debug and the platform to x64 (if not already set), and
compile it with "Build->Build Solution".
Building the solution may generate some compiler warnings, but should still
compile successfully.
# 4. Installing the eBPF netsh helper on Windows
Now we're ready to learn how to use eBPF on Windows. For this tutorial, we only need to install the `netsh` helper.
From an Admin command shell, do the following from your ebpf-for-windows directory:
```
> copy x64\Debug\*.dll %windir%\system32
> netsh add helper %windir%\system32\ebpfnetsh.dll
```
# 5. Verifying eBPF programs on Windows
Normally verification happens at the time an eBPF program is submitted to be loaded. That can be done,
but in this tutorial, we'll just do verification _without_ needing to load the program. This allows this
tutorial to be done on any machine, not just one with the eBPF driver installed into the kernel.
**Step 1)** Enumerate sections
In step 4 of part 2, we saw how to use `llvm-objdump -h` to list all sections
in an object file. We'll now do the same with `netsh`. Do the following from
the directory you used for part 1 (replace "Release" in the path with "Debug" if you only
built a Debug version in step 3):
```
> netsh ebpf show sections bpf.o
Section Type # Maps Size
==================== ====== ====== ======
.text 1 0 2
> netsh ebpf show sections bpf-d.o
Section Type # Maps Size
==================== ====== ====== ======
.text 1 0 2
> netsh ebpf show sections bpf2.o
Section Type # Maps Size
==================== ====== ====== ======
myprog 1 0 2
another 1 0 2
```
Notice that it only lists non-empty TEXT sections, whereas `llvm-objdump -h`
showed all sections. That's because netsh is just looking for eBPF
programs, which are always in non-empty TEXT sections.
`netsh` allows all keywords to be abbreviated, so we could have done
`netsh ebpf sh sec bpf.o` instead. Throughout this tutorial, we'll always spell
things out for readability, but feel free to abbreviate to save typing.
**Step 2)** Run the verifier on our sample program
```
> netsh ebpf show verification bpf.o
Verification succeeded
```
The verification command succeeded because there was only one
non-empty TEXT section in bpf.o, so the verifier found it and used that
as the eBPF program to verify. If we try the same on an object file with
multiple such sections, we get this:
```
> netsh ebpf show verification bpf2.o
Verification succeeded
```
This is because the verifier ran on the *first* eBPF program it found,
which was "myprog" in the section listing. We can explicitly
specify the section to use as follows:
```
> netsh ebpf show verification bpf2.o myprog
Verification succeeded
> netsh ebpf show verification bpf2.o another
Verification succeeded
```
**Step 2)** View disassembly
In step 2 of part 2, we saw how to use "llvm-objdump -S" to view disassembly.
We'll now do the same with `netsh`:
```
> netsh ebpf show disassembly bpf.o
0: r0 = 0
1: exit
```
You can see that the two instructions match the two seen back in step 2 of
part 2. Again for bpf2.o we
can specify which section to use, since there is more than one:
```
> netsh ebpf show disassembly bpf2.o myprog
0: r0 = 0
1: exit
> netsh ebpf show disassembly bpf2.o another
0: r0 = 1
1: exit
```
**Step 3)** View program stats
One can view various stats about the program, without running the
verification process, using the "level=verbose" option to "show section":
```
> netsh ebpf show section bpf.o .text verbose
Section : .text
Program Type : 1
# Maps : 0
Size : 2 instructions
adjust_head : 0
arith : 0
arith32 : 0
arith64 : 1
assign : 1
basic_blocks : 2
call_1 : 0
call_mem : 0
call_nomem : 0
joins : 0
jumps : 0
load : 0
load_store : 0
map_in_map : 0
other : 2
packet_access: 0
store : 0
```
So for our tiny bpf.c program that just does `return 0;`, we can see that
it has 2 instructions, in 2 basic blocks, with 1 assign and no jumps or
joins.
**Step 4)** View verifier verbose output
We can view verbose output to see what the verifier is actually doing,
using the "level=verbose" option to "show verification":
```
> netsh ebpf show verification bpf.o level=verbose
Preconditions : {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[0]
}
Stack: Numbers -> {}
entry:
goto 0;
Postconditions: {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[1]
}
Stack: Numbers -> {}
Preconditions : {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[1]
}
Stack: Numbers -> {}
0:
r0 = 0;
goto 1;
Postconditions: {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[3], r0.value=[0], r0.type=number
}
Stack: Numbers -> {}
Preconditions : {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[3], r0.value=[0], r0.type=number
}
Stack: Numbers -> {}
1:
assert r0 is number;
exit;
goto exit;
Postconditions: {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[6], r0.value=[0], r0.type=number
}
Stack: Numbers -> {}
Preconditions : {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[6], r0.value=[0], r0.type=number
}
Stack: Numbers -> {}
exit:
Postconditions: {r10.value=[512, 2147418112], r10.offset=[512], r10.type=stack_pointer, r1.value=[1, 2147418112], r1.offset=[0], r1.type=ctx_pointer, packet_size=[0, 65534], meta_offset=[-4098, 0], instruction_count=[7], r0.value=[0], r0.type=number
}
Stack: Numbers -> {}
0 errors
Verification succeeded
```
Normally we wouldn't need to do this, but it is illustrative to see how the
verifier works.
Each instruction is shown as before, but is preceded by its preconditions
(or inputs), and followed by its postconditions (or outputs).
"oo" means infinity, "r0" through "r10" are registers (r10 is the stack
pointer, r0 is used for return values, r1-5 are used to pass args to other
functions, r6 is the 'ctx' pointer, etc.
"meta_offset" is the number of bytes of packet metadata preceding (i.e.,
with negative offset from) the start of the packet buffer.
# 6. Advanced Topics
## 6.1. Hooks and arguments
Hook points are callouts exposed by the system to which eBPF programs can
attach. By convention, the section name of the eBPF program in an ELF file
is commonly used to designate which hook point the eBPF program is designed
for. Specifically, a set of prefix strings are used to match against the
section name. For example, any section name starting with "xdp" is meant
as an XDP layer program. This is a convenient default, but can be
overridden by an app asking to load an eBPF program, such as when the eBPF program is simply in the
".text" section.
Each hook point has a specified prototype which must be understood by the
verifier. That is, the verifier needs to understand all the hooks for the
specified platform on which the eBPF program will execute. The hook points
are in general different for Linux vs. Windows, as are the prototypes for
hook points that might be similarly named.
Typically the first and only argument of the hook point is a context
structure which contains an arbitrary amount of data. (Tail calls to
programs can have more than one argument, but hooks put all the info in a
hook-specific context structure passed as one argument.)
Let's say that the "xdp" hook point has the following prototype:
```
// xdp.h
#include <stdint.h>
typedef struct _ebpf_xdp_args
{
void* data;
void* data_end;
uint64_t data_meta;
} ebpf_xdp_args_t;
typedef int (*xdp_callout)(ebpf_xdp_args_t* args);
```
A sample eBPF program might look like this:
```
#include "xdp.h"
// Put "xdp" in the section name to specify XDP as the hook.
// The __attribute__ below has the same effect as the
// clang pragma used in section 2 of this tutorial.
__attribute__((section("xdp"), used))
int my_xdp_parser(ebpf_xdp_args_t* args)
{
int length = (char *)args->data_end - (char *)args->data;
if (length > 1) {
return 1; // allow
}
return 0; // block
}
```
The verifier needs to be enlightened with the same prototype or all
programs written for that hook will fail verification. For Windows,
this info is in the [windows_platform.cpp](../src/ebpf/libs/api/windows_platform.cpp) file,
which for the above prototype might have:
```
const EbpfContextDescriptor g_xdp_context_descriptor = {
24, // Size of ctx struct.
0, // Offset into ctx struct of pointer to data, or -1 if none.
8, // Offset into ctx struct of pointer to end of data, or -1 if none.
16, // Offset into ctx struct of pointer to metadata, or -1 if none.
};
const EbpfProgramType windows_xdp_program_type =
PTYPE("xdp", // Just for printing messages to users.
&g_xdp_context_descriptor,
EBPF_PROG_TYPE_XDP,
{"xdp"}); // Set of section name prefixes for matching.
```
Let's look at the code above in more detail. The EbpfContextDescriptor
info (i.e., `g_xdp_context_descriptor`) tells the verifier about the format
of the context structure (i.e., `struct ebpf_xdp_args`). The struct is
24 bytes long, includes packet data, and so the scalar fields that
are safe to access start at offset 16.
With the above, our sample program will pass verification:
```
> netsh ebpf show verification myxdp.o
Verification succeeded
```
What would have happened had the prototype not matched? Let's say the
verifier is the same as above but xdp.h instead had a different struct
definition:
```
typedef struct _ebpf_xdp_args
{
uint64_t more;
uint64_t stuff;
uint64_t here;
void* data;
void* data_end;
uint64_t data_meta;
} ebpf_xdp_args_t;
```
Now our sample program that checks the length would now be looking for
the data starting at offset 24, which is past the end of what the verifier
thinks the context structure size is, and the verifier fails the program:
```
> netsh ebpf show verification myxdp.o
error: Verification failed
Verification report:
0: r2 = *(u64 *)(r1 + 24)
assertion failed: Upper bound must be at most 24 (valid_access(r1, 24:8))
Code is unreachable after 0
1 errors
```
Notice that the verifier is complaining about access to memory pointed to
by r1 (since the first argument is in register R1) past the end of the
valid buffer of size 24. This illustrates why ideally the same header
file (xdp.h in the above example) should be included by the ebpf program,
the component exposing the hook, and the verifier itself, e.g., so that
the size of the context struct could be `sizeof(ebpf_xdp_args_t)`
rather than hardcoding the number 24 in the above example.
## 6.2. Helper functions and arguments
Now that we've seen how hooks work, let's look at how calls from an eBPF
program into helper functions exposed by the system are verified.
As with hook prototypes, the set of helper functions and their prototypes
can vary by platform. For comparison, helpers for Linux are documented in the
[IOVisor bpf helpers documentation](https://github.com/iovisor/bpf-docs/blob/master/bpf_helpers.rst).
Let's say the following helper function prototype is exposed by Windows:
```
// helpers.h
#include <stdint.h>
struct ebpf_map;
static int (*ebpf_map_update_elem)(struct ebpf_map* map, const void* key, const void* value, uint64_t flags) = (void*) 2;
```
We'll cover in section 6.3 what this function does, but for now we only care about the prototype.
We can create a sample (but, as we will see, invalid) program like so:
```
#include "helpers.h"
int func()
{
int result = ebpf_map_update_elem((struct ebpf_map*)0, (uint32_t*)0, (uint32_t*)0, 0);
return result;
}
```
Let's compile it and see what it looks like. Here we compile with `-g`
to include source line info:
```
> clang -target bpf -Wall -g -O2 -c helpers.c -o helpers.o
> llvm-objdump --triple bpf -S helpers.o
helpers.o: file format ELF64-BPF
Disassembly of section .text:
0000000000000000 func:
; {
0: b7 01 00 00 00 00 00 00 r1 = 0
; int result = ebpf_map_update_elem((struct ebpf_map*)0, (uint32_t*)0, (uint32_t*)0, 0);
1: b7 02 00 00 00 00 00 00 r2 = 0
2: b7 03 00 00 00 00 00 00 r3 = 0
3: b7 04 00 00 00 00 00 00 r4 = 0
4: 85 00 00 00 02 00 00 00 call 2
; return result;
5: 95 00 00 00 00 00 00 00 exit
```
Now let's see how the verifier deals with this. The verifier needs to
know the prototype in order to verify that the eBPF program passes arguments
correctly, and handles the results correct (e.g., not passing an invalid
value in a pointer argument).
The verifier calls into a `get_helper_prototype(2)` API exposed by
platform-specific code to query the prototype for a given helper function.
The platform-specific code ([windows_helpers.cpp](../src/ebpf/libs/api/windows_helpers.cpp)) will return an entry like this one:
```
{// long ebpf_map_update_elem(struct ebpf_map *map, const void *key, const
// void *value, uint64_t flags);
.name = "ebpf_map_update_elem",
.return_type = EbpfHelperReturnType::INTEGER,
.argument_type =
{
EbpfHelperArgumentType::PTR_TO_MAP,
EbpfHelperArgumentType::PTR_TO_MAP_KEY,
EbpfHelperArgumentType::PTR_TO_MAP_VALUE,
EbpfHelperArgumentType::ANYTHING,
EbpfHelperArgumentType::DONTCARE,
}},
```
The above helps the verifier know the type and semantics of the arguments
and the return value.
```
> netsh ebpf show disassembly helpers.o
0: r1 = 0
1: r2 = 0
2: r3 = 0
3: r4 = 0
4: r0 = ebpf_map_update_elem:2(r1:FD, r2:K, r3:V, r4)
5: exit
> netsh ebpf show verification helpers.o
error: Verification failed
Verification report:
4: r0 = ebpf_map_update_elem:2(r1:FD, r2:K, r3:V, r4)
assertion failed: r1 is map_fd
Code is unreachable after 4
1 errors
```
As shown above, the verifier understands the function name and prototype,
and knows that the program is invalid because it is passing null instead
of a valid value. We'll come back to this in section 6.3 to see how to
use the helper correctly.
### 6.2.1. Why -O2?
This section is a slight digression, so skip ahead if you prefer. It's
important that we compiled with `-O2` throughout this tutorial. What
happens if we didn't compile with `-O2`? The disassembly looks instead
like this:
```
> clang -target bpf -Wall -g -c helpers.c -o helpers.o
> llvm-objdump --triple bpf -S helpers.o
helpers.o: file format ELF64-BPF
Disassembly of section .text:
0000000000000000 func:
; {
0: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
; int result = ebpf_map_update_elem((struct ebpf_map*)0, (uint32_t*)0, (uint32_t*)0, 0);
2: 79 11 00 00 00 00 00 00 r1 = *(u64 *)(r1 + 0)
3: b7 02 00 00 00 00 00 00 r2 = 0
4: 7b 1a f0 ff 00 00 00 00 *(u64 *)(r10 - 16) = r1
5: bf 21 00 00 00 00 00 00 r1 = r2
6: 7b 2a e8 ff 00 00 00 00 *(u64 *)(r10 - 24) = r2
7: 79 a3 e8 ff 00 00 00 00 r3 = *(u64 *)(r10 - 24)
8: 79 a4 e8 ff 00 00 00 00 r4 = *(u64 *)(r10 - 24)
9: 79 a5 f0 ff 00 00 00 00 r5 = *(u64 *)(r10 - 16)
10: 8d 00 00 00 05 00 00 00 callx 5
11: 63 0a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r0
; return result;
12: 61 a0 fc ff 00 00 00 00 r0 = *(u32 *)(r10 - 4)
13: 95 00 00 00 00 00 00 00 exit
```
The helper function is called in line 10 via the `callx` instruction
(0x8d), but importantly that instruction *is not listed in the
[eBPF spec](https://github.com/iovisor/bpf-docs/blob/master/eBPF.md)*!
Furthermore, the PREVAIL verifier's ELF parser also has problems with it.
Let's see
why. Unlike the optimized disassembly where the helper id is encoded in
the instruction, here the value 32 (0x20) is encoded in the data section:
```
> llvm-objdump --triple bpf -s helpers.o --section .data
helpers.o: file format ELF64-BPF
Contents of section .data:
0000 02000000 00000000
```
An entry also appears in the relocation section, which we can see as follows.
Since we compiled with `-g`, there are also relocation sections for debug
symbols so we use `-section` to specify the code (i.e., text) section only,
where without it llvm-objdump will dump all of them.
```
> llvm-objdump --triple bpf --section .rel.text -r helpers.o
helpers.o: file format ELF64-BPF
RELOCATION RECORDS FOR [.rel.text]:
0000000000000000 R_BPF_64_64 .data
```
However the verifier's ELF parser only handles relocation records for
maps (which we'll cover next), not helper functions, since in "correct" eBPF bytecode (i.e.,
bytecode conforming to the eBPF spec), relocation records are always for
maps. So if you forget to compile with -O2, it will fail elf parsing even
before trying to verify the bytecode.
## 6.3. Maps
Now that we've seen how helpers work, let's move on to
[maps](https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md#maps),
which are memory structures that can be shared between eBPF programs and/or
applications. They are typically used to store state between invocations
of eBPF programs, or to expose information (e.g., statistics) to applications.
To see how maps are exposed to eBPF programs, let's first start from a
plain eBPF program:
```
__attribute__((section("myprog"), used))
int func()
{
return 0;
}
```
We can add a reference to a map to which the program will have access
by creating a `maps` section as follows. We'll use a "per-CPU array"
in this example so that there are no race conditions or corrupted data
if multiple instances of our program are simultaneously running on different
CPUs.
```
#include <stdint.h>
struct ebpf_map {
uint32_t size;
uint32_t type;
uint32_t key_size;
uint32_t value_size;
uint32_t max_entries;
};
#define BPF_MAP_TYPE_PERCPU_ARRAY 1
__attribute__((section("maps"), used))
struct ebpf_map map =
{sizeof(struct ebpf_map), BPF_MAP_TYPE_PERCPU_ARRAY, 2, 4, 512};
__attribute__((section("myprog"), used))
int func()
{
return 0;
}
```
So far the program doesn't actually use the map, but the presence of
the maps section means that when the program is loaded, the system
will look for the given map and create one if it doesn't already exist,
using the map parameters specified. We can see the fields encoded
into the `maps` section as follows:
```
> clang -target bpf -Wall -g -O2 -c maponly.c -o maponly.o
> llvm-objdump -s -section maps maponly.o
maponly.o: file format ELF64-BPF
Contents of section maps:
0000 14000000 01000000 02000000 04000000 ................
0010 00020000 ....
```
Now to make use of the map, we have to use helper functions to access it:
```
void *ebpf_map_lookup_elem(struct ebpf_map* map, const void* key);
int ebpf_map_update_elem(struct ebpf_map* map, const void* key, const void* value, uint64_t flags);
int ebpf_map_delete_elem(struct ebpf_map* map, const void* key);
```
Let's update the program to write the value "42" to the map section for the
current CPU, by changing the "myprog" section to the following:
```
static void* (*ebpf_map_lookup_elem)(struct ebpf_map* map, const void* key) = (void*) 0;
static int (*ebpf_map_update_elem)(struct ebpf_map *map, const void *key, const void *value, uint64_t flags) = (void*) 1;
__attribute__((section("myprog"), used))
int func1()
{
uint32_t key = 0;
uint32_t value = 42;
int result = ebpf_map_update_elem(&map, &key, &value, 0);
return result;
}
```
This program results in the following disassembly:
```
> llvm-objdump -S -section=myprog map.o
map.o: file format ELF64-BPF
Disassembly of section myprog:
func1:
; {
0: b7 01 00 00 00 00 00 00 r1 = 0
; uint32_t key = 0;
1: 63 1a fc ff 00 00 00 00 *(u32 *)(r10 - 4) = r1
2: b7 01 00 00 2a 00 00 00 r1 = 42
; uint32_t value = 42;
3: 63 1a f8 ff 00 00 00 00 *(u32 *)(r10 - 8) = r1
4: bf a2 00 00 00 00 00 00 r2 = r10
; uint32_t key = 0;
5: 07 02 00 00 fc ff ff ff r2 += -4
6: bf a3 00 00 00 00 00 00 r3 = r10
7: 07 03 00 00 f8 ff ff ff r3 += -8
; int result = ebpf_map_update_elem(&map, &key, &value, 0);
8: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll
10: b7 04 00 00 00 00 00 00 r4 = 0
11: 85 00 00 00 01 00 00 00 call 1
; return result;
12: 95 00 00 00 00 00 00 00 exit
```
Above shows "call 1", but `netsh` shows more details
```
> netsh ebpf show disassembly map.o
0: r1 = 0
1: *(u32 *)(r10 - 4) = r1
2: r1 = 42
3: *(u32 *)(r10 - 8) = r1
4: r2 = r10
5: r2 += -4
6: r3 = r10
7: r3 += -8
8: r1 = map_fd 1026
10: r4 = 0
11: r0 = ebpf_map_update_elem:1(r1:FD, r2:K, r3:V, r4)
12: exit
````
Notice from instruction 11 that `netsh` understands that
`ebpf_map_update_elem()` expects
a file descriptor (FD) in R1, a key in R2, a value in R3, and R4 can be
anything.
R1 was set in instruction 8 to a map FD value of 1026. Where did that value
come from, since the llvm-objdump disassembly didn't have it? The
create_map_crab() function in the Prevail verifier creates a dummy value
based on (value_size * 256) + key_size. Since we passed
value_size = 4 and key_size = 2, this gives us 1026. When installed,
this value gets replaced with a real map address. Let's see how that happens.
Now that we're actually using the map, rather than just defining it,
the relocation section is also populated. The relocation section for
a program is in a section with the ".rel" prefix followed by the
program section name ("myprog" in this example):
```
> llvm-objdump --triple bpf -section=.relmyprog -r map.o
map.o: file format ELF64-BPF
RELOCATION RECORDS FOR [.relmyprog]:
0000000000000040 R_BPF_64_64 map
```
This record means that the actual address of `map` should be inserted at
offset 0x40, but where is that? llvm-objdump and check both gave us
instruction numbers not offsets, but we can see the raw bytes as follows:
```
> llvm-objdump -s -section=myprog map.o
map.o: file format ELF64-BPF
Contents of section myprog:
0000 b7010000 00000000 631afcff 00000000 ........c.......
0010 b7010000 2a000000 631af8ff 00000000 ....*...c.......
0020 bfa20000 00000000 07020000 fcffffff ................
0030 bfa30000 00000000 07030000 f8ffffff ................
0040 18010000 00000000 00000000 00000000 ................
0050 b7040000 00000000 85000000 01000000 ................
0060 95000000 00000000 ........
```
We see that offset 0x40 has "18010000 00000000 00000000 00000000".
Looking back at the llvm-objdump disassembly above, we see that
is indeed instruction 8.
So, to summarize, the verifier operates on pseudo FDs, not actual
FDs or addresses. When the program is actually installed, the relocation
section will be used to insert the actual map address into the executable
code.