Граф коммитов

1173387 Коммитов

Автор SHA1 Сообщение Дата
Eduard Zingerman f323a81806 selftests/bpf: verifier/spin_lock converted to inline assembly
Test verifier/spin_lock automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-21-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:25:31 -07:00
Eduard Zingerman 426fc0e3fc selftests/bpf: verifier/sock converted to inline assembly
Test verifier/sock automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-20-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:25:19 -07:00
Eduard Zingerman 034d9ad25d selftests/bpf: verifier/search_pruning converted to inline assembly
Test verifier/search_pruning automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-19-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:25:07 -07:00
Eduard Zingerman 65222842ca selftests/bpf: verifier/runtime_jit converted to inline assembly
Test verifier/runtime_jit automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-18-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:24:41 -07:00
Eduard Zingerman 16a42573c2 selftests/bpf: verifier/regalloc converted to inline assembly
Test verifier/regalloc automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-17-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:23:40 -07:00
Eduard Zingerman 8be6327959 selftests/bpf: verifier/ref_tracking converted to inline assembly
Test verifier/ref_tracking automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-16-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:23:13 -07:00
Eduard Zingerman aee1779f0d selftests/bpf: verifier/map_ptr_mixing converted to inline assembly
Test verifier/map_ptr_mixing automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-13-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:20:38 -07:00
Eduard Zingerman 4a400ef9ba selftests/bpf: verifier/map_in_map converted to inline assembly
Test verifier/map_in_map automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-12-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:20:26 -07:00
Eduard Zingerman b427ca576f selftests/bpf: verifier/lwt converted to inline assembly
Test verifier/lwt automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:19:20 -07:00
Eduard Zingerman a6fc14dc5e selftests/bpf: verifier/loops1 converted to inline assembly
Test verifier/loops1 automatically converted to use inline assembly.

There are a few modifications for the converted tests.
"tracepoint" programs do not support test execution, change program
type to "xdp" (which supports test execution) for the following tests
that have __retval tags:
- bounded loop, count to 4
- bonded loop containing forward jump

Also, remove the __retval tag for test:
- bounded loop, count from positive unknown to 4

As it's return value is a random number.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-10-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:19:07 -07:00
Eduard Zingerman a5828e3154 selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly
Test verifier/jeq_infer_not_null automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:18:55 -07:00
Eduard Zingerman 0a372c9c08 selftests/bpf: verifier/direct_packet_access converted to inline assembly
Test verifier/direct_packet_access automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:18:44 -07:00
Eduard Zingerman 6080280243 selftests/bpf: verifier/d_path converted to inline assembly
Test verifier/d_path automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:18:16 -07:00
Eduard Zingerman fcd36964f2 selftests/bpf: verifier/ctx converted to inline assembly
Test verifier/ctx automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:18:03 -07:00
Eduard Zingerman 37467c79e1 selftests/bpf: verifier/btf_ctx_access converted to inline assembly
Test verifier/btf_ctx_access automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:17:51 -07:00
Eduard Zingerman 965a3f913e selftests/bpf: verifier/bpf_get_stack converted to inline assembly
Test verifier/bpf_get_stack automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:17:39 -07:00
Eduard Zingerman c92336559a selftests/bpf: verifier/bounds converted to inline assembly
Test verifier/bounds automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:17:14 -07:00
Eduard Zingerman 63bb645b9d selftests/bpf: Add notion of auxiliary programs for test_loader
In order to express test cases that use bpf_tail_call() intrinsic it
is necessary to have several programs to be loaded at a time.
This commit adds __auxiliary annotation to the set of annotations
supported by test_loader.c. Programs marked as auxiliary are always
loaded but are not treated as a separate test.

For example:

    void dummy_prog1(void);

    struct {
            __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
            __uint(max_entries, 4);
            __uint(key_size, sizeof(int));
            __array(values, void (void));
    } prog_map SEC(".maps") = {
            .values = {
                    [0] = (void *) &dummy_prog1,
            },
    };

    SEC("tc")
    __auxiliary
    __naked void dummy_prog1(void) {
            asm volatile ("r0 = 42; exit;");
    }

    SEC("tc")
    __description("reference tracking: check reference or tail call")
    __success __retval(0)
    __naked void check_reference_or_tail_call(void)
    {
            asm volatile (
            "r2 = %[prog_map] ll;"
            "r3 = 0;"
            "call %[bpf_tail_call];"
            "r0 = 0;"
            "exit;"
            :: __imm(bpf_tail_call),
            :  __clobber_all);
    }

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 12:16:56 -07:00
Alexei Starovoitov d7a799ec78 Merge branch 'bpf: add netfilter program type'
Florian Westphal says:

====================
Changes since last version:
- rework test case in last patch wrt. ctx->skb dereference etc (Alexei)
- pacify bpf ci tests, netfilter program type missed string translation
  in libbpf helper.

This still uses runtime btf walk rather than extending
the btf trace array as Alexei suggested, I would do this later (or someone else can).

v1 cover letter:

Add minimal support to hook bpf programs to netfilter hooks, e.g.
PREROUTING or FORWARD.

For this the most relevant parts for registering a netfilter
hook via the in-kernel api are exposed to userspace via bpf_link.

The new program type is 'tracing style', i.e. there is no context
access rewrite done by verifier, the function argument (struct bpf_nf_ctx)
isn't stable.
There is no support for direct packet access, dynptr api should be used
instead.

With this its possible to build a small test program such as:

 #include "vmlinux.h"
extern int bpf_dynptr_from_skb(struct __sk_buff *skb, __u64 flags,
                               struct bpf_dynptr *ptr__uninit) __ksym;
extern void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, uint32_t offset,
                                   void *buffer, uint32_t buffer__sz) __ksym;
SEC("netfilter")
int nf_test(struct bpf_nf_ctx *ctx)
{
	struct nf_hook_state *state = ctx->state;
	struct sk_buff *skb = ctx->skb;
	const struct iphdr *iph, _iph;
	const struct tcphdr *th, _th;
	struct bpf_dynptr ptr;

	if (bpf_dynptr_from_skb(skb, 0, &ptr))
		return NF_DROP;

	iph = bpf_dynptr_slice(&ptr, 0, &_iph, sizeof(_iph));
	if (!iph)
		return NF_DROP;

	th = bpf_dynptr_slice(&ptr, iph->ihl << 2, &_th, sizeof(_th));
	if (!th)
		return NF_DROP;

	bpf_printk("accept %x:%d->%x:%d, hook %d ifin %d\n",
		   iph->saddr, bpf_ntohs(th->source), iph->daddr,
		   bpf_ntohs(th->dest), state->hook, state->in->ifindex);
        return NF_ACCEPT;
}

Then, tail /sys/kernel/tracing/trace_pipe.

Changes since v3:
- uapi: remove 'reserved' struct member, s/prio/priority (Alexei)
- add ctx access test cases (Alexei, see last patch)
- some arm32 can only handle cmpxchg on u32 (build bot)
- Fix kdoc annotations (Simon Horman)
- bpftool: prefer p_err, not fprintf (Quentin)
- add test cases in separate patch

Changes since v2:
1. don't WARN when user calls 'bpftool loink detach' twice
   restrict attachment to ip+ip6 families, lets relax this
   later in case arp/bridge/netdev are needed too.
2. show netfilter links in 'bpftool net' output as well.

Changes since v1:
1. Don't fail to link when CONFIG_NETFILTER=n (build bot)
2. Use test_progs instead of test_verifier (Alexei)

Changes since last RFC version:
1. extend 'bpftool link show' to print prio/hooknum etc
2. extend 'nft list hooks' so it can print the bpf program id
3. Add an extra patch to artificially restrict bpf progs with
   same priority.  Its fine from a technical pov but it will
   cause ordering issues (most recent one comes first).
   Can be removed later.
4. Add test_run support for netfilter prog type and a small
   extension to verifier tests to make sure we can't return
   verdicts like NF_STOLEN.
5. Alter the netfilter part of the bpf_link uapi struct:
   - add flags/reserved members.
  Not used here except returning errors when they are nonzero.
  Plan is to allow the bpf_link users to enable netfilter
  defrag or conntrack engine by setting feature flags at
  link create time in the future.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:37:41 -07:00
Florian Westphal 006c0e44ed selftests/bpf: add missing netfilter return value and ctx access tests
Extend prog_tests with two test cases:

 # ./test_progs --allow=verifier_netfilter_retcode
 #278/1   verifier_netfilter_retcode/bpf_exit with invalid return code. test1:OK
 #278/2   verifier_netfilter_retcode/bpf_exit with valid return code. test2:OK
 #278/3   verifier_netfilter_retcode/bpf_exit with valid return code. test3:OK
 #278/4   verifier_netfilter_retcode/bpf_exit with invalid return code. test4:OK
 #278     verifier_netfilter_retcode:OK

This checks that only accept and drop (0,1) are permitted.

NF_QUEUE could be implemented later if we can guarantee that attachment
of such programs can be rejected if they get attached to a pf/hook that
doesn't support async reinjection.

NF_STOLEN could be implemented via trusted helpers that can guarantee
that the skb will eventually be free'd.

v4: test case for bpf_nf_ctx access checks, requested by Alexei Starovoitov.
v5: also check ctx->{state,skb} can be dereferenced (Alexei).

 # ./test_progs --allow=verifier_netfilter_ctx
 #281/1   verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
 #281/2   verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
 #281/3   verifier_netfilter_ctx/netfilter invalid context access, past end of ctx:OK
 #281/4   verifier_netfilter_ctx/netfilter invalid context, write:OK
 #281/5   verifier_netfilter_ctx/netfilter valid context read and invalid write:OK
 #281/6   verifier_netfilter_ctx/netfilter test prog with skb and state read access:OK
 #281/7   verifier_netfilter_ctx/netfilter test prog with skb and state read access @unpriv:OK
 #281     verifier_netfilter_ctx:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED

This checks:
1/2: partial reads of ctx->{skb,state} are rejected
3. read access past sizeof(ctx) is rejected
4. write to ctx content, e.g. 'ctx->skb = NULL;' is rejected
5. ctx->state content cannot be altered
6. ctx->state and ctx->skb can be dereferenced
7. ... same program fails for unpriv (CAP_NET_ADMIN needed).

Link: https://lore.kernel.org/bpf/20230419021152.sjq4gttphzzy6b5f@dhcp-172-26-102-232.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20230420201655.77kkgi3dh7fesoll@MacBook-Pro-6.local/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-8-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:34:50 -07:00
Florian Westphal 2b99ef22e0 bpf: add test_run support for netfilter program type
add glue code so a bpf program can be run using userspace-provided
netfilter state and packet/skb.

Default is to use ipv4:output hook point, but this can be overridden by
userspace.  Userspace provided netfilter state is restricted, only hook and
protocol families can be overridden and only to ipv4/ipv6.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-7-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:34:50 -07:00
Florian Westphal d0fe92fb5e tools: bpftool: print netfilter link info
Dump protocol family, hook and priority value:
$ bpftool link
2: netfilter  prog 14
        ip input prio -128
        pids install(3264)
5: netfilter  prog 14
        ip6 forward prio 21
        pids a.out(3387)
9: netfilter  prog 14
        ip prerouting prio 123
        pids a.out(5700)
10: netfilter  prog 14
        ip input prio 21
        pids test2(5701)

v2: Quentin Monnet suggested to also add 'bpftool net' support:

$ bpftool net
xdp:

tc:

flow_dissector:

netfilter:

        ip prerouting prio 21 prog_id 14
        ip input prio -128 prog_id 14
        ip input prio 21 prog_id 14
        ip forward prio 21 prog_id 14
        ip output prio 21 prog_id 14
        ip postrouting prio 21 prog_id 14

'bpftool net' only dumps netfilter link type, links are sorted by protocol
family, hook and priority.

v5: fix bpf ci failure: libbpf needs small update to prog_type_name[]
    and probe_prog_load helper.
v4: don't fail with -EOPNOTSUPP in libbpf probe_prog_load, update
    prog_type_name[] with "netfilter" entry (bpf ci)
v3: fix bpf.h copy, 'reserved' member was removed (Alexei)
    use p_err, not fprintf (Quentin)

Suggested-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/eeeaac99-9053-90c2-aa33-cc1ecb1ae9ca@isovalent.com/
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-6-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:34:49 -07:00
Florian Westphal 0bdc6da88f netfilter: disallow bpf hook attachment at same priority
This is just to avoid ordering issues between multiple bpf programs,
this could be removed later in case it turns out to be too cautious.

bpf prog could still be shared with non-bpf hook, otherwise we'd have to
make conntrack hook registration fail just because a bpf program has
same priority.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-5-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:34:14 -07:00
Florian Westphal 506a74db7e netfilter: nfnetlink hook: dump bpf prog id
This allows userspace ("nft list hooks") to show which bpf program
is attached to which hook.

Without this, user only knows bpf prog is attached at prio
x, y, z at INPUT and FORWARD, but can't tell which program is where.

v4: kdoc fixups (Simon Horman)

Link: https://lore.kernel.org/bpf/ZEELzpNCnYJuZyod@corigine.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-4-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:34:14 -07:00
Florian Westphal fd9c663b9a bpf: minimal support for programs hooked into netfilter framework
This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs
that will be invoked via the NF_HOOK() points in the ip stack.

Invocation incurs an indirect call.  This is not a necessity: Its
possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the
program invocation with the same method already done for xdp progs.

This isn't done here to keep the size of this chunk down.

Verifier restricts verdicts to either DROP or ACCEPT.

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:34:14 -07:00
Florian Westphal 84601d6ee6 bpf: add bpf_link support for BPF_NETFILTER programs
Add bpf_link support skeleton.  To keep this reviewable, no bpf program
can be invoked yet, if a program is attached only a c-stub is called and
not the actual bpf program.

Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig.

Uapi example usage:
	union bpf_attr attr = { };

	attr.link_create.prog_fd = progfd;
	attr.link_create.attach_type = 0; /* unused */
	attr.link_create.netfilter.pf = PF_INET;
	attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN;
	attr.link_create.netfilter.priority = -128;

	err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr));

... this would attach progfd to ipv4:input hook.

Such hook gets removed automatically if the calling program exits.

BPF_NETFILTER program invocation is added in followup change.

NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it
allows to tell userspace which program is attached at the given hook
when user runs 'nft hook list' command rather than just the priority
and not-very-helpful 'this hook runs a bpf prog but I can't tell which
one'.

Will also be used to disallow registration of two bpf programs with
same priority in a followup patch.

v4: arm32 cmpxchg only supports 32bit operand
    s/prio/priority/
v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if
    more use cases pop up (arptables, ebtables, netdev ingress/egress etc).

Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:34:14 -07:00
Kui-Feng Lee 45cea721ea bpftool: Update doc to explain struct_ops register subcommand.
The "struct_ops register" subcommand now allows for an optional *LINK_DIR*
to be included. This specifies the directory path where bpftool will pin
struct_ops links with the same name as their corresponding map names.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20230420002822.345222-2-kuifeng@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:10:10 -07:00
Kui-Feng Lee 0232b78897 bpftool: Register struct_ops with a link.
You can include an optional path after specifying the object name for the
'struct_ops register' subcommand.

Since the commit 226bc6ae64 ("Merge branch 'Transit between BPF TCP
congestion controls.'") has been accepted, it is now possible to create a
link for a struct_ops. This can be done by defining a struct_ops in
SEC(".struct_ops.link") to make libbpf returns a real link. If we don't pin
the links before leaving bpftool, they will disappear. To instruct bpftool
to pin the links in a directory with the names of the maps, we need to
provide the path of that directory.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20230420002822.345222-1-kuifeng@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21 11:10:10 -07:00
Stanislav Fomichev 833d67ecdc selftests/bpf: Verify optval=NULL case
Make sure we get optlen exported instead of getting EFAULT.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230418225343.553806-3-sdf@google.com
2023-04-21 17:10:34 +02:00
Stanislav Fomichev 00e74ae086 bpf: Don't EFAULT for getsockopt with optval=NULL
Some socket options do getsockopt with optval=NULL to estimate the size
of the final buffer (which is returned via optlen). This breaks BPF
getsockopt assumptions about permitted optval buffer size. Let's enforce
these assumptions only when non-NULL optval is provided.

Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Reported-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/ZD7Js4fj5YyI2oLd@google.com/T/#mb68daf700f87a9244a15d01d00c3f0e5b08f49f7
Link: https://lore.kernel.org/bpf/20230418225343.553806-2-sdf@google.com
2023-04-21 17:09:53 +02:00
Jakub Kicinski ca28896580 wireless-next patches for v6.4
Most likely the last -next pull request for v6.4. We have changes all
 over. rtw88 now supports SDIO bus and iwlwifi continues to work on
 Wi-Fi 7 support. Not much stack changes this time.
 
 Major changes:
 
 cfg80211/mac80211
 
 * fix some Fine Time Measurement (FTM) frames not being bufferable
 
 * flush frames before key removal to avoid potential unencrypted
   transmission depending on the hardware design
 
 iwlwifi
 
 * preparation for Wi-Fi 7 EHT and multi-link support
 
 rtw88
 
 * SDIO bus support
 
 * RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support
 
 rtw89
 
 * framework firmware backwards compatibility
 
 brcmfmac
 
 * Cypress 43439 SDIO support
 
 mt76
 
 * mt7921 P2P support
 
 * mt7996 mesh A-MSDU support
 
 * mt7996 EHT support
 
 * mt7996 coredump support
 
 wcn36xx
 
 * support for pronto v3 hardware
 
 ath11k
 
 * PCIe DeviceTree bindings
 
 * WCN6750: enable SAR support
 
 ath10k
 
 * convert DeviceTree bindings to YAML
 -----BEGIN PGP SIGNATURE-----
 
 iQFFBAABCgAvFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmRCaTURHGt2YWxvQGtl
 cm5lbC5vcmcACgkQbhckVSbrbZvcRwf+NcLS4HbmqGZhBxl2LZVZ6AFCBM4ijDlO
 pxdMiC4UxT+UApY1/9YXo0VS97M7paDJH+R/g1HcTvvKURHCmsdhYHm+R1MH+/uD
 r8RfvJg4VtNnlUpsJh9jxt+e697KP15M7DF0sFlQzdIoTUl13Hp7YhI76zunAbAN
 u1FBcVVJiCcJWbLolMzqAeBMUWUEG+GtHF6Zn5kChVU/p1nmwJMPUG3Qvb61a7Yc
 BM1pQX8jQ8PBj+VrGPGvqX0BOdbxq0evauYScq2oTOhQ1fzTNWOsI1yI7AwApptR
 itwQ2t1UK/C/EWpvWIBSd0nit1uwSx0Zsu/nSZlbKbrvIFwd5XnfwQ==
 =Irrd
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Kalle Valo says:

====================
wireless-next patches for v6.4

Most likely the last -next pull request for v6.4. We have changes all
over. rtw88 now supports SDIO bus and iwlwifi continues to work on
Wi-Fi 7 support. Not much stack changes this time.

Major changes:

cfg80211/mac80211
 - fix some Fine Time Measurement (FTM) frames not being bufferable
 - flush frames before key removal to avoid potential unencrypted
   transmission depending on the hardware design

iwlwifi
 - preparation for Wi-Fi 7 EHT and multi-link support

rtw88
 - SDIO bus support
 - RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support

rtw89
 - framework firmware backwards compatibility

brcmfmac
 - Cypress 43439 SDIO support

mt76
 - mt7921 P2P support
 - mt7996 mesh A-MSDU support
 - mt7996 EHT support
 - mt7996 coredump support

wcn36xx
 - support for pronto v3 hardware

ath11k
 - PCIe DeviceTree bindings
 - WCN6750: enable SAR support

ath10k
 - convert DeviceTree bindings to YAML

* tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (261 commits)
  wifi: rtw88: Update spelling in main.h
  wifi: airo: remove ISA_DMA_API dependency
  wifi: rtl8xxxu: Simplify setting the initial gain
  wifi: rtl8xxxu: Add rtl8xxxu_write{8,16,32}_{set,clear}
  wifi: rtl8xxxu: Don't print the vendor/product/serial
  wifi: rtw88: Fix memory leak in rtw88_usb
  wifi: rtw88: call rtw8821c_switch_rf_set() according to chip variant
  wifi: rtw88: set pkg_type correctly for specific rtw8821c variants
  wifi: rtw88: rtw8821c: Fix rfe_option field width
  wifi: rtw88: usb: fix priority queue to endpoint mapping
  wifi: rtw88: 8822c: add iface combination
  wifi: rtw88: handle station mode concurrent scan with AP mode
  wifi: rtw88: prevent scan abort with other VIFs
  wifi: rtw88: refine reserved page flow for AP mode
  wifi: rtw88: disallow PS during AP mode
  wifi: rtw88: 8822c: extend reserved page number
  wifi: rtw88: add port switch for AP mode
  wifi: rtw88: add bitmap for dynamic port settings
  wifi: rtw89: mac: use regular int as return type of DLE buffer request
  wifi: mac80211: remove return value check of debugfs_create_dir()
  ...
====================

Link: https://lore.kernel.org/r/20230421104726.800BCC433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21 07:35:51 -07:00
Magnus Karlsson 02e93e0475 selftests/xsk: Put MAP_HUGE_2MB in correct argument
Put the flag MAP_HUGE_2MB in the correct flags argument instead of the
wrong offset argument.

Fixes: 2ddade3229 ("selftests/xsk: Fix munmap for hugepage allocated umem")
Reported-by: Kal Cutter Conley <kal.conley@dectris.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230421062208.3772-1-magnus.karlsson@gmail.com
2023-04-21 16:35:10 +02:00
Dave Marchevsky 4ab07209d5 bpf: Fix bpf_refcount_acquire's refcount_t address calculation
When calculating the address of the refcount_t struct within a local
kptr, bpf_refcount_acquire_impl should add refcount_off bytes to the
address of the local kptr. Due to some missing parens, the function is
incorrectly adding sizeof(refcount_t) * refcount_off bytes. This patch
fixes the calculation.

Due to the incorrect calculation, bpf_refcount_acquire_impl was trying
to refcount_inc some memory well past the end of local kptrs, resulting
in kasan and refcount complaints, as reported in [0]. In that thread,
Florian and Eduard discovered that bpf selftests written in the new
style - with __success and an expected __retval, specifically - were
not actually being run. As a result, selftests added in bpf_refcount
series weren't really exercising this behavior, and thus didn't unearth
the bug.

With this fixed behavior it's safe to revert commit 7c4b96c000
("selftests/bpf: disable program test run for progs/refcounted_kptr.c"),
this patch does so.

  [0] https://lore.kernel.org/bpf/ZEEp+j22imoN6rn9@strlen.de/

Fixes: 7c50b1cb76 ("bpf: Add bpf_refcount_acquire kfunc")
Reported-by: Florian Westphal <fw@strlen.de>
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20230421074431.3548349-1-davemarchevsky@fb.com
2023-04-21 16:31:37 +02:00
Alexei Starovoitov acf1c3d68e bpf: Fix race between btf_put and btf_idr walk.
Florian and Eduard reported hard dead lock:
[   58.433327]  _raw_spin_lock_irqsave+0x40/0x50
[   58.433334]  btf_put+0x43/0x90
[   58.433338]  bpf_find_btf_id+0x157/0x240
[   58.433353]  btf_parse_fields+0x921/0x11c0

This happens since btf->refcount can be 1 at the time of btf_put() and
btf_put() will call btf_free_id() which will try to grab btf_idr_lock
and will dead lock.
Avoid the issue by doing btf_put() without locking.

Fixes: 3d78417b60 ("bpf: Add bpf_btf_find_by_name_kind() helper.")
Fixes: 1e89106da2 ("bpf: Add bpf_core_add_cands() and wire it into bpf_core_apply_relo_insn().")
Reported-by: Florian Westphal <fw@strlen.de>
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20230421014901.70908-1-alexei.starovoitov@gmail.com
2023-04-21 16:27:56 +02:00
Jianfeng Tan dfc39d4026 net/packet: support mergeable feature of virtio
Packet sockets, like tap, can be used as the backend for kernel vhost.
In packet sockets, virtio net header size is currently hardcoded to be
the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
always the case: some virtio features, such as mrg_rxbuf, need virtio
net header to be 12-byte long.

Mergeable buffers, as a virtio feature, is worthy of supporting: packets
that are larger than one-mbuf size will be dropped in vhost worker's
handle_rx if mrg_rxbuf feature is not used, but large packets
cannot be avoided and increasing mbuf's size is not economical.

With this virtio feature enabled by virtio-user, packet sockets with
hardcoded 10-byte virtio net header will parse mac head incorrectly in
packet_snd by taking the last two bytes of virtio net header as part of
mac header.
This incorrect mac header parsing will cause packet to be dropped due to
invalid ether head checking in later under-layer device packet receiving.

By adding extra field vnet_hdr_sz with utilizing holes in struct
packet_sock to record currently used virtio net header size and supporting
extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
sockets can know the exact length of virtio net header that virtio user
gives.
In packet_snd, tpacket_snd and packet_recvmsg, instead of using
hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
corresponding packet_sock, and parse mac header correctly based on this
information to avoid the packets being mistakenly dropped.

Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com>
Co-developed-by: Anqi Shen <amy.saq@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 12:01:58 +01:00
David S. Miller 156c93986d Merge branch 'mlx5-ipsec-fixes'
Leon Romanovsky says:

====================
Fixes to mlx5 IPsec implementation

This small patchset includes various fixes and one refactoring patch
which I collected for the features sent in this cycle, with one exception -
first patch.

First patch fixes code which was introduced in previous cycle, however I
was able to trigger FW error only in custom debug code, so don't see a
need to send it to net-rc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 11:49:47 +01:00
Leon Romanovsky 45fd01f2fb net/mlx5e: Refactor duplicated code in mlx5e_ipsec_init_macs
ARP discovery code has same logic for RX and TX flows, but with
different source and destination fields. Instead of duplicating
same code in mlx5e_ipsec_init_macs, let's refactor.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 11:49:47 +01:00
Leon Romanovsky 94edec4484 net/mlx5e: Properly release work data structure
There are some flows in which work structure is not allocated at all
and it is needed to be checked prior release of data structure.

 general protection fault, probably for non-canonical address 0xdffffc000000000a: 0000 [#1] SMP KASAN
 KASAN: null-ptr-deref in range [0x0000000000000050-0x0000000000000057]
 CPU: 6 PID: 3486 Comm: kworker/6:0 Not tainted 6.3.0-rc5_for_upstream_debug_2023_04_06_11_01 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: events xfrm_state_gc_task
 RIP: 0010:mlx5e_xfrm_free_state+0x177/0x260 [mlx5_core]
 Code: c1 ea 03 80 3c 02 00 0f 85 f5 00 00 00 4c 8b a5 08 01 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 50 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b7 00 00 00 49 8b 7c 24 50 e8 85 7c 09 e0 4c 89
 RSP: 0018:ffff888137a8fc50 EFLAGS: 00010206
 RAX: dffffc0000000000 RBX: ffff888180398000 RCX: 0000000000000000
 RDX: 000000000000000a RSI: ffffffffa1878227 RDI: 0000000000000050
 RBP: ffff88812a0c8000 R08: ffff888137a8fb60 R09: 0000000000000000
 R10: fffffbfff09aba0c R11: 0000000000000001 R12: 0000000000000000
 R13: ffff88812a0c8108 R14: ffffffff84c63480 R15: ffff8881acb63118
 FS:  0000000000000000(0000) GS:ffff88881eb00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f667e8bc000 CR3: 0000000004693006 CR4: 0000000000370ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:

  ___xfrm_state_destroy+0x3c8/0x5e0
  xfrm_state_gc_task+0xf6/0x140
  ? ___xfrm_state_destroy+0x5e0/0x5e0
  process_one_work+0x7c2/0x1340
  ? lockdep_hardirqs_on_prepare+0x3f0/0x3f0
  ? pwq_dec_nr_in_flight+0x230/0x230
  ? spin_bug+0x1d0/0x1d0
  worker_thread+0x59d/0xec0
  ? __kthread_parkme+0xd9/0x1d0
  ? process_one_work+0x1340/0x1340
  kthread+0x28f/0x330
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30

 Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm ib_uverbs ib_core vfio_pci vfio_pci_core vfio_iommu_type1 vfio cuse overlay zram zsmalloc fuse [last unloaded: mlx5_core]
 ---[ end trace 0000000000000000 ]---

Fixes: 4562116f8a ("net/mlx5e: Generalize IPsec work structs")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 11:49:47 +01:00
Leon Romanovsky 3198ae7d42 net/mlx5e: Compare all fields in IPv6 address
Fix size argument in memcmp to compare whole IPv6 address.

Fixes: b3beba1fb4 ("net/mlx5e: Allow policies with reqid 0, to support IKE policy holes")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Emeel Hakim <ehakim@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 11:49:47 +01:00
Leon Romanovsky 697b3518eb net/mlx5e: Don't overwrite extack message returned from IPsec SA validator
Addition of new err_xfrm label caused to error messages be overwritten.
Fix it by using proper NL_SET_ERR_MSG_WEAK_MOD macro together with change
in a default message.

Fixes: aa8bd0c951 ("net/mlx5e: Support IPsec acquire default SA")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 11:49:47 +01:00
Leon Romanovsky e239e31ae8 net/mlx5e: Fix FW error while setting IPsec policy block action
When trying to set IPsec policy block action the following error is
generated:

 mlx5_cmd_out_err:803:(pid 3426): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed,
	status bad parameter(0x3), syndrome (0x8708c3), err(-22)

This error means that drop action is not allowed when modify action is
set, so update the code to skip modify header for XFRM_POLICY_BLOCK action.

Fixes: 6721239672 ("net/mlx5e: Skip IPsec encryption for TX path without matching policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 11:49:47 +01:00
Yan Wang 35226750f7 net: stmmac:fix system hang when setting up tag_8021q VLAN for DSA ports
The system hang because of dsa_tag_8021q_port_setup()->
				stmmac_vlan_rx_add_vid().

I found in stmmac_drv_probe() that cailing pm_runtime_put()
disabled the clock.

First, when the kernel is compiled with CONFIG_PM=y,The stmmac's
resume/suspend is active.

Secondly,stmmac as DSA master,the dsa_tag_8021q_port_setup() function
will callback stmmac_vlan_rx_add_vid when DSA dirver starts. However,
The system is hanged for the stmmac_vlan_rx_add_vid() accesses its
registers after stmmac's clock is closed.

I would suggest adding the pm_runtime_resume_and_get() to the
stmmac_vlan_rx_add_vid().This guarantees that resuming clock output
while in use.

Fixes: b3dcb31277 ("net: stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid()")
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Yan Wang <rk.code@outlook.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 11:44:35 +01:00
David S. Miller d8bb382419 Merge branch 'pds_core'
Shannon Nelson says:

====================
pds_core driver

Summary:
--------
This patchset implements a new driver for use with the AMD/Pensando
Distributed Services Card (DSC), intended to provide core configuration
services through the auxiliary_bus and through a couple of EXPORTed
functions for use initially in VFio and vDPA feature specific drivers.

To keep this patchset to a manageable size, the pds_vdpa and pds_vfio
drivers have been split out into their own patchsets to be reviewed
separately.

Detail:
-------
AMD/Pensando is making available a new set of devices for supporting vDPA,
VFio, and potentially other features in the Distributed Services Card
(DSC).  These features are implemented through a PF that serves as a Core
device for controlling and configuring its VF devices.  These VF devices
have separate drivers that use the auxiliary_bus to work through the Core
device as the control path.

Currently, the DSC supports standard ethernet operations using the
ionic driver.  This is not replaced by the Core-based devices - these
new devices are in addition to the existing Ethernet device.  Typical DSC
configurations will include both PDS devices and Ionic Eth devices.
However, there is a potential future path for ethernet services to come
through this device as well.

The Core device is a new PCI PF/VF device managed by a new driver
'pds_core'.  The PF device has access to an admin queue for configuring
the services used by the VFs, and sets up auxiliary_bus devices for each
vDPA VF for communicating with the drivers for the vDPA devices.  The VFs
may be for VFio or vDPA, and other services in the future; these VF types
are selected as part of the DSC internal FW configurations, which is out
of the scope of this patchset.

When the vDPA support set is enabled in the core PF through its devlink
param, auxiliary_bus devices are created for each VF that supports the
feature.  The vDPA driver then connects to and uses this auxiliary_device
to do control path configuration through the PF device.  This can then be
used with the vdpa kernel module to provide devices for virtio_vdpa kernel
module for host interfaces, or vhost_vdpa kernel module for interfaces
exported into your favorite VM.

A cheap ASCII diagram of a vDPA instance looks something like this:

                                ,----------.
                                |   vdpa   |
                                '----------'
                                  |     ||
                                 ctl   data
                                  |     ||
                          .----------.  ||
                          | pds_vdpa |  ||
                          '----------'  ||
                               |        ||
                       pds_core.vDPA.1  ||
                               |        ||
                    .---------------.   ||
                    |   pds_core    |   ||
                    '---------------'   ||
                        ||         ||   ||
                      09:00.0      09:00.1
        == PCI ============================================
                        ||            ||
                   .----------.   .----------.
            ,------|    PF    |---|    VF    |-------,
            |      '----------'   '----------'       |
            |                  DSC                   |
            |                                        |
            ------------------------------------------

Changes:
  v11:
 - change strncpy to strscpy
     Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202304181137.WaZTYyAa-lkp@intel.com/

  v10:
Link: https://lore.kernel.org/netdev/20230418003228.28234-1-shannon.nelson@amd.com/
 - remove CONFIG_DEBUG_FS guard static inline stuff
 - remove unnecessary 0 and null initializations
 - verify in driver load that PDS_CORE_DRV_NAME matches KBUILD_MODNAME
 - remove debugfs irqs_show(), redundant with /proc
 - return -ENOMEM if intr_info = kcalloc() fails
 - move the status code enum into pds_core_if.h as part of API definition
 - fix up one place in pdsc_devcmd_wait() we're using the status codes where we could use the errno
 - remove redundant calls to flush_workqueue()
 - grab config_lock before testing state bits in pdsc_fw_reporter_diagnose()
 - change pdsc_color_match() to return bool
 - remove useless VIF setup loop and just setup vDPA services for now
 - remove pf pointer from struct padev and have clients use pci_physfn()
 - drop use of "vf" in auxdev.c function names, make more generic
 - remove last of client ops struct and simply export the functions
 - drop drivers@pensando.io from MAINTAINERS and add new include dir
 - include dynamic_debug.h in adminq.c to protect dynamic_hex_dump()
 - fixed fw_slot type from u8 to int for handling error returns
 - fixed comment spelling
 - changed void arg in pdsc_adminq_post() to struct pdsc *

  v9:
Link: https://lore.kernel.org/netdev/20230406234143.11318-1-shannon.nelson@amd.com/
 - change pdsc field name id to uid to clarify the unique id used for aux device
 - remove unnecessary pf->state and other checks in aux device creation
 - hardcode fw slotnames for devlink info, don't use strings from FW
 - handle errors from PDS_CORE_CMD_INIT devcmd call
 - tighten up health thread use of config_lock
 - remove pdsc_queue_health_check() layer over queuing health check
 - start pds_core.rst file in first patch, add to it incrementally
 - give more user interaction info in commit messages
 - removed a few more extraneous includes

  v8:
Link: https://lore.kernel.org/netdev/20230330234628.14627-1-shannon.nelson@amd.com/
 - fixed deadlock problem, use devl_health_reporter_destroy() when devlink is locked
 - don't clear client_id until after auxiliary_device_uninit()

  v7:
Link: https://lore.kernel.org/netdev/20230330192313.62018-1-shannon.nelson@amd.com/
 - use explicit devlink locking and devl_* APIs
 - move some of devlink setup logic into probe and remove
 - use debugfs_create_u{type}() for state and queue head and tail
 - add include for linux/vmalloc.h
     Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202303260420.Tgq0qobF-lkp@intel.com/

  v6:
Link: https://lore.kernel.org/netdev/20230324190243.27722-1-shannon.nelson@amd.com/
 - removed version.h include noticed by kernel test robot's version check
     Reported-by: kernel test robot <lkp@intel.com>
     Link: https://lore.kernel.org/oe-kbuild-all/202303230742.pX3ply0t-lkp@intel.com/
 - fixed up the more egregious checkpatch line length complaints
 - make sure pdsc_auxbus_dev_register() checks padev pointer errcode

  v5:
Link: https://lore.kernel.org/netdev/20230322185626.38758-1-shannon.nelson@amd.com/
 - added devlink health reporter for FW issues
 - removed asic_type, asic_rev, serial_num, fw_version from debugfs as
   they are available through other means
 - trimed OS info in pdsc_identify(), we don't need to send that much info to the FW
 - removed reg/unreg from auxbus client API, they are now in the core when VF
   is started
 - removed need for pdsc definition in client by simplifying the padev to only carry
   struct pci_dev pointers rather than full struct pdsc to the pf and vf
 - removed the unused pdsc argument in pdsc_notify()
 - moved include/linux/pds/pds_core.h to driver/../pds_core/core.h
 - restored a few pds_core_if.h interface values and structs that are shared
   with FW source
 - moved final config_lock unlock to before tear down of timer and workqueue
   to be sure there are no deadlocks while waiting for any stragglers
 - changed use of PAGE_SIZE to local PDS_PAGE_SIZE to keep with FW layout needs
   without regard to kernel PAGE_SIZE configuration
 - removed the redundant *adminqcq argument from pdsc_adminq_post()

  v4:
Link: https://lore.kernel.org/netdev/20230308051310.12544-1-shannon.nelson@amd.com/
 - reworked to attach to both Core PF and vDPA VF PCI devices
 - now creates auxiliary_device as part of each VF PCI probe, removes them on PCI remove
 - auxiliary devices now use simple unique id rather than PCI address for identifier
 - replaced home-grown event publishing with kernel-based notifier service
 - dropped live_migration parameter, not needed when not creating aux device for it
 - replaced devm_* functions with traditional interfaces
 - added MAINTAINERS entry
 - removed lingering traces of set/get_vf attribute adminq commands
 - trimmed some include lists
 - cleaned a kernel test robot complaint about a stray unused variable
        Link: https://lore.kernel.org/oe-kbuild-all/202302181049.yeUQMeWY-lkp@intel.com/

  v3:
Link: https://lore.kernel.org/netdev/20230217225558.19837-1-shannon.nelson@amd.com/
 - changed names from "pensando" to "amd" and updated copyright strings
 - dropped the DEVLINK_PARAM_GENERIC_ID_FW_BANK for future development
 - changed the auxiliary device creation to be triggered by the
   PCI bus event BOUND_DRIVER, and torn down at UNBIND_DRIVER in order
   to properly handle users using the sysfs bind/unbind functions
 - dropped some noisy log messages
 - rebased to current net-next

  RFC to v2:
Link: https://lore.kernel.org/netdev/20221207004443.33779-1-shannon.nelson@amd.com/
 - added separate devlink param patches for DEVLINK_PARAM_GENERIC_ID_ENABLE_MIGRATION
   and DEVLINK_PARAM_GENERIC_ID_FW_BANK, and dropped the driver specific implementations
 - updated descriptions for the new devlink parameters
 - dropped netdev support
 - dropped vDPA patches, will followup later
 - separated fw update and fw bank select into their own patches

  RFC:
Link: https://lore.kernel.org/netdev/20221118225656.48309-1-snelson@pensando.io/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 08:29:14 +01:00
Shannon Nelson ddbcb22055 pds_core: Kconfig and pds_core.rst
Remaining documentation and Kconfig hook for building the driver.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 08:29:14 +01:00
Shannon Nelson d24c28278a pds_core: publish events to the clients
When the Core device gets an event from the device, or notices
the device FW to be up or down, it needs to send those events
on to the clients that have an event handler.  Add the code to
pass along the events to the clients.

The entry points pdsc_register_notify() and pdsc_unregister_notify()
are EXPORTed for other drivers that want to listen for these events.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 08:29:13 +01:00
Shannon Nelson 10659034c6 pds_core: add the aux client API
Add the client API operations for running adminq commands.
The core registers the client with the FW, then the client
has a context for requesting adminq services.  We expect
to add additional operations for other clients, including
requesting additional private adminqs and IRQs, but don't have
the need yet.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 08:29:13 +01:00
Shannon Nelson 40ced89445 pds_core: devlink params for enabling VIF support
Add the devlink parameter switches so the user can enable
the features supported by the VFs.  The only feature supported
at the moment is vDPA.

Example:
    devlink dev param set pci/0000:2b:00.0 \
	    name enable_vnet cmode runtime value true

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 08:29:13 +01:00
Shannon Nelson 4569cce43b pds_core: add auxiliary_bus devices
An auxiliary_bus device is created for each vDPA type VF at VF
probe and destroyed at VF remove.  The aux device name comes
from the driver name + VIF type + the unique id assigned at PCI
probe.  The VFs are always removed on PF remove, so there should
be no issues with VFs trying to access missing PF structures.

The auxiliary_device names will look like "pds_core.vDPA.nn"
where 'nn' is the VF's uid.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 08:29:13 +01:00
Shannon Nelson f53d93110a pds_core: add initial VF device handling
This is the initial VF PCI driver framework for the new
pds_vdpa VF device, which will work in conjunction with an
auxiliary_bus client of the pds_core driver.  This does the
very basics of registering for the new VF device, setting
up debugfs entries, and registering with devlink.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 08:29:13 +01:00
Shannon Nelson 65e0185ad7 pds_core: set up the VIF definitions and defaults
The Virtual Interfaces (VIFs) supported by the DSC's
configuration (vDPA, Eth, RDMA, etc) are reported in the
dev_ident struct and made visible in debugfs.  At this point
only vDPA is supported in this driver so we only setup
devices for that feature.

Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21 08:29:13 +01:00