Граф коммитов

105866 Коммитов

Автор SHA1 Сообщение Дата
Yonghong Song ba64e7d852 bpf: btf: support proper non-jit func info
Commit 838e96904f ("bpf: Introduce bpf_func_info")
added bpf func info support. The userspace is able
to get better ksym's for bpf programs with jit, and
is able to print out func prototypes.

For a program containing func-to-func calls, the existing
implementation returns user specified number of function
calls and BTF types if jit is enabled. If the jit is not
enabled, it only returns the type for the main function.

This is undesirable. Interpreter may still be used
and we should keep feature identical regardless of
whether jit is enabled or not.
This patch fixed this discrepancy.

Fixes: 838e96904f ("bpf: Introduce bpf_func_info")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-26 17:57:10 -08:00
David S. Miller 4afe60a97b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-11-26

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Extend BTF to support function call types and improve the BPF
   symbol handling with this info for kallsyms and bpftool program
   dump to make debugging easier, from Martin and Yonghong.

2) Optimize LPM lookups by making longest_prefix_match() handle
   multiple bytes at a time, from Eric.

3) Adds support for loading and attaching flow dissector BPF progs
   from bpftool, from Stanislav.

4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.

5) Enable verifier to support narrow context loads with offset > 0
   to adapt to LLVM code generation (currently only offset of 0 was
   supported). Add test cases as well, from Andrey.

6) Simplify passing device functions for offloaded BPF progs by
   adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
   Also convert nfp and netdevsim to make use of them, from Quentin.

7) Add support for sock_ops based BPF programs to send events to
   the perf ring-buffer through perf_event_output helper, from
   Sowmini and Daniel.

8) Add read / write support for skb->tstamp from tc BPF and cg BPF
   programs to allow for supporting rate-limiting in EDT qdiscs
   like fq from BPF side, from Vlad.

9) Extend libbpf API to support map in map types and add test cases
   for it as well to BPF kselftests, from Nikita.

10) Account the maximum packet offset accessed by a BPF program in
    the verifier and use it for optimizing nfp JIT, from Jiong.

11) Fix error handling regarding kprobe_events in BPF sample loader,
    from Daniel T.

12) Add support for queue and stack map type in bpftool, from David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26 13:08:17 -08:00
Eric Dumazet 4bffc669d6 net: remove unsafe skb_insert()
I do not see how one can effectively use skb_insert() without holding
some kind of lock. Otherwise other cpus could have changed the list
right before we have a chance of acquiring list->lock.

Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
one probably meant to use __skb_insert() since it appears nesqp->pau_list
is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
could be removed, since nesqp->pau_list.lock could be used instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma <linux-rdma@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25 10:36:19 -08:00
Heiner Kallweit 620344c43e net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
Similar to netdev_sent_queue add helper __netdev_sent_queue as variant
of __netdev_tx_sent_queue.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-25 10:18:13 -08:00
Alexey Dobriyan 2183435c25 net: fixup type in netdev_start_xmit()
Return code should be formally "netdev_tx_t".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-24 17:33:21 -08:00
David S. Miller b1bf78bfb2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-24 17:01:43 -08:00
Linus Torvalds 857fa628bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.

 2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.

 3) Fix socket wmem accounting in SCTP, from Xin Long.

 4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.

 5) qed driver passes bytes instead of bits into second arg of
    bitmap_weight(). From Denis Bolotin.

 6) Fix reset deadlock in ibmvnic, from Juliet Kim.

 7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
    Machata.

 8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
    compression during this period, from Eric Dumazet.

 9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.

10) Don't leave dangling error pointer if bpf_prog_add() fails in
    thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
    headers, set sq->tso_hdrs to NULL.

11) Fix race condition over state variables in act_police, from Davide
    Caratti.

12) Disable guest csum in the presence of XDP in virtio_net, from Jason
    Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
  net: gemini: Fix copy/paste error
  net: phy: mscc: fix deadlock in vsc85xx_default_config
  dt-bindings: dsa: Fix typo in "probed"
  net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
  net: amd: add missing of_node_put()
  team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
  virtio-net: fail XDP set if guest csum is negotiated
  virtio-net: disable guest csum during XDP set
  net/sched: act_police: add missing spinlock initialization
  net: don't keep lonely packets forever in the gro hash
  net/ipv6: re-do dad when interface has IFF_NOARP flag change
  packet: copy user buffers before orphan or clone
  ibmvnic: Update driver queues after change in ring size support
  ibmvnic: Fix RX queue buffer cleanup
  net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
  net/dim: Update DIM start sample after each DIM iteration
  net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
  net/smc: use after free fix in smc_wr_tx_put_slot()
  net/smc: atomic SMCD cursor handling
  net/smc: add SMC-D shutdown signal
  ...
2018-11-24 09:19:38 -08:00
Petr Machata d17d9f5e51 switchdev: Replace port obj add/del SDO with a notification
Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of
this field from all clients, which were migrated to use switchdev
notification in the previous patches.

Add a new function switchdev_port_obj_notify() that sends the switchdev
notifications SWITCHDEV_PORT_OBJ_ADD and _DEL.

Update switchdev_port_obj_del_now() to dispatch to this new function.
Drop __switchdev_port_obj_add() and update switchdev_port_obj_add()
likewise.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23 18:02:24 -08:00
Petr Machata f30f0601eb switchdev: Add helpers to aid traversal through lower devices
After the transition from switchdev operations to notifier chain (which
will take place in following patches), the onus is on the driver to find
its own devices below possible layer of LAG or other uppers.

The logic to do so is fairly repetitive: each driver is looking for its
own devices among the lowers of the notified device. For those that it
finds, it calls a handler. To indicate that the event was handled,
struct switchdev_notifier_port_obj_info.handled is set. The differences
lie only in what constitutes an "own" device and what handler to call.

Therefore abstract this logic into two helpers,
switchdev_handle_port_obj_add() and switchdev_handle_port_obj_del(). If
a driver only supports physical ports under a bridge device, it will
simply avoid this layer of indirection.

One area where this helper diverges from the current switchdev behavior
is the case of mixed lowers, some of which are switchdev ports and some
of which are not. Previously, such scenario would fail with -EOPNOTSUPP.
The helper could do that for lowers for which the passed-in predicate
doesn't hold. That would however break the case that switchdev ports
from several different drivers are stashed under one master, a scenario
that switchdev currently happily supports. Therefore tolerate any and
all unknown netdevices, whether they are backed by a switchdev driver
or not.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23 18:02:24 -08:00
Petr Machata aa4efe2139 switchdev: Add SWITCHDEV_PORT_OBJ_ADD, SWITCHDEV_PORT_OBJ_DEL
An offloading driver may need to have access to switchdev events on
ports that aren't directly under its control. An example is a VXLAN port
attached to a bridge offloaded by a driver. The driver needs to know
about VLANs configured on the VXLAN device. However the VXLAN device
isn't stashed between the bridge and a front-panel-port device (such as
is the case e.g. for LAG devices), so the usual switchdev ops don't
reach the driver.

VXLAN is likely not the only device type like this: in theory any L2
tunnel device that needs offloading will prompt requirement of this
sort. This falsifies the assumption that only the lower devices of a
front panel port need to be notified to achieve flawless offloading.

A way to fix this is to give up the notion of port object addition /
deletion as a switchdev operation, which assumes somewhat tight coupling
between the message producer and consumer. And instead send the message
over a notifier chain.

To that end, introduce two new switchdev notifier types,
SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types
communicate the same event as the corresponding switchdev op, except in
a form of a notification. struct switchdev_notifier_port_obj_info was
added to carry the fields that the switchdev op carries. An additional
field, handled, will be used to communicate back to switchdev that the
event has reached an interested party, which will be important for the
two-phase commit.

The two switchdev operations themselves are kept in place. Following
patches first convert individual clients to the notifier protocol, and
only then are the operations removed.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23 18:02:23 -08:00
Petr Machata a93e3b1722 switchdev: Add a blocking notifier chain
In general one can't assume that a switchdev notifier is called in a
non-atomic context, and correspondingly, the switchdev notifier chain is
an atomic one.

However, port object addition and deletion messages are delivered from a
process context. Even the MDB addition messages, whose delivery is
scheduled from atomic context, are queued and the delivery itself takes
place in blocking context. For VLAN messages in particular, keeping the
blocking nature is important for error reporting.

Therefore introduce a blocking notifier chain and related service
functions to distribute the notifications for which a blocking context
can be assumed.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23 18:02:23 -08:00
Petr Machata ec394af5ea switchdev: SWITCHDEV_OBJ_PORT_{VLAN, MDB}(): Sanitize
The two macros SWITCHDEV_OBJ_PORT_VLAN() and SWITCHDEV_OBJ_PORT_MDB()
expand to a container_of() call, yielding an appropriate container of
their sole argument. However, due to a name collision, the first
argument, i.e. the contained object pointer, is not the only one to get
expanded. The third argument, which is a structure member name, and
should be kept literal, gets expanded as well. The only safe way to use
these two macros is therefore to name the local variable passed to them
"obj".

To fix this, rename the sole argument of the two macros from
"obj" (which collides with the member name) to "OBJ". Additionally,
instead of passing "OBJ" to container_of() verbatim, parenthesize it, so
that a comma in the passed-in expression doesn't pollute the
container_of() invocation.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23 18:02:23 -08:00
Madalin Bucur 5c664ace8c soc/qman: add return value to interrupt coalesce changing APIs
Check that the values received by the portal interrupt coalesce
change APIs are in range.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Roy Pledge <roy.pledge@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23 11:17:06 -08:00
Willem de Bruijn 5cd8d46ea1 packet: copy user buffers before orphan or clone
tpacket_snd sends packets with user pages linked into skb frags. It
notifies that pages can be reused when the skb is released by setting
skb->destructor to tpacket_destruct_skb.

This can cause data corruption if the skb is orphaned (e.g., on
transmit through veth) or cloned (e.g., on mirror to another psock).

Create a kernel-private copy of data in these cases, same as tun/tap
zerocopy transmission. Reuse that infrastructure: mark the skb as
SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).

Unlike other zerocopy packets, do not set shinfo destructor_arg to
struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
when the original skb is released and a timestamp is recorded. Do not
change this timestamp behavior. The ubuf_info->callback is not needed
anyway, as no zerocopy notification is expected.

Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
resulting value is not a valid ubuf_info pointer, nor a valid
tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.

The fix relies on features introduced in commit 52267790ef ("sock:
add MSG_ZEROCOPY"), so can be backported as is only to 4.14.

Tested with from `./in_netns.sh ./txring_overwrite` from
http://github.com/wdebruij/kerneltools/tests

Fixes: 69e3c75f4d ("net: TX_RING and packet mmap")
Reported-by: Anand H. Krishnan <anandhkrishnan@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-23 11:08:03 -08:00
Vlad Dumitrescu f11216b242 bpf: add skb->tstamp r/w access from tc clsact and cg skb progs
This could be used to rate limit egress traffic in concert with a qdisc
which supports Earliest Departure Time, such as FQ.

Write access from cg skb progs only with CAP_SYS_ADMIN, since the value
will be used by downstream qdiscs. It might make sense to relax this.

Changes v1 -> v2:
  - allow access from cg skb, write only with CAP_SYS_ADMIN

Signed-off-by: Vlad Dumitrescu <vladum@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-22 15:47:28 -08:00
Daniel Jurgens e45678973d {net, IB}/mlx4: Initialize CQ buffers in the driver when possible
Perform CQ initialization in the driver when the capability is supported
by the FW.  When passing the CQ to HW indicate that the CQ buffer has
been pre-initialized.

Doing so decreases CQ creation time.  Testing on P8 showed a single 2048
entry CQ creation time was reduced from ~395us to ~170us, which is
2.3x faster.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22 11:38:36 -08:00
Tal Gilboa 0211dda68a net/dim: Update DIM start sample after each DIM iteration
On every iteration of net_dim, the algorithm may choose to
check for the system state by comparing current data sample
with previous data sample. After each of these comparison,
regardless of the action taken, the sample used as baseline
is needed to be updated.

This patch fixes a bug that causes DIM to take wrong decisions,
due to never updating the baseline sample for comparison between
iterations. This way, DIM always compares current sample with
zeros.

Although this is a functional fix, it also improves and stabilizes
performance as the algorithm works properly now.

Performance:
Tested single UDP TX stream with pktgen:
samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1
-m 24:8a:07:88:26:8b -f 3 -b 128

ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps.
Also, toggling between profiles is less frequent with the fix.

Fixes: 8115b750db ("net/dim: use struct net_dim_sample as arg to net_dim")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-22 11:36:54 -08:00
Linus Torvalds 4cd731953d USB fixes for 4.20-rc4
Here are a number of small USB fixes for 4.20-rc4.
 
 There's the usual xhci and dwc2/3 fixes as well as a few minor other
 issues resolved for problems that have been reported.  Full details are
 in the shortlog.
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW/ZsOg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymPlACeO1ii05UG6Mb47AC0sZbgMk5diWwAnRK3A0Pe
 fSupZy5v6u1HBEormFco
 =5k0S
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of small USB fixes for 4.20-rc4.

  There's the usual xhci and dwc2/3 fixes as well as a few minor other
  issues resolved for problems that have been reported. Full details are
  in the shortlog.

  All have been in linux-next for a while with no reported issues"

* tag 'usb-4.20-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: cdc-acm: add entry for Hiro (Conexant) modem
  usb: xhci: Prevent bus suspend if a port connect change or polling state is detected
  usb: core: Fix hub port connection events lost
  usb: dwc3: gadget: fix ISOC TRB type on unaligned transfers
  Revert "usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers"
  usb: dwc2: pci: Fix an error code in probe
  usb: dwc3: Fix NULL pointer exception in dwc3_pci_remove()
  xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc
  usb: xhci: fix timeout for transition from RExit to U0
  usb: xhci: fix uninitialized completion when USB3 port got wrong status
  xhci: Add check for invalid byte size error when UAS devices are connected.
  xhci: handle port status events for removed USB3 hcd
  xhci: Fix leaking USB3 shared_hcd at xhci removal
  USB: misc: appledisplay: add 20" Apple Cinema Display
  USB: quirks: Add no-lpm quirk for Raydium touchscreens
  usb: quirks: Add delay-init quirk for Corsair K70 LUX RGB
  USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub
  usb: dwc3: gadget: Properly check last unaligned/zero chain TRB
  usb: dwc3: core: Clean up ULPI device
2018-11-22 08:39:29 -08:00
Ido Schimmel 085ddc87d0 bridge: Allow querying bridge port flags
Allow querying bridge port flags so that drivers capable of performing
VxLAN learning will update the bridge driver only if learning is enabled
on its bridge port corresponding to the VxLAN device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21 17:10:31 -08:00
Petr Machata 5728ae0d17 vxlan: Add hardware FDB learning
In order to allow devices to signal learning events to VXLAN, introduce
two new switchdev messages: SWITCHDEV_VXLAN_FDB_ADD_TO_BRIDGE and
SWITCHDEV_VXLAN_FDB_DEL_TO_BRIDGE.

Listen to these notifications in the vxlan driver. The FDB entries
learned this way have an NTF_EXT_LEARNED flag, and only entries marked
as such can be unlearned by the _DEL_ event. They are also immediately
marked as offloaded. This is the same behavior that the bridge driver
observes.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21 17:10:31 -08:00
Petr Machata 45598c1cee vxlan: Mark user-added FDB entries
The VXLAN driver needs to differentiate between FDB entries learned by
the VXLAN driver, and those added by the user. The latter ones shouldn't
be taken over by external learning events. This is in accordance with
bridge behavior.

Therefore, extend the flags bitfield to 16 bits and add a new private
NTF flag to mark the user-added entries.

This seems preferable to adding a dedicated boolean, because passing the
flag, unlike passing e.g. a true, makes it clear what the meaning of the
bit is.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21 17:10:30 -08:00
Eric Dumazet 86de5921a3 tcp: defer SACK compression after DupThresh
Jean-Louis reported a TCP regression and bisected to recent SACK
compression.

After a loss episode (receiver not able to keep up and dropping
packets because its backlog is full), linux TCP stack is sending
a single SACK (DUPACK).

Sender waits a full RTO timer before recovering losses.

While RFC 6675 says in section 5, "Algorithm Details",

   (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
       indicating at least three segments have arrived above the current
       cumulative acknowledgment point, which is taken to indicate loss
       -- go to step (4).
...
   (4) Invoke fast retransmit and enter loss recovery as follows:

there are old TCP stacks not implementing this strategy, and
still counting the dupacks before starting fast retransmit.

While these stacks probably perform poorly when receivers implement
LRO/GRO, we should be a little more gentle to them.

This patch makes sure we do not enable SACK compression unless
3 dupacks have been sent since last rcv_nxt update.

Ideally we should even rearm the timer to send one or two
more DUPACK if no more packets are coming, but that will
be work aiming for linux-4.21.

Many thanks to Jean-Louis for bisecting the issue, providing
packet captures and testing this patch.

Fixes: 5d9f4262b7 ("tcp: add SACK compression")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Tested-by: Jean-Louis Dupond <jean-louis@dupond.be>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21 15:49:52 -08:00
Michał Mirosław a2e768b861 net/vlan: introduce skb_vlan_tag_get_cfi() helper
Abstract CFI/DEI bit access consistently with other VLAN tag fields.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21 15:41:30 -08:00
Yonghong Song f6161a8f30 bpf: fix a compilation error when CONFIG_BPF_SYSCALL is not defined
Kernel test robot (lkp@intel.com) reports a compilation error at
  https://www.spinics.net/lists/netdev/msg534913.html
introduced by commit 838e96904f ("bpf: Introduce bpf_func_info").

If CONFIG_BPF is defined and CONFIG_BPF_SYSCALL is not defined,
the following error will appear:
  kernel/bpf/core.c:414: undefined reference to `btf_type_by_id'
  kernel/bpf/core.c:415: undefined reference to `btf_name_by_offset'

When CONFIG_BPF_SYSCALL is not defined,
let us define stub inline functions for btf_type_by_id()
and btf_name_by_offset() in include/linux/btf.h.
This way, the compilation failure can be avoided.

Fixes: 838e96904f ("bpf: Introduce bpf_func_info")
Reported-by: kbuild test robot <lkp@intel.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 15:21:45 -08:00
Yonghong Song 838e96904f bpf: Introduce bpf_func_info
This patch added interface to load a program with the following
additional information:
   . prog_btf_fd
   . func_info, func_info_rec_size and func_info_cnt
where func_info will provide function range and type_id
corresponding to each function.

The func_info_rec_size is introduced in the UAPI to specify
struct bpf_func_info size passed from user space. This
intends to make bpf_func_info structure growable in the future.
If the kernel gets a different bpf_func_info size from userspace,
it will try to handle user request with part of bpf_func_info
it can understand. In this patch, kernel can understand
  struct bpf_func_info {
       __u32   insn_offset;
       __u32   type_id;
  };
If user passed a bpf func_info record size of 16 bytes, the
kernel can still handle part of records with the above definition.

If verifier agrees with function range provided by the user,
the bpf_prog ksym for each function will use the func name
provided in the type_id, which is supposed to provide better
encoding as it is not limited by 16 bytes program name
limitation and this is better for bpf program which contains
multiple subprograms.

The bpf_prog_info interface is also extended to
return btf_id, func_info, func_info_rec_size and func_info_cnt
to userspace, so userspace can print out the function prototype
for each xlated function. The insn_offset in the returned
func_info corresponds to the insn offset for xlated functions.
With other jit related fields in bpf_prog_info, userspace can also
print out function prototypes for each jited function.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:39 -08:00
Martin KaFai Lau 2667a2626f bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO
to support the function debug info.

BTF_KIND_FUNC_PROTO must not have a name (i.e. !t->name_off)
and it is followed by >= 0 'struct bpf_param' objects to
describe the function arguments.

The BTF_KIND_FUNC must have a valid name and it must
refer back to a BTF_KIND_FUNC_PROTO.

The above is the conclusion after the discussion between
Edward Cree, Alexei, Daniel, Yonghong and Martin.

By combining BTF_KIND_FUNC and BTF_LIND_FUNC_PROTO,
a complete function signature can be obtained.  It will be
used in the later patches to learn the function signature of
a running bpf program.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20 10:54:38 -08:00
Linus Torvalds 06e68fed32 media fixes for v4.20-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb8+gTAAoJEAhfPr2O5OEVjskP/3jkRvloWfEeufwbIZUAn3Ax
 9fIYIUcka3T9N6fWj/wDvzapXq/onfCE6+XVCH2E72E/2qiwLIupTvFgYiF0SHus
 rENmKSCviT3S6jTlnHM7I6AlRQWU61dztNoS7172kVdgnjRiar145sN2L8X/sSFJ
 Q2M+fyo1JYqyPISb+zSMl/6pWuzWaUpyGld8A9xAJ4rmZx8H9+R5y8WdrKLDGLsd
 99Mq3jFXsVULLRjG6mn+0Mm8uL0yGWiWsgWU/RFiTi35yEbZVJ78bqMphCtaJE5s
 h57ntU0Jg5IxgjaRlTR9aYVLLdbyQWt0nu+eg/dSUZJ6zFca45lCQto/rTboTzr/
 8o5q2jXibpWmoqDMYSuNPzQ2uJYmnxCuDJFPcA3IhC+smzRhSRf/tUK/0JoNAhIA
 mf3vp5DMheq2NAM1H7TfKEEPvASokFhzOpdFDDiLdSM6G+ETx0rMux2MLvAfHe/i
 oVMlfIgEAU81OBSvfglhBT/51DdfUp+LrIzLHbZFfNrfI+mofOiTYYdQtUu/Z2J4
 GeWHrLLC5yY7Qvzd6QUoKzK9DMW5FR65Cpb0zevdyJjdscAfoe9Y8PpKPnVcTgG0
 tqFggw8SzhAUOWIxegR2PRvO9vmgJdS3f4qSsPfppGDUx1H6kJOhEoxML3sJlOLj
 qF1Ft6Je0MKs7nAC3MOt
 =OlYm
 -----END PGP SIGNATURE-----

Merge tag 'media/v4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:

 - add a missing include at v4l2-controls uAPI header

 - minor kAPI update for the request API

 - some fixes at CEC core

 - use a lower minimum height for the virtual codec driver

 - cleanup a gcc warning due to the lack of a fall though markup

 - tc358743: Remove unnecessary self assignment

 - fix the V4L event subscription logic

 - docs: Document metadata format in struct v4l2_format

 - omap3isp and ipu3-cio2: fix unbinding logic

* tag 'media/v4.20-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  media: ipu3-cio2: Use cio2_queues_exit
  media: ipu3-cio2: Unregister device nodes first, then release resources
  media: omap3isp: Unregister media device as first
  media: docs: Document metadata format in struct v4l2_format
  media: v4l: event: Add subscription to list before calling "add" operation
  media: dm365_ipipeif: better annotate a fall though
  media: Rename vb2_m2m_request_queue -> v4l2_m2m_request_queue
  media: cec: increase debug level for 'queue full'
  media: cec: check for non-OK/NACK conditions while claiming a LA
  media: vicodec: lower minimum height to 360
  media: tc358743: Remove unnecessary self assignment
  media: v4l: fix uapi mpeg slice params definition
  v4l2-controls: add a missing include
2018-11-20 07:37:15 -08:00
Jakub Kicinski 068ceb3555 net: sched: cls_u32: add res to offload information
In case of egress offloads the class/flowid assigned by the filter
may be very important for offloaded Qdisc selection.  Provide this
info to drivers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:46 -08:00
Jakub Kicinski e49efd5288 net: sched: gred: support reporting stats from offloads
Allow drivers which offload GRED to report back statistics.  Since
A lot of GRED stats is fairly ad hoc in nature pass to drivers the
standard struct gnet_stats_basic/gnet_stats_queue pairs, and
untangle the values in the core.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:46 -08:00
Jakub Kicinski 890d8d23ec net: sched: gred: add basic Qdisc offload
Add basic offload for the GRED Qdisc.  Inform the drivers any
time Qdisc or virtual queue configuration changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:45 -08:00
Lorenz Bauer 2f1833607a bpf: move BPF_F_QUERY_EFFECTIVE after map flags
BPF_F_QUERY_EFFECTIVE is in the middle of the flags valid
for BPF_MAP_CREATE. Move it to its own section to reduce confusion.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20 00:53:39 +01:00
Lorenz Bauer 96b3b6c909 bpf: allow zero-initializing hash map seed
Add a new flag BPF_F_ZERO_SEED, which forces a hash map
to initialize the seed to zero. This is useful when doing
performance analysis both on individual BPF programs, as
well as the kernel's hash table implementation.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20 00:53:39 +01:00
Xin Long 69fec325a6 Revert "sctp: remove sctp_transport_pmtu_check"
This reverts commit 22d7be267e.

The dst's mtu in transport can be updated by a non sctp place like
in xfrm where the MTU information didn't get synced between asoc,
transport and dst, so it is still needed to do the pmtu check
in sctp_packet_config.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 12:42:47 -08:00
Xin Long 480ba9c18a sctp: add sockopt SCTP_EVENT
This patch adds sockopt SCTP_EVENT described in rfc6525#section-6.2.
With this sockopt users can subscribe to an event from a specified
asoc.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 12:25:43 -08:00
Xin Long 88ee48c1f3 sctp: rename enum sctp_event to sctp_event_type
sctp_event is a structure name defined in RFC for sockopt
SCTP_EVENT. To avoid the conflict, rename it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 12:25:43 -08:00
Xin Long a1e3a0590f sctp: add subscribe per asoc
The member subscribe should be per asoc, so that sockopt SCTP_EVENT
in the next patch can subscribe a event from one asoc only.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 12:25:43 -08:00
Xin Long 2cc0eeb676 sctp: define subscribe in sctp_sock as __u16
The member subscribe in sctp_sock is used to indicate to which of
the events it is subscribed, more like a group of flags. So it's
better to be defined as __u16 (2 bytpes), instead of struct
sctp_event_subscribe (13 bytes).

Note that sctp_event_subscribe is an UAPI struct, used on sockopt
calls, and thus it will not be removed. This patch only changes
the internal storage of the flags.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 12:25:43 -08:00
David S. Miller f2be6d710d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-19 10:55:00 -08:00
Linus Torvalds f2ce1065e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix some potentially uninitialized variables and use-after-free in
    kvaser_usb can drier, from Jimmy Assarsson.

 2) Fix leaks in qed driver, from Denis Bolotin.

 3) Socket leak in l2tp, from Xin Long.

 4) RSS context allocation fix in bnxt_en from Michael Chan.

 5) Fix cxgb4 build errors, from Ganesh Goudar.

 6) Route leaks in ipv6 when removing exceptions, from Xin Long.

 7) Memory leak in IDR allocation handling of act_pedit, from Davide
    Caratti.

 8) Use-after-free of bridge vlan stats, from Nikolay Aleksandrov.

 9) When MTU is locked, do not force DF bit on ipv4 tunnels. From
    Sabrina Dubroca.

10) When NAPI cached skb is reused, we must set it to the proper initial
    state which includes skb->pkt_type. From Eric Dumazet.

11) Lockdep and non-linear SKB handling fix in tipc from Jon Maloy.

12) Set RX queue properly in various tuntap receive paths, from Matthew
    Cover.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tuntap: fix multiqueue rx
  ipv6: Fix PMTU updates for UDP/raw sockets in presence of VRF
  tipc: don't assume linear buffer when reading ancillary data
  tipc: fix lockdep warning when reinitilaizing sockets
  net-gro: reset skb->pkt_type in napi_reuse_skb()
  tc-testing: tdc.py: Guard against lack of returncode in executed command
  tc-testing: tdc.py: ignore errors when decoding stdout/stderr
  ip_tunnel: don't force DF when MTU is locked
  MAINTAINERS: Add entry for CAKE qdisc
  net: bridge: fix vlan stats use-after-free on destruction
  socket: do a generic_file_splice_read when proto_ops has no splice_read
  net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs
  Revert "net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs"
  net: phy: mdio-gpio: Fix working over slow can_sleep GPIOs
  net/sched: act_pedit: fix memory leak when IDR allocation fails
  net: lantiq: Fix returned value in case of error in 'xrx200_probe()'
  ipv6: fix a dst leak when removing its exception
  net: mvneta: Don't advertise 2.5G modes
  drivers/net/ethernet/qlogic/qed/qed_rdma.h: fix typo
  net/mlx4: Fix UBSAN warning of signed integer overflow
  ...
2018-11-19 09:24:04 -08:00
Linus Torvalds 743a4863fd Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Misc fixes: two warning splat fixes, a leak fix and persistent memory
  allocation fixes for ARM"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Permit calling efi_mem_reserve_persistent() from atomic context
  efi/arm: Defer persistent reservations until after paging_init()
  efi/arm/libstub: Pack FDT after populating it
  efi/arm: Revert deferred unmap of early memmap mapping
  efi: Fix debugobjects warning on 'efi_rts_work'
2018-11-18 10:52:26 -08:00
Eric Dumazet 001c96db01 net: align gnet_stats_basic_cpu struct
This structure is small (12 or 16 bytes depending on 64bit
or 32bit kernels), but we do not want it spanning two cache lines.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17 21:37:29 -08:00
Eric Dumazet 9a5ee46230 net: align pcpu_sw_netstats and pcpu_lstats structs
Do not risk spanning these small structures on two cache lines,
it is absolutely not worth it.

For 32bit arches, the hint might not be enough, but we do not
really care anymore.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17 21:37:13 -08:00
Samuel Mendoza-Jonas 8d951a75d0 net/ncsi: Configure multi-package, multi-channel modes with failover
This patch extends the ncsi-netlink interface with two new commands and
three new attributes to configure multiple packages and/or channels at
once, and configure specific failover modes.

NCSI_CMD_SET_PACKAGE mask and NCSI_CMD_SET_CHANNEL_MASK set a whitelist
of packages or channels allowed to be configured with the
NCSI_ATTR_PACKAGE_MASK and NCSI_ATTR_CHANNEL_MASK attributes
respectively. If one of these whitelists is set only packages or
channels matching the whitelist are considered for the channel queue in
ncsi_choose_active_channel().

These commands may also use the NCSI_ATTR_MULTI_FLAG to signal that
multiple packages or channels may be configured simultaneously. NCSI
hardware arbitration (HWA) must be available in order to enable
multi-package mode. Multi-channel mode is always available.

If the NCSI_ATTR_CHANNEL_ID attribute is present in the
NCSI_CMD_SET_CHANNEL_MASK command the it sets the preferred channel as
with the NCSI_CMD_SET_INTERFACE command. The combination of preferred
channel and channel whitelist defines a primary channel and the allowed
failover channels.
If the NCSI_ATTR_MULTI_FLAG attribute is also present then the preferred
channel is configured for Tx/Rx and the other channels are enabled only
for Rx.

Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17 21:09:49 -08:00
Yousuk Seung e8bd8fca67 tcp: add SRTT to SCM_TIMESTAMPING_OPT_STATS
Add TCP_NLA_SRTT to SCM_TIMESTAMPING_OPT_STATS that reports the smoothed
round trip time in microseconds (tcp_sock.srtt_us >> 3).

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17 20:34:36 -08:00
Stephen Hemminger 54e8cb7861 uapi/ethtool: fix spelling errors
Trivial spelling errors found by codespell.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-17 20:33:32 -08:00
Jakub Kicinski 7211101502 net: sched: gred: allow manipulating per-DP RED flags
Allow users to set and dump RED flags (ECN enabled and harddrop)
on per-virtual queue basis.  Validation of attributes is split
from changes to make sure we won't have to undo previous operations
when we find out configuration is invalid.

The objective is to allow changing per-Qdisc parameters without
overwriting the per-vq configured flags.

Old user space will not pass the TCA_GRED_VQ_FLAGS attribute and
per-Qdisc flags will always get propagated to the virtual queues.

New user space which wants to make use of per-vq flags should set
per-Qdisc flags to 0 and then configure per-vq flags as it
sees fit.  Once per-vq flags are set per-Qdisc flags can't be
changed to non-zero.  Vice versa - if the per-Qdisc flags are
non-zero the TCA_GRED_VQ_FLAGS attribute has to either be omitted
or set to the same value as per-Qdisc flags.

Update per-Qdisc parameters:
per-Qdisc | per-VQ | result
        0 |      0 | all vq flags updated
	0 |  non-0 | error (vq flags in use)
    non-0 |      0 | -- impossible --
    non-0 |  non-0 | all vq flags updated

Update per-VQ state (flags parameter not specified):
   no change to flags

Update per-VQ state (flags parameter set):
per-Qdisc | per-VQ | result
        0 |   any  | per-vq flags updated
    non-0 |      0 | -- impossible --
    non-0 |  non-0 | error (per-Qdisc flags in use)

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Jakub Kicinski 80e22e961d net: sched: gred: provide a better structured dump and expose stats
Currently all GRED's virtual queue data is dumped in a single
array in a single attribute.  This makes it pretty much impossible
to add new fields.  In order to expose more detailed stats add a
new set of attributes.  We can now expose the 64 bit value of bytesin
and all the mark stats which were not part of the original design.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 23:08:51 -08:00
Yafang Shao 213d7767af tcp: clean up STATE_TRACE
Currently we can use bpf or tcp tracepoint to conveniently trace the tcp
state transition at the run time.
So we don't need to do this stuff at the compile time anymore.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 20:28:00 -08:00
Cong Wang 7f600f14df net: remove unused skb_send_sock()
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:32:33 -08:00
Michał Mirosław 0c4b2d3705 net: remove VLAN_TAG_PRESENT
Replace VLAN_TAG_PRESENT with single bit flag and free up
VLAN.CFI overload. Now VLAN.CFI is visible in networking stack
and can be passed around intact.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16 19:25:29 -08:00