Граф коммитов

903539 Коммитов

Автор SHA1 Сообщение Дата
Romain Bellan 7c6b412162 netfilter: ctnetlink: be more strict when NF_CONNTRACK_MARK is not set
When CONFIG_NF_CONNTRACK_MARK is not set, any CTA_MARK or CTA_MARK_MASK
in netlink message are not supported. We should return an error when one
of them is set, not both

Fixes: 9306425b70 ("netfilter: ctnetlink: must check mark attributes vs NULL")
Signed-off-by: Romain Bellan <romain.bellan@wifirst.fr>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-30 02:05:36 +02:00
Florian Westphal 28f715b9e6 netfilter: nf_queue: prefer nf_queue_entry_free
Instead of dropping refs+kfree, use the helper added in previous patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29 16:28:29 +02:00
Florian Westphal af370ab36f netfilter: nf_queue: do not release refcouts until nf_reinject is done
nf_queue is problematic when another NF_QUEUE invocation happens
from nf_reinject().

1. nf_queue is invoked, increments state->sk refcount.
2. skb is queued, waiting for verdict.
3. sk is closed/released.
3. verdict comes back, nf_reinject is called.
4. nf_reinject drops the reference -- refcount can now drop to 0

Instead of get_ref/release_ref pattern, we need to nest the get_ref calls:
    get_ref
       get_ref
       release_ref
     release_ref

So that when we invoke the next processing stage (another netfilter
or the okfn()), we hold at least one reference count on the
devices/socket.

After previous patch, it is now safe to put the entry even after okfn()
has potentially free'd the skb.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29 16:28:29 +02:00
Florian Westphal 119e52e664 netfilter: nf_queue: place bridge physports into queue_entry struct
The refcount is done via entry->skb, which does work fine.
Major problem: When putting the refcount of the bridge ports, we
must always put the references while the skb is still around.

However, we will need to put the references after okfn() to avoid
a possible 1 -> 0 -> 1 refcount transition, so we cannot use the
skb pointer anymore.

Place the physports in the queue entry structure instead to allow
for refcounting changes in the next patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29 16:28:29 +02:00
Florian Westphal dd3cc111f2 netfilter: nf_queue: make nf_queue_entry_release_refs static
This is a preparation patch, no logical changes.
Move free_entry into core and rename it to something more sensible.

Will ease followup patches which will complicate the refcount handling.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-29 16:28:29 +02:00
Paul Blakey 7da182a998 netfilter: flowtable: Use work entry per offload command
To allow offload commands to execute in parallel, create workqueue
for flow table offload, and use a work entry per offload command.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:42:21 +01:00
Paul Blakey 422c032afc netfilter: flowtable: Use rw sem as flow block lock
Currently flow offload threads are synchronized by the flow block mutex.
Use rw lock instead to increase flow insertion (read) concurrency.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:42:20 +01:00
Qian Cai 0a6a9515fe netfilter: nf_tables: silence a RCU-list warning in nft_table_lookup()
It is safe to traverse &net->nft.tables with &net->nft.commit_mutex
held using list_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false
positive,

WARNING: suspicious RCU usage
net/netfilter/nf_tables_api.c:523 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
1 lock held by iptables/1384:
 #0: ffffffff9745c4a8 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x25/0x60 [nf_tables]

Call Trace:
 dump_stack+0xa1/0xea
 lockdep_rcu_suspicious+0x103/0x10d
 nft_table_lookup.part.0+0x116/0x120 [nf_tables]
 nf_tables_newtable+0x12c/0x7d0 [nf_tables]
 nfnetlink_rcv_batch+0x559/0x1190 [nfnetlink]
 nfnetlink_rcv+0x1da/0x210 [nfnetlink]
 netlink_unicast+0x306/0x460
 netlink_sendmsg+0x44b/0x770
 ____sys_sendmsg+0x46b/0x4a0
 ___sys_sendmsg+0x138/0x1a0
 __sys_sendmsg+0xb6/0x130
 __x64_sys_sendmsg+0x48/0x50
 do_syscall_64+0x69/0xf4
 entry_SYSCALL_64_after_hwframe+0x49/0xb3

Signed-off-by: Qian Cai <cai@lca.pw>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:42:20 +01:00
wenxu 133a2fe594 netfilter: flowtable: Fix incorrect tc_setup_type type
The indirect block setup should use TC_SETUP_FT as the type instead of
TC_SETUP_BLOCK. Adjust existing users of the indirect flow block
infrastructure.

Fixes: b5140a36da ("netfilter: flowtable: add indr block setup support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:41:52 +01:00
Pablo Neira Ayuso 53c2b2899a netfilter: flowtable: add counter support
Add a new flag to turn on flowtable counters which are stored in the
conntrack entry.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:32:37 +01:00
Pablo Neira Ayuso cfbd1125fc netfilter: nf_tables: add enum nft_flowtable_flags to uapi
Expose the NFT_FLOWTABLE_HW_OFFLOAD flag through uapi.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:32:36 +01:00
Pablo Neira Ayuso 8ac2bd3577 netfilter: conntrack: export nf_ct_acct_update()
This function allows you to update the conntrack counters.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:32:36 +01:00
Haishuang Yan 73348fed35 ipvs: optimize tunnel dumps for icmp errors
After strip GRE/UDP tunnel header for icmp errors, it's better to show
"GRE/UDP" instead of "IPIP" in debug message.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:31:01 +01:00
Jules Irenge 6b36d4829c netfilter: conntrack: Add missing annotations for nf_conntrack_all_lock() and nf_conntrack_all_unlock()
Sparse reports warnings at nf_conntrack_all_lock()
	and nf_conntrack_all_unlock()

warning: context imbalance in nf_conntrack_all_lock()
	- wrong count at exit
warning: context imbalance in nf_conntrack_all_unlock()
	- unexpected unlock

Add the missing __acquires(&nf_conntrack_locks_all_lock)
Add missing __releases(&nf_conntrack_locks_all_lock)

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:22:06 +01:00
Jules Irenge 19f8f717f6 netfilter: ctnetlink: Add missing annotation for ctnetlink_parse_nat_setup()
Sparse reports a warning at ctnetlink_parse_nat_setup()

warning: context imbalance in ctnetlink_parse_nat_setup()
	- unexpected unlock

The root cause is the missing annotation at ctnetlink_parse_nat_setup()
Add the missing __must_hold(RCU) annotation

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-27 18:21:09 +01:00
wenxu dc264f1f7e netfilter: flowtable: fix NULL pointer dereference in tunnel offload support
The tc ct action does not cache the route in the flowtable entry.

Fixes: 88bf6e4114 ("netfilter: flowtable: add tunnel encap/decap action offload support")
Fixes: cfab6dbd0e ("netfilter: flowtable: add tunnel match offload support")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19 21:06:17 +01:00
Pablo Neira Ayuso 475beb9c8d netfilter: nf_tables: add nft_set_elem_expr_destroy() and use it
This patch adds nft_set_elem_expr_destroy() to destroy stateful
expressions in set elements.

This patch also updates the commit path to call this function to invoke
expr->ops->destroy_clone when required.

This is implicitly fixing up a module reference counter leak and
a memory leak in expressions that allocated internal state, e.g.
nft_counter.

Fixes: 4094445229 ("netfilter: nf_tables: add elements with stateful expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19 11:37:32 +01:00
Pablo Neira Ayuso 772f4e82b3 netfilter: nf_tables: fix double-free on set expression from the error path
After copying the expression to the set element extension, release the
expression and reset the pointer to avoid a double-free from the error
path.

Fixes: 4094445229 ("netfilter: nf_tables: add elements with stateful expressions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19 11:37:31 +01:00
Pablo Neira Ayuso 65038428b2 netfilter: nf_tables: allow to specify stateful expression in set definition
This patch allows users to specify the stateful expression for the
elements in this set via NFTA_SET_EXPR. This new feature allows you to
turn on counters for all of the elements in this set.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19 11:37:31 +01:00
Pablo Neira Ayuso 0c2a85edd1 netfilter: nf_tables: pass context to nft_set_destroy()
The patch that adds support for stateful expressions in set definitions
require this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19 11:37:31 +01:00
Pablo Neira Ayuso c604cc691c netfilter: nf_tables: move nft_expr_clone() to nf_tables_api.c
Move the nft_expr_clone() helper function to the core.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-19 11:37:31 +01:00
David S. Miller 79e28519ac mlx5-updates-2020-03-17
1) Compiler warnings and cleanup for the connection tracking series
 2) Bug fixes for the connection tracking series
 3) Fix devlink port register sequence
 4) Last five patches in the series, By Eli cohen
    Add the support for forwarding traffic between two eswitch uplink
    representors (Hairpin for eswitch), using mlx5 termination tables
    to change the direction of a packet in hw from RX to TX pipeline.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5ximgACgkQSD+KveBX
 +j4iGwf9FrTxtGjgVXuwmc5LmSU6tak5SjK+dW5PdCw4mNorN2hSJeV/f9evLrf7
 7Cxfm4OH8/ivOSpVQz6XZEF0aYTq9T3JakGzAWESWwo/s+i7iwA+lVPYKhcvHXeg
 C9ImWbnyDCZkPZM6jz4KNpSMRWkyB7sEtQ51hYF0bdiSzcLSDaLCoKPEljp7sNKb
 f1456/yDuOIZ3sb6rYPH6e8EqqfUMiyYAyY3bBu09sl3deXopyueYVqPSPgjOoC7
 SfM5K+9nnuQJdvSqwUJLexxDZo1Z7fizz73LwUp0SBLk5zvZdn2bhxbt4wPS/xH/
 CSjsWFAs1eu2rDqRH48G3jKqTZeq2w==
 =LkML
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-03-17

1) Compiler warnings and cleanup for the connection tracking series
2) Bug fixes for the connection tracking series
3) Fix devlink port register sequence
4) Last five patches in the series, By Eli cohen
   Add the support for forwarding traffic between two eswitch uplink
   representors (Hairpin for eswitch), using mlx5 termination tables
   to change the direction of a packet in hw from RX to TX pipeline.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 19:13:37 -07:00
Heiner Kallweit d445dff2df net: phy: realtek: read actual speed to detect downshift
At least some integrated PHY's in RTL8168/RTL8125 chip versions support
downshift, and the actual link speed can be read from a vendor-specific
register. Info about this register was provided by Realtek.
More details about downshift configuration (e.g. number of attempts)
aren't available, therefore the downshift tunable is not implemented.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 17:05:34 -07:00
Petr Machata 2c4b58dc75 net: sched: Fix hw_stats_type setting in pedit loop
In the commit referenced below, hw_stats_type of an entry is set for every
entry that corresponds to a pedit action. However, the assignment is only
done after the entry pointer is bumped, and therefore could overwrite
memory outside of the entries array.

The reason for this positioning may have been that the current entry's
hw_stats_type is already set above, before the action-type dispatch.
However, if there are no more actions, the assignment is wrong. And if
there are, the next round of the for_each_action loop will make the
assignment before the action-type dispatch anyway.

Therefore fix this issue by simply reordering the two lines.

Fixes: 74522e7baa ("net: sched: set the hw_stats_type in pedit loop")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:52:04 -07:00
David S. Miller dd13f4dfc0 Merge branch 'mlxsw-spectrum_cnt-Expose-counter-resources'
Ido Schimmel says:

====================
mlxsw: spectrum_cnt: Expose counter resources

Jiri says:

Capacity and utilization of existing flow and RIF counters are currently
unavailable to be seen by the user. Use the existing devlink resources
API to expose the information:

$ sudo devlink resource show pci/0000:00:10.0 -v
pci/0000:00:10.0:
  name kvd resource_path /kvd size 524288 unit entry dpipe_tables none
  name span_agents resource_path /span_agents size 8 occ 0 unit entry dpipe_tables none
  name counters resource_path /counters size 79872 occ 44 unit entry dpipe_tables none
    resources:
      name flow resource_path /counters/flow size 61440 occ 4 unit entry dpipe_tables none
      name rif resource_path /counters/rif size 18432 occ 40 unit entry dpipe_tables none
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:20 -07:00
Jiri Pirko ee4848ac1a selftests: mlxsw: Add tc action hw_stats tests
Add tests for mlxsw hw_stats types.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko 4e145fc6eb mlxsw: spectrum_cnt: Expose devlink resource occupancy for counters
Implement occupancy counting for counters and expose over devlink
resource API.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko 53d9636694 mlxsw: spectrum_cnt: Consolidate subpools initialization
Put all init operations related to subpools into
mlxsw_sp_counter_sub_pools_init().

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko ab8c4cc604 mlxsw: spectrum_cnt: Move config validation along with resource register
Move the validation of subpools configuration, to avoid possible over
commitment to resource registration. Add WARN_ON to indicate bug
in the code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko d53cdbb889 mlxsw: spectrum_cnt: Expose subpool sizes over devlink resources
Implement devlink resources support for counter pools. Move the subpool
sizes calculations into the new resources register function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko b2d3e33c77 mlxsw: spectrum_cnt: Add entry_size_res_id for each subpool and use it to query entry size
Add new field to subpool struct that would indicate which
resource id should be used to query the entry size for
the subpool from the device.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko c33fbe949f mlxsw: spectrum_cnt: Move sub_pools under per-instance pool struct
Currently, the global static array of subpools is used. Make it
per-instance as multiple instances of the mlxsw driver can have
different values.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko 4d21ed2e3d selftests: spectrum-2: Adjust tc_flower_scale limit according to current counter count
With the change that made the code to query counter bank size from device
instead of using hard-coded value, the number of available counters
changed for Spectrum-2. Adjust the limit in the selftests.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko ac5de9a20f mlxsw: spectrum_cnt: Query bank size from FW resources
The bank size is different between Spectrum versions. Also it is
a resource that can be queried. So instead of hard coding the value in
code, query it from the firmware.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Rahul Lakkireddy 8d174351f2 cxgb4: rework TC filter rule insertion across regions
Chelsio NICs have 3 filter regions, in following order of priority:
1. High Priority (HPFILTER) region (Highest Priority).
2. HASH region.
3. Normal FILTER region (Lowest Priority).

Currently, there's a 1-to-1 mapping between the prio value passed
by TC and the filter region index. However, it's possible to have
multiple TC rules with the same prio value. In this case, if a region
is exhausted, no attempt is made to try inserting the rule in the
next available region.

So, rework and remove the 1-to-1 mapping. Instead, dynamically select
the region to insert the filter rule, as long as the new rule's prio
value doesn't conflict with existing rules across all the 3 regions.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:39:03 -07:00
Daniel Borkmann 357b6cc583 netfilter: revert introduction of egress hook
This reverts the following commits:

  8537f78647 ("netfilter: Introduce egress hook")
  5418d3881e ("netfilter: Generalize ingress hook")
  b030f194ae ("netfilter: Rename ingress hook include file")

>From the discussion in [0], the author's main motivation to add a hook
in fast path is for an out of tree kernel module, which is a red flag
to begin with. Other mentioned potential use cases like NAT{64,46}
is on future extensions w/o concrete code in the tree yet. Revert as
suggested [1] given the weak justification to add more hooks to critical
fast-path.

  [0] https://lore.kernel.org/netdev/cover.1583927267.git.lukas@wunner.de/
  [1] https://lore.kernel.org/netdev/20200318.011152.72770718915606186.davem@davemloft.net/

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Miller <davem@davemloft.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Nacked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:35:48 -07:00
David S. Miller ce7964bdc4 Merge branch 's390-qeth-next'
Julian Wiedmann says:

====================
s390/qeth: updates 2020-03-18

please apply the following patch series for qeth to netdev's net-next
tree.

This consists of three parts:
1) support for __GFP_MEMALLOC,
2) several ethtool enhancements (.set_channels, SW Timestamping),
3) the usual cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:36 -07:00
Julian Wiedmann cd652be52c s390/qeth: use dev->reg_state
To check whether a netdevice has already been registered, look at
NETREG_REGISTERED to replace some hacks I added a while ago.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann 5bcd8ad976 s390/qeth: remove gratuitous NULL checks
qeth_do_ioctl() is only reached through our own net_device_ops, so we
can trust that dev->ml_priv still contains what we put there earlier.

qeth_bridgeport_an_set() is an internal function that doesn't require
such sanity checks.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann 86e7a4e4af s390/qeth: add phys_to_virt() translation for AOB
Data addresses in the AOB are absolute, and need to be translated before
being fed into kmem_cache_free(). Currently this phys_to_virt() is a no-op.
Also see commit 2db01da8d2 ("s390/qdio: fill SBALEs with absolute addresses").

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann 54e73b9c0a s390/qeth: don't report hard-coded driver version
Versions are meaningless for an in-kernel driver.
Instead use the UTS_RELEASE that is set by ethtool_get_drvinfo().

Cc: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann 8d145da294 s390/qeth: add SW timestamping support for IQD devices
This adds support for SOF_TIMESTAMPING_TX_SOFTWARE.
No support for non-IQD devices, since they orphan the skb in their xmit
path.

To play nice with TX bulking, set the timestamp when the buffer that
contains the skb(s) is actually flushed out to HW.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann 5d8ce41c6a s390/qeth: balance the TX queue selection for IQD devices
For ucast traffic, qeth_iqd_select_queue() falls back to
netdev_pick_tx(). This will potentially use skb_tx_hash() to distribute
the flow over all active TX queues - so txq 0 is a valid selection, and
qeth_iqd_select_queue() needs to check for this and put it on some other
queue. As a result, the distribution for ucast flows is unbalanced and
hits QETH_IQD_MIN_UCAST_TXQ heavier than the other queues.

Open-coding a custom variant of skb_tx_hash() isn't an option, since
netdev_pick_tx() also gives us eg. access to XPS. But we can pull a
little trick: add a single TC class that excludes the mcast txq, and
thus encourage skb_tx_hash() to not pick the mcast txq.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann 66cddf1019 s390/qeth: allow configuration of TX queues for IQD devices
Similar to the support for z/VM NICs, but we need to take extra care
about the dedicated mcast queue:

1. netdev_pick_tx() is unaware of this limitation and might select the
   mcast txq. Catch this.
2. require at least _two_ TX queues - one for ucast, one for mcast.
3. when reducing the number of TX queues, there's a potential race
   where netdev_cap_txqueue() over-rules the selected txq index and
   falls back to index 0. This would place ucast traffic on the mcast
   queue, and result in TX errors.
   So for IQD, reject a reduction while the interface is running.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann fcc2df8b87 s390/qeth: allow configuration of TX queues for z/VM NICs
Add support for ETHTOOL_SCHANNELS to change the count of active
TX queues.

Since all TX queue structs are pre-allocated and -registered, we just
need to trivially adjust dev->real_num_tx_queues.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann 1c103cf819 s390/qeth: remove prio-queueing support for z/VM NICs
z/VM NICs don't offer HW QoS for TX rings. So just use netdev_pick_tx()
to distribute the connections equally over all enabled TX queues.

We start with just 1 enabled TX queue (this matches the typical
configuration without prio-queueing). A follow-on patch will allow users
to enable additional TX queues.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann b413ff8a18 s390/qeth: use memory reserves in TX slow path
When falling back to an allocation from the HW header cache, check if
the skb is eligible for using memory reserves.
This only makes a difference if the cache is empty and needs to be
refilled.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
Julian Wiedmann 714c910885 s390/qeth: use memory reserves to back RX buffers
Use dev_alloc_page() for backing the RX buffers with pages. This way we
pick up __GFP_MEMALLOC.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:33:35 -07:00
David S. Miller a58741ef1e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Use nf_flow_offload_tuple() to fetch flow stats, from Paul Blakey.

2) Add new xt_IDLETIMER hard mode, from Manoj Basapathi.
   Follow up patch to clean up this new mode, from Dan Carpenter.

3) Add support for geneve tunnel options, from Xin Long.

4) Make sets built-in and remove modular infrastructure for sets,
   from Florian Westphal.

5) Remove unused TEMPLATE_NULLS_VAL, from Li RongQing.

6) Statify nft_pipapo_get, from Chen Wandun.

7) Use C99 flexible-array member, from Gustavo A. R. Silva.

8) More descriptive variable names for bitwise, from Jeremy Sowden.

9) Four patches to add tunnel device hardware offload to the flowtable
   infrastructure, from wenxu.

10) pipapo set supports for 8-bit grouping, from Stefano Brivio.

11) pipapo can switch between nibble and byte grouping, also from
    Stefano.

12) Add AVX2 vectorized version of pipapo, from Stefano Brivio.

13) Update pipapo to be use it for single ranges, from Stefano.

14) Add stateful expression support to elements via control plane,
    eg. counter per element.

15) Re-visit sysctls in unprivileged namespaces, from Florian Westphal.

15) Add new egress hook, from Lukas Wunner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 23:51:31 -07:00
Paolo Abeni 7f20d5fc70 mptcp: move msk state update to subflow_syn_recv_sock()
After commit 58b0991962 ("mptcp: create msk early"), the
msk socket is already available at subflow_syn_recv_sock()
time. Let's move there the state update, to mirror more
closely the first subflow state.

The above will also help multiple subflow supports.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 22:52:24 -07:00