The variables, msg and data, have the same value. This patch removes
the extra one.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than use an on-stack array to copy a broadcast address, use
the generic eth_broadcast_addr function to save a trivial amount of
object code.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai says:
====================
net_rwsem fixes
there is wext_netdev_notifier_call()->wireless_nlevent_flush()
netdevice notifier, which takes net_rwsem, so we can't take
net_rwsem in {,un}register_netdevice_notifier().
Since {,un}register_netdevice_notifier() is executed under
pernet_ops_rwsem, net_namespace_list can't change, while we
holding it, so there is no need net_rwsem in these functions [1/2].
The same is in [2/2]. We make callers of __rtnl_link_unregister()
take pernet_ops_rwsem, and close the race with setup_net()
and cleanup_net(), so __rtnl_link_unregister() does not need it.
This also fixes the problem of that __rtnl_link_unregister() does
not see initializing and exiting nets.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This function calls call_netdevice_notifier(), which also
may take net_rwsem. So, we can't use net_rwsem here.
This patch makes callers of this functions take pernet_ops_rwsem,
like register_netdevice_notifier() does. This will protect
the modifications of net_namespace_list, and allows notifiers
to take it (they won't have to care about context).
Since __rtnl_link_unregister() is used on module load
and unload (which are not frequent operations), this looks
for me better, than make all call_netdevice_notifier()
always executing in "protected net_namespace_list" context.
Also, this fixes the problem we had a deal in 328fbe747a
"Close race between {un, }register_netdevice_notifier and ...",
and guarantees __rtnl_link_unregister() does not skip
exitting net.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These functions take net_rwsem, while wireless_nlevent_flush()
also takes it. But down_read() can't be taken recursive,
because of rw_semaphore design, which prevents it to be occupied
by only readers forever.
Since we take pernet_ops_rwsem in {,un}register_netdevice_notifier(),
net list can't change, so these down_read()/up_read() can be removed.
Fixes: f0b07bb151 "net: Introduce net_rwsem to protect net_namespace_list"
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need for explicit calls of devm_kfree(), as the allocated
memory will be freed during driver's detach.
The driver core clears the driver data to NULL after device_release.
Thus, it is not needed to manually clear the device driver data to NULL.
So remove the unnecessary pci_set_drvdata() and devm_kfree().
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change nsim_devlink_setup to return any error back to the caller and
update nsim_init to handle it.
Requested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy says:
====================
tipc: slim down name table
We clean up and improve the name binding table:
- Replace the memory consuming 'sub_sequence/service range' array with
an RB tree.
- Introduce support for overlapping service sequences/ranges
v2: #1: Fixed a missing initialization reported by David Miller
#4: Obsoleted and replaced a few more macros to get a consistent
terminology in the API.
#5: Added new commit to fix a potential string overflow bug (it
is still only in net-next) reported by Arnd Bergmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc points out that the combined length of the fixed-length inputs to
l->name is larger than the destination buffer size:
net/tipc/link.c: In function 'tipc_link_create':
net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes
into a region of size between 26 and 58 [-Werror=format-overflow=]
sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes
(assuming 75) into a destination of size 60
sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
A detailed analysis reveals that the theoretical maximum length of
a link name is:
max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name =
16 + 1 + 15 + 1 + 16 + 1 + 15 = 65
Since we also need space for a trailing zero we now set MAX_LINK_NAME
to 68.
Just to be on the safe side we also replace the sprintf() call with
snprintf().
Fixes: 25b0b9c4e8 ("tipc: handle collisions of 32-bit node address
hash values")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The three address type structs in the user API have names that in
reality reflect the specific, non-Linux environment where they were
originally created.
We now give them more intuitive names, in accordance with how TIPC is
described in the current documentation.
struct tipc_portid -> struct tipc_socket_addr
struct tipc_name -> struct tipc_service_addr
struct tipc_name_seq -> struct tipc_service_range
To avoid confusion, we also update some commmets and macro names to
match the new terminology.
For compatibility, we add macros that map all old names to the new ones.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the new RB tree structure for service ranges it becomes possible to
solve an old problem; - we can now allow overlapping service ranges in
the table.
When inserting a new service range to the tree, we use 'lower' as primary
key, and when necessary 'upper' as secondary key.
Since there may now be multiple service ranges matching an indicated
'lower' value, we must also add the 'upper' value to the functions
used for removing publications, so that the correct, corresponding
range item can be found.
These changes guarantee that a well-formed publication/withdrawal item
from a peer node never will be rejected, and make it possible to
eliminate the problematic backlog functionality we currently have for
handling such cases.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function tipc_nametbl_translate() function is ugly and hard to
follow. This can be improved somewhat by introducing a stack variable
for holding the publication list to be used and re-ordering the if-
clauses for selection of algorithm.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current design of the binding table has an unnecessary memory
consuming and complex data structure. It aggregates the service range
items into an array, which is expanded by a factor two every time it
becomes too small to hold a new item. Furthermore, the arrays never
shrink when the number of ranges diminishes.
We now replace this array with an RB tree that is holding the range
items as tree nodes, each range directly holding a list of bindings.
This, along with a few name changes, improves both readability and
volume of the code, as well as reducing memory consumption and hopefully
improving cache hit rate.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov says:
====================
net: bridge: MTU handling changes
As previously discussed the recent changes break some setups and could lead
to packet drops. Thus the first patch reverts the behaviour for the bridge
to follow the minimum MTU but also keeps the ability to set the MTU to the
maximum (out of all ports) if vlan filtering is enabled. Patch 02 is the
bigger change in behaviour - we've always had trouble when configuring
bridges and their MTU which is auto tuning on port events
(add/del/changemtu), which means config software needs to chase it and fix
it after each such event, after patch 02 we allow the user to configure any
MTU (ETH_MIN/MAX limited) but once that is done the bridge stops auto
tuning and relies on the user to keep the MTU correct.
This should be compatible with cases that don't touch the MTU (or set it
to the same value), while allowing to configure the MTU and not worry
about it changing afterwards.
The patches are intentionally split like this, so that if they get accepted
and there are any complaints patch 02 can be reverted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As Roopa noted today the biggest source of problems when configuring
bridge and ports is that the bridge MTU keeps changing automatically on
port events (add/del/changemtu). That leads to inconsistent behaviour
and network config software needs to chase the MTU and fix it on each
such event. Let's improve on that situation and allow for the user to
set any MTU within ETH_MIN/MAX limits, but once manually configured it
is the user's responsibility to keep it correct afterwards.
In case the MTU isn't manually set - the behaviour reverts to the
previous and the bridge follows the minimum MTU.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently the bridge was changed to automatically set maximum MTU on port
events (add/del/changemtu) when vlan filtering is enabled, but that
actually changes behaviour in a way which breaks some setups and can lead
to packet drops. In order to still allow that maximum to be set while being
compatible, we add the ability for the user to tune the bridge MTU up to
the maximum when vlan filtering is enabled, but that has to be done
explicitly and all port events (add/del/changemtu) lead to resetting that
MTU to the minimum as before.
Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Lomovtsev says:
====================
net: thunderx: implement DMAC filtering support
By default CN88XX BGX accepts all incoming multicast and broadcast
packets and filtering is disabled. The nic driver doesn't provide
an ability to change such behaviour.
This series is to implement DMAC filtering management for CN88XX
nic driver allowing user to enable/disable filtering and configure
specific MAC addresses to filter traffic.
Changes from v1:
build issues:
- update code in order to address compiler warnings;
checkpatch.pl reported issues:
- update code in order to fit 80 symbols length;
- update commit descriptions in order to fit 80 symbols length;
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ndo_set_rx_mode() is called from atomic context which causes
messages response timeouts while VF to PF communication via MSIx.
To get rid of that we're copy passed mc list, parse flags and queue
handling of kernel request to ordered workqueue.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel calls ndo_set_rx_mode() callback from atomic context which
causes messaging timeouts between VF and PF (as they’re implemented via
MSIx). So in order to handle ndo_set_rx_mode() we need to get rid of it.
This commit implements necessary workqueue related structures to let VF
queue kernel request processing in non-atomic context later.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is to add message handling for ndo_set_rx_mode()
callback at PF side.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel calls ndo_set_rx_mode() callback supplying it will all necessary
info, such as device state flags, multicast mac addresses list and so on.
Since we have only 128 bits to communicate with PF we need to initiate
several requests to PF with small/short operation each based on input data.
So this commit implements following PF messages codes along with new
data structures for them:
NIC_MBOX_MSG_RESET_XCAST to flush all filters configured for this
particular network interface (VF)
NIC_MBOX_MSG_ADD_MCAST to add new MAC address to DMAC filter registers
for this particular network interface (VF)
NIC_MBOX_MSG_SET_XCAST to apply filtering configuration to filter control
register
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ThunderX NIC could be partitioned to up to 128 VFs and thus
represented to system. Each VF is mapped to pair BGX:LMAC, and each of VF
is configured by kernel individually. Eventually the bunch of VFs could be
mapped onto same pair BGX:LMAC and thus could cause several multicast
filtering configuration requests to LMAC with the same MAC addresses.
This commit is to add ThunderX NIC BGX filtering manipulation routines.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ThunderX NIC has two Ethernet Interfaces (BGX) each of them could has
up to four Logical MACs configured. Each of BGX has 32 filters to be
configured for filtering ingress packets. The number of filters available
to particular LMAC is from 8 (if we have four LMACs configured per BGX)
up to 32 (in case of only one LMAC is configured per BGX).
At the same time the NIC could present up to 128 VFs to OS as network
interfaces, each of them kernel will configure with set of MAC addresses
for filtering. So to prevent dupes in BGX filter registers from different
network interfaces it is required to cache and track all filter
configuration requests prior to applying them onto BGX filter registers.
This commit is to update LMAC structures with control fields to
allocate/releasing filters tracking list along with implementing
dmac array allocate/release per LMAC.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ThunderX NIC has set of registers which allows to configure
filter policy for ingress packets. There are three possible regimes
of filtering multicasts, broadcasts and unicasts: accept all, reject all
and accept filter allowed only.
Current implementation has enum with all of them and two generic macro
for enabling filtering et all (CAM_ACCEPT) and enabling/disabling
broadcast packets, which also should be corrected in order to represent
register bits properly. All these values are private for driver and
there is no need to ‘publish’ them via header file.
This commit is to move filtering register manipulation values from
header file into source with explicit assignment of exact register
values to them to be used while register configuring.
Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Blumenstingl says:
====================
Meson8m2 support for dwmac-meson8b
The Meson8m2 SoC is an updated version of the Meson8 SoC. Some of the
peripherals are shared with Meson8b (for example the watchdog registers
and the internal temperature sensor calibration procedure).
Meson8m2 also seems to include the same Gigabit MAC register layout as
Meson8b.
The registers in the Amlogic dwmac "glue" seem identical between Meson8b
and Meson8m2. Manual testing seems to confirm this.
To be extra-safe a new compatible string is added because there's no
(public) documentation on the Meson8m2 SoC. This will allow us to
implement any SoC-specific variations later on (if needed).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The Meson8m2 SoC uses a similar (potentially even identical) register
layout as the Meson8b and GXBB SoCs for the dwmac glue.
Add a new compatible string and update the module description to
indicate support for these SoCs.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Meson8m2 SoC uses a similar (potentially even identical) register
layout for the dwmac glue as Meson8b and GXBB. Unfortunately there is no
documentation available.
Testing shows that both, RMII and RGMII PHYs are working if they are
configured as on Meson8b. Add a new compatible string to the
documentation so differences (if there are any) between Meson8m2 and the
other SoCs can be taken care of within the driver.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using the -i feature to generate random ID numbers for test
cases in tdc, the function that writes the JSON to file doesn't
add a newline character to the end of the file, so we have to
add our own.
Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit has fix for RX traffic issues when we stress test the driver
with continuous ifconfig up/down under very high traffic conditions.
Reason for the issue is that, in existing liquidio_stop function NAPI is
disabled even before actual FW/HW interface is brought down via
send_rx_ctrl_cmd(lio, 0). Between time frame of NAPI disable and actual
interface down in firmware, firmware continuously enqueues rx traffic to
host. When interrupt happens for new packets, host irq handler fails in
scheduling NAPI as the NAPI is already disabled.
After "ifconfig <iface> up", Host re-enables NAPI but cannot schedule it
until it receives another Rx interrupt. Host never receives Rx interrupt as
it never cleared the Rx interrupt it received during interface down
operation. NIC Rx interrupt gets cleared only when Host processes queue and
clears the queue counts. Above anomaly leads to other issues like packet
overflow in FW/HW queues, backpressure.
Fix:
This commit fixes this issue by disabling NAPI only after informing
firmware to stop queueing packets to host via send_rx_ctrl_cmd(lio, 0).
send_rx_ctrl_cmd is not visible in the patch as it is already there in the
code. The DOWN command also waits for any pending packets to be processed
by NAPI so that the deadlock will not occur.
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Acked-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2018-03-29
An update from ieee802154 for *net-next*
Colin fixed a unused variable in the new mcr20a driver.
Harry fixed an unitialised data read in the debugfs interface of the
ca8210 driver.
If there are any issues or you think these are to late for -rc1 (both can also
go into -rc2 as they are simple fixes) let me know.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The freescale.com address will no longer be available.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new compatible string for the RZ/G1C (R8A77470) SoC.
Signed-off-by: Biju Das <biju.das@bp.renesas.com>
Reviewed-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu says:
====================
Fix TX Timeout and implement Safety Features
Fix the TX Timeout handler to correctly reconfigure the whole system and
start implementing features for DWMAC5 cores, specifically the Safety
Features.
Changes since v1:
- Display error stats in ethtool
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds initial suport for DWMAC5 and implements the Automotive Safety
Package which is available from core version 5.10.
The Automotive Safety Pacakge (also called Safety Features) offers us
with error protection in the core by implementing ECC Protection in
memories, on-chip data path parity protection, FSM parity and timeout
protection and Application/CSR interface timeout protection.
In case of an uncorrectable error we call stmmac_global_err() and
reconfigure the whole core.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently TX Timeout handler does not behaves as expected and leads to
an unrecoverable state. Rework current implementation of TX Timeout
handling to actually perform a complete reset of the driver state and IP.
We use deferred work to init a task which will be responsible for
resetting the system.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The style of the rx/tx queue's *_coal member assignment is:
static void foo_coal_set(...)
{
set the coal in hw;
update queue's foo_coal member; [1]
}
In other place, we call foo_coal_set(pp, queue->foo_coal), so the above [1]
is duplicated and could be removed.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi says:
====================
do not allow adding routes if disable_ipv6 is enabled
Do not allow userspace to add static ipv6 routes if disable_ipv6 is enabled.
Update disable_ipv6 documentation according to that change
Changes since v1:
- added an extack message telling the user that IPv6 is disabled on the nexthop
device
- rebased on-top of net-next
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Clarify that when disable_ipv6 is enabled even the ipv6 routes
are deleted for the selected interface and from now it will not
be possible to add addresses/routes to that interface
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not allow setting ipv6 routes from userspace if disable_ipv6 has been
enabled. The issue can be triggered using the following reproducer:
- sysctl net.ipv6.conf.all.disable_ipv6=1
- ip -6 route add a🅱️c:d::/64 dev em1
- ip -6 route show
a🅱️c:d::/64 dev em1 metric 1024 pref medium
Fix it checking disable_ipv6 value in ip6_route_info_create routine
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. This batch comes with more input sanitization for xtables to
address bug reports from fuzzers, preparation works to the flowtable
infrastructure and assorted updates. In no particular order, they are:
1) Make sure userspace provides a valid standard target verdict, from
Florian Westphal.
2) Sanitize error target size, also from Florian.
3) Validate that last rule in basechain matches underflow/policy since
userspace assumes this when decoding the ruleset blob that comes
from the kernel, from Florian.
4) Consolidate hook entry checks through xt_check_table_hooks(),
patch from Florian.
5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
very large compat offset arrays, so we have a reasonable upper limit
and fuzzers don't exercise the oom-killer. Patches from Florian.
6) Several WARN_ON checks on xtables mutex helper, from Florian.
7) xt_rateest now has a hashtable per net, from Cong Wang.
8) Consolidate counter allocation in xt_counters_alloc(), from Florian.
9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
from Xin Long.
10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
Felix Fietkau.
11) Consolidate code through flow_offload_fill_dir(), also from Felix.
12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
to remove a dependency with flowtable and ipv6.ko, from Felix.
13) Cache mtu size in flow_offload_tuple object, this is safe for
forwarding as f87c10a8aa describes, from Felix.
14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
modular infrastructure, from Felix.
15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
Ahmed Abdelsalam.
16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.
17) Support for counting only to nf_conncount infrastructure, patch
from Yi-Hung Wei.
18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
to nft_ct.
19) Use boolean as return value from ipt_ah and from IPVS too, patch
from Gustavo A. R. Silva.
20) Remove useless parameters in nfnl_acct_overquota() and
nf_conntrack_broadcast_help(), from Taehee Yoo.
21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.
22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.
23) Fix typo in xt_limit, from Geert Uytterhoeven.
24) Do no use VLAs in Netfilter code, again from Gustavo.
25) Use ADD_COUNTER from ebtables, from Taehee Yoo.
26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.
27) Use pr_*() and add pr_fmt(), from Arushi Singhal.
28) Add synproxy support to ctnetlink.
29) ICMP type and IGMP matching support for ebtables, patches from
Matthias Schiffer.
30) Support for the revision infrastructure to ebtables, from
Bernie Harris.
31) String match support for ebtables, also from Bernie.
32) Documentation for the new flowtable infrastructure.
33) Use generic comparison functions in ebt_stp, from Joe Perches.
34) Demodularize filter chains in nftables.
35) Register conntrack hooks in case nftables NAT chain is added.
36) Merge assignments with return in a couple of spots in the
Netfilter codebase, also from Arushi.
37) Document that xtables percpu counters are stored in the same
memory area, from Ben Hutchings.
38) Revert mark_source_chains() sanity checks that break existing
rulesets, from Florian Westphal.
39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai says:
====================
Close race between {un, }register_netdevice_notifier and pernet_operations
the problem is {,un}register_netdevice_notifier() do not take
pernet_ops_rwsem, and they don't see network namespaces, being
initialized in setup_net() and cleanup_net(), since at this
time net is not hashed to net_namespace_list.
This may lead to imbalance, when a notifier is called at time of
setup_net()/net is alive, but it's not called at time of cleanup_net(),
for the devices, hashed to the net, and vise versa. See (3/3) for
the scheme of imbalance.
This patchset fixes the problem by acquiring pernet_ops_rwsem
at the time of {,un}register_netdevice_notifier() (3/3).
(1-2/3) are preparations in xfrm and netfilter subsystems.
The problem was introduced a long ago, but backporting won't be easy,
since every previous kernel version may have changes in netdevice
notifiers, and they all need review and testing. Otherwise, there
may be more pernet_operations, which register or unregister
netdevice notifiers, and that leads to deadlock (which is was fixed
in 1-2/3). This patchset is for net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
{un,}register_netdevice_notifier() iterate over all net namespaces
hashed to net_namespace_list. But pernet_operations register and
unregister netdevices in unhashed net namespace, and they are not
seen for netdevice notifiers. This results in asymmetry:
1)Race with register_netdevice_notifier()
pernet_operations::init(net) ...
register_netdevice() ...
call_netdevice_notifiers() ...
... nb is not called ...
... register_netdevice_notifier(nb) -> net skipped
... ...
list_add_tail(&net->list, ..) ...
Then, userspace stops using net, and it's destructed:
pernet_operations::exit(net)
unregister_netdevice()
call_netdevice_notifiers()
... nb is called ...
This always happens with net::loopback_dev, but it may be not the only device.
2)Race with unregister_netdevice_notifier()
pernet_operations::init(net)
register_netdevice()
call_netdevice_notifiers()
... nb is called ...
Then, userspace stops using net, and it's destructed:
list_del_rcu(&net->list) ...
pernet_operations::exit(net) unregister_netdevice_notifier(nb) -> net skipped
dev_change_net_namespace() ...
call_netdevice_notifiers()
... nb is not called ...
unregister_netdevice()
call_netdevice_notifiers()
... nb is not called ...
This race is more danger, since dev_change_net_namespace() moves real
network devices, which use not trivial netdevice notifiers, and if this
will happen, the system will be left in unpredictable state.
The patch closes the race. During the testing I found two places,
where register_netdevice_notifier() is called from pernet init/exit
methods (which led to deadlock) and fixed them (see previous patches).
The review moved me to one more unusual registration place:
raw_init() (can driver). It may be a reason of problems,
if someone creates in-kernel CAN_RAW sockets, since they
will be destroyed in exit method and raw_release()
will call unregister_netdevice_notifier(). But grep over
kernel tree does not show, someone creates such sockets
from kernel space.
Theoretically, there can be more places like this, and which are
hidden from review, but we found them on the first bumping there
(since there is no a race, it will be 100% reproducible).
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register netdevice notifier for every iptable entry
is not good, since this breaks modularity, and
the hidden synchronization is based on rtnl_lock().
This patch reworks the synchronization via new lock,
while the rest of logic remains as it was before.
This is required for the next patch.
Tested via:
while :; do
unshare -n iptables -t mangle -A OUTPUT -j TEE --gateway 1.1.1.2 --oif lo;
done
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, driver registers it from pernet_operations::init method,
and this breaks modularity, because initialization of net namespace
and netdevice notifiers are orthogonal actions. We don't have
per-namespace netdevice notifiers; all of them are global for all
devices in all namespaces.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Looijmans says:
====================
of_net: Implement of_get_nvmem_mac_address helper
Posted this as a small set now, with an (optional) second patch that shows
how the changes work and what I've used to test the code on a Topic Miami board.
I've taken the liberty to add appropriate "Acked" and "Review" tags.
v4: Replaced "6" with ETH_ALEN
v3: Add patch that implements mac in nvmem for the Cadence MACB controller
Remove the integrated of_get_mac_address call
v2: Use of_nvmem_cell_get to avoid needing the assiciated device
Use void* instead of char*
Add devicetree binding doc
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Call of_get_nvmem_mac_address() to fetch the MAC address from an nvmem
cell, if one is provided in the device tree. This allows the address to
be stored in an I2C EEPROM device for example.
Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's common practice to store MAC addresses for network interfaces into
nvmem devices. However the code to actually do this in the kernel lacks,
so this patch adds of_get_nvmem_mac_address() for drivers to obtain the
address from an nvmem cell provider.
This is particulary useful on devices where the ethernet interface cannot
be configured by the bootloader, for example because it's in an FPGA.
Signed-off-by: Mike Looijmans <mike.looijmans@topic.nl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: flower: handle MTU changes
This set improves MTU handling for flower offload. The max MTU is
correctly capped and physical port MTU is communicated to the FW
(and indirectly HW).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Trigger a port mod message to request an MTU change on the NIC when any
physical port representor is assigned a new MTU value. The driver waits
10 msec for an ack that the FW has set the MTU. If no ack is received the
request is rejected and an appropriate warning flagged.
Rather than maintain an MTU queue per repr, one is maintained per app.
Because the MTU ndo is protected by the rtnl lock, there can never be
contention here. Portmod messages from the NIC are also protected by
rtnl so we first check if the portmod is an ack and, if so, handle outside
rtnl and the cmsg work queue.
Acks are detected by the marking of a bit in a portmod response. They are
then verfied by checking the port number and MTU value expected by the
app. If the expected MTU is 0 then no acks are currently expected.
Also, ensure that the packet headroom reserved by the flower firmware is
considered when accepting an MTU change on any repr.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>