ugeth_quiesce/activate are used to halt the controller when there is a
link change that requires to reconfigure the mac.
The previous implementation called netif_device_detach(). This however
causes the initial activation of the netdevice to fail precisely because
it's detached. For details, see [1].
A possible workaround was the revert of commit
net: linkwatch: add check for netdevice being present to linkwatch_do_dev
However, the check introduced in the above commit is correct and shall be
kept.
The netif_device_detach() is thus replaced with
netif_tx_stop_all_queues() that prevents any tranmission. This allows to
perform mac config change required by the link change, without detaching
the corresponding netdevice and thus not preventing its initial
activation.
[1] https://lists.openwall.net/netdev/2020/01/08/201
Signed-off-by: Valentin Longchamp <valentin@longchamp.me>
Acked-by: Matteo Ghidoni <matteo.ghidoni@ch.abb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit bdf6fa52f0 ("sctp: handle association restarts when the
socket is closed.") starts shutdown when an association is restarted,
if in SHUTDOWN-PENDING state and the socket is closed. However, the
rationale stated in that commit applies also when in SHUTDOWN-SENT
state - we don't want to move an association to ESTABLISHED state when
the socket has been closed, because that results in an association
that is unreachable from user space.
The problem scenario:
1. Client crashes and/or restarts.
2. Server (using one-to-one socket) calls close(). SHUTDOWN is lost.
3. Client reconnects using the same addresses and ports.
4. Server's association is restarted. The association and the socket
move to ESTABLISHED state, even though the server process has
closed its descriptor.
Also, after step 4 when the server process exits, some resources are
leaked in an attempt to release the underlying inet sock structure in
ESTABLISHED state:
IPv4: Attempt to release TCP socket in state 1 00000000377288c7
Fix by acting the same way as in SHUTDOWN-PENDING state. That is, if
an association is restarted in SHUTDOWN-SENT state and the socket is
closed, then start shutdown and don't move the association or the
socket to ESTABLISHED state.
Fixes: bdf6fa52f0 ("sctp: handle association restarts when the socket is closed.")
Signed-off-by: Jere Leppänen <jere.leppanen@nokia.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When rxhash is enabled on any ethernet port except the first in each CP
block, traffic flow is prevented. The analysis is below:
I've been investigating this afternoon, and what I've found, comparing
a kernel without 895586d5dc and with 895586d5dc applied is:
- The table programmed into the hardware via mvpp22_rss_fill_table()
appears to be identical with or without the commit.
- When rxhash is enabled on eth2, mvpp2_rss_port_c2_enable() reports
that c2.attr[0] and c2.attr[2] are written back containing:
- with 895586d5dc, failing: 00200000 40000000
- without 895586d5dc, working: 04000000 40000000
- When disabling rxhash, c2.attr[0] and c2.attr[2] are written back as:
04000000 00000000
The second value represents the MVPP22_CLS_C2_ATTR2_RSS_EN bit, the
first value is the queue number, which comprises two fields. The high
5 bits are 24:29 and the low three are 21:23 inclusive. This comes
from:
c2.attr[0] = MVPP22_CLS_C2_ATTR0_QHIGH(qh) |
MVPP22_CLS_C2_ATTR0_QLOW(ql);
So, the working case gives eth2 a queue id of 4.0, or 32 as per
port->first_rxq, and the non-working case a queue id of 0.1, or 1.
The allocation of queue IDs seems to be in mvpp2_port_probe():
if (priv->hw_version == MVPP21)
port->first_rxq = port->id * port->nrxqs;
else
port->first_rxq = port->id * priv->max_port_rxqs;
Where:
if (priv->hw_version == MVPP21)
priv->max_port_rxqs = 8;
else
priv->max_port_rxqs = 32;
Making the port 0 (eth0 / eth1) have port->first_rxq = 0, and port 1
(eth2) be 32. It seems the idea is that the first 32 queues belong to
port 0, the second 32 queues belong to port 1, etc.
mvpp2_rss_port_c2_enable() gets the queue number from it's parameter,
'ctx', which comes from mvpp22_rss_ctx(port, 0). This returns
port->rss_ctx[0].
mvpp22_rss_context_create() is responsible for allocating that, which
it does by looking for an unallocated priv->rss_tables[] pointer. This
table is shared amongst all ports on the CP silicon.
When we write the tables in mvpp22_rss_fill_table(), the RSS table
entry is defined by:
u32 sel = MVPP22_RSS_INDEX_TABLE(rss_ctx) |
MVPP22_RSS_INDEX_TABLE_ENTRY(i);
where rss_ctx is the context ID (queue number) and i is the index in
the table.
If we look at what is written:
- The first table to be written has "sel" values of 00000000..0000001f,
containing values 0..3. This appears to be for eth1. This is table 0,
RX queue number 0.
- The second table has "sel" values of 00000100..0000011f, and appears
to be for eth2. These contain values 0x20..0x23. This is table 1,
RX queue number 0.
- The third table has "sel" values of 00000200..0000021f, and appears
to be for eth3. These contain values 0x40..0x43. This is table 2,
RX queue number 0.
How do queue numbers translate to the RSS table? There is another
table - the RXQ2RSS table, indexed by the MVPP22_RSS_INDEX_QUEUE field
of MVPP22_RSS_INDEX and accessed through the MVPP22_RXQ2RSS_TABLE
register. Before 895586d5dc, it was:
mvpp2_write(priv, MVPP22_RSS_INDEX,
MVPP22_RSS_INDEX_QUEUE(port->first_rxq));
mvpp2_write(priv, MVPP22_RXQ2RSS_TABLE,
MVPP22_RSS_TABLE_POINTER(port->id));
and after:
mvpp2_write(priv, MVPP22_RSS_INDEX, MVPP22_RSS_INDEX_QUEUE(ctx));
mvpp2_write(priv, MVPP22_RXQ2RSS_TABLE, MVPP22_RSS_TABLE_POINTER(ctx));
Before the commit, for eth2, that would've contained '32' for the
index and '1' for the table pointer - mapping queue 32 to table 1.
Remember that this is queue-high.queue-low of 4.0.
After the commit, we appear to map queue 1 to table 1. That again
looks fine on the face of it.
Section 9.3.1 of the A8040 manual seems indicate the reason that the
queue number is separated. queue-low seems to always come from the
classifier, whereas queue-high can be from the ingress physical port
number or the classifier depending on the MVPP2_CLS_SWFWD_PCTRL_REG.
We set the port bit in MVPP2_CLS_SWFWD_PCTRL_REG, meaning that queue-high
comes from the MVPP2_CLS_SWFWD_P2HQ_REG() register... and this seems to
be where our bug comes from.
mvpp2_cls_oversize_rxq_set() sets this up as:
mvpp2_write(port->priv, MVPP2_CLS_SWFWD_P2HQ_REG(port->id),
(port->first_rxq >> MVPP2_CLS_OVERSIZE_RXQ_LOW_BITS));
val = mvpp2_read(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG);
val |= MVPP2_CLS_SWFWD_PCTRL_MASK(port->id);
mvpp2_write(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG, val);
Setting the MVPP2_CLS_SWFWD_PCTRL_MASK bit means that the queue-high
for eth2 is _always_ 4, so only queues 32 through 39 inclusive are
available to eth2. Yet, we're trying to tell the classifier to set
queue-high, which will be ignored, to zero. Hence, the queue-high
field (MVPP22_CLS_C2_ATTR0_QHIGH()) from the classifier will be
ignored.
This means we end up directing traffic from eth2 not to queue 1, but
to queue 33, and then we tell it to look up queue 33 in the RSS table.
However, RSS table has not been programmed for queue 33, and so it ends
up (presumably) dropping the packets.
It seems that mvpp22_rss_context_create() doesn't take account of the
fact that the upper 5 bits of the queue ID can't actually be changed
due to the settings in mvpp2_cls_oversize_rxq_set(), _or_ it seems that
mvpp2_cls_oversize_rxq_set() has been missed in this commit. Either
way, these two functions mutually disagree with what queue number
should be used.
Looking deeper into what mvpp2_cls_oversize_rxq_set() and the MTU
validation is doing, it seems that MVPP2_CLS_SWFWD_P2HQ_REG() is used
for over-sized packets attempting to egress through this port. With
the classifier having had RSS enabled and directing eth2 traffic to
queue 1, we may still have packets appearing on queue 32 for this port.
However, the only way we may end up with over-sized packets attempting
to egress through eth2 - is if the A8040 forwards frames between its
ports. From what I can see, we don't support that feature, and the
kernel restricts the egress packet size to the MTU. In any case, if we
were to attempt to transmit an oversized packet, we have no support in
the kernel to deal with that appearing in the port's receive queue.
So, this patch attempts to solve the issue by clearing the
MVPP2_CLS_SWFWD_PCTRL_MASK() bit, allowing MVPP22_CLS_C2_ATTR0_QHIGH()
from the classifier to define the queue-high field of the queue number.
My testing seems to confirm my findings above - clearing this bit
means that if I enable rxhash on eth2, the interface can then pass
traffic, as we are now directing traffic to RX queue 1 rather than
queue 33. Traffic still seems to work with rxhash off as well.
Reported-by: Matteo Croce <mcroce@redhat.com>
Tested-by: Matteo Croce <mcroce@redhat.com>
Fixes: 895586d5dc ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2020-05-22
The following pull-request contains BPF updates for your *net* tree.
We've added 3 non-merge commits during the last 3 day(s) which contain
a total of 5 files changed, 69 insertions(+), 11 deletions(-).
The main changes are:
1) Fix to reject mmap()'ing read-only array maps as writable since BPF verifier
relies on such map content to be frozen, from Andrii Nakryiko.
2) Fix breaking audit from secid_to_secctx() LSM hook by avoiding to use
call_int_hook() since this hook is not stackable, from KP Singh.
3) Fix BPF flow dissector program ref leak on netns cleanup, from Jakub Sitnicki.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The caller of devm_ioremap_resource(), either accidentally
or by wrong assumption, is writing back derived resource data
to global static resource initialization tables that should
have been constant. Meaning that after it computes the final
physical start address it saves the address for no reason
in the static tables. This doesn't affect the first driver
probing after reboot, but it breaks consecutive driver reloads
(i.e. driver unbind & bind) because the initialization tables
no longer have the correct initial values. So the next probe()
will map the device registers to wrong physical addresses,
causing ARM SError async exceptions.
This patch fixes all of the above.
Fixes: 5605194877 ("net: dsa: ocelot: add driver for Felix switch family")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is some ambiguity in the RFC as to whether the ADD_ADDR HMAC is
the rightmost 64 bits of the entire hash or of the leftmost 160 bits
of the hash. The intention, as clarified with the author of the RFC,
is the entire hash.
This change returns the entire hash from
mptcp_crypto_hmac_sha (instead of only the first 160 bits), and moves
any truncation/selection operation on the hash to the caller.
Fixes: 12555a2d97 ("mptcp: use rightmost 64 bits in ADD_ADDR HMAC")
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Todd Malsbary <todd.malsbary@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7IByoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpigmEACWJoK7zk5OK2RhavpzOsb2SDu8nz/YAbUe
+R6tXAjwe4Z7lVVa+FW/fmN9/mQcjyYRIbbG564IFs5fe6hPUoOjUzHqvGTOFLHd
Fw8mjVKgWjWAE5GdoX6ATauLhVwwjnImej1PNfO/J5y29o0SQksP8MbM0eGuuNx1
piqxBj0/3h3YyPn1GeJmqxwwcsFhzHDqk7fbkfbQokZk+7SPiKpqWgJBa7AKSlNC
N0WTluT4UOummQZw1RFynPfA4cCuX6XHVgWAa9h7vrJHXigvuMWqLaHG+MBFqeKu
xD6PPnaCnMwcLRe4T2sJvtjxmNSdyr15Q2kGkIi/RhohSIn4u/y8jEA6wTprCP48
rDi30dn1o2LwUj2S1NO3YCOV8jIKWUguztEvKiAXmjf4KDZIDd4/OwrFsJdb4vg9
EuK86SEwXbvFHf9nu1M7pHlGThKfQi0CiK6C6M7Qb/kOthio72wwZ46gGkwLDk5z
DZWHymHBhQw/z1c20loX7pBvFIzLzbuYUThf23UegPzXVqqQfBkqs4BGFcOGuqy6
yfEYF/MAX/O/TQgm2dDQHrhl05AevLu/UQXMXZ8Ha6OrmlC4C2qu3Te/iZO8FUew
YIx5H5XmBh93McjpmJ8VCn7CjE+y/ufNTMdvm8WzCyAIfH40gfcyLangpre26QoJ
CCAARffXrQ==
=ZYUy
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.7-2020-05-22' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A small collection of small fixes that should go into this release:
- Two fixes for async request preparation (Pavel)
- Busy clear fix for SQPOLL (Xiaoguang)
- Don't use kiocb->private for O_DIRECT buf index, some file systems
use it (Bijan)
- Kill dead check in io_splice()
- Ensure sqo_wait is initialized early
- Cancel task_work if we fail adding to original process
- Only add (IO)pollable requests to iopoll list, fixing a regression
in this merge window"
* tag 'io_uring-5.7-2020-05-22' of git://git.kernel.dk/linux-block:
io_uring: reset -EBUSY error when io sq thread is waken up
io_uring: don't add non-IO requests to iopoll pending list
io_uring: don't use kiocb.private to store buf_index
io_uring: cancel work if task_work_add() fails
io_uring: remove dead check in io_splice()
io_uring: fix FORCE_ASYNC req preparation
io_uring: don't prepare DRAIN reqs twice
io_uring: initialize ctx->sqo_wait earlier
This tag contains two fixes:
* Another !MMU build fix that was a straggler from last week.
* A fix to use the "register" keyword for the GP global register variable.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl7H9tYTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiewuEACejtJD1KbV4MoHSKw98WwtCPp7xceF
pm73DR5drm03T+dBIt0IQe9MnlAN6oMuPw1NvwHhJWX1ezvs1o2ZegZw7ZA5uczl
QFqHFCajzzRyF7C0KCukONYhGV67xrrapVfFMfytvdCTuhTjYUHQAjXHuciI2BiP
1VeGZzECj4nRAYFis9GVO652D3LRA+LZ0Op10IqsBHcV3Iia75GREj9IK34+Uwzq
Xe+pVh9GVBINz6WVX8rBcK67z76pMNnTX2hMoHWbunHobXeYnh/uYZ79bcOrGFGc
g2KTTDpYqU7FWRBqmj/P6rqNrUFFStMsH79YGabr3kOT/yyYLrGf17oZtrp5oy92
68EMuM5DSO/2jRD8sHtW6gg1k7Kv1oo49TQlBb26ifTiOEGhCE8bD8rI4ADsx4N+
6Oj5P7NTfqgc++e8NMRES59MiQNUg40nsA0AP1omwD163LN8bZNmI5kZZnvazH4k
ViIGgwrPUKL8S1MxQ2S4maJhGdQ2UOsbPi99zQ+o6NxR2Dxf7OoqsrJYzDnVS3f6
vmczgYshlXti6qhNbLmK8HIC4g0FzDg3CUpFvrezyWJJhmebyKRWGhUsaMLm9YDO
vwX7UnOv9VMKzvOVoS9iDoIXSFtt1dt4SxZ1rQCPJwCb0fqBc/QqVL1pBtSh4iM8
pL3DDCnOAgvr0Q==
=L8xn
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
"Two fixes:
- Another !MMU build fix that was a straggler from last week
- A fix to use the "register" keyword for the GP global register
variable"
* tag 'riscv-for-linus-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: gp_in_global needs register keyword
riscv: Fix print_vm_layout build error if NOMMU
- fix EFI framebuffer earlycon for wide fonts
- avoid filling screen_info with garbage if the EFI framebuffer is not
available
- fix a potential host tool build error due to a symbol clash on x86
- work around a EFI firmware bug regarding the binary format of the TPM
final events table
- fix a missing memory free by reworking the E820 table sizing routine to
not do the allocation in the first place
- add CPER parsing for firmware errors
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl7H3HIACgkQwjcgfpV0
+n1pEAgAjJfwDJmBcYhJzjX8WLnXPJiUmUH9d9tF1t3TlhF6c1G8auXU+Fyia4uI
ejRNw/N4+SXzM9yL+Z19PKBpQsPzQXgm2r9WTPVN5jTelUUI+jFZCH+pKC+TKRp1
/Tx/XIMifCw18gNXsjj6WJEeAyLoh4tb+6bwn7DlPO5cPrxX49LvPuQNMXybk2yi
KimdNKUry1wYpo/WpHqEdFq5//CLAWNkrL9UXlkANvQ6BJNIMI0kRIUC0MVsTMnE
BoCkBO93PdvqxOcnV3WTRvSFetb7qA59Jay62jLc26Myqc4t4pgVWojVm6RHLfZg
17btYACxICgF2mNTZYlKemEEqKPpzQ==
=mY5f
-----END PGP SIGNATURE-----
Merge tag 'efi-fixes-for-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent
Pull EFI fixes from Ard Biesheuvel:
"- fix EFI framebuffer earlycon for wide fonts
- avoid filling screen_info with garbage if the EFI framebuffer is not
available
- fix a potential host tool build error due to a symbol clash on x86
- work around a EFI firmware bug regarding the binary format of the TPM
final events table
- fix a missing memory free by reworking the E820 table sizing routine to
not do the allocation in the first place
- add CPER parsing for firmware errors"
Normally, show_trace_log_lvl() scans the stack, looking for text
addresses to print. In parallel, it unwinds the stack with
unwind_next_frame(). If the stack address matches the pointer returned
by unwind_get_return_address_ptr() for the current frame, the text
address is printed normally without a question mark. Otherwise it's
considered a breadcrumb (potentially from a previous call path) and it's
printed with a question mark to indicate that the address is unreliable
and typically can be ignored.
Since the following commit:
f1d9a2abff ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
... for inactive tasks, show_trace_log_lvl() prints *only* unreliable
addresses (prepended with '?').
That happens because, for the first frame of an inactive task,
unwind_get_return_address_ptr() returns the wrong return address
pointer: one word *below* the task stack pointer. show_trace_log_lvl()
starts scanning at the stack pointer itself, so it never finds the first
'reliable' address, causing only guesses to being printed.
The first frame of an inactive task isn't a normal stack frame. It's
actually just an instance of 'struct inactive_task_frame' which is left
behind by __switch_to_asm(). Now that this inactive frame is actually
exposed to callers, fix unwind_get_return_address_ptr() to interpret it
properly.
Fixes: f1d9a2abff ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
- Annotate variable assignment in get_user() with the type to avoid
sparse warnings.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl7H+w0ACgkQa9axLQDI
XvEiVQ//azhBez3lTHeEgh2p45d75Na8HrrHjGGvQ9lhvtMBisrh/N7fp41Fl1vY
2nG3YJqUY7SZJG2EZkF7OQOMHUjpGE3kHJX3YAn6dCk2RzNDe1TCXKrP0r7xkbLh
OW3MpBsUbGksR8THYJsZ+GiTLZ/BrnZJjpJxcMHNwxnNZeVKT1OY4nJSebUo5Rdf
4lm1jMuSiKuHHfkucp1UsGe3QQxsWYpFahjMgU/iCm3BhlzWvocV8NNG6na10SIb
unT4+sXmFM/0pzVFBSdObmeEik4wSibkKh76A8z0SROMkPCtigu2n9RdDtii22ia
uiixQ6CkEYaXaGBXFqjE50dKkMCH0EYpZiEPp22QGHkLRSKI6ZZi0wvQmTvYNLMI
3a6Ptw0u9Ym/MH17HmjCiBuEfWOQeNhOM8YRcMqhHoguTz6Xj/mglbX8DDC37YAw
MnGNH0a/uhBy35eujgW92b2wvNMWcYnuRSiSX8/PSsG3Hxe36nWTI8y20dQ88wAR
wTqzNExmWDQqbTmaSAM0gGfDPkaSiiY+WDAO2FdB/jtovmefB/RBXVmwqbjBsigX
OtXuyF/rVitZ6S3SXal5YBT4xey/MI4NRbnwCm4edXyz+Hz7cDM2iQwDBaiaVZj8
4QHn+TqJTUKWeHRfLuMzg2a5UbCnEZGVBNEPtVxx9bSGqjQ2o+Q=
=HdPI
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Bring the PTRACE_SYSEMU semantics in line with the man page.
- Annotate variable assignment in get_user() with the type to avoid
sparse warnings.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Add get_user() type annotation on the !access_ok() path
arm64: Fix PTRACE_SYSEMU semantics
Just a few small fixes: the only significant one is a slight
improvement for PCM running position update with no-period-elapsed
case while the rest are HD-audio fixups and ice1712 model quirk.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAl7H5JwOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/yUQ//VnGdWfwTLteuJ1Anwym1HAgjihycdRBymGJV
YA0hx+wBRn9QAjBlBh/qlQSyWMLFqrh0luQS5WuctDvgXDsF0LsMy9IcZqOr1dS5
wJA4deEoUgU99obgGGyjc+6mDwINKdt2WaTT9q82C+2AHjhOZl+XBvwUuODTTr+l
gGgRQA1J8YmIIWrsuZy7LaarvqwdPqk4RLH/fVGDW9b5Q3lVaUyEQ7Aw42iZTLIq
A+wGqZJ7EkylbGJM/t/C98fku4kW3OYob1tzm9n65N5JGdyq83Gvz1/VcwWmtUTz
LMfPutFGHHHxW5Kkn9M7KEyMzoKr4XPUfijrxbFzZq6WvWRO/AHcUc6FWyBS3Zgf
x+Xo2C02LDO7p8sv9mFjDQhYNp6M6IAFN+ATg7hZBO/MJL9Ed6tCnGqvAG+8X2PB
2FCoIk/btCxCFv/1BKGWJy6WjRqQ865WONSSsFpym+JHlkfmi36tw/8RPvSwninM
32O6KG7DXi2khbH1b7NEOmmLjw+PMWvNy2CJIKu1l7DaGaUNkJZO9uXO5Q7y45nb
otKTX9TOmlus4mESZqYeBQcIlHN4yKChGazTaIu6bxLIBWdBSuoNs/PXmIQ3on6S
qr9OQ/DM/i/Ce/hON3cNII6bVmcbjk+CqHmyYKVtMxOiUXxoi7Nd5pFvefqL44UF
0ZeM7EM=
=8gyV
-----END PGP SIGNATURE-----
Merge tag 'sound-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few small fixes: the only significant one is a slight
improvement for PCM running position update with no-period-elapsed
case while the rest are HD-audio fixups and ice1712 model quirk"
* tag 'sound-5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Add more fixup entries for Clevo machines
ALSA: iec1712: Initialize STDSP24 properly when using the model=staudio option
ALSA: hda/realtek - Fix silent output on Gigabyte X570 Aorus Xtreme
ALSA: pcm: fix incorrect hw_base increase
Sparse reports "Using plain integer as NULL pointer" when the arm64
__get_user_error() assigns 0 to a pointer type. Use proper type
annotation.
Signed-of-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: kbuild test robot <lkp@intel.com>
Link: http://lkml.kernel.org/r/20200522142321.GP23230@ZenIV.linux.org.uk
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
A revert of a recent change to the PTE bits for 32-bit BookS, which broke swap.
And a "fix" to disable STRICT_KERNEL_RWX for 64-bit in Kconfig, as it's causing
crashes for some people.
Thanks to:
Christophe Leroy, Rui Salvaterra.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7H1aMTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG3sD/9L72cw3cI2TcDTw+OEv2S2TyXrZZc3
09WyUeerEv8mK8pk1eH8jVk6BqQK9bVZaq/3zYxLr5vnL1m+CZ4Qr8eGy5AcV3AC
HXiGKhLbEh4btuoN3NwJ6fvEzA85dMTWsowGpgW8JgX1o7rtJmro0XW9EndhZGd2
WWMBDsWo+RaKODej0c0Bz3TAOVgvxalE1SSLq63Q1sRoPhAZAJ0l8K3ED/EgC+tb
v/VUi3fQNJngIzlMBc0sNOPp7NgcnDXoozAkW5c2Bp7YURbzeU0oXmsMAxQnyzee
MP4MY1fAHI3CYdQ7QVRRDpQsTc84bAXVD+te+zhUJejaNm3mWLojRVieYT98eZXi
iCi4Q0aSuAh3H8rxaYgi9ZemUkSKn+5pLu4kIAyMkBtnTB50E1YqUXVxfPcqk48N
Y3Fkd6AyZ2/HyxS3bBVAubT/+GxK8HgQNGUBaF7iS50QKd6fl8EKjEBK1tVbYrTj
xH7lXJpBnLCIj2ygZE1mBLxG8UTLGTfdnpxVNfVkNsLZK4tdsMaQ/llOzVA1uBOY
twaRAhJkC0RHKHak1KNIQ8gh6HPjqwfg+P6SXHvT347YlTbsKgZei9wHtnZy4lsD
CAnSImfgJMbzXCoULSoQbgXW0PloRZ1Zz1+WdfxmNjcNsRSqBNoaS1CaPKr7f8to
a5JEWrUY1D49YQ==
=yBu+
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- a revert of a recent change to the PTE bits for 32-bit BookS, which
broke swap.
- a "fix" to disable STRICT_KERNEL_RWX for 64-bit in Kconfig, as it's
causing crashes for some people.
Thanks to Christophe Leroy and Rui Salvaterra.
* tag 'powerpc-5.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Disable STRICT_KERNEL_RWX
Revert "powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits."
DMA transfers to and from the SD card stall for 10 seconds and run into
timeout on RTS5260 card readers after ASPM was enabled.
Adding a short msleep after disabling ASPM fixes the issue on several
Dell Precision 7530/7540 systems I tested.
This function is only called when waking up after the chip went into
power-save after not transferring data for a few seconds. The added
msleep does therefore not change anything in data transfer speed or
induce any excessive waiting while data transfers are running, or the
chip is sleeping. Only the transition from sleep to active is affected.
Signed-off-by: Klaus Doth <kdlnx@doth.eu>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/4434eaa7-2ee3-a560-faee-6cee63ebd6d4@doth.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When attaching a flow dissector program to a network namespace with
bpf(BPF_PROG_ATTACH, ...) we grab a reference to bpf_prog.
If netns gets destroyed while a flow dissector is still attached, and there
are no other references to the prog, we leak the reference and the program
remains loaded.
Leak can be reproduced by running flow dissector tests from selftests/bpf:
# bpftool prog list
# ./test_flow_dissector.sh
...
selftests: test_flow_dissector [PASS]
# bpftool prog list
4: flow_dissector name _dissect tag e314084d332a5338 gpl
loaded_at 2020-05-20T18:50:53+0200 uid 0
xlated 552B jited 355B memlock 4096B map_ids 3,4
btf_id 4
#
Fix it by detaching the flow dissector program when netns is going away.
Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20200521083435.560256-1-jakub@cloudflare.com
In the function devm_platform_ioremap_resource(), if get resource
failed, the return value is ERR_PTR() not NULL. Thus it must be
replaced by IS_ERR(), or else it may result in crashes if a critical
error path is encountered.
Fixes: 0ce5ebd24d ("mfd: ioc3: Add driver for SGI IOC3 chip")
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case we can't find a ->dumpit callback for the requested
(family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're
in the same situation as if userspace had requested a PF_UNSPEC
dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all
the registered RTM_GETROUTE handlers.
The requested table id may or may not exist for all of those
families. commit ae677bbb44 ("net: Don't return invalid table id
error when dumping all families") fixed the problem when userspace
explicitly requests a PF_UNSPEC dump, but missed the fallback case.
For example, when we pass ipv6.disable=1 to a kernel with
CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y,
the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in
rtnl_dump_all, and listing IPv6 routes will unexpectedly print:
# ip -6 r
Error: ipv4: MR table does not exist.
Dump terminated
commit ae677bbb44 introduced the dump_all_families variable, which
gets set when userspace requests a PF_UNSPEC dump. However, we can't
simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the
fallback case to get dump_all_families == true, because some messages
types (for example RTM_GETRULE and RTM_GETNEIGH) only register the
PF_UNSPEC handler and use the family to filter in the kernel what is
dumped to userspace. We would then export more entries, that userspace
would have to filter. iproute does that, but other programs may not.
Instead, this patch removes dump_all_families and updates the
RTM_GETROUTE handlers to check if the family that is being dumped is
their own. When it's not, which covers both the intentional PF_UNSPEC
dumps (as dump_all_families did) and the fallback case, ignore the
missing table id error.
Fixes: cb167893f4 ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error with MPLS support the code is misusing AF_INET
instead of AF_MPLS.
Fixes: 1b69e7e6c4 ("ipip: support MPLS over IPv4")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vadim Fedorenko says:
====================
net/tls: fix encryption error path
The problem with data stream corruption was found in KTLS
transmit path with small socket send buffers and large
amount of data. bpf_exec_tx_verdict() frees open record
on any type of error including EAGAIN, ENOMEM and ENOSPC
while callers are able to recover this transient errors.
Also wrong error code was returned to user space in that
case. This patchset fixes the problems.
====================
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We cannot free record on any transient error because it leads to
losing previos data. Check socket error to know whether record must
be freed or not.
Fixes: d10523d0b3 ("net/tls: free the record on encryption error")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
bpf_exec_tx_verdict() can return negative value for copied
variable. In that case this value will be pushed back to caller
and the real error code will be lost. Fix it using signed type and
checking for positive value.
Fixes: d10523d0b3 ("net/tls: free the record on encryption error")
Fixes: d3b18ad31f ("tls: add bpf support to sk_msg handling")
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun says:
====================
net: ethernet: ti: fix some return value check
This patchset convert cpsw_ale_create() to return PTR_ERR() only, and
changed all the caller to check IS_ERR() instead of NULL.
Since v2:
1) rebased on net.git, as Jakub's suggest
2) split am65-cpsw-nuss.c changes, as Grygorii's suggest
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to using IS_ERR() instead of NULL test for cpsw_ale_create()
error handling. Also fix to return negative error code from this error
handling case instead of 0 in.
Fixes: 93a7653031 ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cpsw_ale_create() can return both NULL and PTR_ERR(), but all of
the caller only check NULL for error handling. This patch convert
it to only return PTR_ERR() in all error cases, and the caller using
IS_ERR() instead of NULL test.
Fixes: 4b41d34367 ("net: ethernet: ti: cpsw: allow untagged traffic on host port")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once the traversal of the list is completed with list_for_each_entry(),
the iterator (node) will point to an invalid object. So passing this to
qrtr_local_enqueue() which is outside of the iterator block is erroneous
eventhough the object is not used.
So fix this by passing NULL to qrtr_local_enqueue().
Fixes: bdabad3e36 ("net: Add Qualcomm IPC router")
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As ethnl_request_ops::reply_size handlers do not include common header
size into calculated/estimated reply size, it needs to be added in
ethnl_default_doit() and ethnl_default_notify() before allocating the
message. On the other hand, strset_reply_size() should not add common
header size.
Fixes: 728480f124 ("ethtool: default handlers for GET requests")
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Ubuntu and Fedora release new version used kernel version equal to or
higher than v5.4, They started to support kernel exfat filesystem.
Linus reported a mount error with new version of exfat on Fedora:
exfat: Unknown parameter 'namecase'
This is because there is a difference in mount option between old
staging/exfat and new exfat. And utf8, debug, and codepage options as
well as namecase have been removed from new exfat.
This patch add the dummy mount options as deprecated option to be
backward compatible with old one.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the implementation of aa_audit_rule_init(), when aa_label_parse()
fails the allocated memory for rule is released using
aa_audit_rule_free(). But after this release, the return statement
tries to access the label field of the rule which results in
use-after-free. Before releasing the rule, copy errNo and return it
after release.
Fixes: 52e8c38001 ("apparmor: Fix memory leak of rule on error exit path")
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
policy_update() invokes begin_current_label_crit_section(), which
returns a reference of the updated aa_label object to "label" with
increased refcount.
When policy_update() returns, "label" becomes invalid, so the refcount
should be decreased to keep refcount balanced.
The reference counting issue happens in one exception handling path of
policy_update(). When aa_may_manage_policy() returns not NULL, the
refcnt increased by begin_current_label_crit_section() is not decreased,
causing a refcnt leak.
Fix this issue by jumping to "end_section" label when
aa_may_manage_policy() returns not NULL.
Fixes: 5ac8c355ae ("apparmor: allow introspecting the loaded policy pre internal transform")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
aa_change_profile() invokes aa_get_current_label(), which returns
a reference of the current task's label.
According to the comment of aa_get_current_label(), the returned
reference must be put with aa_put_label().
However, when the original object pointed by "label" becomes
unreachable because aa_change_profile() returns or a new object
is assigned to "label", reference count increased by
aa_get_current_label() is not decreased, causing a refcnt leak.
Fix this by calling aa_put_label() before aa_change_profile() return
and dropping unnecessary aa_get_current_label().
Fixes: 9fcf78cca1 ("apparmor: update domain transitions that are subsets of confinement at nnp")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
The Intel kernel build robot recently pointed out that I missed the
register keyword on this one when I refactored the code to remove local
register variables (which aren't supported by LLVM). GCC's manual
indicates that global register variables must have the register keyword,
As far as I can tell lacking the register keyword causes GCC to ignore
the __asm__ and treat this as a regular variable, but I'm not sure how
that didn't show up as some sort of failure.
Fixes: 52e7c52d2d ("RISC-V: Stop relying on GCC's register allocator's hueristics")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Fix a couple of build warnings.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl6+tzkPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpV5QH/jx6Jj16Hzwy6YV9caV4QeySWgZrI3y8fWTK
YlKdzmBE3YNJDwdV6EM5lT6hmJNGf392cF8akGk339IemiYJaHPFLt409ubLvfhZ
ejo0zY7NStOd2DZJfPQdissME7bgiLRpNDvaXRofJwZ87yK7nSNbPWVYp0Jz0Rie
BFnx5XOSqyTkOovylHZajHfodl5eHtdAOYI1+6SZH6gA1YKrhdDqB0gdyejXg4EQ
Ijg0oiDovU/bLfvaF+8jZZJvNsy8mouFidF5NJhCzBewQwx49tl2tLVNOQP/PwGF
Yf8DN7zH8yw+hsUbruj5lFKILvY7Rn2RXhp7ikTaUPYnSKUcYgo=
=uvKy
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Fix a couple of build warnings"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: missing __user tags
vdpasim: remove unused variable 'ret'
Zoned block device specification do not define the behavior of
discard/trim command as this command is generally replaced by the reset
write pointer (zone reset) command. Emulate this in null_blk by making
zoned and discard options mutually exclusive.
Suggested-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Several strange crashes have been eventually traced back to
STRICT_KERNEL_RWX and its interaction with code patching.
Various paths in our ftrace, kprobes and other patching code need to
be hardened against patching failures, otherwise we can end up running
with partially/incorrectly patched ftrace paths, kprobes or jump
labels, which can then cause strange crashes.
Although fixes for those are in development, they're not -rc material.
There also seem to be problems with the underlying strict RWX logic,
which needs further debugging.
So for now disable STRICT_KERNEL_RWX on 64-bit to prevent people from
enabling the option and tripping over the bugs.
Fixes: 1e0fc9d1eb ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200520133605.972649-1-mpe@ellerman.id.au
In the function kobject_cleanup(), kobject_del(kobj) is
called before the kobj->release(). That makes it possible to
release the parent of the kobject before the kobject itself.
To fix that, adding function __kboject_del() that does
everything that kobject_del() does except release the parent
reference. kobject_cleanup() then calls __kobject_del()
instead of kobject_del(), and separately decrements the
reference count of the parent kobject after kobj->release()
has been called.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: 7589238a8c ("Revert "software node: Simplify software_node_release() function"")
Suggested-by: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20200513151840.36400-1-heikki.krogerus@linux.intel.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 21c27f0658 ("driver core: Fix SYNC_STATE_ONLY device link
implementation") didn't completely fix STATELESS + SYNC_STATE_ONLY
handling.
What looks like an optimization in that commit is actually a bug that
causes an if condition to always take the else path. This prevents
reordering of devices in the dpm_list when a DL_FLAG_STATELESS device
link is create on top of an existing DL_FLAG_SYNC_STATE_ONLY device
link.
Fixes: 21c27f0658 ("driver core: Fix SYNC_STATE_ONLY device link implementation")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20200520043626.181820-1-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes data remnant seen when we fail to reserve space for a
nexthop group during a larger dump.
If we fail the reservation, we goto nla_put_failure and
cancel the message.
Reproduce with the following iproute2 commands:
=====================
ip link add dummy1 type dummy
ip link add dummy2 type dummy
ip link add dummy3 type dummy
ip link add dummy4 type dummy
ip link add dummy5 type dummy
ip link add dummy6 type dummy
ip link add dummy7 type dummy
ip link add dummy8 type dummy
ip link add dummy9 type dummy
ip link add dummy10 type dummy
ip link add dummy11 type dummy
ip link add dummy12 type dummy
ip link add dummy13 type dummy
ip link add dummy14 type dummy
ip link add dummy15 type dummy
ip link add dummy16 type dummy
ip link add dummy17 type dummy
ip link add dummy18 type dummy
ip link add dummy19 type dummy
ip link add dummy20 type dummy
ip link add dummy21 type dummy
ip link add dummy22 type dummy
ip link add dummy23 type dummy
ip link add dummy24 type dummy
ip link add dummy25 type dummy
ip link add dummy26 type dummy
ip link add dummy27 type dummy
ip link add dummy28 type dummy
ip link add dummy29 type dummy
ip link add dummy30 type dummy
ip link add dummy31 type dummy
ip link add dummy32 type dummy
ip link set dummy1 up
ip link set dummy2 up
ip link set dummy3 up
ip link set dummy4 up
ip link set dummy5 up
ip link set dummy6 up
ip link set dummy7 up
ip link set dummy8 up
ip link set dummy9 up
ip link set dummy10 up
ip link set dummy11 up
ip link set dummy12 up
ip link set dummy13 up
ip link set dummy14 up
ip link set dummy15 up
ip link set dummy16 up
ip link set dummy17 up
ip link set dummy18 up
ip link set dummy19 up
ip link set dummy20 up
ip link set dummy21 up
ip link set dummy22 up
ip link set dummy23 up
ip link set dummy24 up
ip link set dummy25 up
ip link set dummy26 up
ip link set dummy27 up
ip link set dummy28 up
ip link set dummy29 up
ip link set dummy30 up
ip link set dummy31 up
ip link set dummy32 up
ip link set dummy33 up
ip link set dummy34 up
ip link set vrf-red up
ip link set vrf-blue up
ip link set dummyVRFred up
ip link set dummyVRFblue up
ip ro add 1.1.1.1/32 dev dummy1
ip ro add 1.1.1.2/32 dev dummy2
ip ro add 1.1.1.3/32 dev dummy3
ip ro add 1.1.1.4/32 dev dummy4
ip ro add 1.1.1.5/32 dev dummy5
ip ro add 1.1.1.6/32 dev dummy6
ip ro add 1.1.1.7/32 dev dummy7
ip ro add 1.1.1.8/32 dev dummy8
ip ro add 1.1.1.9/32 dev dummy9
ip ro add 1.1.1.10/32 dev dummy10
ip ro add 1.1.1.11/32 dev dummy11
ip ro add 1.1.1.12/32 dev dummy12
ip ro add 1.1.1.13/32 dev dummy13
ip ro add 1.1.1.14/32 dev dummy14
ip ro add 1.1.1.15/32 dev dummy15
ip ro add 1.1.1.16/32 dev dummy16
ip ro add 1.1.1.17/32 dev dummy17
ip ro add 1.1.1.18/32 dev dummy18
ip ro add 1.1.1.19/32 dev dummy19
ip ro add 1.1.1.20/32 dev dummy20
ip ro add 1.1.1.21/32 dev dummy21
ip ro add 1.1.1.22/32 dev dummy22
ip ro add 1.1.1.23/32 dev dummy23
ip ro add 1.1.1.24/32 dev dummy24
ip ro add 1.1.1.25/32 dev dummy25
ip ro add 1.1.1.26/32 dev dummy26
ip ro add 1.1.1.27/32 dev dummy27
ip ro add 1.1.1.28/32 dev dummy28
ip ro add 1.1.1.29/32 dev dummy29
ip ro add 1.1.1.30/32 dev dummy30
ip ro add 1.1.1.31/32 dev dummy31
ip ro add 1.1.1.32/32 dev dummy32
ip next add id 1 via 1.1.1.1 dev dummy1
ip next add id 2 via 1.1.1.2 dev dummy2
ip next add id 3 via 1.1.1.3 dev dummy3
ip next add id 4 via 1.1.1.4 dev dummy4
ip next add id 5 via 1.1.1.5 dev dummy5
ip next add id 6 via 1.1.1.6 dev dummy6
ip next add id 7 via 1.1.1.7 dev dummy7
ip next add id 8 via 1.1.1.8 dev dummy8
ip next add id 9 via 1.1.1.9 dev dummy9
ip next add id 10 via 1.1.1.10 dev dummy10
ip next add id 11 via 1.1.1.11 dev dummy11
ip next add id 12 via 1.1.1.12 dev dummy12
ip next add id 13 via 1.1.1.13 dev dummy13
ip next add id 14 via 1.1.1.14 dev dummy14
ip next add id 15 via 1.1.1.15 dev dummy15
ip next add id 16 via 1.1.1.16 dev dummy16
ip next add id 17 via 1.1.1.17 dev dummy17
ip next add id 18 via 1.1.1.18 dev dummy18
ip next add id 19 via 1.1.1.19 dev dummy19
ip next add id 20 via 1.1.1.20 dev dummy20
ip next add id 21 via 1.1.1.21 dev dummy21
ip next add id 22 via 1.1.1.22 dev dummy22
ip next add id 23 via 1.1.1.23 dev dummy23
ip next add id 24 via 1.1.1.24 dev dummy24
ip next add id 25 via 1.1.1.25 dev dummy25
ip next add id 26 via 1.1.1.26 dev dummy26
ip next add id 27 via 1.1.1.27 dev dummy27
ip next add id 28 via 1.1.1.28 dev dummy28
ip next add id 29 via 1.1.1.29 dev dummy29
ip next add id 30 via 1.1.1.30 dev dummy30
ip next add id 31 via 1.1.1.31 dev dummy31
ip next add id 32 via 1.1.1.32 dev dummy32
i=100
while [ $i -le 200 ]
do
ip next add id $i group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
echo $i
((i++))
done
ip next add id 999 group 1/2/3/4/5/6
ip next ls
========================
Fixes: ab84be7e54 ("net: Initial nexthop code")
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason A. Donenfeld says:
====================
wireguard fixes for 5.7-rc7
Hopefully these are the last fixes for 5.7:
1) A trivial bump in the selftest harness to support gcc-10.
build.wireguard.com is still on gcc-9 but I'll probably switch to
gcc-10 in the coming weeks.
2) A concurrency fix regarding userspace modifying the pre-shared key at
the same time as packets are being processed, reported by Matt
Dunwoodie.
3) We were previously clearing skb->hash on egress, which broke
fq_codel, cake, and other things that actually make use of the flow
hash for queueing, reported by Dave Taht and Toke Høiland-Jørgensen.
4) A fix for the increased memory usage caused by (3). This can be
thought of as part of patch (3), but because of the separate
reasoning and breadth of it I thought made it a bit cleaner to put in
a standalone commit.
Fixes (2), (3), and (4) are -stable material.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In "wireguard: queueing: preserve flow hash across packet scrubbing", we
were required to slightly increase the size of the receive replay
counter to something still fairly small, but an increase nonetheless.
It turns out that we can recoup some of the additional memory overhead
by splitting up the prior union type into two distinct types. Before, we
used the same "noise_counter" union for both sending and receiving, with
sending just using a simple atomic64_t, while receiving used the full
replay counter checker. This meant that most of the memory being
allocated for the sending counter was being wasted. Since the old
"noise_counter" type increased in size in the prior commit, now is a
good time to split up that union type into a distinct "noise_replay_
counter" for receiving and a boring atomic64_t for sending, each using
neither more nor less memory than required.
Also, since sometimes the replay counter is accessed without
necessitating additional accesses to the bitmap, we can reduce cache
misses by hoisting the always-necessary lock above the bitmap in the
struct layout. We also change a "noise_replay_counter" stack allocation
to kmalloc in a -DDEBUG selftest so that KASAN doesn't trigger a stack
frame warning.
All and all, removing a bit of abstraction in this commit makes the code
simpler and smaller, in addition to the motivating memory usage
recuperation. For example, passing around raw "noise_symmetric_key"
structs is something that really only makes sense within noise.c, in the
one place where the sending and receiving keys can safely be thought of
as the same type of object; subsequent to that, it's important that we
uniformly access these through keypair->{sending,receiving}, where their
distinct roles are always made explicit. So this patch allows us to draw
that distinction clearly as well.
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's important that we clear most header fields during encapsulation and
decapsulation, because the packet is substantially changed, and we don't
want any info leak or logic bug due to an accidental correlation. But,
for encapsulation, it's wrong to clear skb->hash, since it's used by
fq_codel and flow dissection in general. Without it, classification does
not proceed as usual. This change might make it easier to estimate the
number of innerflows by examining clustering of out of order packets,
but this shouldn't open up anything that can't already be inferred
otherwise (e.g. syn packet size inference), and fq_codel can be disabled
anyway.
Furthermore, it might be the case that the hash isn't used or queried at
all until after wireguard transmits the encrypted UDP packet, which
means skb->hash might still be zero at this point, and thus no hash
taken over the inner packet data. In order to address this situation, we
force a calculation of skb->hash before encrypting packet data.
Of course this means that fq_codel might transmit packets slightly more
out of order than usual. Toke did some testing on beefy machines with
high quantities of parallel flows and found that increasing the
reply-attack counter to 8192 takes care of the most pathological cases
pretty well.
Reported-by: Dave Taht <dave.taht@gmail.com>
Reviewed-and-tested-by: Toke Høiland-Jørgensen <toke@toke.dk>
Fixes: e7096c131e ("net: WireGuard secure network tunnel")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>