Resurrect ubsan overflow checks and ubsan report this warning,
fix it by change the variable [length] type to size_t.
UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19
2147479552 + 8567 cannot be represented in type 'int'
CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x214/0x230
show_stack+0x30/0x78
dump_stack_lvl+0xf8/0x118
dump_stack+0x18/0x30
ubsan_epilogue+0x18/0x60
handle_overflow+0xd0/0xf0
__ubsan_handle_add_overflow+0x34/0x44
__ip6_append_data.isra.48+0x1598/0x1688
ip6_append_data+0x128/0x260
udpv6_sendmsg+0x680/0xdd0
inet6_sendmsg+0x54/0x90
sock_sendmsg+0x70/0x88
____sys_sendmsg+0xe8/0x368
___sys_sendmsg+0x98/0xe0
__sys_sendmmsg+0xf4/0x3b8
__arm64_sys_sendmmsg+0x34/0x48
invoke_syscall+0x64/0x160
el0_svc_common.constprop.4+0x124/0x300
do_el0_svc+0x44/0xc8
el0_svc+0x3c/0x1e8
el0t_64_sync_handler+0x88/0xb0
el0t_64_sync+0x16c/0x170
Changes since v1:
-Change the variable [length] type to unsigned, as Eric Dumazet suggested.
Changes since v2:
-Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested.
Changes since v3:
-Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as
Jakub Kicinski suggested.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Similar to the handling of play_deferred in commit 19cfe912c3
("Bluetooth: btusb: Fix memory leak in play_deferred"), we thought
a patch might be needed here as well.
Currently usb_submit_urb is called directly to submit deferred tx
urbs after unanchor them.
So the usb_giveback_urb_bh would failed to unref it in usb_unanchor_urb
and cause memory leak.
Put those urbs in tx_anchor to avoid the leak, and also fix the error
handling.
Signed-off-by: Xiaohui Zhang <xiaohuizhang@ruc.edu.cn>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220607083230.6182-1-xiaohuizhang@ruc.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The transaction buffer is allocated by using the size of the packet buf,
and subtracting two which seem intended to remove the two tags which are
not present in the target structure. This calculation leads to under
counting memory because of differences between the packet contents and the
target structure. The aid_len field is a u8 in the packet, but a u32 in
the structure, resulting in at least 3 bytes always being under counted.
Further, the aid data is a variable length field in the packet, but fixed
in the structure, so if this field is less than the max, the difference is
added to the under counting.
The last validation check for transaction->params_len is also incorrect
since it employs the same accounting error.
To fix, perform validation checks progressively to safely reach the
next field, to determine the size of both buffers and verify both tags.
Once all validation checks pass, allocate the buffer and copy the data.
This eliminates freeing memory on the error path, as those checks are
moved ahead of memory allocation.
Fixes: 26fc6c7f02 ("NFC: st21nfca: Add HCI transaction event support")
Fixes: 4fbcc1a4cb ("nfc: st21nfca: Fix potential buffer overflows in EVT_TRANSACTION")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Faltesek <mfaltesek@google.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The first validation check for EVT_TRANSACTION has two different checks
tied together with logical AND. One is a check for minimum packet length,
and the other is for a valid aid_tag. If either condition is true (fails),
then an error should be triggered. The fix is to change && to ||.
Fixes: 26fc6c7f02 ("NFC: st21nfca: Add HCI transaction event support")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Faltesek <mfaltesek@google.com>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Despite these inline functions having full visibility to the compiler
at compile time, they still strip const from passed pointers.
This change allows for functions in various network drivers to be marked as
const that could not be marked const before.
Signed-off-by: Peter Lafreniere <pjlafren@mtu.edu>
Link: https://lore.kernel.org/r/20220606113458.35953-1-pjlafren@mtu.edu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because the caller (net/ipv6/seg6.c)
and the callee (net/ipv6/seg6_hmac.c) belong to the same module.
It seems an internal function call in ipv6.ko.
Fixes: bf355b8d2c ("ipv6: sr: add core files for SR HMAC support")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because the only in-tree call-site,
net/ipv4/xfrm4_policy.c is never compiled as modular.
(CONFIG_XFRM is boolean)
Fixes: 2f32b51b60 ("xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.
modpost used to detect it, but it has been broken for a decade.
Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.
There are two ways to fix it:
- Remove __init
- Remove EXPORT_SYMBOL
I chose the latter for this case because the only in-tree call-site,
drivers/net/phy/phy_device.c is never compiled as modular.
(CONFIG_PHYLIB is boolean)
Fixes: 90eff9096c ("net: phy: Allow splitting MDIO bus/device support from PHYs")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Remove kernel.h when it is not needed.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220607125103.487801-1-simon.horman@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Fix TDP MMU performance issue with disabling dirty logging
* Fix 5.14 regression with SVM TSC scaling
* Fix indefinite stall on applying live patches
* Fix unstable selftest
* Fix memory leak from wrong copy-and-paste
* Fix missed PV TLB flush when racing with emulation
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmKglysUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOJDAgArpPcAnJbeT2VQTQcp94e4tp9k1Sf
gmUewajco4zFVB/sldE0fIporETkaX+FYYPiaNDdNgJ2lUw/HUJBN7KoFEYTZ37N
Xx/qXiIXQYFw1bmxTnacLzIQtD3luMCzOs/6/Q7CAFZIBpUtUEjkMlQOBuxoKeG0
B0iLCTJSw0taWcN170aN8G6T+5+bdR3AJW1k2wkgfESfYF9NfJoTUHQj9WTMzM2R
aBRuXvUI/rWKvQY3DfoRmgg9Ig/SirSC+abbKIs4H08vZIEUlPk3WOZSKpsN/Wzh
3XDnVRxgnaRLx6NI/ouI2UYJCmjPKbNcueGCf5IfUcHvngHjAEG/xxe4Qw==
=zQ9u
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
- syzkaller NULL pointer dereference
- TDP MMU performance issue with disabling dirty logging
- 5.14 regression with SVM TSC scaling
- indefinite stall on applying live patches
- unstable selftest
- memory leak from wrong copy-and-paste
- missed PV TLB flush when racing with emulation
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: do not report a vCPU as preempted outside instruction boundaries
KVM: x86: do not set st->preempted when going back to user space
KVM: SVM: fix tsc scaling cache logic
KVM: selftests: Make hyperv_clock selftest more stable
KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging
x86: drop bogus "cc" clobber from __try_cmpxchg_user_asm()
KVM: x86/mmu: Check every prev_roots in __kvm_mmu_free_obsolete_roots()
entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set
KVM: Don't null dereference ops->destroy
soldered to the machine) handling for TPM2 trusted keys.
-----BEGIN PGP SIGNATURE-----
iIgEABYIADAWIQRE6pSOnaBC00OEHEIaerohdGur0gUCYqCENRIcamFya2tvQGtl
cm5lbC5vcmcACgkQGnq6IXRrq9IJXAD/beOZ63XVmzp6c070Ws3KPwvLJElgZn+q
hZ92OsZvblEBAJNloKYxj0XBL5WIzhRq/i8WayGlZ0JnOEw3Xbt0F7cN
=EhPr
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-next-v5.19-rc2-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fix from Jarkko Sakkinen:
"A bug fix for migratable (whether or not a key is tied to the TPM chip
soldered to the machine) handling for TPM2 trusted keys"
* tag 'tpmdd-next-v5.19-rc2-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: tpm2: Fix migratable logic
I've been contributing and reviewing patches for bpftool for some time,
and I'm taking care of its external mirror. On Alexei, KP, and Daniel's
suggestion, I would like to step forwards and become a maintainer for
the tool. This patch adds a dedicated entry to MAINTAINERS.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220608121428.69708-1-quentin@isovalent.com
xdpxceiver run on a AF_XDP ZC enabled driver revealed a problem with XSK
Tx batching API. There is a test that checks how invalid Tx descriptors
are handled by AF_XDP. Each valid descriptor is followed by invalid one
on Tx side whereas the Rx side expects only to receive a set of valid
descriptors.
In current xsk_tx_peek_release_desc_batch() function, the amount of
available descriptors is hidden inside xskq_cons_peek_desc_batch(). This
can be problematic in cases where invalid descriptors are present due to
the fact that xskq_cons_peek_desc_batch() returns only a count of valid
descriptors. This means that it is impossible to properly update XSK
ring state when calling xskq_cons_release_n().
To address this issue, pull out the contents of
xskq_cons_peek_desc_batch() so that callers (currently only
xsk_tx_peek_release_desc_batch()) will always be able to update the
state of ring properly, as total count of entries is now available and
use this value as an argument in xskq_cons_release_n(). By
doing so, xskq_cons_peek_desc_batch() can be dropped altogether.
Fixes: 9349eb3a9d ("xsk: Introduce batched Tx descriptor interfaces")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com
When creating (sealing) a new trusted key, migratable
trusted keys have the FIXED_TPM and FIXED_PARENT attributes
set, and non-migratable keys don't. This is backwards, and
also causes creation to fail when creating a migratable key
under a migratable parent. (The TPM thinks you are trying to
seal a non-migratable blob under a migratable parent.)
The following simple patch fixes the logic, and has been
tested for all four combinations of migratable and non-migratable
trusted keys and parent storage keys. With this logic, you will
get a proper failure if you try to create a non-migratable
trusted key under a migratable parent storage key, and all other
combinations work correctly.
Cc: stable@vger.kernel.org # v5.13+
Fixes: e5fb5d2c5a ("security: keys: trusted: Make sealed key properly interoperable")
Signed-off-by: David Safford <david.safford@gmail.com>
Reviewed-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
If a vCPU is outside guest mode and is scheduled out, it might be in the
process of making a memory access. A problem occurs if another vCPU uses
the PV TLB flush feature during the period when the vCPU is scheduled
out, and a virtual address has already been translated but has not yet
been accessed, because this is equivalent to using a stale TLB entry.
To avoid this, only report a vCPU as preempted if sure that the guest
is at an instruction boundary. A rescheduling request will be delivered
to the host physical CPU as an external interrupt, so for simplicity
consider any vmexit *not* instruction boundary except for external
interrupts.
It would in principle be okay to report the vCPU as preempted also
if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
vmentry/vmexit overhead unnecessarily, and optimistic spinning is
also unlikely to succeed. However, leave it for later because right
now kvm_vcpu_check_block() is doing memory accesses. Even
though the TLB flush issue only applies to virtual memory address,
it's very much preferrable to be conservative.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to the Xen path, only change the vCPU's reported state if the vCPU
was actually preempted. The reason for KVM's behavior is that for example
optimistic spinning might not be a good idea if the guest is doing repeated
exits to userspace; however, it is confusing and unlikely to make a difference,
because well-tuned guests will hardly ever exit KVM_RUN in the first place.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Every iteration of for_each_available_child_of_node() decrements
the reference count of the previous node.
when breaking early from a for_each_available_child_of_node() loop,
we need to explicitly call of_node_put() on the gphy_fw_np.
Add missing of_node_put() to avoid refcount leak.
Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220605072335.11257-1-linmq006@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix NAT support for NFPROTO_INET without layer 3 address,
from Florian Westphal.
2) Use kfree_rcu(ptr, rcu) variant in nf_tables clean_net path.
3) Use list to collect flowtable hooks to be deleted.
4) Initialize list of hook field in flowtable transaction.
5) Release hooks on error for flowtable updates.
6) Memleak in hardware offload rule commit and abort paths.
7) Early bail out in case device does not support for hardware offload.
This adds a new interface to net/core/flow_offload.c to check if the
flow indirect block list is empty.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: bail out early if hardware offload is not supported
netfilter: nf_tables: memleak flow rule from commit path
netfilter: nf_tables: release new hooks on unsupported flowtable flags
netfilter: nf_tables: always initialize flowtable hook list in transaction
netfilter: nf_tables: delete flowtable hooks via transaction list
netfilter: nf_tables: use kfree_rcu(ptr, rcu) to release hooks in clean_net path
netfilter: nat: really support inet nat without l3 address
====================
Link: https://lore.kernel.org/r/20220606212055.98300-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- proper annotation of USB buffers in bcm5974 touchpad dirver
- a quirk in SOC button driver to handle Lenovo Yoga Tablet2 1051F
- a fix for missing dependency in raspberrypi-ts driver to avoid
compile breakages with random configs.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQST2eWILY88ieB2DOtAj56VGEWXnAUCYp+6iAAKCRBAj56VGEWX
nMKaAP4gVhsLpuIAN3ChejKJTeYPlUeg0ji/9juh4uUQwzl+AQEAjMw2rwIqWC9N
yOtNNyGTMnboL0NgcXQ6xLHqePMYWwE=
=6HM7
-----END PGP SIGNATURE-----
Merge tag 'input-for-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- proper annotation of USB buffers in bcm5974 touchpad dirver
- a quirk in SOC button driver to handle Lenovo Yoga Tablet2 1051F
- a fix for missing dependency in raspberrypi-ts driver to avoid
compile breakages with random configs.
* tag 'input-for-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: soc_button_array - also add Lenovo Yoga Tablet2 1051F to dmi_use_low_level_irq
Input: bcm5974 - set missing URB_NO_TRANSFER_DMA_MAP urb flag
Input: raspberrypi-ts - add missing HAS_IOMEM dependency
Commit 223f61b8c5 ("Input: soc_button_array - add Lenovo Yoga Tablet2
1051L to the dmi_use_low_level_irq list") added the 1051L to this list
already, but the same problem applies to the 1051F. As there are no
further 1051 variants (just the F/L), we can just DMI match 1051.
Tested on a Lenovo Yoga Tablet2 1051F: Without this patch the
home-button stops working after a wakeup from suspend.
Signed-off-by: Marius Hoch <mail@mariushoch.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220603120246.3065-1-mail@mariushoch.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
The bcm5974 driver does the allocation and dma mapping of the usb urb
data buffer, but driver does not set the URB_NO_TRANSFER_DMA_MAP flag
to let usb core know the buffer is already mapped.
usb core tries to map the already mapped buffer, causing a warning:
"xhci_hcd 0000:00:14.0: rejecting DMA map of vmalloc memory"
Fix this by setting the URB_NO_TRANSFER_DMA_MAP, letting usb core
know buffer is already mapped by bcm5974 driver
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215890
Link: https://lore.kernel.org/r/20220606113636.588955-1-mathias.nyman@linux.intel.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
When the promiscuous mode is enabled on a VF, the IXGBE_VMOLR_VPE
bit (VLAN Promiscuous Enable) is set. This means that the VF will
receive packets whose VLAN is not the same than the VLAN of the VF.
For instance, in this situation:
┌────────┐ ┌────────┐ ┌────────┐
│ │ │ │ │ │
│ │ │ │ │ │
│ VF0├────┤VF1 VF2├────┤VF3 │
│ │ │ │ │ │
└────────┘ └────────┘ └────────┘
VM1 VM2 VM3
vf 0: vlan 1000
vf 1: vlan 1000
vf 2: vlan 1001
vf 3: vlan 1001
If we tcpdump on VF3, we see all the packets, even those transmitted
on vlan 1000.
This behavior prevents to bridge VF1 and VF2 in VM2, because it will
create a loop: packets transmitted on VF1 will be received by VF2 and
vice-versa, and bridged again through the software bridge.
This patch remove the activation of VLAN Promiscuous when a VF enables
the promiscuous mode. However, the IXGBE_VMOLR_UPE bit (Unicast
Promiscuous) is kept, so that a VF receives all packets that has the
same VLAN, whatever the destination MAC address.
Fixes: 8443c1a4b1 ("ixgbe, ixgbevf: Add new mbox API xcast mode")
Cc: stable@vger.kernel.org
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After a VF requested to remove the promiscuous flag on an interface, the
broadcast packets are not received anymore. This breaks some protocols
like ARP.
In ixgbe_update_vf_xcast_mode(), we should keep the IXGBE_VMOLR_BAM
bit (Broadcast Accept) on promiscuous removal.
This flag is already set by default in ixgbe_set_vmolr() on VF reset.
Fixes: 8443c1a4b1 ("ixgbe, ixgbevf: Add new mbox API xcast mode")
Cc: stable@vger.kernel.org
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add a selftest that calls a global function with a context object parameter
from an freplace function to check that the program context type is
correctly converted to the freplace target when fetching the context type
from the kernel BTF.
v2:
- Trim includes
- Get rid of global function
- Use __noinline
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220606075253.28422-2-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The verifier allows programs to call global functions as long as their
argument types match, using BTF to check the function arguments. One of the
allowed argument types to such global functions is PTR_TO_CTX; however the
check for this fails on BPF_PROG_TYPE_EXT functions because the verifier
uses the wrong type to fetch the vmlinux BTF ID for the program context
type. This failure is seen when an XDP program is loaded using
libxdp (which loads it as BPF_PROG_TYPE_EXT and attaches it to a global XDP
type program).
Fix the issue by passing in the target program type instead of the
BPF_PROG_TYPE_EXT type to bpf_prog_get_ctx() when checking function
argument compatibility.
The first Fixes tag refers to the latest commit that touched the code in
question, while the second one points to the code that first introduced
the global function call verification.
v2:
- Use resolve_prog_type()
Fixes: 3363bd0cfb ("bpf: Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support")
Fixes: 51c39bb1d5 ("bpf: Introduce function-by-function verification")
Reported-by: Simon Sundberg <simon.sundberg@kau.se>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220606075253.28422-1-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The kvmalloc_array() function is safer because it has a check for
integer overflows. These sizes come from the user and I was not
able to see any bounds checking so an integer overflow seems like a
realistic concern.
Fixes: 0dcac27254 ("bpf: Add multi kprobe link")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/Yo9VRVMeHbALyjUH@kili
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
syzbot reported an illegal copy_to_user() attempt
from bpf_prog_get_info_by_fd() [1]
There was no repro yet on this bug, but I think
that commit 0aef499f31 ("mm/usercopy: Detect vmalloc overruns")
is exposing a prior bug in bpf arm64.
bpf_prog_get_info_by_fd() looks at prog->jited_len
to determine if the JIT image can be copied out to user space.
My theory is that syzbot managed to get a prog where prog->jited_len
has been set to 43, while prog->bpf_func has ben cleared.
It is not clear why copy_to_user(uinsns, NULL, ulen) is triggering
this particular warning.
I thought find_vma_area(NULL) would not find a vm_struct.
As we do not hold vmap_area_lock spinlock, it might be possible
that the found vm_struct was garbage.
[1]
usercopy: Kernel memory exposure attempt detected from vmalloc (offset 792633534417210172, size 43)!
kernel BUG at mm/usercopy.c:101!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 25002 Comm: syz-executor.1 Not tainted 5.18.0-syzkaller-10139-g8291eaafed36 #0
Hardware name: linux,dummy-virt (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usercopy_abort+0x90/0x94 mm/usercopy.c:101
lr : usercopy_abort+0x90/0x94 mm/usercopy.c:89
sp : ffff80000b773a20
x29: ffff80000b773a30 x28: faff80000b745000 x27: ffff80000b773b48
x26: 0000000000000000 x25: 000000000000002b x24: 0000000000000000
x23: 00000000000000e0 x22: ffff80000b75db67 x21: 0000000000000001
x20: 000000000000002b x19: ffff80000b75db3c x18: 00000000fffffffd
x17: 2820636f6c6c616d x16: 76206d6f72662064 x15: 6574636574656420
x14: 74706d6574746120 x13: 2129333420657a69 x12: 73202c3237313031
x11: 3237313434333533 x10: 3336323937207465 x9 : 657275736f707865
x8 : ffff80000a30c550 x7 : ffff80000b773830 x6 : ffff80000b773830
x5 : 0000000000000000 x4 : ffff00007fbbaa10 x3 : 0000000000000000
x2 : 0000000000000000 x1 : f7ff000028fc0000 x0 : 0000000000000064
Call trace:
usercopy_abort+0x90/0x94 mm/usercopy.c:89
check_heap_object mm/usercopy.c:186 [inline]
__check_object_size mm/usercopy.c:252 [inline]
__check_object_size+0x198/0x36c mm/usercopy.c:214
check_object_size include/linux/thread_info.h:199 [inline]
check_copy_size include/linux/thread_info.h:235 [inline]
copy_to_user include/linux/uaccess.h:159 [inline]
bpf_prog_get_info_by_fd.isra.0+0xf14/0xfdc kernel/bpf/syscall.c:3993
bpf_obj_get_info_by_fd+0x12c/0x510 kernel/bpf/syscall.c:4253
__sys_bpf+0x900/0x2150 kernel/bpf/syscall.c:4956
__do_sys_bpf kernel/bpf/syscall.c:5021 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5019 [inline]
__arm64_sys_bpf+0x28/0x40 kernel/bpf/syscall.c:5019
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52
el0_svc_common.constprop.0+0x44/0xec arch/arm64/kernel/syscall.c:142
do_el0_svc+0xa0/0xc0 arch/arm64/kernel/syscall.c:206
el0_svc+0x44/0xb0 arch/arm64/kernel/entry-common.c:624
el0t_64_sync_handler+0x1ac/0x1b0 arch/arm64/kernel/entry-common.c:642
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:581
Code: aa0003e3 d00038c0 91248000 97fff65f (d4210000)
Fixes: db496944fd ("bpf: arm64: add JIT support for multi-function programs")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220531215113.1100754-1-eric.dumazet@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make iavf_set_mac synchronous by waiting for a response
from a PF. Without this iavf_set_mac is always returning
success even though set_mac can be rejected by a PF.
This ensures that when set_mac exits netdev MAC is updated.
This is needed for sending ARPs with correct MAC after
changing VF's MAC. This is also needed by bonding module.
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
VFs by default are able to see all tagged traffic regardless of trust
and VLAN filters configured.
Add new private flag vf-vlan-pruning that allows changing of default
VF behavior for tagged traffic. When the flag is turned on
untrusted VF will only be able to receive untagged traffic
or traffic with VLAN tags it has created interfaces for
The flag is off by default and can only be changed if
there are no VFs spawned on the PF. This flag will only be effective
when no PVID is set on VF and VF is not trusted.
Add new function that computes the correct VLAN ID for VF VLAN filters
based on trust, PVID, vf-vlan-prune-disable flag and current VLAN ID.
Testing Hints:
Test 1: vf-vlan-pruning == off
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning off (default setting)
2. Use scapy to send any VLAN tagged traffic and make sure the VF
receives all VLAN tagged traffic that matches its destination MAC
filters (unicast, multicast, and broadcast).
Test 2: vf-vlan-pruning == on
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning on
2. Use scapy to send any VLAN tagged traffic and make sure the VF does
not receive any VLAN tagged traffic that matches its destination MAC
filters (unicast, multicast, and broadcast).
3. Add a VLAN filter on the VF netdev
> ip link add link eth0v0 name vlan10 type vlan id 10
4. Bring the VLAN netdev up
> ip link set vlan10 up
4. Use scapy to send traffic with VLAN 10, VLAN 11 (anything not VLAN
10), and untagged traffic. Make sure the VF only receives VLAN 10
and untagged traffic when the link partner is sending.
Test 3: vf-vlan-pruning == off && VF is in a port VLAN
==============================
1. Set the private flag
> ethtool --set-priv-flag eth0 vf-vlan-pruning off (default setting)
2. Create a VF
> echo 1 > sriov_numvfs
3. Put the VF in a port VLAN
> ip link set eth0 vf 0 vlan 10
4. Use scapy to send traffic with VLAN 10 and VLAN 11 (anything not VLAN
10) and make sure the VF only receives untagged traffic when the link
partner is sending VLAN 10 tagged traffic as the VLAN tag is expected
to be stripped by HW for port VLANs and not visible to the VF.
Test 4: Change vf-vlan-pruning while VFs are created
==============================
echo 0 > sriov_numvfs
ethtool --set-priv-flag eth0 vf-vlan-pruning off
echo 1 > sriov_numvfs
ethtool --set-priv-flag eth0 vf-vlan-pruning on (expect failure)
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
SVM uses a per-cpu variable to cache the current value of the
tsc scaling multiplier msr on each cpu.
Commit 1ab9287add
("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
broke this caching logic.
Refactor the code so that all TSC scaling multiplier writes go through
a single function which checks and updates the cache.
This fixes the following scenario:
1. A CPU runs a guest with some tsc scaling ratio.
2. New guest with different tsc scaling ratio starts on this CPU
and terminates almost immediately.
This ensures that the short running guest had set the tsc scaling ratio just
once when it was set via KVM_SET_TSC_KHZ. Due to the bug,
the per-cpu cache is not updated.
3. The original guest continues to run, it doesn't restore the msr
value back to its own value, because the cache matches,
and thus continues to run with a wrong tsc scaling ratio.
Fixes: 1ab9287add ("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
hyperv_clock doesn't always give a stable test result, especially with
AMD CPUs. The test compares Hyper-V MSR clocksource (acquired either
with rdmsr() from within the guest or KVM_GET_MSRS from the host)
against rdtsc(). To increase the accuracy, increase the measured delay
(done with nop loop) by two orders of magnitude and take the mean rdtsc()
value before and after rdmsr()/KVM_GET_MSRS.
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Tested-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220601144322.1968742-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently disabling dirty logging with the TDP MMU is extremely slow.
On a 96 vCPU / 96G VM backed with gigabyte pages, it takes ~200 seconds
to disable dirty logging with the TDP MMU, as opposed to ~4 seconds with
the shadow MMU.
When disabling dirty logging, zap non-leaf parent entries to allow
replacement with huge pages instead of recursing and zapping all of the
child, leaf entries. This reduces the number of TLB flushes required.
and reduces the disable dirty log time with the TDP MMU to ~3 seconds.
Opportunistically add a WARN() to catch GFNs that are mapped at a
higher level than their max level.
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220525230904.1584480-1-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As noted (and fixed) a couple of times in the past, "=@cc<cond>" outputs
and clobbering of "cc" don't work well together. The compiler appears to
mean to reject such, but doesn't - in its upstream form - quite manage
to yet for "cc". Furthermore two similar macros don't clobber "cc", and
clobbering "cc" is pointless in asm()-s for x86 anyway - the compiler
always assumes status flags to be clobbered there.
Fixes: 989b5db215 ("x86/uaccess: Implement macros for CMPXCHG on user addresses")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Message-Id: <485c0c0b-a3a7-0b7c-5264-7d00c01de032@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When freeing obsolete previous roots, check prev_roots as intended, not
the current root.
Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com>
Fixes: 527d5cd7ee ("KVM: x86/mmu: Zap only obsolete roots if a root shadow page is zapped")
Message-Id: <20220607005905.2933378-1-shaoqin.huang@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A livepatch transition may stall indefinitely when a kvm vCPU is heavily
loaded. To the host, the vCPU task is a user thread which is spending a
very long time in the ioctl(KVM_RUN) syscall. During livepatch
transition, set_notify_signal() will be called on such tasks to
interrupt the syscall so that the task can be transitioned. This
interrupts guest execution, but when xfer_to_guest_mode_work() sees that
TIF_NOTIFY_SIGNAL is set but not TIF_SIGPENDING it concludes that an
exit to user mode is unnecessary, and guest execution is resumed without
transitioning the task for the livepatch.
This handling of TIF_NOTIFY_SIGNAL is incorrect, as set_notify_signal()
is expected to break tasks out of interruptible kernel loops and cause
them to return to userspace. Change xfer_to_guest_mode_work() to handle
TIF_NOTIFY_SIGNAL the same as TIF_SIGPENDING, signaling to the vCPU run
loop that an exit to userpsace is needed. Any pending task_work will be
run when get_signal() is called from exit_to_user_mode_loop(), so there
is no longer any need to run task work from xfer_to_guest_mode_work().
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Message-Id: <20220504180840.2907296-1-sforshee@digitalocean.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A KVM device cleanup happens in either of two callbacks:
1) destroy() which is called when the VM is being destroyed;
2) release() which is called when a device fd is closed.
Most KVM devices use 1) but Book3s's interrupt controller KVM devices
(XICS, XIVE, XIVE-native) use 2) as they need to close and reopen during
the machine execution. The error handling in kvm_ioctl_create_device()
assumes destroy() is always defined which leads to NULL dereference as
discovered by Syzkaller.
This adds a checks for destroy!=NULL and adds a missing release().
This is not changing kvm_destroy_devices() as devices with defined
release() should have been removed from the KVM devices list by then.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
bpf_helpers.h has been moved to tools/lib/bpf since 5.10, so add more
including path.
Fixes: edae34a3ed ("selftests net: add UDP GRO fraglist + bpf self-tests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Lina Wang <lina.wang@mediatek.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220606064517.8175-1-lina.wang@mediatek.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Menglong Dong says:
====================
reorganize the code of the enum skb_drop_reason
The code of skb_drop_reason is a little wild, let's reorganize them.
Three things and three patches:
1) Move the enum 'skb_drop_reason' and related function to the standalone
header 'dropreason.h', as Jakub Kicinski suggested, as the skb drop
reasons are getting more and more.
2) use auto-generation to generate the source file that convert enum
skb_drop_reason to string.
3) make the comment of skb drop reasons kernel-doc style.
Changes since v3:
3/3: remove some useless comment (Jakub Kicinski)
Changes since v2:
2/3: - add new line in the end of .gitignore
- fix awk warning by make '\;' to ';', as ';' is not need to be
escaped
- export 'drop_reasons' in skbuff.c
Changes since v1:
1/3: move dropreason.h from include/linux/ to include/net/ (Jakub Kicinski)
2/3: generate source file instead of header file for drop reasons string
array (Jakub Kicinski)
3/3: use inline comment (Jakub Kicinski)
====================
Link: https://lore.kernel.org/r/20220606022436.331005-1-imagedong@tencent.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
To make the code clear, reformat the comment in dropreason.h to k-doc
style.
Now, the comment can pass the check of kernel-doc without warnning:
$ ./scripts/kernel-doc -v -none include/linux/dropreason.h
include/linux/dropreason.h:7: info: Scanning doc for enum skb_drop_reason
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
It is annoying to add new skb drop reasons to 'enum skb_drop_reason'
and TRACE_SKB_DROP_REASON in trace/event/skb.h, and it's easy to forget
to add the new reasons we added to TRACE_SKB_DROP_REASON.
TRACE_SKB_DROP_REASON is used to convert drop reason of type number
to string. For now, the string we passed to user space is exactly the
same as the name in 'enum skb_drop_reason' with a 'SKB_DROP_REASON_'
prefix. Therefore, we can use 'auto-generation' to generate these
drop reasons to string at build time.
The new source 'dropreason_str.c' will be auto generated during build
time, which contains the string array
'const char * const drop_reasons[]'.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
As the skb drop reasons are getting more and more, move the enum
'skb_drop_reason' and related function to the standalone header
'dropreason.h', as Jakub Kicinski suggested.
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
unix_dgram_poll() calls unix_dgram_peer_wake_me() without `other`'s
lock held and check if its receive queue is full. Here we need to
use unix_recvq_full_lockless() instead of unix_recvq_full(), otherwise
KCSAN will report a data-race.
Fixes: 7d267278a9 ("unix: avoid use-after-free in ep_remove_wait_queue")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220605232325.11804-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When the managed API is used, there is no need to explicitly call
pci_free_irq_vectors().
This looks to be a left-over from the commit in the Fixes tag. Only the
.remove() function had been updated.
So remove this unused function call and update goto label accordingly.
Fixes: 8accc46775 ("stmmac: intel: use managed PCI function on probe and resume")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Link: https://lore.kernel.org/r/1ac9b6787b0db83b0095711882c55c77c8ea8da0.1654462241.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>