WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
John Fastabend	313ab00480	net/tls: remove sock unlock/lock around strp_done() The tls close() callback currently drops the sock lock to call strp_done(). Split up the RX cleanup into stopping the strparser and releasing most resources, syncing strparser and finally freeing the context. To avoid the need for a strp_done() call on the cleanup path of device offload make sure we don't arm the strparser until we are sure init will be successful. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-07-22 16:04:16 +02:00
John Fastabend	f87e62d45e	net/tls: remove close callback sock unlock/lock around TX work flush The tls close() callback currently drops the sock lock, makes a cancel_delayed_work_sync() call, and then relocks the sock. By restructuring the code we can avoid droping lock and then reclaiming it. To simplify this we do the following, tls_sk_proto_close set_bit(CLOSING) set_bit(SCHEDULE) cancel_delay_work_sync() <- cancel workqueue lock_sock(sk) ... release_sock(sk) strp_done() Setting the CLOSING bit prevents the SCHEDULE bit from being cleared by any workqueue items e.g. if one happens to be scheduled and run between when we set SCHEDULE bit and cancel work. Then because SCHEDULE bit is set now no new work will be scheduled. Tested with net selftests and bpf selftests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-07-22 16:04:16 +02:00
Jakub Kicinski	ac78fc148d	net/tls: don't call tls_sk_proto_close for hw record offload The deprecated TOE offload doesn't actually do anything in tls_sk_proto_close() - all TLS code is skipped and context not freed. Remove the callback to make it easier to refactor tls_sk_proto_close(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-07-22 16:04:16 +02:00
Jakub Kicinski	318892ac06	net/tls: don't arm strparser immediately in tls_set_sw_offload() In tls_set_device_offload_rx() we prepare the software context for RX fallback and proceed to add the connection to the device. Unfortunately, software context prep includes arming strparser so in case of a later error we have to release the socket lock to call strp_done(). In preparation for not releasing the socket lock half way through callbacks move arming strparser into a separate function. Following patches will make use of that. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-07-22 16:04:16 +02:00
Jakub Kicinski	5c4b4608fe	net/tls: fix socket wmem accounting on fallback with netem netem runs skb_orphan_partial() which "disconnects" the skb from normal TCP write memory accounting. We should not adjust sk->sk_wmem_alloc on the fallback path for such skbs. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-08 20:21:10 -07:00
Jakub Kicinski	ab232e61e7	net/tls: add missing prot info init Turns out TLS_TX in HW offload mode does not initialize tls_prot_info. Since commit `9cd81988cc` ("net/tls: use version from prot") we actually use this field on the datapath. Luckily we always compare it to TLS 1.3, and assume 1.2 otherwise. So since zero is not equal to 1.3, everything worked fine. Fixes: `9cd81988cc` ("net/tls: use version from prot") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-08 20:21:09 -07:00
Dirk van der Merwe	b5d9a834f4	net/tls: don't clear TX resync flag on error Introduce a return code for the tls_dev_resync callback. When the driver TX resync fails, kernel can retry the resync again until it succeeds. This prevents drivers from attempting to offload TLS packets if the connection is known to be out of sync. We don't worry about the RX resync since they will be retried naturally as more encrypted records get received. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-08 20:21:09 -07:00
David S. Miller	af144a9834	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-08 19:48:57 -07:00
Jakub Kicinski	13aecb17ac	net/tls: fix poll ignoring partially copied records David reports that RPC applications which use epoll() occasionally get stuck, and that TLS ULP causes the kernel to not wake applications, even though read() will return data. This is indeed true. The ctx->rx_list which holds partially copied records is not consulted when deciding whether socket is readable. Note that SO_RCVLOWAT with epoll() is and has always been broken for kernel TLS. We'd need to parse all records from the TCP layer, instead of just the first one. Fixes: `692d7b5d1f` ("tls: Fix recvmsg() to be able to peek across multiple records") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-07 14:11:44 -07:00
Jakub Kicinski	acd3e96d53	net/tls: make sure offload also gets the keys wiped Commit `86029d10af` ("tls: zero the crypto information from tls_context before freeing") added memzero_explicit() calls to clear the key material before freeing struct tls_context, but it missed tls_device.c has its own way of freeing this structure. Replace the missing free. Fixes: `86029d10af` ("tls: zero the crypto information from tls_context before freeing") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 19:22:36 -07:00
Jakub Kicinski	618bac4593	net/tls: reject offload of TLS 1.3 Neither drivers nor the tls offload code currently supports TLS version 1.3. Check the TLS version when installing connection state. TLS 1.3 will just fallback to the kernel crypto for now. Fixes: `130b392c6c` ("net: tls: Add tls 1.3 support") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 19:21:31 -07:00
David S. Miller	d96ff269a0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The new route handling in ip_mc_finish_output() from 'net' overlapped with the new support for returning congestion notifications from BPF programs. In order to handle this I had to take the dev_loopback_xmit() calls out of the switch statement. The aquantia driver conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-27 21:06:39 -07:00
Dirk van der Merwe	9354544cbc	net/tls: fix page double free on TX cleanup With commit `94850257cf` ("tls: Fix tls_device handling of partial records") a new path was introduced to cleanup partial records during sk_proto_close. This path does not handle the SW KTLS tx_list cleanup. This is unnecessary though since the free_resources calls for both SW and offload paths will cleanup a partial record. The visible effect is the following warning, but this bug also causes a page double free. WARNING: CPU: 7 PID: 4000 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110 RIP: 0010:sk_stream_kill_queues+0x103/0x110 RSP: 0018:ffffb6df87e07bd0 EFLAGS: 00010206 RAX: 0000000000000000 RBX: ffff8c21db4971c0 RCX: 0000000000000007 RDX: ffffffffffffffa0 RSI: 000000000000001d RDI: ffff8c21db497270 RBP: ffff8c21db497270 R08: ffff8c29f4748600 R09: 000000010020001a R10: ffffb6df87e07aa0 R11: ffffffff9a445600 R12: 0000000000000007 R13: 0000000000000000 R14: ffff8c21f03f2900 R15: ffff8c21f03b8df0 Call Trace: inet_csk_destroy_sock+0x55/0x100 tcp_close+0x25d/0x400 ? tcp_check_oom+0x120/0x120 tls_sk_proto_close+0x127/0x1c0 inet_release+0x3c/0x60 __sock_release+0x3d/0xb0 sock_close+0x11/0x20 __fput+0xd8/0x210 task_work_run+0x84/0xa0 do_exit+0x2dc/0xb90 ? release_sock+0x43/0x90 do_group_exit+0x3a/0xa0 get_signal+0x295/0x720 do_signal+0x36/0x610 ? SYSC_recvfrom+0x11d/0x130 exit_to_usermode_loop+0x69/0xb0 do_syscall_64+0x173/0x180 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fe9b9abc10d RSP: 002b:00007fe9b19a1d48 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 0000000000000006 RCX: 00007fe9b9abc10d RDX: 0000000000000002 RSI: 0000000000000080 RDI: 00007fe948003430 RBP: 00007fe948003410 R08: 00007fe948003430 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00005603739d9080 R13: 00007fe9b9ab9f90 R14: 00007fe948003430 R15: 0000000000000000 Fixes: `94850257cf` ("tls: Fix tls_device handling of partial records") Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-24 07:20:45 -07:00
David S. Miller	13091aa305	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Honestly all the conflicts were simple overlapping changes, nothing really interesting to report. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-17 20:20:36 -07:00
John Fastabend	648ee6cea7	net: tls, correctly account for copied bytes with multiple sk_msgs tls_sw_do_sendpage needs to return the total number of bytes sent regardless of how many sk_msgs are allocated. Unfortunately, copied (the value we return up the stack) is zero'd before each new sk_msg is allocated so we only return the copied size of the last sk_msg used. The caller (splice, etc.) of sendpage will then believe only part of its data was sent and send the missing chunks again. However, because the data actually was sent the receiver will get multiple copies of the same data. To reproduce this do multiple sendfile calls with a length close to the max record size. This will in turn call splice/sendpage, sendpage may use multiple sk_msg in this case and then returns the incorrect number of bytes. This will cause splice to resend creating duplicate data on the receiver. Andre created a C program that can easily generate this case so we will push a similar selftest for this to bpf-next shortly. The fix is to _not_ zero the copied field so that the total sent bytes is returned. Reported-by: Steinar H. Gunderson <steinar+kernel@gunderson.no> Reported-by: Andre Tomt <andre@tomt.net> Tested-by: Andre Tomt <andre@tomt.net> Fixes: `d829e9c411` ("tls: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-12 11:04:35 -07:00
Jakub Kicinski	5018007409	net/tls: add kernel-driven resync mechanism for TX TLS offload drivers keep track of TCP seq numbers to make sure the packets are fed into the HW in order. When packets get dropped on the way through the stack, the driver will get out of sync and have to use fallback encryption, but unless TCP seq number is resynced it will never match the packets correctly (or even worse - use incorrect record sequence number after TCP seq wraps). Existing drivers (mlx5) feed the entire record on every out-of-order event, allowing FW/HW to always be in sync. This patch adds an alternative, more akin to the RX resync. When driver sees a frame which is past its expected sequence number the stream must have gotten out of order (if the sequence number is smaller than expected its likely a retransmission which doesn't require resync). Driver will ask the stack to perform TX sync before it submits the next full record, and fall back to software crypto until stack has performed the sync. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:22:27 -07:00
Jakub Kicinski	eeb2efaf36	net/tls: generalize the resync callback Currently only RX direction is ever resynced, however, TX may also get out of sequence if packets get dropped on the way to the driver. Rename the resync callback and add a direction parameter. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:22:27 -07:00
Jakub Kicinski	f953d33ba1	net/tls: add kernel-driven TLS RX resync TLS offload device may lose sync with the TCP stream if packets arrive out of order. Drivers can currently request a resync at a specific TCP sequence number. When a record is found starting at that sequence number kernel will inform the device of the corresponding record number. This requires the device to constantly scan the stream for a known pattern (constant bytes of the header) after sync is lost. This patch adds an alternative approach which is entirely under the control of the kernel. Kernel tracks records it had to fully decrypt, even though TLS socket is in TLS_HW mode. If multiple records did not have any decrypted parts - it's a pretty strong indication that the device is out of sync. We choose the min number of fully encrypted records to be 2, which should hopefully be more than will get retransmitted at a time. After kernel decides the device is out of sync it schedules a resync request. If the TCP socket is empty the resync gets performed immediately. If socket is not empty we leave the record parser to resync when next record comes. Before resync in message parser we peek at the TCP socket and don't attempt the sync if the socket already has some of the next record queued. On resync failure (encrypted data continues to flow in) we retry with exponential backoff, up to once every 128 records (with a 16k record thats at most once every 2M of data). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:22:26 -07:00
Jakub Kicinski	fe58a5a02c	net/tls: rename handle_device_resync() handle_device_resync() doesn't describe the function very well. The function checks if resync should be issued upon parsing of a new record. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:22:26 -07:00
Jakub Kicinski	89fec474fa	net/tls: pass record number as a byte array TLS offload code casts record number to a u64. The buffer should be aligned to 8 bytes, but its actually a __be64, and the rest of the TLS code treats it as big int. Make the offload callbacks take a byte array, drivers can make the choice to do the ugly cast if they want to. Prepare for copying the record number onto the stack by defining a constant for max size of the byte array. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:22:26 -07:00
Jakub Kicinski	4967373959	net/tls: simplify seq calculation in handle_device_resync() We subtract "TLS_HEADER_SIZE - 1" from req_seq, then if they match we add the same constant to seq. Just add it to seq, and we don't have to touch req_seq. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:22:26 -07:00
David S. Miller	a6cdeeb16b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-07 11:00:14 -07:00
Dirk van der Merwe	b9727d7f95	net/tls: export TLS per skb encryption While offloading TLS connections, drivers need to handle the case where out of order packets need to be transmitted. Other drivers obtain the entire TLS record for the specific skb to provide as context to hardware for encryption. However, other designs may also want to keep the hardware state intact and perform the out of order encryption entirely on the host. To achieve this, export the already existing software encryption fallback path so drivers could access this. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-06 14:13:40 -07:00
Jakub Kicinski	fb0f886fa2	net/tls: don't pass version to tls_advance_record_sn() All callers pass prot->version as the last parameter of tls_advance_record_sn(), yet tls_advance_record_sn() itself needs a pointer to prot. Pass prot from callers. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-04 14:33:50 -07:00
Jakub Kicinski	9cd81988cc	net/tls: use version from prot ctx->prot holds the same information as per-direction contexts. Almost all code gets TLS version from this structure, convert the last two stragglers, this way we can improve the cache utilization by moving the per-direction data into cold cache lines. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-04 14:33:50 -07:00
Jakub Kicinski	1fe275d434	net/tls: don't re-check msg decrypted status in tls_device_decrypted() tls_device_decrypted() is only called from decrypt_skb_update(), when ctx->decrypted == false, there is no need to re-check the bit. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-04 14:33:50 -07:00
Jakub Kicinski	b9d8fec927	net/tls: don't look for decrypted frames on non-offloaded sockets If the RX config of a TLS socket is SW, there is no point iterating over the fragments and checking if frame is decrypted. It will always be fully encrypted. Note that in fully encrypted case the function doesn't actually touch any offload-related state, so it's safe to call for TLS_SW, today. Soon we will introduce code which can only be called for offloaded contexts. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-04 14:33:50 -07:00
Jakub Kicinski	87b11e0638	net/tls: remove false positive warning It's possible that TCP stack will decide to retransmit a packet right when that packet's data gets acked, especially in presence of packet reordering. This means that packets may be in flight, even though tls_device code has already freed their record state. Make fill_sg_in() and in turn tls_sw_fallback() not generate a warning in that case, and quietly proceed to drop such frames. Make the exit path from tls_sw_fallback() drop monitor friendly, for users to be able to troubleshoot dropped retransmissions. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-04 14:33:50 -07:00
Jakub Kicinski	aeb11ff0dc	net/tls: check return values from skb_copy_bits() and skb_store_bits() In light of recent bugs, we should make a better effort of checking return values. In theory none of the functions should fail today. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-04 14:33:50 -07:00
Jakub Kicinski	e52972c11d	net/tls: replace the sleeping lock around RX resync with a bit lock Commit `38030d7cb7` ("net/tls: avoid NULL-deref on resync during device removal") tried to fix a potential NULL-dereference by taking the context rwsem. Unfortunately the RX resync may get called from soft IRQ, so we can't use the rwsem to protect from the device disappearing. Because we are guaranteed there can be only one resync at a time (it's called from strparser) use a bit to indicate resync is busy and make device removal wait for the bit to get cleared. Note that there is a leftover "flags" field in struct tls_context already. Fixes: `4799ac81e5` ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-04 13:34:37 -07:00
Jakub Kicinski	27393f8c6e	Revert "net/tls: avoid NULL-deref on resync during device removal" This reverts commit `38030d7cb7`. Unfortunately the RX resync may get called from soft IRQ, so we can't take the rwsem to protect from the device disappearing. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-04 13:34:37 -07:00
Jakub Kicinski	04b25a5411	net/tls: fix no wakeup on partial reads When tls_sw_recvmsg() partially copies a record it pops that record from ctx->recv_pkt and places it on rx_list. Next iteration of tls_sw_recvmsg() reads from rx_list via process_rx_list() before it enters the decryption loop. If there is no more records to be read tls_wait_data() will put the process on the wait queue and got to sleep. This is incorrect, because some data was already copied in process_rx_list(). In case of RPC connections process may never get woken up, because peer also simply blocks in read(). I think this may also fix a similar issue when BPF is at play, because after __tcp_bpf_recvmsg() returns some data we subtract it from len and use continue to restart the loop, but len could have just reached 0, so again we'd sleep unnecessarily. That's added by: commit `d3b18ad31f` ("tls: add bpf support to sk_msg handling") Fixes: `692d7b5d1f` ("tls: Fix recvmsg() to be able to peek across multiple records") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Tested-by: David Beckett <david.beckett@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-26 21:47:13 -07:00
Jakub Kicinski	46a1695960	net/tls: fix lowat calculation if some data came from previous record If some of the data came from the previous record, i.e. from the rx_list it had already been decrypted, so it's not counted towards the "decrypted" variable, but the "copied" variable. Take that into account when checking lowat. When calculating lowat target we need to pass the original len. E.g. if lowat is at 80, len is 100 and we had 30 bytes on rx_list target would currently be incorrectly calculated as 70, even though we only need 50 more bytes to make up the 80. Fixes: `692d7b5d1f` ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Tested-by: David Beckett <david.beckett@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-26 21:47:12 -07:00
Jakub Kicinski	c3f4a6c39c	net/tls: don't ignore netdev notifications if no TLS features On device surprise removal path (the notifier) we can't bail just because the features are disabled. They may have been enabled during the lifetime of the device. This bug leads to leaking netdev references and use-after-frees if there are active connections while device features are cleared. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-22 12:21:44 -07:00
Jakub Kicinski	3686637e50	net/tls: fix state removal with feature flags off TLS offload drivers shouldn't (and currently don't) block the TLS offload feature changes based on whether there are active offloaded connections or not. This seems to be a good idea, because we want the admin to be able to disable the TLS offload at any time, and there is no clean way of disabling it for active connections (TX side is quite problematic). So if features are cleared existing connections will stay offloaded until they close, and new connections will not attempt offload to a given device. However, the offload state removal handling is currently broken if feature flags get cleared while there are active TLS offloads. RX side will completely bail from cleanup, even on normal remove path, leaving device state dangling, potentially causing issues when the 5-tuple is reused. It will also fail to release the netdev reference. Remove the RX-side warning message, in next release cycle it should be printed when features are disabled, rather than when connection dies, but for that we need a more efficient method of finding connection of a given netdev (a'la BPF offload code). Fixes: `4799ac81e5` ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-22 12:21:44 -07:00
Jakub Kicinski	38030d7cb7	net/tls: avoid NULL-deref on resync during device removal When netdev with active kTLS sockets in unregistered notifier callback walks the offloaded sockets and cleans up offload state. RX data may still be processed, however, and if resync was requested prior to device removal we would hit a NULL pointer dereference on ctx->netdev use. Make sure resync is under the device offload lock and NULL-check the netdev pointer. This should be safe, because the pointer is set to NULL either in the netdev notifier (under said lock) or when socket is completely dead and no resync can happen. The other access to ctx->netdev in tls_validate_xmit_skb() does not dereference the pointer, it just checks it against other device pointer, so it should be pretty safe (perhaps we can add a READ_ONCE/WRITE_ONCE there, if paranoid). Fixes: `4799ac81e5` ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-22 12:21:44 -07:00
Thomas Gleixner	ec8f24b7fa	treewide: Add SPDX license identifier - Makefile/Kconfig Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-05-21 10:50:46 +02:00
Jakub Kicinski	b53f4976fb	net/tls: handle errors from padding_length() At the time padding_length() is called the record header is still part of the message. If malicious TLS 1.3 peer sends an all-zero record padding_length() will stop at the record header, and return full length of the data including the tail_size. Subsequent subtraction of prot->overhead_size from rxm->full_len will cause rxm->full_len to turn negative. skb accessors, however, will always catch resulting out-of-bounds operation, so in practice this fix comes down to returning the correct error code. It also fixes a set but not used warning. This code was added by commit `130b392c6c` ("net: tls: Add tls 1.3 support"). CC: Dave Watson <davejwatson@fb.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-09 16:37:39 -07:00
Jakub Kicinski	88c80bee88	net/tls: remove set but not used variables Commit `4504ab0e6e` ("net/tls: Inform user space about send buffer availability") made us report write_space regardless whether partial record push was successful or not. Remove the now unused return value to clean up the following W=1 warning: net/tls/tls_device.c: In function ‘tls_device_write_space’: net/tls/tls_device.c:546:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] int rc = 0; ^~ CC: Vakul Garg <vakul.garg@nxp.com> CC: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-09 16:37:39 -07:00
Jakub Kicinski	494bc1d281	net/tcp: use deferred jump label for TCP acked data hook User space can flip the clean_acked_data_enabled static branch on and off with TLS offload when CONFIG_TLS_DEVICE is enabled. jump_label.h suggests we use the delayed version in this case. Deferred branches now also don't take the branch mutex on decrement, so we avoid potential locking issues. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-09 11:13:57 -07:00
David S. Miller	ff24e4980a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Three trivial overlapping conflicts. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-02 22:14:21 -04:00
Jakub Kicinski	2dcb003314	net/tls: avoid NULL pointer deref on nskb->sk in fallback update_chksum() accesses nskb->sk before it has been set by complete_skb(), move the init up. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-05-01 11:37:56 -04:00
Jakub Kicinski	eb3d38d5ad	net/tls: fix copy to fragments in reencrypt Fragments may contain data from other records so we have to account for that when we calculate the destination and max length of copy we can perform. Note that 'offset' is the offset within the message, so it can't be passed as offset within the frag.. Here skb_store_bits() would have realised the call is wrong and simply not copy data. Fixes: `4799ac81e5` ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 20:17:19 -04:00
Jakub Kicinski	97e1caa517	net/tls: don't copy negative amounts of data in reencrypt There is no guarantee the record starts before the skb frags. If we don't check for this condition copy amount will get negative, leading to reads and writes to random memory locations. Familiar hilarity ensues. Fixes: `4799ac81e5` ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 20:17:19 -04:00
Jakub Kicinski	63a1c95f3f	net/tls: byte swap device req TCP seq no upon setting To avoid a sparse warning byteswap the be32 sequence number before it's stored in the atomic value. While at it drop unnecessary brackets and use kernel's u64 type. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 16:52:21 -04:00
Jakub Kicinski	9e9957973c	net/tls: remove old exports of sk_destruct functions tls_device_sk_destruct being set on a socket used to indicate that socket is a kTLS device one. That is no longer true - now we use sk_validate_xmit_skb pointer for that purpose. Remove the export. tls_device_attach() needs to be moved. While at it, remove the dead declaration of tls_sk_destruct(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 16:52:21 -04:00
Jakub Kicinski	e49d268db9	net/tls: don't log errors every time offload can't proceed Currently when CONFIG_TLS_DEVICE is set each time kTLS connection is opened and the offload is not successful (either because the underlying device doesn't support it or e.g. it's tables are full) a rate limited error will be printed to the logs. There is nothing wrong with failing TLS offload. SW path will process the packets just fine, drop the noisy messages. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 16:52:21 -04:00
David S. Miller	8b44836583	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-25 23:52:29 -04:00
Jakub Kicinski	12c7686111	net/tls: don't leak IV and record seq when offload fails When device refuses the offload in tls_set_device_offload_rx() it calls tls_sw_free_resources_rx() to clean up software context state. Unfortunately, tls_sw_free_resources_rx() does not free all the state tls_set_sw_offload() allocated - it leaks IV and sequence number buffers. All other code paths which lead to tls_sw_release_resources_rx() (which tls_sw_free_resources_rx() calls) free those right before the call. Avoid the leak by moving freeing of iv and rec_seq into tls_sw_release_resources_rx(). Fixes: `4799ac81e5` ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-20 20:36:51 -07:00
Jakub Kicinski	62ef81d563	net/tls: avoid potential deadlock in tls_set_device_offload_rx() If device supports offload, but offload fails tls_set_device_offload_rx() will call tls_sw_free_resources_rx() which (unhelpfully) releases and reacquires the socket lock. For a small fix release and reacquire the device_offload_lock. Fixes: `4799ac81e5` ("tls: Add rx inline crypto offload") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-20 20:35:28 -07:00
Jakub Kicinski	9188d5ca45	net/tls: fix refcount adjustment in fallback Unlike atomic_add(), refcount_add() does not deal well with a negative argument. TLS fallback code reallocates the skb and is very likely to shrink the truesize, leading to: [ 189.513254] WARNING: CPU: 5 PID: 0 at lib/refcount.c:81 refcount_add_not_zero_checked+0x15c/0x180 Call Trace: refcount_add_checked+0x6/0x40 tls_enc_skb+0xb93/0x13e0 [tls] Once wmem_allocated count saturates the application can no longer send data on the socket. This is similar to Eric's fixes for GSO, TCP: commit `7ec318feee` ("tcp: gso: avoid refcount_t warning from tcp_gso_segment()") and UDP: commit `575b65bc5b` ("udp: avoid refcount_t saturation in __udp_gso_segment()"). Unlike the GSO case, for TLS fallback it's likely that the skb has shrunk, so the "likely" annotation is the other way around (likely branch being "sub"). Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-18 16:51:03 -07:00
David S. Miller	6b0a7f84ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-17 11:26:25 -07:00
Jakub Kicinski	903f1a1877	net/tls: fix build without CONFIG_TLS_DEVICE buildbot noticed that TLS_HW is not defined if CONFIG_TLS_DEVICE=n. Wrap the cleanup branch into an ifdef, tls_device_free_resources_tx() wouldn't be compiled either in this case. Fixes: `35b71a34ad` ("net/tls: don't leak partially sent record in device mode") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-10 17:23:26 -07:00
Jakub Kicinski	35b71a34ad	net/tls: don't leak partially sent record in device mode David reports that tls triggers warnings related to sk->sk_forward_alloc not being zero at destruction time: WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110 WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170 When sender fills up the write buffer and dies from SIGPIPE. This is due to the device implementation not cleaning up the partially_sent_record. This is because commit `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") moved the partial record cleanup to the SW-only path. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-10 13:07:02 -07:00
Jakub Kicinski	5a03bc73ab	net/tls: fix the IV leaks Commit `f66de3ee2c` ("net/tls: Split conf to rx + tx") made freeing of IV and record sequence number conditional to SW path only, but commit `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") also allocates that state for the device offload configuration. Remember to free it. Fixes: `e8f6979981` ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-10 13:07:02 -07:00
David S. Miller	f83f715195	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor comment merge conflict in mlx5. Staging driver has a fixup due to the skb->xmit_more changes in 'net-next', but was removed in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-05 14:14:19 -07:00
Jakub Kicinski	c43ac97bac	net: tls: prevent false connection termination with offload Only decrypt_internal() performs zero copy on rx, all paths which don't hit decrypt_internal() must set zc to false, otherwise tls_sw_recvmsg() may return 0 causing the application to believe that that connection got closed. Currently this happens with device offload when new record is first read from. Fixes: `d069b780e3` ("tls: Fix tls_device receive") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reported-by: David Beckett <david.beckett@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-29 13:38:50 -07:00
Vakul Garg	a88c26f671	net/tls: Replace kfree_skb() with consume_skb() To free the skb in normal course of processing, consume_skb() should be used. Only for failure paths, skb_free() is intended to be used. https://www.kernel.org/doc/htmldocs/networking/API-consume-skb.html Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-21 10:14:26 -07:00
Vakul Garg	f295b3ae9f	net/tls: Add support of AES128-CCM based ciphers Added support for AES128-CCM based record encryption. AES128-CCM is similar to AES128-GCM. Both of them have same salt/iv/mac size. The notable difference between the two is that while invoking AES128-CCM operation, the salt\|\|nonce (which is passed as IV) has to be prefixed with a hardcoded value '2'. Further, CCM implementation in kernel requires IV passed in crypto_aead_request() to be full '16' bytes. Therefore, the record structure 'struct tls_rec' has been modified to reserve '16' bytes for IV. This works for both GCM and CCM based cipher. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-20 11:02:05 -07:00
Vakul Garg	4504ab0e6e	net/tls: Inform user space about send buffer availability A previous fix ("tls: Fix write space handling") assumed that user space application gets informed about the socket send buffer availability when tls_push_sg() gets called. Inside tls_push_sg(), in case do_tcp_sendpages() returns 0, the function returns without calling ctx->sk_write_space. Further, the new function tls_sw_write_space() did not invoke ctx->sk_write_space. This leads to situation that user space application encounters a lockup always waiting for socket send buffer to become available. Rather than call ctx->sk_write_space from tls_push_sg(), it should be called from tls_write_space. So whenever tcp stack invokes sk->sk_write_space after freeing socket send buffer, we always declare the same to user space by the way of invoking ctx->sk_write_space. Fixes: `7463d3a2db` ("tls: Fix write space handling") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-13 14:16:44 -07:00
Boris Pismenny	d069b780e3	tls: Fix tls_device receive Currently, the receive function fails to handle records already decrypted by the device due to the commit mentioned below. This commit advances the TLS record sequence number and prepares the context to handle the next record. Fixes: `fedf201e12` ("net: tls: Refactor control message handling on recv") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-03 22:10:16 -08:00
Eran Ben Elisha	7754bd63ed	tls: Fix mixing between async capable and async Today, tls_sw_recvmsg is capable of using asynchronous mode to handle application data TLS records. Moreover, it assumes that if the cipher can be handled asynchronously, then all packets will be processed asynchronously. However, this assumption is not always true. Specifically, for AES-GCM in TLS1.2, it causes data corruption, and breaks user applications. This patch fixes this problem by separating the async capability from the decryption operation result. Fixes: `c0ab4732d4` ("net/tls: Do not use async crypto for non-data records") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-03 22:10:16 -08:00
Boris Pismenny	7463d3a2db	tls: Fix write space handling TLS device cannot use the sw context. This patch returns the original tls device write space handler and moves the sw/device specific portions to the relevant files. Also, we remove the write_space call for the tls_sw flow, because it handles partial records in its delayed tx work handler. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-03 22:10:16 -08:00
Boris Pismenny	94850257cf	tls: Fix tls_device handling of partial records Cleanup the handling of partial records while fixing a bug where the tls_push_pending_closed_record function is using the software tls context instead of the hardware context. The bug resulted in the following crash: [ 88.791229] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 88.793271] #PF error: [normal kernel read fault] [ 88.794449] PGD 800000022a426067 P4D 800000022a426067 PUD 22a156067 PMD 0 [ 88.795958] Oops: 0000 [#1] SMP PTI [ 88.796884] CPU: 2 PID: 4973 Comm: openssl Not tainted 5.0.0-rc4+ #3 [ 88.798314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 88.800067] RIP: 0010:tls_tx_records+0xef/0x1d0 [tls] [ 88.801256] Code: 00 02 48 89 43 08 e8 a0 0b 96 d9 48 89 df e8 48 dd 4d d9 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39 c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00 [ 88.805179] RSP: 0018:ffffbd888186fca8 EFLAGS: 00010213 [ 88.806458] RAX: ffff9af1ed657c98 RBX: ffff9af1e88a1980 RCX: 0000000000000000 [ 88.808050] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9af1e88a1980 [ 88.809724] RBP: ffff9af1e88a1980 R08: 0000000000000017 R09: ffff9af1ebeeb700 [ 88.811294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 88.812917] R13: ffff9af1e88a1980 R14: ffff9af1ec13f800 R15: 0000000000000000 [ 88.814506] FS: 00007fcad2240740(0000) GS:ffff9af1f7880000(0000) knlGS:0000000000000000 [ 88.816337] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 88.817717] CR2: 0000000000000000 CR3: 0000000228b3e000 CR4: 00000000001406e0 [ 88.819328] Call Trace: [ 88.820123] tls_push_data+0x628/0x6a0 [tls] [ 88.821283] ? remove_wait_queue+0x20/0x60 [ 88.822383] ? n_tty_read+0x683/0x910 [ 88.823363] tls_device_sendmsg+0x53/0xa0 [tls] [ 88.824505] sock_sendmsg+0x36/0x50 [ 88.825492] sock_write_iter+0x87/0x100 [ 88.826521] __vfs_write+0x127/0x1b0 [ 88.827499] vfs_write+0xad/0x1b0 [ 88.828454] ksys_write+0x52/0xc0 [ 88.829378] do_syscall_64+0x5b/0x180 [ 88.830369] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 88.831603] RIP: 0033:0x7fcad1451680 [ 1248.470626] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 1248.472564] #PF error: [normal kernel read fault] [ 1248.473790] PGD 0 P4D 0 [ 1248.474642] Oops: 0000 [#1] SMP PTI [ 1248.475651] CPU: 3 PID: 7197 Comm: openssl Tainted: G OE 5.0.0-rc4+ #3 [ 1248.477426] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 1248.479310] RIP: 0010:tls_tx_records+0x110/0x1f0 [tls] [ 1248.480644] Code: 00 02 48 89 43 08 e8 4f cb 63 d7 48 89 df e8 f7 9c 1b d7 4c 89 f8 4d 8b bf 98 00 00 00 48 05 98 00 00 00 48 89 04 24 49 39 c7 <49> 8b 1f 4d 89 fd 0f 84 af 00 00 00 41 8b 47 10 85 c0 0f 85 8d 00 [ 1248.484825] RSP: 0018:ffffaa0a41543c08 EFLAGS: 00010213 [ 1248.486154] RAX: ffff955a2755dc98 RBX: ffff955a36031980 RCX: 0000000000000006 [ 1248.487855] RDX: 0000000000000000 RSI: 000000000000002b RDI: 0000000000000286 [ 1248.489524] RBP: ffff955a36031980 R08: 0000000000000000 R09: 00000000000002b1 [ 1248.491394] R10: 0000000000000003 R11: 00000000ad55ad55 R12: 0000000000000000 [ 1248.493162] R13: 0000000000000000 R14: ffff955a2abe6c00 R15: 0000000000000000 [ 1248.494923] FS: 0000000000000000(0000) GS:ffff955a378c0000(0000) knlGS:0000000000000000 [ 1248.496847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1248.498357] CR2: 0000000000000000 CR3: 000000020c40e000 CR4: 00000000001406e0 [ 1248.500136] Call Trace: [ 1248.500998] ? tcp_check_oom+0xd0/0xd0 [ 1248.502106] tls_sk_proto_close+0x127/0x1e0 [tls] [ 1248.503411] inet_release+0x3c/0x60 [ 1248.504530] __sock_release+0x3d/0xb0 [ 1248.505611] sock_close+0x11/0x20 [ 1248.506612] __fput+0xb4/0x220 [ 1248.507559] task_work_run+0x88/0xa0 [ 1248.508617] do_exit+0x2cb/0xbc0 [ 1248.509597] ? core_sys_select+0x17a/0x280 [ 1248.510740] do_group_exit+0x39/0xb0 [ 1248.511789] get_signal+0x1d0/0x630 [ 1248.512823] do_signal+0x36/0x620 [ 1248.513822] exit_to_usermode_loop+0x5c/0xc6 [ 1248.515003] do_syscall_64+0x157/0x180 [ 1248.516094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1248.517456] RIP: 0033:0x7fb398bd3f53 [ 1248.518537] Code: Bad RIP value. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-03 22:10:16 -08:00
Vakul Garg	2b794c4098	tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg The patch enables returning 'type' in msghdr for records that are retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked from socket from getting clubbed with any other record of different type when records are subsequently dequeued from strparser. For each record, we now retain its type in sk_buff's control buffer cb[]. Inside control buffer, record's full length and offset are already stored by strparser in 'struct strp_msg'. We store record type after 'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is stored just after record dequeue. For tls1.3, the type is stored after record has been decrypted. Inside process_rx_list(), before processing a non-data record, we check that we must be able to return back the record type to the user application. If not, the decrypted records in tls context's rx_list is left there without consuming any data. Fixes: `692d7b5d1f` ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-24 21:58:38 -08:00
Vakul Garg	4509de1468	net/tls: Move protocol constants from cipher context to tls context Each tls context maintains two cipher contexts (one each for tx and rx directions). For each tls session, the constants such as protocol version, ciphersuite, iv size, associated data size etc are same for both the directions and need to be stored only once per tls context. Hence these are moved from 'struct cipher_context' to 'struct tls_prot_info' and stored only once in 'struct tls_context'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-19 10:40:36 -08:00
Vakul Garg	c0ab4732d4	net/tls: Do not use async crypto for non-data records Addition of tls1.3 support broke tls1.2 handshake when async crypto accelerator is used. This is because the record type for non-data records is not propagated to user application. Also when async decryption happens, the decryption does not stop when two different types of records get dequeued and submitted for decryption. To address it, we decrypt tls1.2 non-data records in synchronous way. We check whether the record we just processed has same type as the previous one before checking for async condition and jumping to dequeue next record. Fixes: `130b392c6c` ("net: tls: Add tls 1.3 support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-12 12:35:14 -05:00
Vakul Garg	8497ded2d1	net/tls: Disable async decrytion for tls1.3 Function tls_sw_recvmsg() dequeues multiple records from stream parser and decrypts them. In case the decryption is done by async accelerator, the records may get submitted for decryption while the previous ones may not have been decryted yet. For tls1.3, the record type is known only after decryption. Therefore, for tls1.3, tls_sw_recvmsg() may submit records for decryption even if it gets 'handshake' records after 'data' records. These intermediate 'handshake' records may do a key updation. By the time new keys are given to ktls by userspace, it is possible that ktls has already submitted some records i(which are encrypted with new keys) for decryption using old keys. This would lead to decrypt failure. Therefore, async decryption of records should be disabled for tls1.3. Fixes: `130b392c6c` ("net: tls: Add tls 1.3 support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-09 09:27:49 -08:00
Dave Watson	5b053e121f	net: tls: Set async_capable for tls zerocopy only if we see EINPROGRESS Currently we don't zerocopy if the crypto framework async bit is set. However some crypto algorithms (such as x86 AESNI) support async, but in the context of sendmsg, will never run asynchronously. Instead, check for actual EINPROGRESS return code before assuming algorithm is async. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:05:07 -08:00
Dave Watson	130b392c6c	net: tls: Add tls 1.3 support TLS 1.3 has minor changes from TLS 1.2 at the record layer. * Header now hardcodes the same version and application content type in the header. * The real content type is appended after the data, before encryption (or after decryption). * The IV is xored with the sequence number, instead of concatinating four bytes of IV with the explicit IV. * Zero-padding: No exlicit length is given, we search backwards from the end of the decrypted data for the first non-zero byte, which is the content type. Currently recv supports reading zero-padding, but there is no way for send to add zero padding. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Dave Watson	fedf201e12	net: tls: Refactor control message handling on recv For TLS 1.3, the control message is encrypted. Handle control message checks after decryption. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Dave Watson	a2ef9b6a22	net: tls: Refactor tls aad space size calculation TLS 1.3 has a different AAD size, use a variable in the code to make TLS 1.3 support easy. Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
Dave Watson	fb99bce712	net: tls: Support 256 bit keys Wire up support for 256 bit keys from the setsockopt to the crypto framework Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-01 15:00:55 -08:00
David S. Miller	eaf2a47f40	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-01-29 21:18:54 -08:00
Dave Watson	1023121375	net: tls: Fix deadlock in free_resources tx If there are outstanding async tx requests (when crypto returns EINPROGRESS), there is a potential deadlock: the tx work acquires the lock, while we cancel_delayed_work_sync() while holding the lock. Drop the lock while waiting for the work to complete. Fixes: `a42055e8d2` ("Add support for async encryption of records...") Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-28 23:07:08 -08:00
Dave Watson	32eb67b93c	net: tls: Save iv in tls_rec for async crypto requests aead_request_set_crypt takes an iv pointer, and we change the iv soon after setting it. Some async crypto algorithms don't save the iv, so we need to save it in the tls_rec for async requests. Found by hardcoding x64 aesni to use async crypto manager (to test the async codepath), however I don't think this combination can happen in the wild. Presumably other hardware offloads will need this fix, but there have been no user reports. Fixes: `a42055e8d2` ("Add support for async encryption of records...") Signed-off-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-28 23:05:55 -08:00
Atul Gupta	76f7164d02	net/tls: free ctx in sock destruct free tls context in sock destruct. close may not be the last call to free sock but force releasing the ctx in close will result in GPF when ctx referred again in tcp_done [ 515.330477] general protection fault: 0000 [#1] SMP PTI [ 515.330539] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc7+ #10 [ 515.330657] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0b 11/07/2013 [ 515.330844] RIP: 0010:tls_hw_unhash+0xbf/0xd0 [ [ 515.332220] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 515.332340] CR2: 00007fab32c55000 CR3: 000000009261e000 CR4: 00000000000006e0 [ 515.332519] Call Trace: [ 515.332632] <IRQ> [ 515.332793] tcp_set_state+0x5a/0x190 [ 515.332907] ? tcp_update_metrics+0xe3/0x350 [ 515.333023] tcp_done+0x31/0xd0 [ 515.333130] tcp_rcv_state_process+0xc27/0x111a [ 515.333242] ? __lock_is_held+0x4f/0x90 [ 515.333350] ? tcp_v4_do_rcv+0xaf/0x1e0 [ 515.333456] tcp_v4_do_rcv+0xaf/0x1e0 Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-22 11:30:54 -08:00
Atul Gupta	63a6b3fee4	net/tls: build_protos moved to common routine build protos is required for tls_hw_prot also hence moved to 'tls_build_proto' and called as required from tls_init and tls_hw_proto. This is required since build_protos for v4 is moved from tls_register to tls_init in commit <28cb6f1eaffdc5a6a9707cac55f4a43aa3fd7895> Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-22 11:30:54 -08:00
Vakul Garg	692d7b5d1f	tls: Fix recvmsg() to be able to peek across multiple records This fixes recvmsg() to be able to peek across multiple tls records. Without this patch, the tls's selftests test case 'recv_peek_large_buf_mult_recs' fails. Each tls receive context now maintains a 'rx_list' to retain incoming skb carrying tls records. If a tls record needs to be retained e.g. for peek case or for the case when the buffer passed to recvmsg() has a length smaller than decrypted record length, then it is added to 'rx_list'. Additionally, records are added in 'rx_list' if the crypto operation runs in async mode. The records are dequeued from 'rx_list' after the decrypted data is consumed by copying into the buffer passed to recvmsg(). In case, the MSG_PEEK flag is used in recvmsg(), then records are not consumed or removed from the 'rx_list'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 14:20:40 -08:00
YueHaibing	01cb8a1a64	net/tls: Make function tls_sw_do_sendpage static Fixes the following sparse warning: net/tls/tls_sw.c:1023:5: warning: symbol 'tls_sw_do_sendpage' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:45:21 -08:00
YueHaibing	f3de19af0f	net/tls: remove unused function tls_sw_sendpage_locked There are no in-tree callers. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 11:44:58 -08:00
David S. Miller	ce28bb4453	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2018-12-21 15:06:20 -08:00
Vakul Garg	65a10e28ae	tls: Do not call sk_memcopy_from_iter with zero length In some conditions e.g. when tls_clone_plaintext_msg() returns -ENOSPC, the number of bytes to be copied using subsequent function sk_msg_memcopy_from_iter() becomes zero. This causes function sk_msg_memcopy_from_iter() to fail which in turn causes tls_sw_sendmsg() to return failure. To prevent it, do not call sk_msg_memcopy_from_iter() when number of bytes to copy (indicated by 'try_to_copy') is zero. Fixes: `d829e9c411` ("tls: convert to generic sk_msg interface") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-21 10:26:54 -08:00
David S. Miller	339bbff2d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-12-21 The following pull-request contains BPF updates for your net-next tree. There is a merge conflict in test_verifier.c. Result looks as follows: [...] }, { "calls: cross frame pruning", .insns = { [...] .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .errstr_unpriv = "function calls to other bpf functions are allowed for root only", .result_unpriv = REJECT, .errstr = "!read_ok", .result = REJECT, }, { "jset: functional", .insns = { [...] { "jset: unknown const compare not taken", .insns = { BPF_RAW_INSN(BPF_JMP \| BPF_CALL, 0, 0, 0, BPF_FUNC_get_prandom_u32), BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1), BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0), BPF_EXIT_INSN(), }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .errstr_unpriv = "!read_ok", .result_unpriv = REJECT, .errstr = "!read_ok", .result = REJECT, }, [...] { "jset: range", .insns = { [...] }, .prog_type = BPF_PROG_TYPE_SOCKET_FILTER, .result_unpriv = ACCEPT, .result = ACCEPT, }, The main changes are: 1) Various BTF related improvements in order to get line info working. Meaning, verifier will now annotate the corresponding BPF C code to the error log, from Martin and Yonghong. 2) Implement support for raw BPF tracepoints in modules, from Matt. 3) Add several improvements to verifier state logic, namely speeding up stacksafe check, optimizations for stack state equivalence test and safety checks for liveness analysis, from Alexei. 4) Teach verifier to make use of BPF_JSET instruction, add several test cases to kselftests and remove nfp specific JSET optimization now that verifier has awareness, from Jakub. 5) Improve BPF verifier's slot_type marking logic in order to allow more stack slot sharing, from Jiong. 6) Add sk_msg->size member for context access and add set of fixes and improvements to make sock_map with kTLS usable with openssl based applications, from John. 7) Several cleanups and documentation updates in bpftool as well as auto-mount of tracefs for "bpftool prog tracelog" command, from Quentin. 8) Include sub-program tags from now on in bpf_prog_info in order to have a reliable way for user space to get all tags of the program e.g. needed for kallsyms correlation, from Song. 9) Add BTF annotations for cgroup_local_storage BPF maps and implement bpf fs pretty print support, from Roman. 10) Fix bpftool in order to allow for cross-compilation, from Ivan. 11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order to be compatible with libbfd and allow for Debian packaging, from Jakub. 12) Remove an obsolete prog->aux sanitation in dump and get rid of version check for prog load, from Daniel. 13) Fix a memory leak in libbpf's line info handling, from Prashant. 14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info does not get unaligned, from Jesper. 15) Fix test_progs kselftest to work with older compilers which are less smart in optimizing (and thus throwing build error), from Stanislav. 16) Cleanup and simplify AF_XDP socket teardown, from Björn. 17) Fix sk lookup in BPF kselftest's test_sock_addr with regards to netns_id argument, from Andrey. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 17:31:36 -08:00
John Fastabend	28cb6f1eaf	bpf: tls_sw, init TLS ULP removes BPF proto hooks The existing code did not expect users would initialize the TLS ULP without subsequently calling the TLS TX enabling socket option. If the application tries to send data after the TLS ULP enable op but before the TLS TX enable op the BPF sk_msg verdict program is skipped. This patch resolves this by converting the ipv4 sock ops to be calculated at init time the same way ipv6 ops are done. This pulls in any changes to the sock ops structure that have been made after the socket was created including the changes from adding the socket to a sock{map\|hash}. This was discovered by running OpenSSL master branch which calls the TLS ULP setsockopt early in TLS handshake but only enables the TLS TX path once the handshake has completed. As a result the datapath missed the initial handshake messages. Fixes: `02c558b2d5` ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-12-20 23:47:09 +01:00
John Fastabend	0608c69c9a	bpf: sk_msg, sock{map\|hash} redirect through ULP A sockmap program that redirects through a kTLS ULP enabled socket will not work correctly because the ULP layer is skipped. This fixes the behavior to call through the ULP layer on redirect to ensure any operations required on the data stream at the ULP layer continue to be applied. To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid calling the BPF layer on a redirected message. This is required to avoid calling the BPF layer multiple times (possibly recursively) which is not the current/expected behavior without ULPs. In the future we may add a redirect flag if users _do_ want the policy applied again but this would need to work for both ULP and non-ULP sockets and be opt-in to avoid breaking existing programs. Also to avoid polluting the flag space with an internal flag we reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with MSG_WAITFORONE. Here WAITFORONE is specific to recv path and SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing to verify is user space API is masked correctly to ensure the flag can not be set by user. (Note this needs to be true regardless because we have internal flags already in-use that user space should not be able to set). But for completeness we have two UAPI paths into sendpage, sendfile and splice. In the sendfile case the function do_sendfile() zero's flags, ./fs/read_write.c: static ssize_t do_sendfile(int out_fd, int in_fd, loff_t ppos, size_t count, loff_t max) { ... fl = 0; #if 0 / * We need to debate whether we can enable this or not. The * man page documents EAGAIN return for the output at least, * and the application is arguably buggy if it doesn't expect * EAGAIN on a non-blocking file descriptor. / if (in.file->f_flags & O_NONBLOCK) fl = SPLICE_F_NONBLOCK; #endif file_start_write(out.file); retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl); } In the splice case the pipe_to_sendpage "actor" is used which masks flags with SPLICE_F_MORE. ./fs/splice.c: static int pipe_to_sendpage(struct pipe_inode_info pipe, struct pipe_buffer buf, struct splice_desc sd) { ... more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; ... } Confirming what we expect that internal flags are in fact internal to socket side. Fixes: `d3b18ad31f` ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-12-20 23:47:09 +01:00
David S. Miller	2be09de7d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 11:53:36 -08:00
Ganesh Goudar	c6ec179a00	net/tls: allocate tls context using GFP_ATOMIC create_ctx can be called from atomic context, hence use GFP_ATOMIC instead of GFP_KERNEL. [ 395.962599] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 395.979896] in_atomic(): 1, irqs_disabled(): 0, pid: 16254, name: openssl [ 395.996564] 2 locks held by openssl/16254: [ 396.010492] #0: 00000000347acb52 (sk_lock-AF_INET){+.+.}, at: do_tcp_setsockopt.isra.44+0x13b/0x9a0 [ 396.029838] #1: 000000006c9552b5 (device_spinlock){+...}, at: tls_init+0x1d/0x280 [ 396.047675] CPU: 5 PID: 16254 Comm: openssl Tainted: G O 4.20.0-rc6+ #25 [ 396.066019] Hardware name: Supermicro X10SRA-F/X10SRA-F, BIOS 2.0c 09/25/2017 [ 396.083537] Call Trace: [ 396.096265] dump_stack+0x5e/0x8b [ 396.109876] ___might_sleep+0x216/0x250 [ 396.123940] kmem_cache_alloc_trace+0x1b0/0x240 [ 396.138800] create_ctx+0x1f/0x60 [ 396.152504] tls_init+0xbd/0x280 [ 396.166135] tcp_set_ulp+0x191/0x2d0 [ 396.180035] ? tcp_set_ulp+0x2c/0x2d0 [ 396.193960] do_tcp_setsockopt.isra.44+0x148/0x9a0 [ 396.209013] __sys_setsockopt+0x7c/0xe0 [ 396.223054] __x64_sys_setsockopt+0x20/0x30 [ 396.237378] do_syscall_64+0x4a/0x180 [ 396.251200] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: `df9d4a1780` ("net/tls: sleeping function from invalid context") Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-19 16:28:50 -08:00
Atul Gupta	df9d4a1780	net/tls: sleeping function from invalid context HW unhash within mutex for registered tls devices cause sleep when called from tcp_set_state for TCP_CLOSE. Release lock and re-acquire after function call with ref count incr/dec. defined kref and fp release for tls_device to ensure device is not released outside lock. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:748 in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/7 INFO: lockdep is turned off. CPU: 7 PID: 0 Comm: swapper/7 Tainted: G W O Call Trace: <IRQ> dump_stack+0x5e/0x8b ___might_sleep+0x222/0x260 __mutex_lock+0x5c/0xa50 ? vprintk_emit+0x1f3/0x440 ? kmem_cache_free+0x22d/0x2a0 ? tls_hw_unhash+0x2f/0x80 ? printk+0x52/0x6e ? tls_hw_unhash+0x2f/0x80 tls_hw_unhash+0x2f/0x80 tcp_set_state+0x5f/0x180 tcp_done+0x2e/0xe0 tcp_rcv_state_process+0x92c/0xdd3 ? lock_acquire+0xf5/0x1f0 ? tcp_v4_rcv+0xa7c/0xbe0 ? tcp_v4_do_rcv+0x70/0x1e0 Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-14 13:39:39 -08:00
Atul Gupta	6c0563e442	net/tls: Init routines in create_ctx create_ctx is called from tls_init and tls_hw_prot hence initialize function pointers in common routine. Signed-off-by: Atul Gupta <atul.gupta@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-14 13:39:39 -08:00
John Fastabend	7246d8ed4d	bpf: helper to pop data from messages This adds a BPF SK_MSG program helper so that we can pop data from a msg. We use this to pop metadata from a previous push data call. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-28 22:07:57 +01:00
Linus Torvalds	9931a07d51	Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "AFS series, with some iov_iter bits included" * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) missing bits of "iov_iter: Separate type from direction and use accessor functions" afs: Probe multiple fileservers simultaneously afs: Fix callback handling afs: Eliminate the address pointer from the address list cursor afs: Allow dumping of server cursor on operation failure afs: Implement YFS support in the fs client afs: Expand data structure fields to support YFS afs: Get the target vnode in afs_rmdir() and get a callback on it afs: Calc callback expiry in op reply delivery afs: Fix FS.FetchStatus delivery from updating wrong vnode afs: Implement the YFS cache manager service afs: Remove callback details from afs_callback_break struct afs: Commit the status on a new file/dir/symlink afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS afs: Don't invoke the server to read data beyond EOF afs: Add a couple of tracepoints to log I/O errors afs: Handle EIO from delivery function afs: Fix TTL on VL server and address lists afs: Implement VL server rotation afs: Improve FS server rotation error handling ...	2018-11-01 19:58:52 -07:00
David Howells	aa563d7bca	iov_iter: Separate type from direction and use accessor functions In the iov_iter struct, separate the iterator type from the iterator direction and use accessor functions to access them in most places. Convert a bunch of places to use switch-statements to access them rather then chains of bitwise-AND statements. This makes it easier to add further iterator types. Also, this can be more efficient as to implement a switch of small contiguous integers, the compiler can use ~50% fewer compare instructions than it has to use bitwise-and instructions. Further, cease passing the iterator type into the iterator setup function. The iterator function can set that itself. Only the direction is required. Signed-off-by: David Howells <dhowells@redhat.com>	2018-10-24 00:41:07 +01:00
David Howells	00e2370744	iov_iter: Use accessor function Use accessor functions to access an iterator's type and direction. This allows for the possibility of using some other method of determining the type of iterator than if-chains with bitwise-AND conditions. Signed-off-by: David Howells <dhowells@redhat.com>	2018-10-24 00:40:44 +01:00
Daniel Borkmann	c16ee04c9b	ulp: remove uid and user_visible members They are not used anymore and therefore should be removed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-10-20 23:13:32 -07:00
John Fastabend	02c558b2d5	bpf: sockmap, support for msg_peek in sk_msg with redirect ingress This adds support for the MSG_PEEK flag when doing redirect to ingress and receiving on the sk_msg psock queue. Previously the flag was being ignored which could confuse applications if they expected the flag to work as normal. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-10-17 02:30:32 +02:00
John Fastabend	d3b18ad31f	tls: add bpf support to sk_msg handling This work adds BPF sk_msg verdict program support to kTLS allowing BPF and kTLS to be combined together. Previously kTLS and sk_msg verdict programs were mutually exclusive in the ULP layer which created challenges for the orchestrator when trying to apply TCP based policy, for example. To resolve this, leveraging the work from previous patches that consolidates the use of sk_msg, we can finally enable BPF sk_msg verdict programs so they continue to run after the kTLS socket is created. No change in behavior when kTLS is not used in combination with BPF, the kselftest suite for kTLS also runs successfully. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-10-15 12:23:19 -07:00
John Fastabend	924ad65ed0	tls: replace poll implementation with read hook Instead of re-implementing poll routine use the poll callback to trigger read from kTLS, we reuse the stream_memory_read callback which is simpler and achieves the same. This helps to align sockmap and kTLS so we can more easily embed BPF in kTLS. Joint work with Daniel. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-10-15 12:23:19 -07:00
Daniel Borkmann	d829e9c411	tls: convert to generic sk_msg interface Convert kTLS over to make use of sk_msg interface for plaintext and encrypted scattergather data, so it reuses all the sk_msg helpers and data structure which later on in a second step enables to glue this to BPF. This also allows to remove quite a bit of open coded helpers which are covered by the sk_msg API. Recent changes in kTLs `80ece6a03a` ("tls: Remove redundant vars from tls record structure") and `4e6d47206c` ("tls: Add support for inplace records encryption") changed the data path handling a bit; while we've kept the latter optimization intact, we had to undo the former change to better fit the sk_msg model, hence the sg_aead_in and sg_aead_out have been brought back and are linked into the sk_msg sgs. Now the kTLS record contains a msg_plaintext and msg_encrypted sk_msg each. In the original code, the zerocopy_from_iter() has been used out of TX but also RX path. For the strparser skb-based RX path, we've left the zerocopy_from_iter() in decrypt_internal() mostly untouched, meaning it has been moved into tls_setup_from_iter() with charging logic removed (as not used from RX). Given RX path is not based on sk_msg objects, we haven't pursued setting up a dummy sk_msg to call into sk_msg_zerocopy_from_iter(), but it could be an option to prusue in a later step. Joint work with John. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-10-15 12:23:19 -07:00
Vakul Garg	4e6d47206c	tls: Add support for inplace records encryption Presently, for non-zero copy case, separate pages are allocated for storing plaintext and encrypted text of records. These pages are stored in sg_plaintext_data and sg_encrypted_data scatterlists inside record structure. Further, sg_plaintext_data & sg_encrypted_data are passed to cryptoapis for record encryption. Allocating separate pages for plaintext and encrypted text is inefficient from both required memory and performance point of view. This patch adds support of inplace encryption of records. For non-zero copy case, we reuse the pages from sg_encrypted_data scatterlist to copy the application's plaintext data. For the movement of pages from sg_encrypted_data to sg_plaintext_data scatterlists, we introduce a new function move_to_plaintext_sg(). This function add pages into sg_plaintext_data from sg_encrypted_data scatterlists. tls_do_encryption() is modified to pass the same scatterlist as both source and destination into aead_request_set_crypt() if inplace crypto has been enabled. A new ariable 'inplace_crypto' has been introduced in record structure to signify whether the same scatterlist can be used. By default, the inplace_crypto is enabled in get_rec(). If zero-copy is used (i.e. plaintext data is not copied), inplace_crypto is set to '0'. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Reviewed-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-02 23:03:47 -07:00
Vakul Garg	80ece6a03a	tls: Remove redundant vars from tls record structure Structure 'tls_rec' contains sg_aead_in and sg_aead_out which point to a aad_space and then chain scatterlists sg_plaintext_data, sg_encrypted_data respectively. Rather than using chained scatterlists for plaintext and encrypted data in aead_req, it is efficient to store aad_space in sg_encrypted_data and sg_plaintext_data itself in the first index and get rid of sg_aead_in, sg_aead_in and further chaining. This requires increasing size of sg_encrypted_data & sg_plaintext_data arrarys by 1 to accommodate entry for aad_space. The code which uses sg_encrypted_data and sg_plaintext_data has been modified to skip first index as it points to aad_space. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-29 11:41:28 -07:00
Wei Yongjun	bf17b67198	net/tls: Make function get_rec() static Fixes the following sparse warning: net/tls/tls_sw.c:655:16: warning: symbol 'get_rec' was not declared. Should it be static? Fixes: `a42055e8d2` ("net/tls: Add support for async encryption of records for performance") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-28 10:25:11 -07:00
Vakul Garg	c774973e91	tls: Fixed a memory leak during socket close During socket close, if there is a open record with tx context, it needs to be be freed apart from freeing up plaintext and encrypted scatter lists. This patch frees up the open record if present in tx context. Also tls_free_both_sg() has been renamed to tls_free_open_rec() to indicate that the free record in tx context is being freed inside the function. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 10:46:59 -07:00
Vakul Garg	b85135b595	tls: Fix socket mem accounting error under async encryption Current async encryption implementation sometimes showed up socket memory accounting error during socket close. This results in kernel warning calltrace. The root cause of the problem is that socket var sk_forward_alloc gets corrupted due to access in sk_mem_charge() and sk_mem_uncharge() being invoked from multiple concurrent contexts in multicore processor. The apis sk_mem_charge() and sk_mem_uncharge() are called from functions alloc_plaintext_sg(), free_sg() etc. It is required that memory accounting apis are called under a socket lock. The plaintext sg data sent for encryption is freed using free_sg() in tls_encryption_done(). It is wrong to call free_sg() from this function. This is because this function may run in irq context. We cannot acquire socket lock in this function. We remove calling of function free_sg() for plaintext data from tls_encryption_done() and defer freeing up of plaintext data to the time when the record is picked up from tx_list and transmitted/freed. When tls_tx_records() gets called, socket is already locked and thus there is no concurrent access problem. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-25 10:43:01 -07:00
Vakul Garg	4128c0cfb1	tls: Fixed uninitialised vars warning In tls_sw_sendmsg() and tls_sw_sendpage(), it is possible that the uninitialised variable 'ret' gets passed to sk_stream_error(). So initialise local variable 'ret' to '0. The warnings were detected by 'smatch' tool. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-24 12:24:52 -07:00
Vakul Garg	9932a29ab1	net/tls: Fixed race condition in async encryption On processors with multi-engine crypto accelerators, it is possible that multiple records get encrypted in parallel and their encryption completion is notified to different cpus in multicore processor. This leads to the situation where tls_encrypt_done() starts executing in parallel on different cores. In current implementation, encrypted records are queued to tx_ready_list in tls_encrypt_done(). This requires addition to linked list 'tx_ready_list' to be protected. As tls_decrypt_done() could be executing in irq content, it is not possible to protect linked list addition operation using a lock. To fix the problem, we remove linked list addition operation from the irq context. We do tx_ready_list addition/removal operation from application context only and get rid of possible multiple access to the linked list. Before starting encryption on the record, we add it to the tail of tx_ready_list. To prevent tls_tx_records() from transmitting it, we mark the record with a new flag 'tx_ready' in 'struct tls_rec'. When record encryption gets completed, tls_encrypt_done() has to only update the 'tx_ready' flag to true & linked list add operation is not required. The changed logic brings some other side benefits. Since the records are always submitted in tls sequence number order for encryption, the tx_ready_list always remains sorted and addition of new records to it does not have to traverse the linked list. Lastly, we renamed tx_ready_list in 'struct tls_sw_context_tx' to 'tx_list'. This is because now, the some of the records at the tail are not ready to transmit. Fixes: `a42055e8d2` ("net/tls: Add support for async encryption") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-24 12:24:24 -07:00
Vakul Garg	a42055e8d2	net/tls: Add support for async encryption of records for performance In current implementation, tls records are encrypted & transmitted serially. Till the time the previously submitted user data is encrypted, the implementation waits and on finish starts transmitting the record. This approach of encrypt-one record at a time is inefficient when asynchronous crypto accelerators are used. For each record, there are overheads of interrupts, driver softIRQ scheduling etc. Also the crypto accelerator sits idle most of time while an encrypted record's pages are handed over to tcp stack for transmission. This patch enables encryption of multiple records in parallel when an async capable crypto accelerator is present in system. This is achieved by allowing the user space application to send more data using sendmsg() even while previously issued data is being processed by crypto accelerator. This requires returning the control back to user space application after submitting encryption request to accelerator. This also means that zero-copy mode of encryption cannot be used with async accelerator as we must be done with user space application buffer before returning from sendmsg(). There can be multiple records in flight to/from the accelerator. Each of the record is represented by 'struct tls_rec'. This is used to store the memory pages for the record. After the records are encrypted, they are added in a linked list called tx_ready_list which contains encrypted tls records sorted as per tls sequence number. The records from tx_ready_list are transmitted using a newly introduced function called tls_tx_records(). The tx_ready_list is polled for any record ready to be transmitted in sendmsg(), sendpage() after initiating encryption of new tls records. This achieves parallel encryption and transmission of records when async accelerator is present. There could be situation when crypto accelerator completes encryption later than polling of tx_ready_list by sendmsg()/sendpage(). Therefore we need a deferred work context to be able to transmit records from tx_ready_list. The deferred work context gets scheduled if applications are not sending much data through the socket. If the applications issue sendmsg()/sendpage() in quick succession, then the scheduling of tx_work_handler gets cancelled as the tx_ready_list would be polled from application's context itself. This saves scheduling overhead of deferred work. The patch also brings some side benefit. We are able to get rid of the concept of CLOSED record. This is because the records once closed are either encrypted and then placed into tx_ready_list or if encryption fails, the socket error is set. This simplifies the kernel tls sendpath. However since tls_device.c is still using macros, accessory functions for CLOSED records have been retained. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-21 19:17:34 -07:00
David S. Miller	e366fa4350	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net Two new tls tests added in parallel in both net and net-next. Used Stephen Rothwell's linux-next resolution. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-18 09:33:27 -07:00
Daniel Borkmann	50c6b58a81	tls: fix currently broken MSG_PEEK behavior In kTLS MSG_PEEK behavior is currently failing, strace example: [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2430] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2430] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2430] listen(4, 10) = 0 [pid 2430] getsockname(4, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 2430] connect(3, {sa_family=AF_INET, sin_port=htons(38855), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2430] setsockopt(3, SOL_TCP, 0x1f /* TCP_??? /, [7564404], 4) = 0 [pid 2430] setsockopt(3, 0x11a / SOL_?? /, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] accept(4, {sa_family=AF_INET, sin_port=htons(49636), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 2430] setsockopt(5, SOL_TCP, 0x1f / TCP_??? /, [7564404], 4) = 0 [pid 2430] setsockopt(5, 0x11a / SOL_?? /, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2430] close(4) = 0 [pid 2430] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2430] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2430] recvfrom(5, "test_read_peektest_read_peektest"..., 64, MSG_PEEK, NULL, NULL) = 64 As can be seen from strace, there are two TLS records sent, i) 'test_read_peek' and ii) '_mult_recs\0' where we end up peeking 'test_read_peektest_read_peektest'. This is clearly wrong, and what happens is that given peek cannot call into tls_sw_advance_skb() to unpause strparser and proceed with the next skb, we end up looping over the current one, copying the 'test_read_peek' over and over into the user provided buffer. Here, we can only peek into the currently held skb (current, full TLS record) as otherwise we would end up having to hold all the original skb(s) (depending on the peek depth) in a separate queue when unpausing strparser to process next records, minimally intrusive is to return only up to the current record's size (which likely was what `c46234ebb4` ("tls: RX path for ktls") originally intended as well). Thus, after patch we properly peek the first record: [pid 2046] wait4(2075, <unfinished ...> [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3 [pid 2075] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 4 [pid 2075] bind(4, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2075] listen(4, 10) = 0 [pid 2075] getsockname(4, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, [16]) = 0 [pid 2075] connect(3, {sa_family=AF_INET, sin_port=htons(55115), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 [pid 2075] setsockopt(3, SOL_TCP, 0x1f / TCP_??? /, [7564404], 4) = 0 [pid 2075] setsockopt(3, 0x11a / SOL_?? /, 1, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] accept(4, {sa_family=AF_INET, sin_port=htons(45732), sin_addr=inet_addr("127.0.0.1")}, [16]) = 5 [pid 2075] setsockopt(5, SOL_TCP, 0x1f / TCP_??? /, [7564404], 4) = 0 [pid 2075] setsockopt(5, 0x11a / SOL_?? */, 2, "\3\0033\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 40) = 0 [pid 2075] close(4) = 0 [pid 2075] sendto(3, "test_read_peek", 14, 0, NULL, 0) = 14 [pid 2075] sendto(3, "_mult_recs\0", 11, 0, NULL, 0) = 11 [pid 2075] recvfrom(5, "test_read_peek", 64, MSG_PEEK, NULL, NULL) = 14 Fixes: `c46234ebb4` ("tls: RX path for ktls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:03:09 -07:00
John Fastabend	7a3dd8c897	tls: async support causes out-of-bounds access in crypto APIs When async support was added it needed to access the sk from the async callback to report errors up the stack. The patch tried to use space after the aead request struct by directly setting the reqsize field in aead_request. This is an internal field that should not be used outside the crypto APIs. It is used by the crypto code to define extra space for private structures used in the crypto context. Users of the API then use crypto_aead_reqsize() and add the returned amount of bytes to the end of the request memory allocation before posting the request to encrypt/decrypt APIs. So this breaks (with general protection fault and KASAN error, if enabled) because the request sent to decrypt is shorter than required causing the crypto API out-of-bounds errors. Also it seems unlikely the sk is even valid by the time it gets to the callback because of memset in crypto layer. Anyways, fix this by holding the sk in the skb->sk field when the callback is set up and because the skb is already passed through to the callback handler via void* we can access it in the handler. Then in the handler we need to be careful to NULL the pointer again before kfree_skb. I added comments on both the setup (in tls_do_decryption) and when we clear it from the crypto callback handler tls_decrypt_done(). After this selftests pass again and fixes KASAN errors/warnings. Fixes: `94524d8fc9` ("net/tls: Add support for async decryption of tls records") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-17 08:01:36 -07:00
Sabrina Dubroca	c844eb46b7	tls: clear key material from kernel memory when do_tls_setsockopt_conf fails Fixes: `3c4d755915` ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 12:03:47 -07:00
Sabrina Dubroca	86029d10af	tls: zero the crypto information from tls_context before freeing This contains key material in crypto_send_aes_gcm_128 and crypto_recv_aes_gcm_128. Introduce union tls_crypto_context, and replace the two identical unions directly embedded in struct tls_context with it. We can then use this union to clean up the memory in the new tls_ctx_free() function. Fixes: `3c4d755915` ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 12:03:47 -07:00
Sabrina Dubroca	7cba09c6d5	tls: don't copy the key out of tls12_crypto_info_aes_gcm_128 There's no need to copy the key to an on-stack buffer before calling crypto_aead_setkey(). Fixes: `3c4d755915` ("tls: kernel TLS support") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-13 12:03:47 -07:00
David S. Miller	aaf9253025	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2018-09-12 22:22:42 -07:00
Vakul Garg	150085791a	net/tls: Fixed return value when tls_complete_pending_work() fails In tls_sw_sendmsg() and tls_sw_sendpage(), the variable 'ret' has been set to return value of tls_complete_pending_work(). This allows return of proper error code if tls_complete_pending_work() fails. Fixes: `3c4d755915` ("tls: kernel TLS support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-12 00:04:28 -07:00
Vakul Garg	52ea992cfa	net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC tls_sw_sendmsg() allocates plaintext and encrypted SG entries using function sk_alloc_sg(). In case the number of SG entries hit MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for current SG index to '0'. This leads to calling of function tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes kernel crash. To fix this, set the number of SG elements to the number of elements in plaintext/encrypted SG arrays in case sk_alloc_sg() returns -ENOSPC. Fixes: `3c4d755915` ("tls: kernel TLS support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-09 08:10:01 -07:00
Vakul Garg	94524d8fc9	net/tls: Add support for async decryption of tls records When tls records are decrypted using asynchronous acclerators such as NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on getting -EINPROGRESS, the tls record processing stops till the time the crypto accelerator finishes off and returns the result. This incurs a context switch and is not an efficient way of accessing the crypto accelerators. Crypto accelerators work efficient when they are queued with multiple crypto jobs without having to wait for the previous ones to complete. The patch submits multiple crypto requests without having to wait for for previous ones to complete. This has been implemented for records which are decrypted in zero-copy mode. At the end of recvmsg(), we wait for all the asynchronous decryption requests to complete. The references to records which have been sent for async decryption are dropped. For cases where record decryption is not possible in zero-copy mode, asynchronous decryption is not used and we wait for decryption crypto api to complete. For crypto requests executing in async fashion, the memory for aead_request, sglists and skb etc is freed from the decryption completion handler. The decryption completion handler wakesup the sleeping user context when recvmsg() flags that it has done sending all the decryption requests and there are no more decryption requests pending to be completed. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Reviewed-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-09-01 19:52:50 -07:00
Doron Roberts-Kedes	0927f71dbc	net/tls: Calculate nsg for zerocopy path without skb_cow_data. decrypt_skb fails if the number of sg elements required to map it is greater than MAX_SKB_FRAGS. nsg must always be calculated, but skb_cow_data adds unnecessary memcpy's for the zerocopy case. The new function skb_nsg calculates the number of scatterlist elements required to map the skb without the extra overhead of skb_cow_data. This patch reduces memcpy by 50% on my encrypted NBD benchmarks. Reported-by: Vakul Garg <Vakul.garg@nxp.com> Reviewed-by: Vakul Garg <Vakul.garg@nxp.com> Tested-by: Vakul Garg <Vakul.garg@nxp.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-29 19:57:04 -07:00
John Fastabend	67db7cd249	tls: possible hang when do_tcp_sendpages hits sndbuf is full case Currently, the lower protocols sk_write_space handler is not called if TLS is sending a scatterlist via tls_push_sg. However, normally tls_push_sg calls do_tcp_sendpage, which may be under memory pressure, that in turn may trigger a wait via sk_wait_event. Typically, this happens when the in-flight bytes exceed the sdnbuf size. In the normal case when enough ACKs are received sk_write_space() will be called and the sk_wait_event will be woken up allowing it to send more data and/or return to the user. But, in the TLS case because the sk_write_space() handler does not wake up the events the above send will wait until the sndtimeo is exceeded. By default this is MAX_SCHEDULE_TIMEOUT so it look like a hang to the user (especially this impatient user). To fix this pass the sk_write_space event to the lower layers sk_write_space event which in the TCP case will wake any pending events. I observed the above while integrating sockmap and ktls. It initially appeared as test_sockmap (modified to use ktls) occasionally hanging. To reliably reproduce this reduce the sndbuf size and stress the tls layer by sending many 1B sends. This results in every byte needing a header and each byte individually being sent to the crypto layer. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-08-22 21:57:14 +02:00
David S. Miller	6e3bf9b04f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2018-08-18 The following pull-request contains BPF updates for your net tree. The main changes are: 1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit restrictions, from Yonghong. 2) Fix a suspicious RCU rcu_dereference_check() warning triggered from removing a device's XDP memory allocator by using the correct rhashtable lookup function, from Tariq. 3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races as well as enforcing module aliases for ULPs. Another fix for BPF map redirect to make them work again with tail calls, from Daniel. 4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-18 10:02:49 -07:00
Daniel Borkmann	037b0b86ec	tcp, ulp: add alias for all ulp modules Lets not turn the TCP ULP lookup into an arbitrary module loader as we only intend to load ULP modules through this mechanism, not other unrelated kernel modules: [root@bar]# cat foo.c #include <sys/types.h> #include <sys/socket.h> #include <linux/tcp.h> #include <linux/in.h> int main(void) { int sock = socket(PF_INET, SOCK_STREAM, 0); setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp")); return 0; } [root@bar]# gcc foo.c -O2 -Wall [root@bar]# lsmod \| grep sctp [root@bar]# ./a.out [root@bar]# lsmod \| grep sctp sctp 1077248 4 libcrc32c 16384 3 nf_conntrack,nf_nat,sctp [root@bar]# Fix it by adding module alias to TCP ULP modules, so probing module via request_module() will be limited to tcp-ulp-[name]. The existing modules like kTLS will load fine given tcp-ulp-tls alias, but others will fail to load: [root@bar]# lsmod \| grep sctp [root@bar]# ./a.out [root@bar]# lsmod \| grep sctp [root@bar]# Sockmap is not affected from this since it's either built-in or not. Fixes: `734942cc4e` ("tcp: ULP infrastructure") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-08-16 14:58:07 -07:00
Linus Torvalds	dafa5f6577	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix dcache flushing crash in skcipher. - Add hash finup self-tests. - Reschedule during speed tests. Algorithms: - Remove insecure vmac and replace it with vmac64. - Add public key verification for DH/ECDH. Drivers: - Decrease priority of sha-mb on x86. - Improve NEON latency/throughput on ARM64. - Add md5/sha384/sha512/des/3des to inside-secure. - Support eip197d in inside-secure. - Only register algorithms supported by the host in virtio. - Add cts and remove incompatible cts1 from ccree. - Add hisilicon SEC security accelerator driver. - Replace msm hwrng driver with qcom pseudo rng driver. Misc: - Centralize CRC polynomials" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (121 commits) crypto: arm64/ghash-ce - implement 4-way aggregation crypto: arm64/ghash-ce - replace NEON yield check with block limit crypto: hisilicon - sec_send_request() can be static lib/mpi: remove redundant variable esign crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable crypto: arm64/aes-ce-gcm - implement 2-way aggregation crypto: arm64/aes-ce-gcm - operate on two input blocks at a time crypto: dh - make crypto_dh_encode_key() make robust crypto: dh - fix calculating encoded key size crypto: ccp - Check for NULL PSP pointer at module unload crypto: arm/chacha20 - always use vrev for 16-bit rotates crypto: ccree - allow bigger than sector XTS op crypto: ccree - zero all of request ctx before use crypto: ccree - remove cipher ivgen left overs crypto: ccree - drop useless type flag during reg crypto: ablkcipher - fix crash flushing dcache in error path crypto: blkcipher - fix crash flushing dcache in error path crypto: skcipher - fix crash flushing dcache in error path crypto: skcipher - remove unnecessary setting of walk->nbytes crypto: scatterwalk - remove scatterwalk_samebuf() ...	2018-08-15 16:01:47 -07:00
Vakul Garg	0b243d004e	net/tls: Combined memory allocation for decryption request For preparing decryption request, several memory chunks are required (aead_req, sgin, sgout, iv, aad). For submitting the decrypt request to an accelerator, it is required that the buffers which are read by the accelerator must be dma-able and not come from stack. The buffers for aad and iv can be separately kmalloced each, but it is inefficient. This patch does a combined allocation for preparing decryption request and then segments into aead_req \|\| sgin \|\| sgout \|\| iv \|\| aad. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-13 08:41:09 -07:00
Vakul Garg	cfb4099fb4	net/tls: Mark the end in scatterlist table Function zerocopy_from_iter() unmarks the 'end' in input sgtable while adding new entries in it. The last entry in sgtable remained unmarked. This results in KASAN error report on using apis like sg_nents(). Before returning, the function needs to mark the 'end' in the last entry it adds. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-05 17:13:58 -07:00
Eric Biggers	8c30fbe63e	crypto: scatterwalk - remove 'chain' argument from scatterwalk_crypto_chain() All callers pass chain=0 to scatterwalk_crypto_chain(). Remove this unneeded parameter. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-03 18:06:03 +08:00
zhong jiang	969d509003	net/tls: Use kmemdup to simplify the code Kmemdup is better than kmalloc+memcpy. So replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-08-01 09:47:47 -07:00
Vakul Garg	ad13acce8d	net/tls: Use socket data_ready callback on record availability On receipt of a complete tls record, use socket's saved data_ready callback instead of state_change callback. In function tls_queue(), the TLS record is queued in encrypted state. But the decryption happen inline when tls_sw_recvmsg() or tls_sw_splice_read() get invoked. So it should be ok to notify the waiting context about the availability of data as soon as we could collect a full TLS record. For new data availability notification, sk_data_ready callback is more appropriate. It points to sock_def_readable() which wakes up specifically for EPOLLIN event. This is in contrast to the socket callback sk_state_change which points to sock_def_wakeup() which issues a wakeup unconditionally (without event mask). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-30 09:41:41 -07:00
Doron Roberts-Kedes	2da19ed3e4	tls: Fix improper revert in zerocopy_from_iter The current code is problematic because the iov_iter is reverted and never advanced in the non-error case. This patch skips the revert in the non-error case. This patch also fixes the amount by which the iov_iter is reverted. Currently, iov_iter is reverted by size, which can be greater than the amount by which the iter was actually advanced. Instead, only revert by the amount that the iter was advanced. Fixes: `4718799817` ("tls: Fix zerocopy_from_iter iov handling") Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-28 22:53:31 -07:00
Doron Roberts-Kedes	5a3611efe5	tls: Remove dead code in tls_sw_sendmsg tls_push_record either returns 0 on success or a negative value on failure. This patch removes code that would only be executed if tls_push_record were to return a positive value. Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-28 22:53:31 -07:00
Doron Roberts-Kedes	0a26cf3ff4	tls: Skip zerocopy path for ITER_KVEC The zerocopy path ultimately calls iov_iter_get_pages, which defines the step function for ITER_KVECs as simply, return -EFAULT. Taking the non-zerocopy path for ITER_KVECs avoids the unnecessary fallback. See https://lore.kernel.org/lkml/20150401023311.GL29656@ZenIV.linux.org.uk/T/#u for a discussion of why zerocopy for vmalloc data is not a good idea. Discovered while testing NBD traffic encrypted with ktls. Fixes: `c46234ebb4` ("tls: RX path for ktls") Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-26 14:11:34 -07:00
Vakul Garg	201876b33c	net/tls: Removed redundant checks for non-NULL Removed checks against non-NULL before calling kfree_skb() and crypto_free_aead(). These functions are safe to be called with NULL as an argument. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-26 14:01:55 -07:00
David S. Miller	19725496da	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net	2018-07-24 19:21:58 -07:00
David S. Miller	c4c5551df1	Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux All conflicts were trivial overlapping changes, so reasonably easy to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-20 21:17:12 -07:00
Doron Roberts-Kedes	fcf4793e27	tls: check RCV_SHUTDOWN in tls_wait_data The current code does not check sk->sk_shutdown & RCV_SHUTDOWN. tls_sw_recvmsg may return a positive value in the case where bytes have already been copied when the socket is shutdown. sk->sk_err has been cleared, causing the tls_wait_data to hang forever on a subsequent invocation. Checking sk->sk_shutdown & RCV_SHUTDOWN, as in tcp_recvmsg, fixes this problem. Fixes: `c46234ebb4` ("tls: RX path for ktls") Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-20 14:38:14 -07:00
Gustavo A. R. Silva	eecd685770	tls: Fix copy-paste error in tls_device_reencrypt It seems that the proper structure to use in this particular case is skb_iter instead of skb. Addresses-Coverity-ID: 1471906 ("Copy-paste error") Fixes: `4799ac81e5` ("tls: Add rx inline crypto offload") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-20 12:12:45 -07:00
Dave Watson	32da12216e	tls: Stricter error checking in zerocopy sendmsg path In the zerocopy sendmsg() path, there are error checks to revert the zerocopy if we get any error code. syzkaller has discovered that tls_push_record can return -ECONNRESET, which is fatal, and happens after the point at which it is safe to revert the iter, as we've already passed the memory to do_tcp_sendpages. Previously this code could return -ENOMEM and we would want to revert the iter, but AFAIK this no longer returns ENOMEM after `a447da7d00` ("tls: fix waitall behavior in tls_sw_recvmsg"), so we fail for all error codes. Reported-by: syzbot+c226690f7b3126c5ee04@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Dave Watson <davejwatson@fb.com> Fixes: `3c4d755915` ("tls: kernel TLS support") Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 13:31:31 -07:00
Boris Pismenny	4718799817	tls: Fix zerocopy_from_iter iov handling zerocopy_from_iter iterates over the message, but it doesn't revert the updates made by the iov iteration. This patch fixes it. Now, the iov can be used after calling zerocopy_from_iter. Fixes: `3c4d75591` ("tls: kernel TLS support") Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	4799ac81e5	tls: Add rx inline crypto offload This patch completes the generic infrastructure to offload TLS crypto to a network device. It enables the kernel to skip decryption and authentication of some skbs marked as decrypted by the NIC. In the fast path, all packets received are decrypted by the NIC and the performance is comparable to plain TCP. This infrastructure doesn't require a TCP offload engine. Instead, the NIC only decrypts packets that contain the expected TCP sequence number. Out-Of-Order TCP packets are provided unmodified. As a result, at the worst case a received TLS record consists of both plaintext and ciphertext packets. These partially decrypted records must be reencrypted, only to be decrypted. The notable differences between SW KTLS Rx and this offload are as follows: 1. Partial decryption - Software must handle the case of a TLS record that was only partially decrypted by HW. This can happen due to packet reordering. 2. Resynchronization - tls_read_size calls the device driver to resynchronize HW after HW lost track of TLS record framing in the TCP stream. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	b190a587c6	tls: Fill software context without allocation This patch allows tls_set_sw_offload to fill the context in case it was already allocated previously. We will use it in TLS_DEVICE to fill the RX software context. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	39f56e1a78	tls: Split tls_sw_release_resources_rx This patch splits tls_sw_release_resources_rx into two functions one which releases all inner software tls structures and another that also frees the containing structure. In TLS_DEVICE we will need to release the software structures without freeeing the containing structure, which contains other information. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:11 -07:00
Boris Pismenny	dafb67f3bb	tls: Split decrypt_skb to two functions Previously, decrypt_skb also updated the TLS context. Now, decrypt_skb only decrypts the payload using the current context, while decrypt_skb_update also updates the state. Later, in the tls_device Rx flow, we will use decrypt_skb directly. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:13:10 -07:00
Boris Pismenny	d80a1b9d18	tls: Refactor tls_offload variable names For symmetry, we rename tls_offload_context to tls_offload_context_tx before we add tls_offload_context_rx. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-16 00:12:09 -07:00
Vakul Garg	d2bdd26812	net/tls: Use aead_request_alloc/free for request alloc/free Instead of kzalloc/free for aead_request allocation and free, use functions aead_request_alloc(), aead_request_free(). It ensures that any sensitive crypto material held in crypto transforms is securely erased from memory. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:44:11 -07:00
Doron Roberts-Kedes	52ee6ef36e	tls: fix skb_to_sgvec returning unhandled error. The current code does not inspect the return value of skb_to_sgvec. This can cause a nullptr kernel panic when the malformed sgvec is passed into the crypto request. Checking the return value of skb_to_sgvec and skipping decryption if it is negative fixes this problem. Fixes: `c46234ebb4` ("tls: RX path for ktls") Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-03 23:26:47 +09:00
David S. Miller	5cd3da4ba2	Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net Simple overlapping changes in stmmac driver. Adjust skb_gro_flush_final_remcsum function signature to make GRO list changes in net-next, as per Stephen Rothwell's example merge resolution. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-03 10:29:26 +09:00
Linus Torvalds	a11e1d432b	Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-06-28 10:40:47 -07:00
Kees Cook	3463e51dc3	net/tls: Remove VLA usage on nonce It looks like the prior VLA removal, commit `b16520f749` ("net/tls: Remove VLA usage"), and a new VLA addition, commit `c46234ebb4` ("tls: RX path for ktls"), passed in the night. This removes the newly added VLA, which happens to have its bounds based on the same max value. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-27 10:39:52 +09:00
Vakul Garg	0ef8b4567d	tls: Removed unused variable Removed unused variable 'rxm' from tls_queue(). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-24 23:54:18 +09:00
Daniel Borkmann	06030dbaf3	tls: fix waitall behavior in tls_sw_recvmsg Current behavior in tls_sw_recvmsg() is to wait for incoming tls messages and copy up to exactly len bytes of data that the user provided. This is problematic in the sense that i) if no packet is currently queued in strparser we keep waiting until one has been processed and pushed into tls receive layer for tls_wait_data() to wake up and push the decrypted bits to user space. Given after tls decryption, we're back at streaming data, use sock_rcvlowat() hint from tcp socket instead. Retain current behavior with MSG_WAITALL flag and otherwise use the hint target for breaking the loop and returning to application. This is done if currently no ctx->recv_pkt is ready, otherwise continue to process it from our strparser backlog. Fixes: `c46234ebb4` ("tls: RX path for ktls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-15 09:14:30 -07:00
Daniel Borkmann	a447da7d00	tls: fix use-after-free in tls_push_record syzkaller managed to trigger a use-after-free in tls like the following: BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls] Write of size 1 at addr ffff88037aa08000 by task a.out/2317 CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144 Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016 Call Trace: dump_stack+0x71/0xab print_address_description+0x6a/0x280 kasan_report+0x258/0x380 ? tls_push_record.constprop.15+0x6a2/0x810 [tls] tls_push_record.constprop.15+0x6a2/0x810 [tls] tls_sw_push_pending_record+0x2e/0x40 [tls] tls_sk_proto_close+0x3fe/0x710 [tls] ? tcp_check_oom+0x4c0/0x4c0 ? tls_write_space+0x260/0x260 [tls] ? kmem_cache_free+0x88/0x1f0 inet_release+0xd6/0x1b0 __sock_release+0xc0/0x240 sock_close+0x11/0x20 __fput+0x22d/0x660 task_work_run+0x114/0x1a0 do_exit+0x71a/0x2780 ? mm_update_next_owner+0x650/0x650 ? handle_mm_fault+0x2f5/0x5f0 ? __do_page_fault+0x44f/0xa50 ? mm_fault_error+0x2d0/0x2d0 do_group_exit+0xde/0x300 __x64_sys_exit_group+0x3a/0x50 do_syscall_64+0x9a/0x300 ? page_fault+0x8/0x30 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happened through fault injection where aead_req allocation in tls_do_encryption() eventually failed and we returned -ENOMEM from the function. Turns out that the use-after-free is triggered from tls_sw_sendmsg() in the second tls_push_record(). The error then triggers a jump to waiting for memory in sk_stream_wait_memory() resp. returning immediately in case of MSG_DONTWAIT. What follows is the trim_both_sgl(sk, orig_size), which drops elements from the sg list added via tls_sw_sendmsg(). Now the use-after-free gets triggered when the socket is being closed, where tls_sk_proto_close() callback is invoked. The tls_complete_pending_work() will figure that there's a pending closed tls record to be flushed and thus calls into the tls_push_pending_closed_record() from there. ctx->push_pending_record() is called from the latter, which is the tls_sw_push_pending_record() from sw path. This again calls into tls_push_record(). And here the tls_fill_prepend() will panic since the buffer address has been freed earlier via trim_both_sgl(). One way to fix it is to move the aead request allocation out of tls_do_encryption() early into tls_push_record(). This means we don't prep the tls header and advance state to the TLS_PENDING_CLOSED_RECORD before allocation which could potentially fail happened. That fixes the issue on my side. Fixes: `3c4d755915` ("tls: kernel TLS support") Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-06-15 09:14:30 -07:00

1 2 3 4 5 ...

296 Коммитов