Граф коммитов

665203 Коммитов

Автор SHA1 Сообщение Дата
Jens Axboe 339318080b blk-mq-sched: alloate reserved tags out of normal pool
At least one driver, mtip32xx, has a hard coded dependency on
the value of the reserved tag used for internal commands. While
that should really be fixed up, for now let's ensure that we just
bypass the scheduler tags an allocation marked as reserved. They
are used for house keeping or error handling, so we can safely
ignore them in the scheduler.

Tested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-27 07:45:46 -06:00
Ming Lei a4e84aae81 mtip32xx: use runtime tag to initialize command header
mtip32xx supposes that 'request_idx' passed to .init_request()
is tag of the request, and use that as request's tag to initialize
command header.

After MQ IO scheduler is in, request tag assigned isn't same with
the request index anymore, so cause strange hardware failure on
mtip32xx, even whole system panic is triggered.

This patch fixes the issue by initializing command header via
request's real tag.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-27 07:45:18 -06:00
Borislav Petkov f8d5549df2 EDAC, ghes: Do not enable it by default
Leave it to the user to decide whether to enable this or not. Otherwise,
platform-specific drivers won't initialize (currently, EDAC supports
only a single platform driver loaded).

Signed-off-by: Borislav Petkov <bp@suse.de>
2017-04-27 14:15:38 +02:00
Sudeep Holla cb710ab1d8 mailbox: handle empty message in tx_tick
We already check if the message is empty before calling the client
tx_done callback. Calling completion on a wait event is also invalid
if the message is empty.

This patch moves the existing empty message check earlier.

Fixes: 2b6d83e2b8 ("mailbox: Introduce framework for mailbox")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2017-04-27 16:20:04 +05:30
Sudeep Holla cc6eeaa302 mailbox: skip complete wait event if timer expired
If a wait_for_completion_timeout() call returns due to a timeout,
complete() can get called after returning from the wait which is
incorrect and can cause subsequent transmissions on a channel to fail.
Since the wait_for_completion_timeout() sees the completion variable
is non-zero caused by the erroneous/spurious complete() call, and
it immediately returns without waiting for the time as expected by the
client.

This patch fixes the issue by skipping complete() call for the timer
expiry.

Fixes: 2b6d83e2b8 ("mailbox: Introduce framework for mailbox")
Reported-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2017-04-27 16:20:04 +05:30
Sudeep Holla c61b781ee0 mailbox: always wait in mbox_send_message for blocking Tx mode
There exists a race when msg_submit return immediately as there was an
active request being processed which may have completed just before it's
checked again in mbox_send_message. This will result in return to the
caller without waiting in mbox_send_message even when it's blocking Tx.

This patch fixes the issue by waiting for the completion always if Tx
is in blocking mode.

Fixes: 2b6d83e2b8 ("mailbox: Introduce framework for mailbox")
Reported-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Alexey Klimov <alexey.klimov@arm.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
2017-04-27 16:20:04 +05:30
Sabrina Dubroca cfcf99f987 xfrm: fix GRO for !CONFIG_NETFILTER
In xfrm_input() when called from GRO, async == 0, and we end up
skipping the processing in xfrm4_transport_finish(). GRO path will
always skip the NF_HOOK, so we don't need the special-case for
!NETFILTER during GRO processing.

Fixes: 7785bba299 ("esp: Add a software GRO codepath")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-27 12:20:19 +02:00
Frederic Weisbecker 25e2d8c1b9 sched/cputime: Fix ksoftirqd cputime accounting regression
irq_time_read() returns the irqtime minus the ksoftirqd time. This
is necessary because irq_time_read() is used to substract the IRQ time
from the sum_exec_runtime of a task. If we were to include the softirq
time of ksoftirqd, this task would substract its own CPU time everytime
it updates ksoftirqd->sum_exec_runtime which would therefore never
progress.

But this behaviour got broken by:

  a499a5a14d ("sched/cputime: Increment kcpustat directly on irqtime account")

... which now includes ksoftirqd softirq time in the time returned by
irq_time_read().

This has resulted in wrong ksoftirqd cputime reported to userspace
through /proc/stat and thus "top" not showing ksoftirqd when it should
after intense networking load.

ksoftirqd->stime happens to be correct but it gets scaled down by
sum_exec_runtime through task_cputime_adjusted().

To fix this, just account the strict IRQ time in a separate counter and
use it to report the IRQ time.

Reported-and-tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1493129448-5356-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-27 09:08:26 +02:00
Christoph Hellwig 7569b90a22 nvme-scsi: remove nvme_trans_security_protocol
This function just returns the same error code and sense data as
the default statement in the switch in the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
2017-04-27 08:39:32 +02:00
Dmitry V. Levin 1741937d47 uapi: change the type of struct statx_timestamp.tv_nsec to unsigned
The comment asserting that the value of struct statx_timestamp.tv_nsec
must be negative when statx_timestamp.tv_sec is negative, is wrong, as
could be seen from the following example:

	#define _FILE_OFFSET_BITS 64
	#include <assert.h>
	#include <fcntl.h>
	#include <stdio.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <asm/unistd.h>
	#include <linux/stat.h>

	int main(void)
	{
		static const struct timespec ts[2] = {
			{ .tv_nsec = UTIME_OMIT },
			{ .tv_sec = -2, .tv_nsec = 42 }
		};
		assert(utimensat(AT_FDCWD, ".", ts, 0) == 0);

		struct stat st;
		assert(stat(".", &st) == 0);
		printf("st_mtim.tv_sec = %lld, st_mtim.tv_nsec = %lu\n",
		       (long long) st.st_mtim.tv_sec,
		       (unsigned long) st.st_mtim.tv_nsec);

		struct statx stx;
		assert(syscall(__NR_statx, AT_FDCWD, ".", 0, 0, &stx) == 0);
		printf("stx_mtime.tv_sec = %lld, stx_mtime.tv_nsec = %lu\n",
		       (long long) stx.stx_mtime.tv_sec,
		       (unsigned long) stx.stx_mtime.tv_nsec);

		return 0;
	}

It expectedly prints:
st_mtim.tv_sec = -2, st_mtim.tv_nsec = 42
stx_mtime.tv_sec = -2, stx_mtime.tv_nsec = 42

The more generic comment asserting that the value of struct
statx_timestamp.tv_nsec might be negative is confusing to say the least.

It contradicts both the struct stat.st_[acm]time_nsec tradition and
struct timespec.tv_nsec requirements in utimensat syscall.
If statx syscall ever returns a stx_[acm]time containing a negative
tv_nsec that cannot be passed unmodified to utimensat syscall,
it will cause an immense confusion.

Fix this source of confusion by changing the type of struct
statx_timestamp.tv_nsec from __s32 to __u32.

Fixes: a528d35e8b ("statx: Add a system call to make enhanced file info available")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-api@vger.kernel.org
cc: mtk.manpages@gmail.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 21:19:05 -04:00
Linus Torvalds f83246089c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
 "I didn't want the release to go out without the statx system call
  properly hooked up"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Update syscall tables.
  sparc64: Fill in rest of HAVE_REGS_AND_STACK_ACCESS_API
2017-04-26 15:10:45 -07:00
David Howells 1e2f82d1e9 statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH
With the new statx() syscall, the following both allow the attributes of
the file attached to a file descriptor to be retrieved:

	statx(dfd, NULL, 0, ...);

and:

	statx(dfd, "", AT_EMPTY_PATH, ...);

Change the code to reject the first option, though this means copying
the path and engaging pathwalk for the fstat() equivalent.  dfd can be a
non-directory provided path is "".

[ The timing of this isn't wonderful, but applying this now before we
  have statx() in any released kernel, before anybody starts using the
  NULL special case.    - Linus ]

Fixes: a528d35e8b ("statx: Add a system call to make enhanced file info available")
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Sandeen <sandeen@sandeen.net>
cc: fstests@vger.kernel.org
cc: linux-api@vger.kernel.org
cc: linux-man@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-26 15:05:47 -07:00
Bart Van Assche 0eebd005dd scsi: Implement blk_mq_ops.show_rq()
Show the SCSI CDB for pending SCSI commands in
/sys/kernel/debug/block/*/mq/*/dispatch and */rq_list. An example
of how SCSI commands are displayed by this code:

ffff8801703245c0 {.op=READ, .cmd_flags=META PRIO, .rq_flags=DONTPREP IO_STAT STATS, .tag=14, .internal_tag=-1, .cmd=Read(10) 28 00 2a 81 1b 30 00 00 08 00}

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: <linux-scsi@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche 2836ee4b1a blk-mq: Add blk_mq_ops.show_rq()
This new callback function will be used in the next patch to show
more information about SCSI requests.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche 8658dca8bd blk-mq: Show operation, cmd_flags and rq_flags names
Show the operation name, .cmd_flags and .rq_flags as names instead
of numbers.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche fd07dc8185 blk-mq: Make blk_flags_show() callers append a newline character
This patch does not change any functionality but makes it possible
to produce a single line of output with multiple flag-to-name
translations.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche 65ca1ca32c blk-mq: Move the "state" debugfs attribute one level down
Move the "state" attribute from the top level to the "mq" directory
as requested by Omar.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche e869b5462f blk-mq: Unregister debugfs attributes earlier
We currently call blk_mq_free_queue() from blk_cleanup_queue()
before we unregister the debugfs attributes for that queue in
blk_release_queue(). This leaves a window open during which
accessing most of the mq debugfs attributes would cause a
use-after-free. Additionally, the "state" attribute allows
running the queue, which we should not do after the queue has
entered the "dead" state. Fix both cases by unregistering the
debugfs attributes before freeing queue resources starts.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche f05d1ba787 blk-mq: Only unregister hctxs for which registration succeeded
Hctx unregistration involves calling kobject_del(). kobject_del()
must not be called if kobject_add() has not been called. Hence in
the error path only unregister hctxs for which registration succeeded.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche 62d6c9496a blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
Since the blk_mq_debugfs_*register_hctxs() functions register and
unregister all attributes under the "mq" directory, rename these
into blk_mq_debugfs_*register_mq().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche 4c9e4019f1 blk-mq: Let blk_mq_debugfs_register() look up the queue name
A later patch will move the call of blk_mq_debugfs_register() to
a function to which the queue name is not passed as an argument.
To avoid having to add a 'name' argument to multiple callers, let
blk_mq_debugfs_register() look up the queue name.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Bart Van Assche 2d0364c8c1 blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
A later patch in this series will modify blk_mq_debugfs_register()
such that it uses q->kobj.parent to determine the name of a
request queue. Hence make sure that that pointer is initialized
before blk_mq_debugfs_register() is called. To avoid lock inversion,
protect sysfs / debugfs registration with the queue sysfs_lock
instead of the global mutex all_q_mutex.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 15:09:04 -06:00
Linus Torvalds fc08b197bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) MLX5 bug fixes from Saeed Mahameed et al:
     - released wrong resources when firmware timeout happens
     - fix wrong check for encapsulation size limits
     - UAR memory leak
     - ETHTOOL_GRXCLSRLALL failed to fill in info->data

 2) Don't cache l3mdev on mis-matches local route, causes net devices to
    leak refs. From Robert Shearman.

 3) Handle fragmented SKBs properly in macsec driver, the problem is
    that we were mis-sizing the sgvec table. From Jason A. Donenfeld.

 4) We cannot have checksum offload enabled for inner UDP tunneled
    packet during IPSEC, from Ansis Atteka.

 5) Fix double SKB free in ravb driver, from Dan Carpenter.

 6) Fix CPU port handling in b53 DSA driver, from Florian Dainelli.

 7) Don't use on-stack buffers for usb_control_msg() in CAN usb driver,
    from Maksim Salau.

 8) Fix device leak in macvlan driver, from Herbert Xu. We have to purge
    the broadcast queue properly on port destroy.

 9) Fix tx ring entry limit on EF10 devices in sfc driver. From Bert
    Kenward.

10) Fix memory leaks in team driver, from Pan Bian.

11) Don't setup ipv6_stub before it can be actually used, from Paolo
    Abeni.

12) Fix tipc socket flow control accounting, from Parthasarathy
    Bhuvaragan.

13) Fix crash on module unload in hso driver, from Andreas Kemnade.

14) Fix purging of bridge multicast entries, the problem is that if we
    don't defer it to ndo_uninit it's possible for new entries to get
    added after we purge. Fix from Xin Long.

15) Don't return garbage for PACKET_HDRLEN getsockopt, from Alexander
    Potapenko.

16) Fix autoneg stall properly in PHY layer, and revert micrel driver
    change that was papering over it. From Alexander Kochetkov.

17) Don't dereference an ipv4 route as an ipv6 one in the ip6_tunnnel
    code, from Cong Wang.

18) Clear out the congestion control private of the TCP socket in all of
    the right places, from Wei Wang.

19) rawv6_ioctl measures SKB length incorrectly, fix from Jamie
    Bainbridge.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  ipv6: check raw payload size correctly in ioctl
  tcp: memset ca_priv data to 0 properly
  ipv6: check skb->protocol before lookup for nexthop
  net: core: Prevent from dereferencing null pointer when releasing SKB
  macsec: dynamically allocate space for sglist
  Revert "phy: micrel: Disable auto negotiation on startup"
  net: phy: fix auto-negotiation stall due to unavailable interrupt
  net/packet: check length in getsockopt() called with PACKET_HDRLEN
  net: ipv6: regenerate host route if moved to gc list
  bridge: move bridge multicast cleanup to ndo_uninit
  ipv6: fix source routing
  qed: Fix error in the dcbx app meta data initialization.
  netvsc: fix calculation of available send sections
  net: hso: fix module unloading
  tipc: fix socket flow control accounting error at tipc_recv_stream
  tipc: fix socket flow control accounting error at tipc_send_stream
  ipv6: move stub initialization after ipv6 setup completion
  team: fix memory leaks
  sfc: tx ring can only have 2048 entries for all EF10 NICs
  macvlan: Fix device ref leak when purging bc_queue
  ...
2017-04-26 13:42:32 -07:00
Jamie Bainbridge 105f5528b9 ipv6: check raw payload size correctly in ioctl
In situations where an skb is paged, the transport header pointer and
tail pointer can be the same because the skb contents are in frags.

This results in ioctl(SIOCINQ/FIONREAD) incorrectly returning a
length of 0 when the length to receive is actually greater than zero.

skb->len is already correctly set in ip6_input_finish() with
pskb_pull(), so use skb->len as it always returns the correct result
for both linear and paged data.

Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 14:59:35 -04:00
Wei Wang c120144407 tcp: memset ca_priv data to 0 properly
Always zero out ca_priv data in tcp_assign_congestion_control() so that
ca_priv data is cleared out during socket creation.
Also always zero out ca_priv data in tcp_reinit_congestion_control() so
that when cc algorithm is changed, ca_priv data is cleared out as well.
We should still zero out ca_priv data even in TCP_CLOSE state because
user could call connect() on AF_UNSPEC to disconnect the socket and
leave it in TCP_CLOSE state and later call setsockopt() to switch cc
algorithm on this socket.

Fixes: 2b0a8c9ee ("tcp: add CDG congestion control")
Reported-by: Andrey Konovalov  <andreyknvl@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 14:58:32 -04:00
WANG Cong 199ab00f3c ipv6: check skb->protocol before lookup for nexthop
Andrey reported a out-of-bound access in ip6_tnl_xmit(), this
is because we use an ipv4 dst in ip6_tnl_xmit() and cast an IPv4
neigh key as an IPv6 address:

        neigh = dst_neigh_lookup(skb_dst(skb),
                                 &ipv6_hdr(skb)->daddr);
        if (!neigh)
                goto tx_err_link_failure;

        addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
        addr_type = ipv6_addr_type(addr6);

        if (addr_type == IPV6_ADDR_ANY)
                addr6 = &ipv6_hdr(skb)->daddr;

        memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));

Also the network header of the skb at this point should be still IPv4
for 4in6 tunnels, we shold not just use it as IPv6 header.

This patch fixes it by checking if skb->protocol is ETH_P_IPV6: if it
is, we are safe to do the nexthop lookup using skb_dst() and
ipv6_hdr(skb)->daddr; if not (aka IPv4), we have no clue about which
dest address we can pick here, we have to rely on callers to fill it
from tunnel config, so just fall to ip6_route_output() to make the
decision.

Fixes: ea3dc9601b ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 14:51:26 -04:00
Myungho Jung 9899886d5e net: core: Prevent from dereferencing null pointer when releasing SKB
Added NULL check to make __dev_kfree_skb_irq consistent with kfree
family of functions.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289

Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 14:47:14 -04:00
Jason A. Donenfeld 5294b83086 macsec: dynamically allocate space for sglist
We call skb_cow_data, which is good anyway to ensure we can actually
modify the skb as such (another error from prior). Now that we have the
number of fragments required, we can safely allocate exactly that amount
of memory.

Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 14:41:53 -04:00
David S. Miller b43bd72835 Revert "phy: micrel: Disable auto negotiation on startup"
This reverts commit 99f81afc13.

It was papering over the real problem, which is fixed by commit
f555f34fdc ("net: phy: fix auto-negotiation stall due to unavailable
interrupt")

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 14:34:07 -04:00
Alexander Kochetkov f555f34fdc net: phy: fix auto-negotiation stall due to unavailable interrupt
The Ethernet link on an interrupt driven PHY was not coming up if the Ethernet
cable was plugged before the Ethernet interface was brought up.

The patch trigger PHY state machine to update link state if PHY was requested to
do auto-negotiation and auto-negotiation complete flag already set.

During power-up cycle the PHY do auto-negotiation, generate interrupt and set
auto-negotiation complete flag. Interrupt is handled by PHY state machine but
doesn't update link state because PHY is in PHY_READY state. After some time
MAC bring up, start and request PHY to do auto-negotiation. If there are no new
settings to advertise genphy_config_aneg() doesn't start PHY auto-negotiation.
PHY continue to stay in auto-negotiation complete state and doesn't fire
interrupt. At the same time PHY state machine expect that PHY started
auto-negotiation and is waiting for interrupt from PHY and it won't get it.

Fixes: 321beec504 ("net: phy: Use interrupts when available in NOLINK state")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Cc: stable <stable@vger.kernel.org> # v4.9+
Tested-by: Roger Quadros <rogerq@ti.com>
Tested-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 14:32:00 -04:00
Linus Torvalds ea3a8596a0 sound fixes for 4.11
Since we got a bonus week, let me try to screw a few pending fixes.
 
 A slightly large fix is the locking fix in ASoC STI driver, but it's
 pretty board-specific, and the risk is fairly low.
 
 All the rest are small / trivial fixes, mostly marked as stable, for
 ALSA sequencer core, ASoC topology, ASoC Intel bytcr and Firewire
 drivers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCAAsFiEECxfAB4MH3rD5mfB6bDGAVD0pKaQFAlkAczcOHHRpd2FpQHN1
 c2UuZGUACgkQbDGAVD0pKaQP+Q//Z2yK7q80sbEpX6CVtTpX86Y4qNCPdO5v4p5Y
 25w/kR1P1lpGUTl2FYv+CRB6hjjaNc5egTJqPGk2zjKT1ZSOR+Y1KE+woaTJ9iFF
 fATKzQPaLFy+fB5w2egqB4WZnk0jx285QkDlcLPNIOrnaQwSvoc/Hlhxf+HmClhq
 U2PN9ihdd9sX1XeyhmueJx98hm4BQ+MhVX4V1W86qF+qOOXmZ70pYQS/ertSpCwj
 yf8na1wrWB0DChwdT44ZdHDhcnvPMqPI+Wg5R/kpJenPSdPjLht9hVkGz3Faj1TJ
 30aEHzEiJClnbrJJ5vIF+5e2IXbMeNs/otdal3QmDG/u6bKUbeqNRVI0aNvqRL+r
 RpwR01rDrxTuq/aL3Dn0vYoIwK49DvLLq74vovN1lySEvA6Mb0h7kH8PK6R8z9fC
 yxHHDbnujo7eqLuQiwOmPDUjlRmulRn5XdnxdQtvcik4PWWfBxDO8xQNS9baVwFZ
 lRwAsyuv6CAlLMGh5UbYQNnMj69KNd7HHmif4xtay2py64+NS92Q6Aba3ji50ww6
 x3PEaxPta581d/b5czaJWZw1xqH+19k0U0v9ssla0OyOwHs7UFdm4NTY7hv/J6VD
 MfG6XRfNEqUH/+OOoAGqYh2YywkDSj7KoyWgwX1E8RXMNnWuBQ8+I+LJH9fCtS2F
 0Y5xVFs=
 =TQDC
 -----END PGP SIGNATURE-----

Merge tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Since we got a bonus week, let me try to screw a few pending fixes.

  A slightly large fix is the locking fix in ASoC STI driver, but it's
  pretty board-specific, and the risk is fairly low.

  All the rest are small / trivial fixes, mostly marked as stable, for
  ALSA sequencer core, ASoC topology, ASoC Intel bytcr and Firewire
  drivers"

* tag 'sound-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ASoC: intel: Fix PM and non-atomic crash in bytcr drivers
  ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type
  ALSA: seq: Don't break snd_use_lock_sync() loop by timeout
  ASoC: topology: Fix to store enum text values
  ASoC: STI: Fix null ptr deference in IRQ handler
  ALSA: oxfw: fix regression to handle Stanton SCS.1m/1d
2017-04-26 09:30:33 -07:00
Al Viro 2fefc97b21 HAVE_ARCH_HARDENED_USERCOPY is unconditional now
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 12:11:06 -04:00
Al Viro 701cac61d0 CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
all architectures converted

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 12:11:01 -04:00
Al Viro eea86b637a Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', 'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess 2017-04-26 12:06:59 -04:00
Al Viro 9a677341cd m32r: switch to RAW_COPY_USER
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-26 12:05:56 -04:00
Christoph Hellwig 1608fd1ca3 ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
The caller only looks at the scsi_request result field anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 07:53:35 -06:00
Christoph Hellwig ce210e9223 ide-pm: always pass 0 error to __blk_end_request_all
ide_pm_execute_rq exectures a PM request synchronously, and in the failure
case where it calls __blk_end_request_all it never checks the error field
passed to the end_io callback, so don't bother setting it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 07:53:34 -06:00
Christoph Hellwig d188b90c48 scsi_transport_sas: always pass 0 error to blk_end_request_all
The SAS transport queues are only used by bsg, and bsg always looks at
the scsi_request results and never add the error passed in the end_io
callback.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-26 07:53:29 -06:00
Xin Long 35db069121 xfrm: do the garbage collection after flushing policy
Now xfrm garbage collection can be triggered by 'ip xfrm policy del'.
These is no reason not to do it after flushing policies, especially
considering that 'garbage collection deferred' is only triggered
when it reaches gc_thresh.

It's no good that the policy is gone but the xdst still hold there.
The worse thing is that xdst->route/orig_dst is also hold and can
not be released even if the orig_dst is already expired.

This patch is to do the garbage collection if there is any policy
removed in xfrm_policy_flush.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-04-26 10:34:32 +02:00
Andy Lutomirski dbd68d8e84 x86/mm: Fix flush_tlb_page() on Xen
flush_tlb_page() passes a bogus range to flush_tlb_others() and
expects the latter to fix it up.  native_flush_tlb_others() has the
fixup but Xen's version doesn't.  Move the fixup to
flush_tlb_others().

AFAICS the only real effect is that, without this fix, Xen would
flush everything instead of just the one page on remote vCPUs in
when flush_tlb_page() was called.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: e7b52ffd45 ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 10:02:06 +02:00
Andy Lutomirski ce27374fab x86/mm: Make flush_tlb_mm_range() more predictable
I'm about to rewrite the function almost completely, but first I
want to get a functional change out of the way.  Currently, if
flush_tlb_mm_range() does not flush the local TLB at all, it will
never do individual page flushes on remote CPUs.  This seems to be
an accident, and preserving it will be awkward.  Let's change it
first so that any regressions in the rewrite will be easier to
bisect and so that the rewrite can attempt to change no visible
behavior at all.

The fix is simple: we can simply avoid short-circuiting the
calculation of base_pages_to_flush.

As a side effect, this also eliminates a potential corner case: if
tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
could have ended up flushing the entire address space one page at a
time.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 10:02:06 +02:00
Andy Lutomirski 29961b59a5 x86/mm: Remove flush_tlb() and flush_tlb_current_task()
I was trying to figure out what how flush_tlb_current_task() would
possibly work correctly if current->mm != current->active_mm, but I
realized I could spare myself the effort: it has no callers except
the unused flush_tlb() macro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 10:02:06 +02:00
Andy Lutomirski 9ccee2373f x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
mark_screen_rdonly() is the last remaining caller of flush_tlb().
flush_tlb_mm_range() is potentially faster and isn't obsolete.

Compile-tested only because I don't know whether software that uses
this mechanism even exists.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/791a644076fc3577ba7f7b7cafd643cc089baa7d.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 10:02:06 +02:00
Kirill A. Shutemov e6ab9c4d43 x86/mm/64: Fix crash in remove_pagetable()
remove_pagetable() does page walk using p*d_page_vaddr() plus cast.
It's not canonical approach -- we usually use p*d_offset() for that.

It works fine as long as all page table levels are present. We broke the
invariant by introducing folded p4d page table level.

As result, remove_pagetable() interprets PMD as PUD and it leads to
crash:

	BUG: unable to handle kernel paging request at ffff880300000000
	IP: memchr_inv+0x60/0x110
	PGD 317d067
	P4D 317d067
	PUD 3180067
	PMD 33f102067
	PTE 8000000300000060

Let's fix this by using p*d_offset() instead of p*d_page_vaddr() for
page walk.

Reported-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Fixes: f2a6a70501 ("x86: Convert the rest of the code to support p4d_t")
Link: http://lkml.kernel.org/r/20170425092557.21852-1-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 08:26:43 +02:00
Josh Poimboeuf 262fa734a0 x86/unwind: Dump all stacks in unwind_dump()
Currently unwind_dump() dumps only the most recently accessed stack.
But it has a few issues.

In some cases, 'first_sp' can get out of sync with 'stack_info', causing
unwind_dump() to start from the wrong address, flood the printk buffer,
and eventually read a bad address.

In other cases, dumping only the most recently accessed stack doesn't
give enough data to diagnose the error.

Fix both issues by dumping *all* stacks involved in the trace, not just
the last one.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8b5e99f022 ("x86/unwind: Dump stack data on warnings")
Link: http://lkml.kernel.org/r/016d6a9810d7d1bfc87ef8c0e6ee041c6744c909.1493171120.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 08:19:05 +02:00
Josh Poimboeuf b0d50c7b5d x86/unwind: Silence more entry-code related warnings
Borislav Petkov reported the following unwinder warning:

  WARNING: kernel stack regs at ffffc9000024fea8 in udevadm:92 has bad 'bp' value 00007fffc4614d30
  unwind stack type:0 next_sp:          (null) mask:0x6 graph_idx:0
  ffffc9000024fea8: 000055a6100e9b38 (0x55a6100e9b38)
  ffffc9000024feb0: 000055a6100e9b35 (0x55a6100e9b35)
  ffffc9000024feb8: 000055a6100e9f68 (0x55a6100e9f68)
  ffffc9000024fec0: 000055a6100e9f50 (0x55a6100e9f50)
  ffffc9000024fec8: 00007fffc4614d30 (0x7fffc4614d30)
  ffffc9000024fed0: 000055a6100eaf50 (0x55a6100eaf50)
  ffffc9000024fed8: 0000000000000000 ...
  ffffc9000024fee0: 0000000000000100 (0x100)
  ffffc9000024fee8: ffff8801187df488 (0xffff8801187df488)
  ffffc9000024fef0: 00007ffffffff000 (0x7ffffffff000)
  ffffc9000024fef8: 0000000000000000 ...
  ffffc9000024ff10: ffffc9000024fe98 (0xffffc9000024fe98)
  ffffc9000024ff18: 00007fffc4614d00 (0x7fffc4614d00)
  ffffc9000024ff20: ffffffffffffff10 (0xffffffffffffff10)
  ffffc9000024ff28: ffffffff811c6c1f (SyS_newlstat+0xf/0x10)
  ffffc9000024ff30: 0000000000000010 (0x10)
  ffffc9000024ff38: 0000000000000296 (0x296)
  ffffc9000024ff40: ffffc9000024ff50 (0xffffc9000024ff50)
  ffffc9000024ff48: 0000000000000018 (0x18)
  ffffc9000024ff50: ffffffff816b2e6a (entry_SYSCALL_64_fastpath+0x18/0xa8)
  ...

It unwinded from an interrupt which came in right after entry code
called into a C syscall handler, before it had a chance to set up the
frame pointer, so regs->bp still had its user space value.

Add a check to silence warnings in such a case, where an interrupt
has occurred and regs->sp is almost at the end of the stack.

Reported-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: c32c47c68a ("x86/unwind: Warn on bad frame pointer")
Link: http://lkml.kernel.org/r/c695f0d0d4c2cfe6542b90e2d0520e11eb901eb5.1493171120.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-26 08:19:05 +02:00
Linus Torvalds ea839b4174 Last minute fixes for ARC
- Build error in Mellanox nps platform
 
  - addressing lack of saving FPU regs in releavnt configs
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIbBAABAgAGBQJY/7Y9AAoJEGnX8d3iisJerOIP+OIw4iInJEycs2A1lQxMTUAa
 DFny0uv1EuWD5cC7g1O3m/SGIeN14zt6WBo40VHgxiOw/VFDsk4fze/QyfA+FFj0
 Qr7c5UDQKsWCNUNwkAfUYqp5/HWM9cO+WlUr3JbKL5sThh0SOBhb09T+Z9BEh+Ns
 jMu4VKa1Jh8RuXPd0Wb86CEhkRalTySTbY3og6g4F+HYRPvvqHvigdJ3yPI/mh0E
 ToPbZcqCNqgpJ0vyHak+QXb1gSbMjlevmjmokJR9O49Ypb8jEvQjwkZE0I81N793
 ayEftdCL9zVQ3WCcGVrhviXOMuxvDcimNUrAQiDJBqBSGEjq9AVaejEi1DIOyT3X
 InN1SmZPKno64OuZSFVDF2IuwTAumNXjp886of7LNy76lnznov2VvTXiiJSG/2ZE
 QEyhIq++DWFns+1IerJfU7QjzHEmx3StibBgPYur6K1dN49iHsFBTgRGgKzEhd0L
 iUOQQajg0RoYBhrcS/HQXBQgJhu0Fpkx6csDTtmW66hgw95I6ZoPJshk8SvouRTg
 Lif/U/qOd86Bghwj0V5pw/504LQ9i8xfSZzidM+St8XvhEa/DwuTggVuoIzOB8AH
 6qOExsN75jwExjAR71JsI33jNNTdTQ7HaHaqB1DQJ+2y12QmttNXaPy/iLYqQNsN
 iLeZ6MKdCrXKr73t2Js=
 =wA0V
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fix from Vineet Gupta:
 "Last minute fixes for ARC:

   - build error in Mellanox nps platform

   - addressing lack of saving FPU regs in releavnt configs"

* tag 'arc-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARCv2: entry: save Accumulator register pair (r58:59) if present
  ARC: [plat-eznps] Fix build error
2017-04-25 14:07:24 -07:00
J. Bruce Fields 13bf9fbff0 nfsd: stricter decoding of write-like NFSv2/v3 ops
The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer.  This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative.  Add
checks to catch these.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25 16:36:23 -04:00
J. Bruce Fields db44bac41b nfsd4: minor NFSv2/v3 write decoding cleanup
Use a couple shortcuts that will simplify a following bugfix.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25 16:36:16 -04:00
J. Bruce Fields e6838a29ec nfsd: check for oversized NFSv2/v3 arguments
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply.  This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE).  But a client that sends an incorrectly long reply
can violate those assumptions.  This was observed to cause crashes.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

We may also consider rejecting calls that have any extra garbage
appended.  That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25 16:34:37 -04:00