WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Mike Galbraith	4036ac1567	sched: Fix clock_gettime(CLOCK_[PROCESS/THREAD]_CPUTIME_ID) monotonicity If a task has been dequeued, it has been accounted. Do not project cycles that may or may not ever be accounted to a dequeued task, as that may make clock_gettime() both inaccurate and non-monotonic. Protect update_rq_clock() from slight TSC skew while at it. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: kosaki.motohiro@jp.fujitsu.com Cc: pjt@google.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1403588980.29711.11.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-05 11:17:30 +02:00
Ben Segall	c06f04c704	sched: Fix potential near-infinite distribute_cfs_runtime() loop distribute_cfs_runtime() intentionally only hands out enough runtime to bring each cfs_rq to 1 ns of runtime, expecting the cfs_rqs to then take the runtime they need only once they actually get to run. However, if they get to run sufficiently quickly, the period timer is still in distribute_cfs_runtime() and no runtime is available, causing them to throttle. Then distribute has to handle them again, and this can go on until distribute has handed out all of the runtime 1ns at a time, which takes far too long. Instead allow access to the same runtime that distribute is handing out, accepting that corner cases with very low quota may be able to spend the entire cfs_b->runtime during distribute_cfs_runtime, meaning that the runtime directly handed out by distribute_cfs_runtime was over quota. In addition, if a cfs_rq does manage to throttle like this, make sure the existing distribute_cfs_runtime no longer loops over it again. Signed-off-by: Ben Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140620222120.13814.21652.stgit@sword-of-the-dawn.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-05 11:17:29 +02:00
Viresh Kumar	541b82644d	sched/core: Fix formatting issues in sched_can_stop_tick() sched_can_stop_tick() is using 7 spaces instead of 8 spaces or a 'tab' at the beginning of few lines. Which doesn't align well with the Coding Guidelines. Also remove local variable 'rq' as it is used at only one place and we can directly use this_rq() instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: fweisbec@gmail.com Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/afb781733e4a9ffbced5eb9fd25cc0aa5c6ffd7a.1403596966.git.viresh.kumar@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-05 11:17:28 +02:00
Peter Zijlstra	a77353e5eb	irq_work: Remove BUG_ON in irq_work_run() Because of a collision with `8d056c48e4` ("CPU hotplug, smp: flush any pending IPI callbacks before CPU offline"), which ends up calling hotplug_cfd()->flush_smp_call_function_queue()->irq_work_run(), which is not from IRQ context. And since that already calls irq_work_run() from the hotplug path, remove our entire hotplug handling. Reported-by: Stephen Warren <swarren@wwwdotorg.org> Tested-by: Stephen Warren <swarren@wwwdotorg.org> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-busatzs2gvz4v62258agipuf@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-05 11:17:26 +02:00
Ingo Molnar	51da9830d7	Merge branch 'timers/nohz' into sched/core Merge these two, because upcoming patches will touch both areas. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-07-05 11:06:10 +02:00
Ingo Molnar	d490b3e2c2	Merge branch 'timers/nohz-irq-work-v7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/nohz Pull nohz updates from Frederic Weisbecker: " This set moves the nohz kick, used to notify a full dynticks CPU when events require tick rescheduling, out of the scheduler tick to a dedicated IPI. This debloats a bit the scheduler IPI from off-topic work that was abusing that scheduler fast path for its convenient asynchronous properties. Now the nohz kick uses irq-work for its own needs. Of course this implied quite some background infrastructure rework, including: * Clean up some irq-work internals * Implement remote irq-work * Implement nohz kick on top of remote irq-work * Move full dynticks timer enqueue notification to new kick * Move multi-task notification to new kick * Remove unecessary barriers on multi-task notification " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-18 18:47:03 +02:00
Hillf Danton	5d5e2b1bcb	sched: Fix CACHE_HOT_BUDY condition When computing cache hot, we should check if the migration dst cpu is idle, instead of the current cpu. Though they are same in normal balancing, that is false nowadays in nohz idle balancing at least. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140607090452.4696E301D2@webmail.sinamail.sina.com.cn Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-18 18:29:59 +02:00
Rik van Riel	bb97fc3164	sched/numa: Always try to migrate to preferred node at task_numa_placement() time It is possible that at task_numa_placement() time, the task's numa_preferred_nid does not change, but the task is not actually running on the preferred node at the time. In that case, we still want to attempt migration to the preferred node. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: mgorman@suse.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140604163315.1dbc7b56@cuia.bos.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-18 18:29:58 +02:00
Rik van Riel	a43455a1d5	sched/numa: Ensure task_numa_migrate() checks the preferred node The first thing task_numa_migrate() does is check to see if there is CPU capacity available on the preferred node, in order to move the task there. However, if the preferred node is all busy, we would skip considering that node for tasks swaps in the subsequent loop. This prevents NUMA convergence of tasks on busy systems. However, swapping locations with a task on our preferred nid, when the preferred nid is busy, is perfectly fine. The fix is to also look for a CPU on our preferred nid when it is totally busy. This changes "perf bench numa mem -p 4 -t 20 -m -0 -P 1000" from not converging in 15 minutes on my 4 node system, to converging in 10-20 seconds. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: mgorman@suse.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140604160942.6969b101@cuia.bos.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-06-18 18:29:57 +02:00
Konstantin Khlebnikov	ebe06187bf	epoll: fix use-after-free in eventpoll_release_file This fixes use-after-free of epi->fllink.next inside list loop macro. This loop actually releases elements in the body. The list is rcu-protected but here we cannot hold rcu_read_lock because we need to lock mutex inside. The obvious solution is to use list_for_each_entry_safe(). RCU-ness isn't essential because nobody can change this list under us, it's final fput for this file. The bug was introduced by `ae10b2b4eb` ("epoll: optimize EPOLL_CTL_DEL using rcu") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Stable <stable@vger.kernel.org> # 3.13+ Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-16 17:21:59 -10:00
Frederic Weisbecker	3882ec6439	nohz: Use IPI implicit full barrier against rq->nr_running r/w A full dynticks CPU is allowed to stop its tick when a single task runs. Meanwhile when a new task gets enqueued, the CPU must be notified so that it can restart its tick to maintain local fairness and other accounting details. This notification is performed by way of an IPI. Then when the target receives the IPI, we expect it to see the new value of rq->nr_running. Hence the following ordering scenario: CPU 0 CPU 1 write rq->running get IPI smp_wmb() smp_rmb() send IPI read rq->nr_running But Paul Mckenney says that nowadays IPIs imply a full barrier on all architectures. So we can safely remove this pair and rely on the implicit barriers that come along IPI send/receive. Lets just comment on this new assumption. Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-06-16 16:27:24 +02:00
Frederic Weisbecker	fd2ac4f4a6	nohz: Use nohz own full kick on 2nd task enqueue Now that we have a nohz full remote kick based on irq work, lets use it to notify a CPU that it's exiting single task mode. This unbloats a bit the scheduler IPI that the nohz code was abusing for its cool "callable anywhere/anytime" properties. Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-06-16 16:26:55 +02:00
Frederic Weisbecker	53c5fa16b4	nohz: Switch to nohz full remote kick on timer enqueue When a new timer is enqueued on a full dynticks target, that CPU must re-evaluate the next tick to handle the timer correctly. This is currently performed through the scheduler IPI. Meanwhile this happens at the cost of off-topic workarounds in that fast path to make it call irq_exit(). As we plan to remove this hack off the scheduler IPI, lets use the nohz full kick instead. Pretty much any IPI fits for that job as long at it calls irq_exit(). The nohz full kick just happens to be handy and readily available here. If it happens to be too much an overkill in the future, we can still turn that timer kick into an empty IPI. Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-06-16 16:26:55 +02:00
Frederic Weisbecker	3d36aebc2e	nohz: Support nohz full remote kick Remotely kicking a full nohz CPU in order to make it re-evaluate its next tick is currently implemented using the scheduler IPI. However this bloats a scheduler fast path with an off-topic feature. The scheduler tick was abused here for its cool "callable anywhere/anytime" properties. But now that the irq work subsystem can queue remote callbacks, it's a perfect fit to safely queue IPIs when interrupts are disabled without worrying about concurrent callers. So lets implement remote kick on top of irq work. This is going to be used when a new event requires the next tick to be recalculated: more than 1 task competing on the CPU, timer armed, ... Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-06-16 16:26:54 +02:00
Frederic Weisbecker	4788501606	irq_work: Implement remote queueing irq work currently only supports local callbacks. However its code is mostly ready to run remote callbacks and we have some potential user. The full nohz subsystem currently open codes its own remote irq work on top of the scheduler ipi when it wants a CPU to reevaluate its next tick. However this ad hoc solution bloats the scheduler IPI. Lets just extend the irq work subsystem to support remote queuing on top of the generic SMP IPI to handle this kind of user. This shouldn't add noticeable overhead. Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-06-16 16:26:54 +02:00
Frederic Weisbecker	b93e0b8fa8	irq_work: Split raised and lazy lists An irq work can be handled from two places: from the tick if the work carries the "lazy" flag and the tick is periodic, or from a self IPI. We merge all these works in a single list and we use some per cpu latch to avoid raising a self-IPI when one is already pending. Now we could do away with this ugly latch if only the list was only made of non-lazy works. Just enqueueing a work on the empty list would be enough to know if we need to raise an IPI or not. Also we are going to implement remote irq work queuing. Then the per CPU latch will need to become atomic in the global scope. That's too bad because, here as well, just enqueueing a work on an empty list of non-lazy works would be enough to know if we need to raise an IPI or not. So lets take a way out of this: split the works in two distinct lists, one for the works that can be handled by the next tick and another one for those handled by the IPI. Just checking if the latter is empty when we queue a new work is enough to know if we need to raise an IPI. Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2014-06-16 16:26:53 +02:00
Benjamin Herrenschmidt	68986c9f0f	Revert "offb: Add palette hack for little endian" This reverts commit `e1edf18b20`. This patch was a misguided attempt at fixing offb for LE ppc64 kernels on BE qemu but is just wrong ... it breaks real LE/LE setups, LE with real HW, and existing mixed endian systems that did the fight thing with the appropriate device-tree property. Bad reviewing on my part, sorry. The right fix is to either make qemu change its endian when the guest changes endian (working on that) or to use the existing foreign endian support. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.13+] ---	2014-06-16 19:45:45 +10:00
Linus Torvalds	7171511eae	Linux 3.16-rc1	2014-06-15 17:45:28 -10:00
Linus Torvalds	a9be22425e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix checksumming regressions, from Tom Herbert. 2) Undo unintentional permissions changes for SCTP rto_alpha and rto_beta sysfs knobs, from Denial Borkmann. 3) VXLAN, like other IP tunnels, should advertize it's encapsulation size using dev->needed_headroom instead of dev->hard_header_len. From Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: sctp: fix permissions for rto_alpha and rto_beta knobs vxlan: Checksum fixes net: add skb_pop_rcv_encapsulation udp: call __skb_checksum_complete when doing full checksum net: Fix save software checksum complete net: Fix GSO constants to match NETIF flags udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup vxlan: use dev->needed_headroom instead of dev->hard_header_len MAINTAINERS: update cxgb4 maintainer	2014-06-15 16:37:03 -10:00
Linus Torvalds	dd1845af24	This pull request contains the second half the of the clk changes for 3.16. They are simply fixes and code refactoring for the OMAP clock drivers. The sunxi clock driver changes include splitting out the one mega-driver into several smaller pieces and adding support for the A31 SoC clocks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJTnHqfAAoJEDqPOy9afJhJnI0P/1PvRHx7bmwNAD8b09pAVm2u xTmhiH+zfHcRtKivKCAxFQ4FlkS3v69RB9FC+s6FIgn984K3FjkHRW2zgqe3K2h3 7tj6EoT6XJ6szK4AWDy/GVqekRF9kyADexSiYI4rIRP0rnSswvBKHZ485OR06Fs+ Jls0EMbGOEzMyB/B+pDNnTOznZOSd+lZbBznSh1zG+8QHQEzXwxPRr+G0/jxneO/ rTqUvDRqGC709YIaa+oBCH5ez/wVwrU68u/CpmrLQIPdFfaWl7YhYy/ZicwwJprE Oi1AlQpRoBe1yYIz6oJ//+4D6b9Y/e6cqG4P37VhF6PiD9yDyN+ycEtGMqxNXjIa OMGlairEU6V43ZrP/wDWvX6NLP7LCEqOG/PSo8zjuoZ/G1kw2jo6firRI5TVR/bY uARHkBTUYQGjvwBU3QoLuHf+pOPAeBXfYVsi2n/b+HSueXkPQW+HdH4erktlahPh 2xkVhEDbMfCOeovOGcZhsQ8aDUIDUjZTJE7uU633DjsHY7P96OTRBHF8qirNpuOx 0GkAVOsFBU7wMt8tcO4it00i7z6PEKwqDIZBNQVq2F2DnOS9WTTcop7dmYPz95qp 8qTZIN++ROWaxok0H5SL7ER22GIJlTuGGynwPK5Aa/6v193rUW9pEZPlr7wYSf8u RwP/J6OfN9t/rKxCsFCj =9/Iv -----END PGP SIGNATURE----- Merge tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux Pull more clock framework updates from Mike Turquette: "This contains the second half the of the clk changes for 3.16. They are simply fixes and code refactoring for the OMAP clock drivers. The sunxi clock driver changes include splitting out the one mega-driver into several smaller pieces and adding support for the A31 SoC clocks" * tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits) clk: sunxi: document PRCM clock compatible strings clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support clk: sun6i: Protect SDRAM gating bit clk: sun6i: Protect CPU clock clk: sunxi: Rework clock protection code clk: sunxi: Move the GMAC clock to a file of its own clk: sunxi: Move the 24M oscillator to a file of its own clk: sunxi: Remove calls to clk_put clk: sunxi: document new A31 USB clock compatible clk: sunxi: Implement A31 USB clock ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC) CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic) dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock CLK: TI: gate: add composite interface clock to OMAP2 only build ARM: OMAP2: clock: add DT boot support for cpufreq_ck CLK: TI: OMAP2: add clock init support ...	2014-06-15 16:02:20 -10:00
Linus Torvalds	b55b390202	Merge git://git.infradead.org/users/willy/linux-nvme Pull NVMe update from Matthew Wilcox: "Mostly bugfixes again for the NVMe driver. I'd like to call out the exported tracepoint in the block layer; I believe Keith has cleared this with Jens. We've had a few reports from people who're really pounding on NVMe devices at scale, hence the timeout changes (and new module parameters), hotplug cpu deadlock, tracepoints, and minor performance tweaks" [ Jens hadn't seen that tracepoint thing, but is ok with it - it will end up going away when mq conversion happens ] * git://git.infradead.org/users/willy/linux-nvme: (22 commits) NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. NVMe: Use Log Page constants in SCSI emulation NVMe: Define Log Page constants NVMe: Fix hot cpu notification dead lock NVMe: Rename io_timeout to nvme_io_timeout NVMe: Use last bytes of f/w rev SCSI Inquiry NVMe: Adhere to request queue block accounting enable/disable NVMe: Fix nvme get/put queue semantics NVMe: Delete NVME_GET_FEAT_TEMP_THRESH NVMe: Make admin timeout a module parameter NVMe: Make iod bio timeout a parameter NVMe: Prevent possible NULL pointer dereference NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD) NVMe: Update data structures for NVMe 1.2 NVMe: Enable BUILD_BUG_ON checks NVMe: Update namespace and controller identify structures to the 1.1a spec NVMe: Flush with data support NVMe: Configure support for block flush NVMe: Add tracepoints NVMe: Protect against badly formatted CQEs ...	2014-06-15 15:58:03 -10:00
Daniel Borkmann	b58537a1f5	net: sctp: fix permissions for rto_alpha and rto_beta knobs Commit `3fd091e73b` ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * \|SRTT - R'\| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: `3fd091e73b` ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:17:32 -07:00
David S. Miller	e4f7ae930a	Merge branch 'csum_fixes' Tom Herbert says: ==================== Fixes related to some recent checksum modifications. - Fix GSO constants to match NETIF flags - Fix logic in saving checksum complete in __skb_checksum_complete - Call __skb_checksum_complete from UDP if we are checksumming over whole packet in order to save checksum. - Fixes to VXLAN to work correctly with checksum complete ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:00:56 -07:00
Tom Herbert	f79b064c15	vxlan: Checksum fixes Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet header to work properly with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:00:50 -07:00
Tom Herbert	e5eb4e30a5	net: add skb_pop_rcv_encapsulation This function is used by UDP encapsulation protocols in RX when crossing encapsulation boundary. If ip_summed is set to CHECKSUM_UNNECESSARY and encapsulation is not set, change to CHECKSUM_NONE since the checksum has not been validated within the encapsulation. Clears csum_valid by the same rationale. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:00:50 -07:00
Tom Herbert	bbdff225ed	udp: call __skb_checksum_complete when doing full checksum In __udp_lib_checksum_complete check if checksum is being done over all the data (len is equal to skb->len) and if it is call __skb_checksum_complete instead of __skb_checksum_complete_head. This allows checksum to be saved in checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:00:49 -07:00
Tom Herbert	46fb51eb96	net: Fix save software checksum complete Geert reported issues regarding checksum complete and UDP. The logic introduced in commit `7e3cead517` ("net: Save software checksum complete") is not correct. This patch: 1) Restores code in __skb_checksum_complete_header except for setting CHECKSUM_UNNECESSARY. This function may be calculating checksum on something less than skb->len. 2) Adds saving checksum to __skb_checksum_complete. The full packet checksum 0..skb->len is calculated without adding in pseudo header. This value is saved in skb->csum and then the pseudo header is added to that to derive the checksum for validation. 3) In both __skb_checksum_complete_header and __skb_checksum_complete, set skb->csum_valid to whether checksum of zero was computed. This allows skb_csum_unnecessary to return true without changing to CHECKSUM_UNNECESSARY which was done previously. 4) Copy new csum related bits in __copy_skb_header. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:00:49 -07:00
Tom Herbert	4b28252cad	net: Fix GSO constants to match NETIF flags Joseph Gasparakis reported that VXLAN GSO offload stopped working with i40e device after recent UDP changes. The problem is that the SKB_GSO_* bits are out of sync with the corresponding NETIF flags. This patch fixes that. Also, we add BUILD_BUG_ONs in net_gso_ok for several GSO constants that were missing to avoid the problem in the future. Reported-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-15 01:00:49 -07:00
Linus Torvalds	abf04af74a	SCSI for-linus on 20140613 This is just a couple of drivers (hpsa and lpfc) that got left out for further testing in linux-next. We also have one fix to a prior submission (qla2xxx sparse). Signed-off-by: James Bottomley <JBottomley@Parallels.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJTm48MAAoJEDeqqVYsXL0M1YEH/iZyEILT4EIZxre/tspqX/LB dxtGlmlF8AEU8/Eze3k/OB5nSuGcnYZ1hN1CgT2zZEv+sih6FekQOQV06qTwzwbo DnWA3dOrPVgMzzSVvXFEjryroIUNhZvMy8TGu+DefE9b6FUs6B3VZlMR3A+TcSgV cgknkG2Q6mWN8rO44pTSVlVDe2JpkvCYsHnqhO8uneQXVHNtsPpV7FfoLMLjBUDX dgsaDiUjyrj0sdR1yOgRjDH68FPewEiEONdtKi63kkI6zWDFASiKDY9yc1eIyjVd /1gbBJxwTRl4dWEdsigr/pOBxs6yjXGBSl/6PPDtuvdpWLFWUg4C2XtDLz0KLfU= =tdDT -----END PGP SIGNATURE----- Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull more SCSI updates from James Bottomley: "This is just a couple of drivers (hpsa and lpfc) that got left out for further testing in linux-next. We also have one fix to a prior submission (qla2xxx sparse)" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (36 commits) qla2xxx: fix sparse warnings introduced by previous target mode t10-dif patch lpfc: Update lpfc version to driver version 10.2.8001.0 lpfc: Fix ExpressLane priority setup lpfc: mark old devices as obsolete lpfc: Fix for initializing RRQ bitmap lpfc: Fix for cleaning up stale ring flag and sp_queue_event entries lpfc: Update lpfc version to driver version 10.2.8000.0 lpfc: Update Copyright on changed files from 8.3.45 patches lpfc: Update Copyright on changed files lpfc: Fixed locking for scsi task management commands lpfc: Convert runtime references to old xlane cfg param to fof cfg param lpfc: Fix FW dump using sysfs lpfc: Fix SLI4 s abort loop to process all FCP rings and under ring_lock lpfc: Fixed kernel panic in lpfc_abort_handler lpfc: Fix locking for postbufq when freeing lpfc: Fix locking for lpfc_hba_down_post lpfc: Fix dynamic transitions of FirstBurst from on to off hpsa: fix handling of hpsa_volume_offline return value hpsa: return -ENOMEM not -1 on kzalloc failure in hpsa_get_device_id hpsa: remove messages about volume status VPD inquiry page not supported ...	2014-06-14 19:49:48 -05:00
Linus Torvalds	16d52ef7c0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull more btrfs updates from Chris Mason: "This has a few fixes since our last pull and a new ioctl for doing btree searches from userland. It's very similar to the existing ioctl, but lets us return larger items back down to the app" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix error handling in create_pending_snapshot btrfs: fix use of uninit "ret" in end_extent_writepage() btrfs: free ulist in qgroup_shared_accounting() error path Btrfs: fix qgroups sanity test crash or hang btrfs: prevent RCU warning when dereferencing radix tree slot Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting btrfs: new ioctl TREE_SEARCH_V2 btrfs: tree_search, search_ioctl: direct copy to userspace btrfs: new function read_extent_buffer_to_user btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer btrfs: tree_search, search_ioctl: accept varying buffer btrfs: tree_search: eliminate redundant nr_items check	2014-06-14 19:48:43 -05:00
Linus Torvalds	a311c48038	Merge git://git.kvack.org/~bcrl/aio-next Pull aio fix and cleanups from Ben LaHaise: "This consists of a couple of code cleanups plus a minor bug fix" * git://git.kvack.org/~bcrl/aio-next: aio: cleanup: flatten kill_ioctx() aio: report error from io_destroy() when threads race in io_destroy() fs/aio.c: Remove ctx parameter in kiocb_cancel	2014-06-14 19:43:27 -05:00
Al Viro	05064084e8	fix __swap_writepage() compile failure on old gcc versions Tetsuo Handa wrote: "Commit `62a8067a7f` ("bio_vec-backed iov_iter") introduced an unnamed union inside a struct which gcc-4.4.7 cannot handle. Name the unnamed union as u in order to fix build failure" Let's do this instead: there is only one place in the entire tree that steps into this breakage. Anon structs and unions work in older gcc versions; as the matter of fact, we have those in the tree - see e.g. struct ieee80211_tx_info in include/net/mac80211.h What doesn't work is handling their initializers: struct { int a; union { int b; char c; }; } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}}; is the obvious syntax for initializer, perfectly fine for C11 and handled correctly by gcc-4.7 or later. Earlier versions, though, break on it - declaration is fine and so's access to fields (i.e. x[0].c = 'a'; would produce the right code), but members of the anon structs and unions are not inserted into the right namespace. Tellingly, those older versions will not barf on struct {int a; struct {int a;};}; - looks like they just have it hacked up somewhere around the handling of . and -> instead of doing the right thing. The easiest way to deal with that crap is to turn initialization of those fields (in the only place where we have such initializer of iov_iter) into plain assignment. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-14 19:30:48 -05:00
Linus Torvalds	4a54e5e517	HSI Fixes for the v3.16 series: * Tighten Dependency between ssi-protocol and omap-ssi to fix build failures with randconfig. * Use normal module refcounting in omap driver to fix build with disabled module support. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJTm50dAAoJENju1/PIO/qaI54P/ix0jMNYUJTHgEuPa8uifJY8 ZJBvE1jdb9k4keOaQvD5d0B0ExEBzfaBKzmSIGOlfREPcR2o7m20psLNXkkfsSbj 6jquDEp7ObOGgGdQ+3OebXRE+4qZm91H5AmX+8VMPbcxhcjLYvfn73T6dCbl/Wo/ y0VY2gGdUGQ0uQLcQ8WIeVah0mlmQ2lVbpVakG9cfDE+0yVYzb86xvepBvqzeMei 0xGmJo/dXQegLpS//uqSW6S9ds6BFPBvptLJjjQ1wOGdcBxe6ADkcu9VYZFv0FaN XnD10FaKnRZROYTAC+9w7XksT4WsAwuuGRySrn2H12Da5XCxjTrGMCUnNgKc4HhO cERQDdgtBe8+8wPD7kTnhYSzWWqQTBelwucmTuO1jecIa3vC6DA8UuMPKLE7K8Qs g7MelhcT7aw3Clmgbvg11oH7YAfFnis9/fJ3Bq2wgKivfbEik++BjE1P8lVB2uVK UXrLsEgSwEDQV3wLW4bpHO1NO8XtVFmkoBoxCWRKOYouhVlkucyt8HYi1pPwnhcq hjxtXN7pUgf7lnFeeS7CH5xbZSkIkBHjUS3mmTPr5AKgsqYpNlyP8jMt7GTTZGXX LzOS0VDAi73vl08k1yiLRDAhu7iZwKMk8+arTP3iYhzmk7OI/9gMB+pRf+zPGp6B ADWLvkEREuu1zw6Ob4wp =BKDp -----END PGP SIGNATURE----- Merge tag 'hsi-for-3.16-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI build fixes from Sebastian Reichel: - tighten dependency between ssi-protocol and omap-ssi to fix build failures with randconfig. - use normal module refcounting in omap driver to fix build with disabled module support * tag 'hsi-for-3.16-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: hsi: omap_ssi_port: use normal module refcounting HSI: fix omap ssi driver dependency	2014-06-14 14:51:25 -07:00
Linus Torvalds	1ad96bb0a2	A first GPIO fix for the v3.16 series, this was serious since it blocks the OMAP boot. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTm3A3AAoJEEEQszewGV1zFMoQAKFQR8bkDzG0iyzuQ6h6921Y Lj7KgIlwQvLNqHIz/MznVGprhIkUwtIs9BMeUzkkDhuhjWtXnMU7JgUMRAch4gwL aU2IlNkhYQlXRRJpZbCaOUYZ8veIJR+Ax0qnNvGF+rwqCCYi0sHJGAEKiCuprZNI dQcNG0AtSEmysV9pW+ikVnN6ehblussaHCfTdG8xf+kuzcEW5n0QpbN7YiwIBcFW bmZdELxDEBPA+zWGw2STZ0rCLc4PI3hdVmD84sIClbg0mLuE75piHS+MHM36aEGJ GUTFyI1DZRjam6KH3L1GhaQsYS6NwfTumoArjf/8O/xBe+WTt6roauWWn6wavk+m W64Syj8yibfhcI28ZdH6lvS0uwvqWrUeApgihDagoJsVmb5NOP+eMRXVRZsY3k0n qtO3AzwL77b95CUwuuWMW/4wmBsdiN+KPunCVgHXz41FBsL9wmPv1gHIKh80NJCP WKi++fSjbKeJZ5p5l8xQ2J04CbiYeJic6dr8+4KldI/RtO12rUQ/C2tv1NiXnv1l YvOqSVRuM0JOnbk2SRzT0HIEJ3I36t+QE5DWxQawAv2RIrWlWF1kKlZZ7PfBywxl l2GTo7eHusHU4CVxYOL63r+NFanjDbLUrLgZZ8VsGYc9FWgAoo4eiqku2iNFVMuz V/zxEhDuhKqN/gnAVkNI =6SvI -----END PGP SIGNATURE----- Merge tag 'gpio-v3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fix from Linus Walleij: "A first GPIO fix for the v3.16 series, this was serious since it blocks the OMAP boot. Sending you this vital fix before leaving for a short vacation so it does not sit collecting dust in my tree for no good reason. Apart from this, our v3.16 cycle looks like a good start" * tag 'gpio-v3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: of: Fix handling for deferred probe for -gpio suffix	2014-06-14 14:49:51 -07:00
Linus Torvalds	c728762e06	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso fixes from Peter Anvin: "Fixes for x86/vdso. One is a simple build fix for bigendian hosts, one is to make "make vdso_install" work again, and the rest is about working around a bug in Google's Go language -- two are documentation patches that improves the sample code that the Go coders took, modified, and broke; the other two implements a workaround that keeps existing Go binaries from segfaulting at least" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Fix vdso_install x86/vdso: Hack to keep 64-bit Go programs working x86/vdso: Add PUT_LE to store little-endian values x86/vdso/doc: Make vDSO examples more portable x86/vdso/doc: Rename vdso_test.c to vdso_standalone_test_x86.c x86, vdso: Remove one final use of htole16()	2014-06-14 14:46:29 -07:00
Linus Torvalds	503698e12d	New driver for Sensirion SHTC1 humidity / temperature sensor Convert ltc4151 and vexpress drivers to use devm functions Drop generic chip detection from lm85 driver Avoid forward declarations in atxp1 driver Fix sign extensions in ina2xx driver -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJTmckcAAoJEMsfJm/On5mBjioP/A0gxpRxfC28kuIj2R2k6iQg F+13HggU6y8AfVg1+ITDVlt29IPsyr4C48OE3Vi7NkASON0uBykE2Yaw3fOxnlDn DHzOXPzB2sZimvrOVeb0ZwYaTfe3nu/MwwFTFOyoIdImZVCvluJ0ZaeEDuboSHty q8grWKxHw6bxJJFUM1c1Q2xZm/yr5LL9WXHBIDFhv4/ZUOcpOhfLPPiXpD/KpPsM dLfumbgyUOCbQraX+5p+8PovcnUa7NS3sQHKvQGm9M+K1pXMJubmtzUn3huga1zx ZrV0zlZpwANhsuOyEVPuceOuhCyD9a9C7mRhL3AMPtJw2fiIzhctxGuSJ5hkpm4i okzpyD2eX7UsKWNJcIAbFsBdsFBaZV1kN7A3ERdnSiHwkvUnxExWUKY3DLTaja+A ilY3+AkRMqy/BfVd2Odjh2EkF3YaWhSUU5YLZipkGkHAJU5aAt9Kde7U+nhJJ5Oe AHt/q8buDkH82X+F5rS7L60KXKARnaGQks3GsqxMopo+ZL3dVGyAgJ4TJ/5RQrPa K6yflRV5u2gi1dBZN8Al3118MCNrsqNhkwWCBxY3xAnRaDWeI5EdZKT2nho1aaNw 06sE8Q5yAaLeFxRtkQuYQew2PVpV8RLKMfDMo+yqQkYKQWE9rRfQNVAdQpFNhpPN SuU1GSjWmAs+cjI1MULZ =mv/m -----END PGP SIGNATURE----- Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: - new driver for Sensirion SHTC1 humidity / temperature sensor - convert ltc4151 and vexpress drivers to use devm functions - drop generic chip detection from lm85 driver - avoid forward declarations in atxp1 driver - fix sign extensions in ina2xx driver * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: vexpress: Use devm helper for hwmon device registration hwmon: (atxp1) Avoid forward declaration hwmon: add support for Sensirion SHTC1 sensor hwmon: (ltc4151) Convert to devm_hwmon_device_register_with_groups hwmon: (lm85) Drop generic detection hwmon: (ina2xx) Cast to s16 on shunt and current regs	2014-06-14 14:43:23 -07:00
Eric Dumazet	63c6f81cdd	udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup Its too easy to add thousand of UDP sockets on a particular bucket, and slow down an innocent multicast receiver. Early demux is supposed to be an optimization, we should avoid spending too much time in it. It is interesting to note __udp4_lib_demux_lookup() only tries to match first socket in the chain. 10 is the threshold we already have in __udp4_lib_lookup() to switch to secondary hash. Fixes: `421b3885bf` ("udp: ipv4: Add udp early demux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Held <drheld@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-13 15:39:24 -07:00
Cong Wang	2853af6a2e	vxlan: use dev->needed_headroom instead of dev->hard_header_len When we mirror packets from a vxlan tunnel to other device, the mirror device should see the same packets (that is, without outer header). Because vxlan tunnel sets dev->hard_header_len, tcf_mirred() resets mac header back to outer mac, the mirror device actually sees packets with outer headers Vxlan tunnel should set dev->needed_headroom instead of dev->hard_header_len, like what other ip tunnels do. This fixes the above problem. Cc: "David S. Miller" <davem@davemloft.net> Cc: stephen hemminger <stephen@networkplumber.org> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-13 15:27:59 -07:00
Dimitris Michailidis	56f16c74ca	MAINTAINERS: update cxgb4 maintainer Hari's been doing the patch submissions for a while now and he'll be taking over as maintainer. Signed-off-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-13 14:55:43 -07:00
Andy Lutomirski	a934fb5bc9	x86/vdso: Fix vdso_install "make vdso_install" installs unstripped versions of the vdso objects for the benefit of the debugger. This was broken by checkin: `6f121e548f` x86, vdso: Reimplement vdso.so preparation in build-time C The filenames are different now, so update the Makefile to cope. This still installs the 64-bit vdso as vdso64.so. We believe this will be okay, as the only known user is a patched gdb which is known to use build-ids, but if it turns out to be a problem we may have to add a link. Inspired by a patch from Sam Ravnborg. Acked-by: Sam Ravnborg <sam@ravnborg.org> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Tested-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/b10299edd8ba98d17e07dafcd895b8ecf4d99eff.1402586707.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2014-06-13 10:31:48 -07:00
Dan McLeran	b8e080847a	NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. This patch contains several fixes for Scsi START_STOP_UNIT. The previous code did not account for signed vs. unsigned arithmetic which resulted in an invalid lowest power state caculation when the device only supports 1 power state. The code for Power Condition == 2 (Idle) was not following the spec. The spec calls for setting the device to specific power states, depending upon Power Condition Modifier, without accounting for the number of power states supported by the device. The code for Power Condition == 3 (Standby) was using a hard-coded '0' which is replaced with the macro POWER_STATE_0. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-13 13:11:00 -04:00
Eric Sandeen	47a306a748	btrfs: fix error handling in create_pending_snapshot `fcebe456` cut and pasted some code to a later point in create_pending_snapshot(), but didn't switch to the appropriate error handling for this stage of the function. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:30 -07:00
Eric Sandeen	3e2426bd0e	btrfs: fix use of uninit "ret" in end_extent_writepage() If this condition in end_extent_writepage() is false: if (tree->ops && tree->ops->writepage_end_io_hook) we will then test an uninitialized "ret" at: ret = ret < 0 ? ret : -EIO; The test for ret is for the case where ->writepage_end_io_hook failed, and we'd choose that ret as the error; but if there is no ->writepage_end_io_hook, nothing sets ret. Initializing ret to 0 should be sufficient; if writepage_end_io_hook wasn't set, (!uptodate) means non-zero err was passed in, so we choose -EIO in that case. Signed-of-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:28 -07:00
Eric Sandeen	d737278091	btrfs: free ulist in qgroup_shared_accounting() error path If tmp = ulist_alloc(GFP_NOFS) fails, we return without freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS) and cause a memory leak. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:26 -07:00
Filipe Manana	b050f9f6dd	Btrfs: fix qgroups sanity test crash or hang Often when running the qgroups sanity test, a crash or a hang happened. This is because the extent buffer the test uses for the root node doesn't have an header level explicitly set, making it have a random level value. This is a problem when it's not zero for the btrfs_search_slot() calls the test ends up doing, resulting in crashes or hangs such as the following: [ 6454.127192] Btrfs loaded, debug=on, assert=on, integrity-checker=on (...) [ 6454.127760] BTRFS: selftest: Running qgroup tests [ 6454.127964] BTRFS: selftest: Running test_test_no_shared_qgroup [ 6454.127966] BTRFS: selftest: Qgroup basic add [ 6480.152005] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:5383] [ 6480.152005] Modules linked in: btrfs(+) xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc i2c_piix4 i2c_core pcspkr evbug psmouse serio_raw e1000 [last unloaded: btrfs] [ 6480.152005] irq event stamp: 188448 [ 6480.152005] hardirqs last enabled at (188447): [<ffffffff8168ef5c>] restore_args+0x0/0x30 [ 6480.152005] hardirqs last disabled at (188448): [<ffffffff81698e6a>] apic_timer_interrupt+0x6a/0x80 [ 6480.152005] softirqs last enabled at (188446): [<ffffffff810516cf>] __do_softirq+0x1cf/0x450 [ 6480.152005] softirqs last disabled at (188441): [<ffffffff81051c25>] irq_exit+0xb5/0xc0 [ 6480.152005] CPU: 0 PID: 5383 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4 [ 6480.152005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 6480.152005] task: ffff8802146125a0 ti: ffff8800d0d00000 task.ti: ffff8800d0d00000 [ 6480.152005] RIP: 0010:[<ffffffff81349a63>] [<ffffffff81349a63>] __write_lock_failed+0x13/0x20 [ 6480.152005] RSP: 0018:ffff8800d0d038e8 EFLAGS: 00000287 [ 6480.152005] RAX: 0000000000000000 RBX: ffffffff8168ef5c RCX: 000005deb8525852 [ 6480.152005] RDX: 0000000000000000 RSI: 0000000000001d45 RDI: ffff8802105000b8 [ 6480.152005] RBP: ffff8800d0d038e8 R08: fffffe12710f63db R09: ffffffffa03196fb [ 6480.152005] R10: ffff8802146125a0 R11: ffff880214612e28 R12: ffff8800d0d03858 [ 6480.152005] R13: 0000000000000000 R14: ffff8800d0d00000 R15: ffff8802146125a0 [ 6480.152005] FS: 00007f14ff804700(0000) GS:ffff880215e00000(0000) knlGS:0000000000000000 [ 6480.152005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 6480.152005] CR2: 00007fff4df0dac8 CR3: 00000000d1796000 CR4: 00000000000006f0 [ 6480.152005] Stack: [ 6480.152005] ffff8800d0d03908 ffffffff810ae967 0000000000000001 ffff8802105000b8 [ 6480.152005] ffff8800d0d03938 ffffffff8168e57e ffffffffa0319c16 0000000000000007 [ 6480.152005] ffff880210500000 ffff880210500100 ffff8800d0d039b8 ffffffffa0319c16 [ 6480.152005] Call Trace: [ 6480.152005] [<ffffffff810ae967>] do_raw_write_lock+0x47/0xa0 [ 6480.152005] [<ffffffff8168e57e>] _raw_write_lock+0x5e/0x80 [ 6480.152005] [<ffffffffa0319c16>] ? btrfs_tree_lock+0x116/0x270 [btrfs] [ 6480.152005] [<ffffffffa0319c16>] btrfs_tree_lock+0x116/0x270 [btrfs] [ 6480.152005] [<ffffffffa02b2acb>] btrfs_lock_root_node+0x3b/0x50 [btrfs] [ 6480.152005] [<ffffffffa02b81a6>] btrfs_search_slot+0x916/0xa20 [btrfs] [ 6480.152005] [<ffffffff811a727f>] ? create_object+0x23f/0x300 [ 6480.152005] [<ffffffffa02b9958>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [ 6480.152005] [<ffffffffa036041a>] insert_normal_tree_ref.constprop.4+0xa2/0x19a [btrfs] [ 6480.152005] [<ffffffffa03605c3>] test_no_shared_qgroup+0xb1/0x1ca [btrfs] [ 6480.152005] [<ffffffff8108cad6>] ? local_clock+0x16/0x30 [ 6480.152005] [<ffffffffa035ef8e>] btrfs_test_qgroups+0x1ae/0x1d7 [btrfs] [ 6480.152005] [<ffffffffa03a69d2>] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs] [ 6480.152005] [<ffffffffa03a6a86>] init_btrfs_fs+0xb4/0x153 [btrfs] [ 6480.152005] [<ffffffff81000352>] do_one_initcall+0x102/0x150 [ 6480.152005] [<ffffffff8103d223>] ? set_memory_nx+0x43/0x50 [ 6480.152005] [<ffffffff81682668>] ? set_section_ro_nx+0x6d/0x74 [ 6480.152005] [<ffffffff810d91cc>] load_module+0x1cdc/0x2630 (...) Therefore initialize the extent buffer as an empty leaf (level 0). Issue easy to reproduce when btrfs is built as a module via: $ for ((i = 1; i <= 1000000; i++)); do rmmod btrfs; modprobe btrfs; done Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:24 -07:00
Sasha Levin	f1e3c28949	btrfs: prevent RCU warning when dereferencing radix tree slot Mark the dereference as protected by lock. Not doing so triggers an RCU warning since the radix tree assumed that RCU is in use. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:22 -07:00
Wang Shilong	5fbc7c59fd	Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting Steps to reproduce: # mkfs.btrfs -f /dev/sd[b-f] -m raid5 -d raid5 # mkfs.ext4 /dev/sdc --->corrupt one of btrfs device # mount /dev/sdb /mnt -o degraded # btrfs scrub start -BRd /mnt This is because readahead would skip missing device, this is not true for RAID5/6, because REQ_GET_READ_MIRRORS return 1 for RAID5/6 block mapping. If expected data locates in missing device, readahead thread would not call __readahead_hook() which makes event @rc->elems=0 wait forever. Fix this problem by checking return value of btrfs_map_block(),we can only skip missing device safely if there are several mirrors. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2014-06-13 09:52:21 -07:00
Gerhard Heift	cc68a8a5a4	btrfs: new ioctl TREE_SEARCH_V2 This new ioctl call allows the user to supply a buffer of varying size in which a tree search can store its results. This is much more flexible if you want to receive items which are larger than the current fixed buffer of 3992 bytes or if you want to fetch more items at once. Items larger than this buffer are for example some of the type EXTENT_CSUM. Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>	2014-06-13 09:52:19 -07:00
Matthew Wilcox	ef351b97de	NVMe: Use Log Page constants in SCSI emulation The nvme-scsi file defined its own Log Page constant. Use the newly-defined one from the header file instead. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-13 10:54:21 -04:00
Matthew Wilcox	3d69bb6e46	NVMe: Define Log Page constants Taken from the 1.1a version of the spec Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-13 10:53:49 -04:00

1 2 3 4 5 ...

454911 Коммитов Все ветки Поиск

454911 Коммитов

Все ветки