Граф коммитов

456380 Коммитов

Автор SHA1 Сообщение Дата
Linus Torvalds bcf44bfe5e Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "A cpufreq lockup fix and a compiler warning fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix compiler warnings
  x86, tsc: Fix cpufreq lockup
2014-07-16 10:11:02 -10:00
Linus Torvalds d14aef3872 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Tooling fixes and an Intel PMU driver fixlet"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Do not allow optimized switch for non-cloned events
  perf/x86/intel: ignore CondChgd bit to avoid false NMI handling
  perf symbols: Get kernel start address by symbol name
  perf tools: Fix segfault in cumulative.callchain report
2014-07-16 10:10:27 -10:00
Linus Torvalds 2da2944740 sound fixes for 3.16-rc6
Things seem to calm down so far, just a small few HD-audio fixes
 (regression fixes and a new codec ID addition) popping up.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJTxlkHAAoJEGwxgFQ9KSmk0EEP/iKDIFbGzlQeH/X6G0pwnk9+
 Cl+j8KBUQfevV49SCGZm+h8NHF+RpnV3OrVHQFH+a1AwZqYRA0pEs4sDjRarcSud
 fwMNqe/AUmTtyazeF2fSvNokbzHLlWaWG/+vLzxsvvsGlu2tNppNPK6WYrQyRp/G
 8ZCruUQz+Cl/GY4VN2Cx4zV7xNkcyUQnbnL3Uo8EM2hZKK4I2YdtQPc8rrD4+DzG
 YPDtbton4Sb0xPOnAcmmmuXQDM3rSh0xMK8ZMwfDeruLapXhY1z25PbugUvet2Xx
 gkROW9JfseEjvJKT55sYbrj+jEFlIPAeom/BSVJBerJmYs+jbDkpCzlbtzU5dT0R
 yUp6v5Z9VYmIkExJZOFmJVq9D2UbCd1b4kKZdwExvCtVPJ8Ce3HhCvrB5MJWl7vo
 nA3Z7pPXeqhBmxBHT1oREREn5OydRR1MfERNEtxuVWrg7jl9ugUGHBc3/F24K2Ia
 Xkbnh7orTmUZy9LeMA9BTUFKGmAKr/phju8RRahv846TFnd98NSK8g1F+Gi1UEfB
 4ZqG9fJJBhwzm2ZIh8ocMOF7fVJsTOPurPkfL3PbosZ8J/7qsh3unVihrHuLDbC+
 0sM3JS7kG6IgMCgyB8+697AtAP+jw1sAz/gDG+bjXx2BA2HtycLlb5mXwB8Jvg0y
 K9XlJ6urPLsQGKsxCTO4
 =Ua+N
 -----END PGP SIGNATURE-----

Merge tag 'sound-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Things seem to calm down so far, just a small few HD-audio fixes
  (regression fixes and a new codec ID addition) popping up"

* tag 'sound-3.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Fix broken PM due to incomplete i915 initialization
  ALSA: hda - Revert stream assignment order for Intel controllers
  ALSA: hda - Add new GPU codec ID 0x10de0070 to snd-hda
  ALSA: hda: Fix build warning
2014-07-16 06:48:08 -10:00
Viresh Kumar 1bf8cc3d01 cpufreq: cpu0: OPPs can be populated at runtime
OPPs can be populated statically, via DT, or added at run time with
dev_pm_opp_add().

While this driver handles the first case correctly, it would fail to populate
OPPs added at runtime. Because call to of_init_opp_table() would fail as there
are no OPPs in DT and probe will return early.

To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table()
unconditionally.

Update bindings as well.

Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16 15:12:52 +02:00
Quentin Armitage 2fa1adc070 cpufreq: kirkwood: Reinstate cpufreq driver for ARCH_KIRKWOOD
Commit ff1f0018cf ("drivers: Enable
building of Kirkwood drivers for mach-mvebu") added Kirkwood into
mach-mvebu, adding MACH_KIRKWOOD to ARCH_KIRKWOOD in the KConfig files.

The change for ARM_KIRKWOOD_CPUFREQ replaced ARCH_KIRKWOOD with
MACH_KIRKWOOD, whereas all the other changes were ARCH_KIRKWOOD ||
MACH_KIRKWOOD.

As a consequence of this change, the cpufreq driver is no longer enabled
for ARCH_KIRKWOOD. This patch reinstates ARM_KIRKWOOD_CPUFREQ for
ARCH_KIRKWOOD.

Signed-off-by: Quentin Armitage <quentin@armitage.org.uk>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16 15:12:03 +02:00
Ingo Molnar 9de8033f1b Merge branch 'liblockdep-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux into locking/urgent
Pull liblockdep fixes from Sasha Levin.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 14:57:27 +02:00
Davidlohr Bueso 5db6c6fefb locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER
Just like with mutexes (CONFIG_MUTEX_SPIN_ON_OWNER),
encapsulate the dependencies for rwsem optimistic spinning.
No logical changes here as it continues to depend on both
SMP and the XADD algorithm variant.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Jason Low <jason.low2@hp.com>
[ Also make it depend on ARCH_SUPPORTS_ATOMIC_RMW. ]
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1405112406-13052-2-git-send-email-davidlohr@hp.com
Cc: aswin@hp.com
Cc: Chris Mason <clm@fb.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 14:57:13 +02:00
Peter Zijlstra 4badad352a locking/mutex: Disable optimistic spinning on some architectures
The optimistic spin code assumes regular stores and cmpxchg() play nice;
this is found to not be true for at least: parisc, sparc32, tile32,
metag-lock1, arc-!llsc and hexagon.

There is further wreckage, but this in particular seemed easy to
trigger, so blacklist this.

Opt in for known good archs.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: John David Anglin <dave.anglin@bell.net>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: stable@vger.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/20140606175316.GV13930@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 14:57:07 +02:00
Jason Low ce069fc920 locking/rwsem: Reduce the size of struct rw_semaphore
Recent optimistic spinning additions to rwsem provide significant performance
benefits on many workloads on large machines. The cost of it was increasing
the size of the rwsem structure by up to 128 bits.

However, now that the previous patches in this series bring the overhead of
struct optimistic_spin_queue to 32 bits, this patch reorders some fields in
struct rw_semaphore such that we can reduce the overhead of the rwsem structure
by 64 bits (on 64 bit systems).

The extra overhead required for rwsem optimistic spinning would now be up
to 8 additional bytes instead of up to 16 bytes. Additionally, the size of
rwsem would now be more in line with mutexes.

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Link: http://lkml.kernel.org/r/1405358872-3732-6-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 14:57:03 +02:00
Peter Zijlstra 13b9a962a2 locking/rwsem: Rename 'activity' to 'count'
There are two definitions of struct rw_semaphore, one in linux/rwsem.h
and one in linux/rwsem-spinlock.h.

For some reason they have different names for the initial field. This
makes it impossible to use C99 named initialization for
__RWSEM_INITIALIZER() -- or we have to duplicate that entire thing
along with the structure definitions.

The simpler patch is renaming the rwsem-spinlock variant to match the
regular rwsem.

This allows us to switch to C99 named initialization.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-bmrZolsbGmautmzrerog27io@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 14:56:55 +02:00
Nicolas Del Piano 7e02168711 cpufreq: imx6q: Select PM_OPP
PM_OPP is a library used by several of the existing cpufreq drivers.
ARM IMX6Q cpufreq driver uses this library for its functionality.
Thus, it should be selected in Kconfig.

Reported-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Nicolas Del Piano <ndel314@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16 14:45:37 +02:00
Linus Walleij 97d496bf1a cpufreq: sa1110: set memory type for h3600
The Compaq iPAQ h3600 also has the K4S281632b-1H memory type.
Verified by prying apart a broken board.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16 14:30:17 +02:00
Hans de Goede 4cf465b579 ACPI / video: Add use_native_backlight quirk for HP ProBook 4540s
As reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=1025690
This is yet another model which needs this quirk.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1025690
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-16 14:16:30 +02:00
Mateusz Guzik b0ab99e773 sched: Fix possible divide by zero in avg_atom() calculation
proc_sched_show_task() does:

  if (nr_switches)
	do_div(avg_atom, nr_switches);

nr_switches is unsigned long and do_div truncates it to 32 bits, which
means it can test non-zero on e.g. x86-64 and be truncated to zero for
division.

Fix the problem by using div64_ul() instead.

As a side effect calculations of avg_atom for big nr_switches are now correct.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1402750809-31991-1-git-send-email-mguzik@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 13:36:07 +02:00
Jason Low 33ecd2083a locking/spinlocks/mcs: Micro-optimize osq_unlock()
In the unlock function of the cancellable MCS spinlock, the first
thing we do is to retrive the current CPU's osq node. However, due to
the changes made in the previous patch, in the common case where the
lock is not contended, we wouldn't need to access the current CPU's
osq node anymore.

This patch optimizes this by only retriving this CPU's osq node
after we attempt the initial cmpxchg to unlock the osq and found
that its contended.

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1405358872-3732-5-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 13:28:06 +02:00
Jason Low 4d9d951e6b locking/spinlocks/mcs: Introduce and use init macro and function for osq locks
Currently, we initialize the osq lock by directly setting the lock's values. It
would be preferable if we use an init macro to do the initialization like we do
with other locks.

This patch introduces and uses a macro and function for initializing the osq lock.

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Link: http://lkml.kernel.org/r/1405358872-3732-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 13:28:05 +02:00
Jason Low 90631822c5 locking/spinlocks/mcs: Convert osq lock to atomic_t to reduce overhead
The cancellable MCS spinlock is currently used to queue threads that are
doing optimistic spinning. It uses per-cpu nodes, where a thread obtaining
the lock would access and queue the local node corresponding to the CPU that
it's running on. Currently, the cancellable MCS lock is implemented by using
pointers to these nodes.

In this patch, instead of operating on pointers to the per-cpu nodes, we
store the CPU numbers in which the per-cpu nodes correspond to in atomic_t.
A similar concept is used with the qspinlock.

By operating on the CPU # of the nodes using atomic_t instead of pointers
to those nodes, this can reduce the overhead of the cancellable MCS spinlock
by 32 bits (on 64 bit systems).

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Mason <clm@fb.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Link: http://lkml.kernel.org/r/1405358872-3732-3-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 13:28:04 +02:00
Jason Low 046a619d8e locking/spinlocks/mcs: Rename optimistic_spin_queue() to optimistic_spin_node()
Currently, the per-cpu nodes structure for the cancellable MCS spinlock is
named "optimistic_spin_queue". However, in a follow up patch in the series
we will be introducing a new structure that serves as the new "handle" for
the lock. It would make more sense if that structure is named
"optimistic_spin_queue". Additionally, since the current use of the
"optimistic_spin_queue" structure are  "nodes", it might be better if we
rename them to "node" anyway.

This preparatory patch renames all current "optimistic_spin_queue"
to "optimistic_spin_node".

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Mason <clm@fb.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Link: http://lkml.kernel.org/r/1405358872-3732-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 13:28:03 +02:00
Jason Low 37e9562453 locking/rwsem: Allow conservative optimistic spinning when readers have lock
Commit 4fc828e24c ("locking/rwsem: Support optimistic spinning")
introduced a major performance regression for workloads such as
xfs_repair which mix read and write locking of the mmap_sem across
many threads. The result was xfs_repair ran 5x slower on 3.16-rc2
than on 3.15 and using 20x more system CPU time.

Perf profiles indicate in some workloads that significant time can
be spent spinning on !owner. This is because we don't set the lock
owner when readers(s) obtain the rwsem.

In this patch, we'll modify rwsem_can_spin_on_owner() such that we'll
return false if there is no lock owner. The rationale is that if we
just entered the slowpath, yet there is no lock owner, then there is
a possibility that a reader has the lock. To be conservative, we'll
avoid spinning in these situations.

This patch reduced the total run time of the xfs_repair workload from
about 4 minutes 24 seconds down to approximately 1 minute 26 seconds,
back to close to the same performance as on 3.15.

Retesting of AIM7, which were some of the workloads used to test the
original optimistic spinning code, confirmed that we still get big
performance gains with optimistic spinning, even with this additional
regression fix. Davidlohr found that while the 'custom' workload took
a performance hit of ~-14% to throughput for >300 users with this
additional patch, the overall gain with optimistic spinning is
still ~+45%. The 'disk' workload even improved by ~+15% at >1000 users.

Tested-by: Dave Chinner <dchinner@redhat.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1404532172.2572.30.camel@j-VirtualBox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 13:28:02 +02:00
Paul Bolle d3f44fbabe x86: Remove unused variable "polling"
Compile tested. "polling" is unused since commit f80c5b39b8
("sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG").

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Link: http://lkml.kernel.org/r/1404138749.2978.6.camel@x41
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-16 12:58:47 +02:00
Brian Norris 44305ebde2 UBI: fastmap: do not miss bit-flips
The return value from 'ubi_io_read_ec_hdr()' was stored in 'err', not in 'ret'.
This fix makes sure Fastmap-enabled UBI does not miss bit-flip while reading EC
headers, events and scrubs the affected PEBs.

This issue was reported by Coverity Scan.

Artem: improved the commit message.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2014-07-16 08:32:00 +03:00
Linus Torvalds c20ddc6499 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota fix from Jan Kara:
 "Fix locking of dquot shrinker"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: missing lock in dqcache_shrink_scan()
2014-07-15 17:47:42 -10:00
Linus Torvalds d7b6e53e32 A GPIO fix for the v3.16 series - fixes up some merge
confusion from the merge window.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTxXoGAAoJEEEQszewGV1z6ogQAIIe+juP1FUjRdsZkD8S6ewF
 gdtOSMuPfL3HV+hSlRLKchAQbA+4HMxyH9lv0Vd8CHKy91w932w6x84O10B1XQJf
 ethsxxXmMi7OGTMQkIrbuZZDcnOVkcNmw1GuPXknvHUWgNFDUMz6nu0YShQ+ZehQ
 ctIAyfV0wdFilZHuSkO+thj/+cDjBEjlklgAcVVQ6OgnmshPDd1PmELzVbKIueWf
 E+/KgyLYvunubcuYB56dfDWtG4pE48qVekvDzXHZXxVQbuW3Ov4fOzQvyO01s/WQ
 cK7S7T5T3FepYzWIvh5fvGIXKfoD6sjdQWQ4hUBYTZI+qpdoN7Diri3T1Rhe4avj
 lJ5xCZujJ3AnFSEowqn9oM0CQQ5J0Dx3joUYnL+Krkt+va+LayrRwWxdMmoCX57H
 2uw3uAVdAg4qhqHcxJP7HnsgXEocDVKcGXV59++wdE8Ebann3IeyH4MnDCr2hQM+
 u0O1IgjjZlOM0NAfGySCcTsfUxW4mB8znAOex+66PnX1Mvncm0UVc+VrkAxcYp3Q
 3Ek7hz6sIPB/b8RYzPb5X4aT+lXZyD8I8lOE7OruxhWggCC3cnCDQWRJQ+DNvwYJ
 z3GiPLJRgFuvoF7QrYhOeBlUH3fZ9mrqLQietQiAS1tz0oMKYO4TBkM8HDjLMNAx
 3hQEuyp9EHDA9K1gxERL
 =h0US
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fix from Linus Walleij:
 "Fix up some merge confusion from the merge window"

* tag 'gpio-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: mcp23s08: Eliminates redundant checking.
2014-07-15 17:46:51 -10:00
Martin Lau 97b8ee8453 ring-buffer: Fix polling on trace_pipe
ring_buffer_poll_wait() should always put the poll_table to its wait_queue
even there is immediate data available.  Otherwise, the following epoll and
read sequence will eventually hang forever:

1. Put some data to make the trace_pipe ring_buffer read ready first
2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
3. epoll_wait()
4. read(trace_pipe_fd) till EAGAIN
5. Add some more data to the trace_pipe ring_buffer
6. epoll_wait() -> this epoll_wait() will block forever

~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
  ring_buffer_poll_wait() returns immediately without adding poll_table,
  which has poll_table->_qproc pointing to ep_poll_callback(), to its
  wait_queue.
~ During the epoll_wait() call in step 3 and step 6,
  ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
  because the poll_table->_qproc is NULL and it is how epoll works.
~ When there is new data available in step 6, ring_buffer does not know
  it has to call ep_poll_callback() because it is not in its wait queue.
  Hence, block forever.

Other poll implementation seems to call poll_wait() unconditionally as the very
first thing to do.  For example, tcp_poll() in tcp.c.

Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com

Cc: stable@vger.kernel.org # 2.6.27
Fixes: 2a2cc8f7c4 "ftrace: allow the event pipe to be polled"
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Martin Lau <kafai@fb.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15 17:07:47 -04:00
Niu Yawei d68aab6b8f quota: missing lock in dqcache_shrink_scan()
Commit 1ab6c4997e (fs: convert fs shrinkers to new scan/count API)
accidentally removed locking from quota shrinker. Fix it -
dqcache_shrink_scan() should use dq_list_lock to protect the
scan on free_dquots list.

CC: stable@vger.kernel.org
Fixes: 1ab6c4997e
Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2014-07-15 22:36:18 +02:00
Mike Snitzer 048e5a07f2 dm cache metadata: do not allow the data block size to change
The block size for the dm-cache's data device must remained fixed for
the life of the cache.  Disallow any attempt to change the cache's data
block size.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org
2014-07-15 14:07:50 -04:00
Mike Snitzer 9aec8629ec dm thin metadata: do not allow the data block size to change
The block size for the thin-pool's data device must remained fixed for
the life of the thin-pool.  Disallow any attempt to change the
thin-pool's data block size.

It should be noted that attempting to change the data block size via
thin-pool table reload will be ignored as a side-effect of the thin-pool
handover that the thin-pool target does during thin-pool table reload.

Here is an example outcome of attempting to load a thin-pool table that
reduced the thin-pool's data block size from 1024K to 512K.

Before:
kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks

After:
kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported
kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object
kernel: device-mapper: ioctl: error adding target to table

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org
2014-07-15 14:05:26 -04:00
zhangwei(Jovi) f0160a5a29 tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs
The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
so add it, to be consistent with __trace_printk/__trace_bprintk.
Those functions are all called by the same function: trace_printk().

Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com

Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15 11:58:40 -04:00
Linus Torvalds 0b632204c7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
 "This contains miscellaneous fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: replace count*size kzalloc by kcalloc
  fuse: release temporary page if fuse_writepage_locked() failed
  fuse: restructure ->rename2()
  fuse: avoid scheduling while atomic
  fuse: handle large user and group ID
  fuse: inode: drop cast
  fuse: ignore entry-timeout on LOOKUP_REVAL
  fuse: timeout comparison fix
2014-07-15 08:57:17 -07:00
Linus Torvalds 5615f9f822 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Bluetooth pairing fixes from Johan Hedberg.

 2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB,
    from Max Stepanov.

 3) New iwlwifi chip IDs, from Oren Givon.

 4) bnx2x driver reads wrong PCI config space MSI register, from Yijing
    Wang.

 5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu.

 6) Fix double SKB free in openvswitch, from Andy Zhou.

 7) Fix sk_dst_set() being racey with UDP sockets, leading to strange
    crashes, from Eric Dumazet.

 8) Interpret the NAPI budget correctly in the new systemport driver,
    from Florian Fainelli.

 9) VLAN code frees percpu stats in the wrong place, leading to crashes
    in the get stats handler.  From Eric Dumazet.

10) TCP sockets doing a repair can crash with a divide by zero, because
    we invoke tcp_push() with an MSS value of zero.  Just skip that part
    of the sendmsg paths in repair mode.  From Christoph Paasch.

11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai.

12) Don't ignore path MTU icmp messages with a zero mtu, machines out
    there still spit them out, and all of our per-protocol handlers for
    PMTU can cope with it just fine.  From Edward Allcutt.

13) Some NETDEV_CHANGE notifier invocations were not passing in the
    correct kind of cookie as the argument, from Loic Prylli.

14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul
    Maloy.

15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix
    from Dmitry Popov.

16) Fix skb->sk assigned without taking a reference to 'sk' in
    appletalk, from Andrey Utkin.

17) Fix some info leaks in ULP event signalling to userspace in SCTP,
    from Daniel Borkmann.

18) Fix deadlocks in HSO driver, from Olivier Sobrie.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
  hso: fix deadlock when receiving bursts of data
  hso: remove unused workqueue
  net: ppp: don't call sk_chk_filter twice
  mlx4: mark napi id for gro_skb
  bonding: fix ad_select module param check
  net: pppoe: use correct channel MTU when using Multilink PPP
  neigh: sysctl - simplify address calculation of gc_* variables
  net: sctp: fix information leaks in ulpevent layer
  MAINTAINERS: update r8169 maintainer
  net: bcmgenet: fix RGMII_MODE_EN bit
  tipc: clear 'next'-pointer of message fragments before reassembly
  r8152: fix r8152_csum_workaround function
  be2net: set EQ DB clear-intr bit in be_open()
  GRE: enable offloads for GRE
  farsync: fix invalid memory accesses in fst_add_one() and fst_init_card()
  igb: do a reset on SR-IOV re-init if device is down
  igb: Workaround for i210 Errata 25: Slow System Clock
  usbnet: smsc95xx: add reset_resume function with reset operation
  dp83640: Always decode received status frames
  r8169: disable L23
  ...
2014-07-15 08:42:52 -07:00
Steven Rostedt (Red Hat) 5f8bf2d263 tracing: Fix graph tracer with stack tracer on other archs
Running my ftrace tests on PowerPC, it failed the test that checks
if function_graph tracer is affected by the stack tracer. It was.
Looking into this, I found that the update_function_graph_func()
must be called even if the trampoline function is not changed.
This is because archs like PowerPC do not support ftrace_ops being
passed by assembly and instead uses a helper function (what the
trampoline function points to). Since this function is not changed
even when multiple ftrace_ops are added to the code, the test that
falls out before calling update_function_graph_func() will miss that
the update must still be done.

Call update_function_graph_function() for all calls to
update_ftrace_function()

Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15 11:10:25 -04:00
zhangwei(Jovi) 8abfb8727f tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs
Currently trace option stacktrace is not applicable for
trace_printk with constant string argument, the reason is
in __trace_puts/__trace_bputs ftrace_trace_stack is missing.

In contrast, when using trace_printk with non constant string
argument(will call into __trace_printk/__trace_bprintk), then
trace option stacktrace is workable, this inconstant result
will confuses users a lot.

Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15 11:04:40 -04:00
Takashi Iwai 4da63c6fc4 ALSA: hda - Fix broken PM due to incomplete i915 initialization
When the initialization of Intel HDMI controller fails due to missing
i915 kernel symbols (e.g. HD-audio is built in while i915 is module),
the driver discontinues the probe.  However, since the probe was done
asynchronously, the driver object still remains, thus the relevant PM
ops are still called at suspend/resume. This results in the bad access
to the incomplete audio card object, eventually leads to Oops or stall
at PM.

This patch adds the missing checks of chip->init_failed flag at each
PM callback in order to fix the problem above.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2014-07-15 15:19:43 +02:00
Zhang Rui 653a3538f8 PM / sleep: fix freeze_ops NULL pointer dereferences
This patch fixes a NULL pointer dereference issue introduced by
commit 1f0b63866f (ACPI / PM: Hold ACPI scan lock over the "freeze"
sleep state).

Fixes: 1f0b63866f (ACPI / PM: Hold ACPI scan lock over the "freeze" sleep state)
Link: http://marc.info/?l=linux-pm&m=140541182017443&w=2
Reported-and-tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-15 14:27:30 +02:00
Takashi Iwai 4320f6b1d9 PM / sleep: Fix request_firmware() error at resume
The commit [247bc037: PM / Sleep: Mitigate race between the freezer
and request_firmware()] introduced the finer state control, but it
also leads to a new bug; for example, a bug report regarding the
firmware loading of intel BT device at suspend/resume:
  https://bugzilla.novell.com/show_bug.cgi?id=873790

The root cause seems to be a small window between the process resume
and the clear of usermodehelper lock.  The request_firmware() function
checks the UMH lock and gives up when it's in UMH_DISABLE state.  This
is for avoiding the invalid  f/w loading during suspend/resume phase.
The problem is, however, that usermodehelper_enable() is called at the
end of thaw_processes().  Thus, a thawed process in between can kick
off the f/w loader code path (in this case, via btusb_setup_intel())
even before the call of usermodehelper_enable().  Then
usermodehelper_read_trylock() returns an error and request_firmware()
spews WARN_ON() in the end.

This oneliner patch fixes the issue just by setting to UMH_FREEZING
state again before restarting tasks, so that the call of
request_firmware() will be blocked until the end of this function
instead of returning an error.

Fixes: 247bc03742 (PM / Sleep: Mitigate race between the freezer and request_firmware())
Link: https://bugzilla.novell.com/show_bug.cgi?id=873790
Cc: 3.4+ <stable@vger.kernel.org> # 3.4+
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-15 14:27:29 +02:00
Dave Airlie 85d9e14cfd Merge branch 'linux-3.16' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
deadlock fix.

* 'linux-3.16' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
  drm/nouveau/therm: fix a potential deadlock in the therm monitoring code
2014-07-15 13:25:58 +10:00
Martin Peres bb78e7a12a drm/nouveau/therm: fix a potential deadlock in the therm monitoring code
Signed-off-by: Martin Peres <martin.peres@free.fr>
Tested-by: Stefan Ringel <mail@stefanringel.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2014-07-15 12:33:00 +10:00
Olivier Sobrie 8f9818af4e hso: fix deadlock when receiving bursts of data
When the module sends bursts of data, sometimes a deadlock happens in
the hso driver when the tty buffer doesn't get the chance to be flushed
quickly enough.

Remove the endless while loop in function put_rxbuf_data() which is
called by the urb completion handler.
If there isn't enough room in the tty buffer, discards all the data
received in the URB.

Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
Cc: Dan Williams <dcbw@redhat.com>
Cc: Jan Dumon <j.dumon@option.com>
Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Acked-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 19:27:34 -07:00
Olivier Sobrie 5c763edfe4 hso: remove unused workqueue
The workqueue "retry_unthrottle_workqueue" is not scheduled anywhere
in the code. So, remove it.

Signed-off-by: Olivier Sobrie <olivier@sobrie.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 19:27:34 -07:00
Linus Torvalds 1b81e88189 IEEE 1394 (FireWire) subsystem fix: The 1394 drivers cannot and are not supposed
to be built on platforms which don't provide the DMA mapping API.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJTxCGPAAoJEHnzb7JUXXnQ0W8QAKTGboWwSU5kznTcWGveIgEl
 fckaDnyinK9Vd6lOpixxsSc0rxdaKPmMmoC1ta2AGcD4NgRyfM2aD3Lw6dArzlHr
 xgcDRPE4K/41atxf6Z/wZnsXFSLrNEJbr6e8oZncGDzcRp4DKzv7tX2meNzyQRFE
 ns0/sjeN7QJEamd8vUkELiVdBED7NLiQjxXdULQ9AjY/cC3byzWAHzyoII0ff5NY
 +URYGtTIxwJ2KoNAtqD9OAazwPUV8rt1XXWnqkUHS89TnACsKiWmHpaBq+6jTkqG
 YsRxY9h0eZ+sSHoeEBUS2WLiaKWufvsBWBTCOUVoO0eS2iYHJQ1SwSLYU1+Zh0IR
 Fey2N2YaMA1HNthJ0OJJdC+sqSZ/tBmE+ytN7rTVHu0Yw0ZffzSPgJ303jIO0y87
 rYQ57K6dXhn5kDqh/PheDJCZ1OwMqz+zvgJDyv9XaEcm9lynDmmhc2IqmzlqONmz
 0UbiHITiqMrv6bSzZ83+PVeoROw2DBgH/fEu/Xk2vC1hBA7s4NynGSllEht1tXT3
 zLbVW/bBbZfjqc0a1UADBRHVKSllac+BPlxvGHSUxJusqC2VN9S9tgEhboj7+baY
 7VHbGkuCaEqtZeErzfNTN9uGvUsBN0KGDatW4Kp0g2CZ33dFDpJFOt7O5QaephnF
 DhlO4daXNsvHvFVgpBGZ
 =vY8J
 -----END PGP SIGNATURE-----

Merge tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire fix from Stefan Richter:
 "The 1394 drivers cannot and are not supposed to be built on platforms
  which don't provide the DMA mapping API (regression since v3.16-rc1
  with CONFIG_COMPILE_TEST=y on some architectures)"

* tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: IEEE 1394 (FireWire) support should depend on HAS_DMA
2014-07-14 17:17:53 -07:00
Linus Torvalds 8ec8ba8e7d Merge git://git.kvack.org/~bcrl/aio-fixes
Pull another aio fix from Ben LaHaise:
 "put_reqs_available() can now be called from within irq context, which
  means that it (and its sibling function get_reqs_available()) now need
  to be irq-safe, not just preempt-safe"

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: protect reqs_available updates from changes in interrupt handlers
2014-07-14 17:11:50 -07:00
Sasha Levin 3cf521f7dc net/l2tp: don't fall back on UDP [get|set]sockopt
The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.

As David Miller points out:

  "If we wanted this to work, it'd have to look up the tunnel and then
   use tunnel->sk, but I wonder how useful that would be"

Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: James Chapman <jchapman@katalix.com>
Acked-by: David Miller <davem@davemloft.net>
Cc: Phil Turnbull <phil.turnbull@oracle.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-14 17:02:31 -07:00
Christoph Schulz 3916a31927 net: ppp: don't call sk_chk_filter twice
Commit 568f194e8b ("net: ppp: use
sk_unattached_filter api") causes sk_chk_filter() to be called twice when
setting a PPP pass or active filter. This applies to both the generic PPP
subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP
subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from
within get_filter(). The second one is through the call chain

  ppp_ioctl() or isdn_ppp_ioctl()
  --> sk_unattached_filter_create()
      --> __sk_prepare_filter()
          --> sk_chk_filter()

The first call from within get_filter() should be deleted as get_filter() is
called just before calling sk_unattached_filter_create() later on, which
eventually calls sk_chk_filter() anyway.

For 3.15.x, this proposed change is a bugfix rather than a pure optimization as
in that branch, sk_chk_filter() may replace filter codes by other codes which
are not recognized when executing sk_chk_filter() a second time. So with
3.15.x, if sk_chk_filter() is called twice, the second invocation may yield
EINVAL (this depends on the filter codes found in the filter to be set, but
because the replacement is done for frequently used codes, this is almost
always the case). The net effect is that setting pass and/or active PPP filters
does not work anymore, since sk_unattached_filter_create() always returns
EINVAL due to the second call to sk_chk_filter(), regardless whether the filter
was originally sane or not.

Signed-off-by: Christoph Schulz <develop@kristov.de>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 16:15:12 -07:00
Jason Wang 32b333fe99 mlx4: mark napi id for gro_skb
Napi id was not marked for gro_skb, this will lead rx busy loop won't
work correctly since they stack never try to call low latency receive
method because of a zero socket napi id. Fix this by marking napi id
for gro_skb.

The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
(from 20531.68 to 30610.88).

Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 16:14:16 -07:00
Nikolay Aleksandrov 548d28bd0e bonding: fix ad_select module param check
Obvious copy/paste error when I converted the ad_select to the new
option API. "lacp_rate" there should be "ad_select" so we can get the
proper value.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: David S. Miller <davem@davemloft.net>

Fixes: 9e5f5eebe7 ("bonding: convert ad_select to use the new option
API")
Reported-by: Karim Scheik <karim.scheik@prisma-solutions.at>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 14:36:58 -07:00
Christoph Schulz a8a3e41c67 net: pppoe: use correct channel MTU when using Multilink PPP
The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
ppp_generic module) tries to determine how big a fragment might be. According
to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
corresponding comment and code in ppp_mp_explode():

		/*
		 * hdrlen includes the 2-byte PPP protocol field, but the
		 * MTU counts only the payload excluding the protocol field.
		 * (RFC1661 Section 2)
		 */
		mtu = pch->chan->mtu - (hdrlen - 2);

However, the pppoe module *does* include the PPP protocol field in the channel
MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
certain circumstances (one byte if PPP protocol compression is used, two
otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
module has to subtract two bytes from the channel MTU. This error only
manifests itself when using Multilink PPP, as otherwise the channel MTU is not
used anywhere.

In the following, I will describe how to reproduce this bug. We configure two
pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
computed by adding the two link MTUs and subtracting the MP header twice, which
is 4 bytes long.) The necessary pppd statements on both sides are "multilink
mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
side, we additionally need to start two pppoe-server instances to be able to
establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
of the PPP network interface to the MRRU (2976) on both sides of the connection
in order to make use of the higher bandwidth. (If we didn't do that, IP
fragmentation would kick in, which we want to avoid.)

Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
server over the PPP link. This results in the following network packet:

   2948 (echo payload)
 +    8 (ICMPv4 header)
 +   20 (IPv4 header)
---------------------
   2976 (PPP payload)

These 2976 bytes do not exceed the MTU of the PPP network interface, so the
IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
than the negotiated MRRU. So this packet would have to be divided in three
fragments. But this does not happen as each link MTU is assumed to be two bytes
larger. So this packet is diveded into two fragments only, one of size 1489 and
one of size 1488. Now we have for that bigger fragment:

   1489 (PPP payload)
 +    4 (MP header)
 +    2 (PPP protocol field for the MP payload (0x3d))
 +    6 (PPPoE header)
--------------------------
   1501 (Ethernet payload)

This packet exceeds the link MTU and is discarded.

If one configures the link MTU on the client side to 1501, one can see the
discarded Ethernet frames with tcpdump running on the client. A

ping -s 2948 -c 1 192.168.15.254

leads to the smaller fragment that is correctly received on the server side:

(tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [end], length 1492

and to the bigger fragment that is not received on the server side:

(tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1515: PPPoE  [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
  Flags [begin], length 1493

With the patch below, we correctly obtain three fragments:

52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [begin], length 1492
52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
  length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
  Flags [none], length 1492
52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
  length 27: PPPoE  [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
  Flags [end], length 5

And the ICMPv4 echo request is successfully received at the server side:

IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
  length 2976)
    192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
      length 2956

The bug was introduced in commit c9aa689537
("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
to 3.10 upwards but the fix can be applied (with minor modifications) to
kernels as old as 2.6.32.

Signed-off-by: Christoph Schulz <develop@kristov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 14:35:46 -07:00
Mathias Krause 9ecf07a1d8 neigh: sysctl - simplify address calculation of gc_* variables
The code in neigh_sysctl_register() relies on a specific layout of
struct neigh_table, namely that the 'gc_*' variables are directly
following the 'parms' member in a specific order. The code, though,
expresses this in the most ugly way.

Get rid of the ugly casts and use the 'tbl' pointer to get a handle to
the table. This way we can refer to the 'gc_*' variables directly.

Similarly seen in the grsecurity patch, written by Brad Spengler.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 14:32:51 -07:00
Dave Chinner 03e01349c6 xfs: null unused quota inodes when quota is on
When quota is on, it is expected that unused quota inodes have a
value of NULLFSINO. The changes to support a separate project quota
in 3.12 broken this rule for non-project quota inode enabled
filesystem, as the code now refuses to write the group quota inode
if neither group or project quotas are enabled. This regression was
introduced by commit d892d58 ("xfs: Start using pquotaino from the
superblock").

In this case, we should be writing NULLFSINO rather than nothing to
ensure that we leave the group quota inode in a valid state while
quotas are enabled.

Failure to do so doesn't cause a current kernel to break - the
separate project quota inodes introduced translation code to always
treat a zero inode as NULLFSINO. This was introduced by commit
0102629 ("xfs: Initialize all quota inodes to be NULLFSINO") with is
also in 3.12 but older kernels do not do this and hence taking a
filesystem back to an older kernel can result in quotas failing
initialisation at mount time. When that happens, we see this in
dmesg:

[ 1649.215390] XFS (sdb): Mounting Filesystem
[ 1649.316894] XFS (sdb): Failed to initialize disk quotas.
[ 1649.316902] XFS (sdb): Ending clean mount

By ensuring that we write NULLFSINO to quota inodes that aren't
active, we avoid this problem. We have to be really careful when
determining if the quota inodes are active or not, because we don't
want to write a NULLFSINO if the quota inodes are active and we
simply aren't updating them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2014-07-15 07:28:41 +10:00
Daniel Borkmann 8f2e5ae40e net: sctp: fix information leaks in ulpevent layer
While working on some other SCTP code, I noticed that some
structures shared with user space are leaking uninitialized
stack or heap buffer. In particular, struct sctp_sndrcvinfo
has a 2 bytes hole between .sinfo_flags and .sinfo_ppid that
remains unfilled by us in sctp_ulpevent_read_sndrcvinfo() when
putting this into cmsg. But also struct sctp_remote_error
contains a 2 bytes hole that we don't fill but place into a skb
through skb_copy_expand() via sctp_ulpevent_make_remote_error().

Both structures are defined by the IETF in RFC6458:

* Section 5.3.2. SCTP Header Information Structure:

  The sctp_sndrcvinfo structure is defined below:

  struct sctp_sndrcvinfo {
    uint16_t sinfo_stream;
    uint16_t sinfo_ssn;
    uint16_t sinfo_flags;
    <-- 2 bytes hole  -->
    uint32_t sinfo_ppid;
    uint32_t sinfo_context;
    uint32_t sinfo_timetolive;
    uint32_t sinfo_tsn;
    uint32_t sinfo_cumtsn;
    sctp_assoc_t sinfo_assoc_id;
  };

* 6.1.3. SCTP_REMOTE_ERROR:

  A remote peer may send an Operation Error message to its peer.
  This message indicates a variety of error conditions on an
  association. The entire ERROR chunk as it appears on the wire
  is included in an SCTP_REMOTE_ERROR event. Please refer to the
  SCTP specification [RFC4960] and any extensions for a list of
  possible error formats. An SCTP error notification has the
  following format:

  struct sctp_remote_error {
    uint16_t sre_type;
    uint16_t sre_flags;
    uint32_t sre_length;
    uint16_t sre_error;
    <-- 2 bytes hole  -->
    sctp_assoc_t sre_assoc_id;
    uint8_t  sre_data[];
  };

Fix this by setting both to 0 before filling them out. We also
have other structures shared between user and kernel space in
SCTP that contains holes (e.g. struct sctp_paddrthlds), but we
copy that buffer over from user space first and thus don't need
to care about it in that cases.

While at it, we can also remove lengthy comments copied from
the draft, instead, we update the comment with the correct RFC
number where one can look it up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14 14:18:56 -07:00
Dave Airlie c693099294 Revert "drm/i915: reverse dp link param selection, prefer fast over wide again"
This reverts commit 38aecea0cc.

This breaks Haswell Thinkpad + Lenovo dock in SST mode with a HDMI monitor attached.

Before this we can 1920x1200 mode, after this we only ever get 1024x768, and
a lot of deferring.

This didn't revert clean, but this should be fine.

bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1117008
Cc: stable@vger.kernel.org # v3.15
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-07-14 23:16:54 +02:00