This patch adds two new functions to enable/disable
the watchdog across all CPUs.
This will be used by the HT PMU bug workaround code to
disable/enable the NMI watchdog across quirk enablement.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: bp@alien8.de
Cc: jolsa@redhat.com
Cc: kan.liang@intel.com
Cc: maria.n.dimakopoulou@gmail.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1416251225-17721-12-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For counters that generate AUX data that is bound to the context of a
running task, such as instruction tracing, the decoder needs to know
exactly which task is running when the event is first scheduled in,
before the first sched_switch. The decoder's need to know this stems
from the fact that instruction flow trace decoding will almost always
require program's object code in order to reconstruct said flow and
for that we need at least its pid/tid in the perf stream.
To single out such instruction tracing pmus, this patch introduces
ITRACE PMU capability. The reason this is not part of RECORD_AUX
record is that not all pmus capable of generating AUX data need this,
and the opposite is *probably* also true.
While sched_switch covers for most cases, there are two problems with it:
the consumer will need to process events out of order (that is, having
found RECORD_AUX, it will have to skip forward to the nearest sched_switch
to figure out which task it was, then go back to the actual trace to
decode it) and it completely misses the case when the tracing is enabled
and disabled before sched_switch, for example, via PERF_EVENT_IOC_DISABLE.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-15-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For pmus that wish to write data to ring buffer's AUX area, provide
perf_aux_output_{begin,end}() calls to initiate/commit data writes,
similarly to perf_output_{begin,end}. These also use the same output
handle structure. Also, similarly to software counterparts, these
will direct inherited events' output to parents' ring buffers.
After the perf_aux_output_begin() returns successfully, handle->size
is set to the maximum amount of data that can be written wrt aux_tail
pointer, so that no data that the user hasn't seen will be overwritten,
therefore this should always be called before hardware writing is
enabled. On success, this will return the pointer to pmu driver's
private structure allocated for this aux area by pmu::setup_aux. Same
pointer can also be retrieved using perf_get_aux() while hardware
writing is enabled.
PMU driver should pass the actual amount of data written as a parameter
to perf_aux_output_end(). All hardware writes should be completed and
visible before this one is called.
Additionally, perf_aux_output_skip() will adjust output handle and
aux_head in case some part of the buffer has to be skipped over to
maintain hardware's alignment constraints.
Nested writers are forbidden and guards are in place to catch such
attempts.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-8-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Usually, pmus that do, for example, instruction tracing, would only ever
be able to have one event per task per cpu (or per perf_event_context). For
such pmus it makes sense to disallow creating conflicting events early on,
so as to provide consistent behavior for the user.
This patch adds a pmu capability that indicates such constraint on event
creation.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1422613866-113186-1-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For pmus that don't support scatter-gather for AUX data in hardware, it
might still make sense to implement software double buffering to avoid
losing data while the user is reading data out. For this purpose, add
a pmu capability that guarantees multiple high-order chunks for AUX buffer,
so that the pmu driver can do switchover tricks.
To make use of this feature, add PERF_PMU_CAP_AUX_SW_DOUBLEBUF to your
pmu's capability mask. This will make the ring buffer AUX allocation code
ensure that the biggest high order allocation for the aux buffer pages is
no bigger than half of the total requested buffer size, thus making sure
that the buffer has at least two high order allocations.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-5-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some pmus (such as BTS or Intel PT without multiple-entry ToPA capability)
don't support scatter-gather and will prefer larger contiguous areas for
their output regions.
This patch adds a new pmu capability to request higher order allocations.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-4-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch introduces "AUX space" in the perf mmap buffer, intended for
exporting high bandwidth data streams to userspace, such as instruction
flow traces.
AUX space is a ring buffer, defined by aux_{offset,size} fields in the
user_page structure, and read/write pointers aux_{head,tail}, which abide
by the same rules as data_* counterparts of the main perf buffer.
In order to allocate/mmap AUX, userspace needs to set up aux_offset to
such an offset that will be greater than data_offset+data_size and
aux_size to be the desired buffer size. Both need to be page aligned.
Then, same aux_offset and aux_size should be passed to mmap() call and
if everything adds up, you should have an AUX buffer as a result.
Pages that are mapped into this buffer also come out of user's mlock
rlimit plus perf_event_mlock_kb allowance.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: markus.t.metzger@intel.com
Cc: mathieu.poirier@linaro.org
Link: http://lkml.kernel.org/r/1421237903-181015-3-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
BPF programs, attached to kprobes, provide a safe way to execute
user-defined BPF byte-code programs without being able to crash or
hang the kernel in any way. The BPF engine makes sure that such
programs have a finite execution time and that they cannot break
out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = {
.type = PERF_TYPE_TRACEPOINT,
.config = event_id,
...
};
event_fd = perf_event_open(&attr,...);
ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program
previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any
kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
architecture dependent) and return 0 to ignore the event and 1 to store
kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI,
so BPF programs attached to kprobes must be recompiled for
every kernel version and user must supply correct LINUX_VERSION_CODE
in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
add TRACE_EVENT_FL_KPROBE flag to differentiate kprobe type of
tracepoints, since bpf programs can only be attached to kprobe
type of PERF_TYPE_TRACEPOINT perf events.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1427312966-8434-3-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Socket filter code and other subsystems with upcoming eBPF
support should not need to deal with the fact that we have
CONFIG_BPF_SYSCALL defined or not.
Having the bpf syscall as a config option is a nice thing and
I'd expect it to stay that way for expert users (I presume one
day the default setting of it might change, though), but code
making use of it should not care if it's actually enabled or
not.
Instead, hide this via header files and let the rest deal with it.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1427312966-8434-2-git-send-email-ast@plumgrid.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While thinking on the whole clock discussion it occurred to me we have
two distinct uses of time:
1) the tracking of event/ctx/cgroup enabled/running/stopped times
which includes the self-monitoring support in struct
perf_event_mmap_page.
2) the actual timestamps visible in the data records.
And we've been conflating them.
The first is all about tracking time deltas, nobody should really care
in what time base that happens, its all relative information, as long
as its internally consistent it works.
The second however is what people are worried about when having to
merge their data with external sources. And here we have the
discussion on MONOTONIC vs MONOTONIC_RAW etc..
Where MONOTONIC is good for correlating between machines (static
offset), MONOTNIC_RAW is required for correlating against a fixed rate
hardware clock.
This means configurability; now 1) makes that hard because it needs to
be internally consistent across groups of unrelated events; which is
why we had to have a global perf_clock().
However, for 2) it doesn't really matter, perf itself doesn't care
what it writes into the buffer.
The below patch makes the distinction between these two cases by
adding perf_event_clock() which is used for the second case. It
further makes this configurable on a per-event basis, but adds a few
sanity checks such that we cannot combine events with different clocks
in confusing ways.
And since we then have per-event configurability we might as well
retain the 'legacy' behaviour as a default.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
An upcoming patch will depend on tai_ns() and NMI-safe ktime_get_raw_fast(),
so merge timers/core here in a separate topic branch until it's all cooked
and timers/core is merged upstream.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because it was the only clock for which we didn't have a _ns()
accessor yet.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce tkr_raw and make use of it.
base_raw -> tkr_raw.base
clock->{mult,shift} -> tkr_raw.{mult.shift}
Kill timekeeping_get_ns_raw() in favour of
timekeeping_get_ns(&tkr_raw), this removes all mono_raw special
casing.
Duplicate the updates to tkr_mono.cycle_last into tkr_raw.cycle_last,
both need the same value.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.422589590@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation of adding another tkr field, rename this one to
tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base,
since the structure is not specific to CLOCK_MONOTONIC and the mono
name got added to the tk_read_base instance.
Lots of trivial churn.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226
Across the board the 4.0-rc1 numbers are much slower, and the degradation
is far worse when using the large memory footprint configs. Perf points
straight at the cause - this is from 4.0-rc1 on the "-o bhash=101073" config:
- 56.07% 56.07% [kernel] [k] default_send_IPI_mask_sequence_phys
- default_send_IPI_mask_sequence_phys
- 99.99% physflat_send_IPI_mask
- 99.37% native_send_call_func_ipi
smp_call_function_many
- native_flush_tlb_others
- 99.85% flush_tlb_page
ptep_clear_flush
try_to_unmap_one
rmap_walk
try_to_unmap
migrate_pages
migrate_misplaced_page
- handle_mm_fault
- 99.73% __do_page_fault
trace_do_page_fault
do_async_page_fault
+ async_page_fault
0.63% native_send_call_func_single_ipi
generic_exec_single
smp_call_function_single
This is showing excessive migration activity even though excessive
migrations are meant to get throttled. Normally, the scan rate is tuned
on a per-task basis depending on the locality of faults. However, if
migrations fail for any reason then the PTE scanner may scan faster if
the faults continue to be remote. This means there is higher system CPU
overhead and fault trapping at exactly the time we know that migrations
cannot happen. This patch tracks when migration failures occur and
slows the PTE scanner.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull libata fix from Tejun Heo:
"One patch to fix a regression from the recent switch to blk-mq tag
allocation which can cause oops on SAS-attached SATA drives"
* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ata: Add a new flag to destinguish sas controller
- fix thin target to always zero-fill reads to unprovisioned blocks
- fix to interlock device destruction's suspend from internal suspends
- fix 2 snapshot exception store handover bugs
- fix dm-io to cope with DISCARD and WRITE_SAME capabilities changing
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVDIJ8AAoJEMUj8QotnQNaynEIAK4Q3mvOeI6PIwuC2iOWRglJ
C0xzTKMR6VDQeM/uMRLeiiqxdPs6b89OdSmJHKJYOrdsusjG0a4IBAO9vDayb6sW
AxOlbnpXdoiH3cqQ4dISJIfS9qM49qGM40CAflcLKYsGfLnstVLt+4l9HaWaRrHj
b2kAC+I6PDDybFlKD5KsMB56sjzhhqg/lnnwkY2omV4vLHf6RrhVhK5gcEPF7VN0
SJ5HKSNBHQnpYSEECHXkxGvZ/ZKTe9n7q0t6Q3nnTSUFTXS6esgTsK1q2AykADDR
3W1LkrAAFP31AWKPkUwCEe3C8Rk3rCsrq2H14woWX1/UxSXNPlMYCU50ak4TVmc=
=Svyk
-----END PGP SIGNATURE-----
Merge tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull devicemapper fixes from Mike Snitzer:
"A handful of stable fixes for DM:
- fix thin target to always zero-fill reads to unprovisioned blocks
- fix to interlock device destruction's suspend from internal
suspends
- fix 2 snapshot exception store handover bugs
- fix dm-io to cope with DISCARD and WRITE_SAME capabilities changing"
* tag 'dm-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME
dm snapshot: suspend merging snapshot when doing exception handover
dm snapshot: suspend origin when doing exception handover
dm: hold suspend_lock while suspending device during device deletion
dm thin: fix to consistently zero-fill reads to unprovisioned blocks
- Fix up consumer return values on pin control stubs.
- Four patches fixing up the interrupt handling and
sleep context save in the Baytrail driver.
- Make default output directions work properly in the
Cherryview driver.
- Fix interrupt locking in the AT91 driver.
- Fix setting interrupt generating lines as input in
the sunxi driver.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVCo4HAAoJEEEQszewGV1z+HQQAMAzKL9igh3iCDR8p3tmB1sp
ZBTgWl4NHT9MFeA1AmilVEn1JbmUDmDetTN7P/sVC8mxaJiheY8PrFbj4bwBsvzA
JV0mlSBSj7jw8CxNM9rzSWZxRNtdpKr45siLA1SBPdni2x711SRW1H57eK73UQnQ
PEVW9hWzXY4cHU6Q7dX67YDJuteQsu5A1QCy6hBYX4Kyyy5gT8RJs6lAvx2f5k8g
Gsdgs60T/bTmIAiQT3FIf6VUQezW2m1PZn2fhuJeUCZWM17ej2YjVoanQUKu3Bz5
GPvV3wt2FpbKJsum5p4FJQPRD2qPsuq4jg7Msk6QOiVWHOl/QtL30AnS6N/iQ97z
TlblAH3ze2t182rHeI4J8d4FIX8jRfftb9DHlgBhLZFU4k0EanMvqEWN6/8yqgmy
n5nYUB88y6rI5RRLoGAStudlRIHqpz0fFvsU6IW5aZ7wpobIv+JPtMBUbfIKdQBV
Xj39LDJj9W5jtI7Icl2v8q7oTknnEa7rUuH/VYbptMLkXBxndWPKG2JLFwfhQ3Py
KMZvFdLP7E2uAR89KNvqQxbQQuYOK8wx5T2nFV57wVPX6VFv/G9sKUqdS5F7iraS
qg8aBloBN9k79mBBWyIo/XEdluYC/zuKf2KPRvH7UhXep+e9iqCSxWyXqgbgv6F9
MtWvNLoK9kZyxymnqbmJ
=qp0z
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here is a slew of pin control fixes I've accumulated for the v4.0
kernel. Nothing special, just driver fixes (mainly embedded Intel it
seems) and a misunderstanding regarding the stub functions was
reverted:
- Fix up consumer return values on pin control stubs.
- Four patches fixing up the interrupt handling and sleep context
save in the Baytrail driver.
- Make default output directions work properly in the Cherryview
driver.
- Fix interrupt locking in the AT91 driver.
- Fix setting interrupt generating lines as input in the sunxi
driver"
* tag 'pinctrl-v4.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: sun4i: GPIOs configured as irq must be set to input before reading
pinctrl: at91: move lock/unlock_as_irq calls into request/release
pinctrl: update direction_output function of cherryview driver
pinctrl: baytrail: Save pin context over system sleep
pinctrl: baytrail: Rework interrupt handling
pinctrl: baytrail: Clear interrupt triggering from pins that are in GPIO mode
pinctrl: baytrail: Relax GPIO request rules
Revert "pinctrl: consumer: use correct retval for placeholder functions"
Pull networking fixes from David Miller:
1) Fix packet header offset calculation in _decode_session6(), from
Hajime Tazaki.
2) Fix route leak in error paths of xfrm_lookup(), from Huaibin Wang.
3) Be sure to clear state properly when scans fail in iwlwifi mvm code,
from Luciano Coelho.
4) iwlwifi tries to stop scans that aren't actually running, also from
Luciano Coelho.
5) mac80211 should drop mesh frames that are not encrypted, fix from
Bob Copeland.
6) Add new device ID to b43 wireless driver for BCM432228 chips, from
Rafał Miłecki.
7) Fix accidental addition of members after variable sized array in
struct tc_u_hnode, from WANG Cong.
8) Don't re-enable interrupts until after we call napi_complete() in
ibmveth and WIZnet drivers, frm Yongbae Park.
9) Fix regression in vlan tag handling of fec driver, from Fugang Duan.
10) If a network namespace change fails during rtnl_newlink(), we don't
unwind the device registry properly.
11) Fix two TCP regressions, from Neal Cardwell:
- Don't allow snd_cwnd_cnt to accumulate huge values due to missing
test in tcp_cong_avoid_ai().
- Restore CUBIC back to advancing cwnd by 1.5x packets per RTT.
12) Fix performance regression in xne-netback involving push TX
notifications, from David Vrabel.
13) __skb_tstamp_tx() can be called with a NULL sk pointer, do not
dereference blindly. From Willem de Bruijn.
14) Fix potential stack overflow in RDS protocol stack, from Arnd
Bergmann.
15) VXLAN_VID_MASK used incorrectly in new remote checksum offload
support of VXLAN driver. Fix from Alexey Kodanev.
16) Fix too small netlink SKB allocation in inet_diag layer, from Eric
Dumazet.
17) ieee80211_check_combinations() does not count interfaces correctly,
from Andrei Otcheretianski.
18) Hardware feature determination in bxn2x driver references a piece of
software state that actually isn't initialized yet, fix from Michal
Schmidt.
19) inet_csk_wait_for_connect() needs a sched_annotate_sleep()
annoation, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
Revert "net: cx82310_eth: use common match macro"
net/mlx4_en: Set statistics bitmap at port init
IB/mlx4: Saturate RoCE port PMA counters in case of overflow
net/mlx4_en: Fix off-by-one in ethtool statistics display
IB/mlx4: Verify net device validity on port change event
act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome
Revert "smc91x: retrieve IRQ and trigger flags in a modern way"
inet: Clean up inet_csk_wait_for_connect() vs. might_sleep()
ip6_tunnel: fix error code when tunnel exists
netdevice.h: fix ndo_bridge_* comments
bnx2x: fix encapsulation features on 57710/57711
mac80211: ignore CSA to same channel
nl80211: ignore HT/VHT capabilities without QoS/WMM
mac80211: ask for ECSA IE to be considered for beacon parse CRC
mac80211: count interfaces correctly for combination checks
isdn: icn: use strlcpy() when parsing setup options
rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg()
caif: fix MSG_OOB test in caif_seqpkt_recvmsg()
bridge: reset bridge mtu after deleting an interface
can: kvaser_usb: Fix tx queue start/stop race conditions
...
SAS controller has its own tag allocation, which doesn't directly match to ATA
tag, so SAS and SATA have different code path for ata tags. Originally we use
port->scsi_host (98bd4be1) to destinguish SAS controller, but libsas set
->scsi_host too, so we can't use it for the destinguish, we add a new flag for
this purpose.
Without this patch, the following oops can happen because scsi-mq uses
a host-wide tag map shared among all devices with some integer tag
values >= ATA_MAX_QUEUE. These unexpectedly high tag values cause
__ata_qc_from_tag() to return NULL, which is then dereferenced in
ata_qc_new_init().
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff804fd46e>] ata_qc_new_init+0x3e/0x120
PGD 32adf0067 PUD 32adf1067 PMD 0
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi igb
i2c_algo_bit ptp pps_core pm80xx libsas scsi_transport_sas sg coretemp
eeprom w83795 i2c_i801
CPU: 4 PID: 1450 Comm: cydiskbench Not tainted 4.0.0-rc3 #1
Hardware name: Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.1b 05/04/12
task: ffff8800ba86d500 ti: ffff88032a064000 task.ti: ffff88032a064000
RIP: 0010:[<ffffffff804fd46e>] [<ffffffff804fd46e>] ata_qc_new_init+0x3e/0x120
RSP: 0018:ffff88032a067858 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8800ba0d2230 RCX: 000000000000002a
RDX: ffffffff80505ae0 RSI: 0000000000000020 RDI: ffff8800ba0d2230
RBP: ffff88032a067868 R08: 0000000000000201 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ba0d0000
R13: ffff8800ba0d2230 R14: ffffffff80505ae0 R15: ffff8800ba0d0000
FS: 0000000041223950(0063) GS:ffff88033e480000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000058 CR3: 000000032a0a3000 CR4: 00000000000006e0
Stack:
ffff880329eee758 ffff880329eee758 ffff88032a0678a8 ffffffff80502dad
ffff8800ba167978 ffff880329eee758 ffff88032bf9c520 ffff8800ba167978
ffff88032bf9c520 ffff88032bf9a290 ffff88032a0678b8 ffffffff80506909
Call Trace:
[<ffffffff80502dad>] ata_scsi_translate+0x3d/0x1b0
[<ffffffff80506909>] ata_sas_queuecmd+0x149/0x2a0
[<ffffffffa0046650>] sas_queuecommand+0xa0/0x1f0 [libsas]
[<ffffffff804ea544>] scsi_dispatch_cmd+0xd4/0x1a0
[<ffffffff804eb50f>] scsi_queue_rq+0x66f/0x7f0
[<ffffffff803e5098>] __blk_mq_run_hw_queue+0x208/0x3f0
[<ffffffff803e54b8>] blk_mq_run_hw_queue+0x88/0xc0
[<ffffffff803e5c74>] blk_mq_insert_request+0xc4/0x130
[<ffffffff803e0b63>] blk_execute_rq_nowait+0x73/0x160
[<ffffffffa0023fca>] sg_common_write+0x3da/0x720 [sg]
[<ffffffffa0025100>] sg_new_write+0x250/0x360 [sg]
[<ffffffffa0025feb>] sg_write+0x13b/0x450 [sg]
[<ffffffff8032ec91>] vfs_write+0xd1/0x1b0
[<ffffffff8032ee54>] SyS_write+0x54/0xc0
[<ffffffff80689932>] system_call_fastpath+0x12/0x17
tj: updated description.
Fixes: 12cb5ce101 ("libata: use blk taging")
Reported-and-tested-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Pull livepatching fix from Jiri Kosina:
- fix for potential race with module loading, from Petr Mladek.
The race is very unlikely to be seen in real world and has been found
by code inspection, but should be fixed for 4.0 anyway.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix subtle race with coming and going modules
The argument 'flags' was missing in ndo_bridge_setlink().
ndo_bridge_dellink() was missing.
Fixes: 407af3299e ("bridge: Add netlink interface to configure vlans on bridge ports")
Fixes: add511b382 ("bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink")
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The register offset for REGEN2_CTRL in different for TPS659038 chip as when
compared with other Palmas family PMICs. In the case of TPS659038 the wrong
offset pointed to PLLEN_CTRL and was causing a hang. Correcting the same.
Signed-off-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
There is a notifier that handles live patches for coming and going modules.
It takes klp_mutex lock to avoid races with coming and going patches but
it does not keep the lock all the time. Therefore the following races are
possible:
1. The notifier is called sometime in STATE_MODULE_COMING. The module
is visible by find_module() in this state all the time. It means that
new patch can be registered and enabled even before the notifier is
called. It might create wrong order of stacked patches, see below
for an example.
2. New patch could still see the module in the GOING state even after
the notifier has been called. It will try to initialize the related
object structures but the module could disappear at any time. There
will stay mess in the structures. It might even cause an invalid
memory access.
This patch solves the problem by adding a boolean variable into struct module.
The value is true after the coming and before the going handler is called.
New patches need to be applied when the value is true and they need to ignore
the module when the value is false.
Note that we need to know state of all modules on the system. The races are
related to new patches. Therefore we do not know what modules will get
patched.
Also note that we could not simply ignore going modules. The code from the
module could be called even in the GOING state until mod->exit() finishes.
If we start supporting patches with semantic changes between function
calls, we need to apply new patches to any still usable code.
See below for an example.
Finally note that the patch solves only the situation when a new patch is
registered. There are no such problems when the patch is being removed.
It does not matter who disable the patch first, whether the normal
disable_patch() or the module notifier. There is nothing to do
once the patch is disabled.
Alternative solutions:
======================
+ reject new patches when a patched module is coming or going; this is ugly
+ wait with adding new patch until the module leaves the COMING and GOING
states; this might be dangerous and complicated; we would need to release
kgr_lock in the middle of the patch registration to avoid a deadlock
with the coming and going handlers; also we might need a waitqueue for
each module which seems to be even bigger overhead than the boolean
+ stop modules from entering COMING and GOING states; wait until modules
leave these states when they are already there; looks complicated; we would
need to ignore the module that asked to stop the others to avoid a deadlock;
also it is unclear what to do when two modules asked to stop others and
both are in COMING state (situation when two new patches are applied)
+ always register/enable new patches and fix up the potential mess (registered
patches order) in klp_module_init(); this is nasty and prone to regressions
in the future development
+ add another MODULE_STATE where the kallsyms are visible but the module is not
used yet; this looks too complex; the module states are checked on "many"
locations
Example of patch stacking breakage:
===================================
The notifier could _not_ _simply_ ignore already initialized module objects.
For example, let's have three patches (P1, P2, P3) for functions a() and b()
where a() is from vmcore and b() is from a module M. Something like:
a() b()
P1 a1() b1()
P2 a2() b2()
P3 a3() b3(3)
If you load the module M after all patches are registered and enabled.
The ftrace ops for function a() and b() has listed the functions in this
order:
ops_a->func_stack -> list(a3,a2,a1)
ops_b->func_stack -> list(b3,b2,b1)
, so the pointer to b3() is the first and will be used.
Then you might have the following scenario. Let's start with state when patches
P1 and P2 are registered and enabled but the module M is not loaded. Then ftrace
ops for b() does not exist. Then we get into the following race:
CPU0 CPU1
load_module(M)
complete_formation()
mod->state = MODULE_STATE_COMING;
mutex_unlock(&module_mutex);
klp_register_patch(P3);
klp_enable_patch(P3);
# STATE 1
klp_module_notify(M)
klp_module_notify_coming(P1);
klp_module_notify_coming(P2);
klp_module_notify_coming(P3);
# STATE 2
The ftrace ops for a() and b() then looks:
STATE1:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b3);
STATE2:
ops_a->func_stack -> list(a3,a2,a1);
ops_b->func_stack -> list(b2,b1,b3);
therefore, b2() is used for the module but a3() is used for vmcore
because they were the last added.
Example of the race with going modules:
=======================================
CPU0 CPU1
delete_module() #SYSCALL
try_stop_module()
mod->state = MODULE_STATE_GOING;
mutex_unlock(&module_mutex);
klp_register_patch()
klp_enable_patch()
#save place to switch universe
b() # from module that is going
a() # from core (patched)
mod->exit();
Note that the function b() can be called until we call mod->exit().
If we do not apply patch against b() because it is in MODULE_STATE_GOING,
it will call patched a() with modified semantic and things might get wrong.
[jpoimboe@redhat.com: use one boolean instead of two]
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
driver fixes for new regressions since v3.19. Second are fixes to the
common clock divider type caused by recent changes to how we round clock
rates. This affects many clock drivers that use this common code.
Finally there are fixes for drivers that improperly compared struct clk
pointers (drivers must not deref these pointers). While some of these
drivers have done this for a long time, this did not cause a problem
until we started generating unique struct clk pointers for every
consumer. A new function, clk_is_match was introduced to get these
drivers working again and they are fixed up to no longer deref the
pointers themselves.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVBewEAAoJEKI6nJvDJaTUR2gP/2frIXCm/krwkDyofIGxxQ+F
RwIXFTn9GG9QEUwKLUcxRUegHbWbZMXYp6W19hUcUdYz3pD+uEJSuH0NI8kW1Ohy
32P5/ALuoTq7OVzBBz/9di9jBDdIM1wusLZGJfOWk9DXBLOS3VHuhN55D47dqWS/
GsszeEpp8r1WBKFVmAkuQ5Jc0CqgS5GxvMOndXXVN3kDMhCT+9pBiqtUT0V3YV/J
d5GCvfPlO/Xmjnpjf99MPButkfiW/o6YXt3H0QY6hhskS1Av8alyfabVctk8lqOW
py8SQFY7MdRLZ84Zk87sqKCKUc/vHkTBT9vKWYm65l3yJ5OEFv60NaFMYY61HVlJ
n6qWU6SbFZvkPnQTJn6Ii/v7BQ92bXjYpLNcBK8UY35jIjmHsPS/YXCbkmArtn1N
/yAB4TIfH8uX93smFb3XEmZLSiKFuZAhU8YbjDzYgsGuQ9EwN3aNP9c5mzC7Soou
tYnDqVic0i993qQTD2Db5dplGwxelCRJpazO2kK6NW/EJzE8XJaM6XVy1xIBKiDX
bbWPdp53/eWV7gbUEZ8zzcS06G/DLw5/N45XyaWIx+2ThDjhOAlVHTAFDY+Oa743
42Dkwtr5GZ6yORyXY1wI5HttUGU0gnPN0kM84GQG8O/GbzGureZSW1e6G7tJJljn
JXhROl0w4aIPwUUxry7R
=nc5C
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clock framework fixes from Michael Turquette:
"The clk fixes for 4.0-rc4 comprise three themes.
First are the usual driver fixes for new regressions since v3.19.
Second are fixes to the common clock divider type caused by recent
changes to how we round clock rates. This affects many clock drivers
that use this common code.
Finally there are fixes for drivers that improperly compared struct
clk pointers (drivers must not deref these pointers). While some of
these drivers have done this for a long time, this did not cause a
problem until we started generating unique struct clk pointers for
every consumer. A new function, clk_is_match was introduced to get
these drivers working again and they are fixed up to no longer deref
the pointers themselves"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
ASoC: kirkwood: fix struct clk pointer comparing
ASoC: fsl_spdif: fix struct clk pointer comparing
ARM: imx: fix struct clk pointer comparing
clk: introduce clk_is_match
clk: don't export static symbol
clk: divider: fix calculation of initial best divider when rounding to closest
clk: divider: fix selection of divider when rounding to closest
clk: divider: fix calculation of maximal parent rate for a given divider
clk: divider: return real rate instead of divider value
clk: qcom: fix platform_no_drv_owner.cocci warnings
clk: qcom: fix platform_no_drv_owner.cocci warnings
clk: qcom: Add PLL4 vote clock
clk: qcom: lcc-msm8960: Fix PLL rate detection
clk: qcom: Fix slimbus n and m val offsets
clk: ti: Fix FAPLL parent enable bit handling
- armada-370-xp
- Chained per-cpu interrupts
- gic{,-v3,v3-its}
- Various fixes for safer operation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJVBO4uAAoJEP45WPkGe8Zn8jAQAL0bbO4wTUf8eIAVMQ8gvpWT
RmmPx806bctxWzCQdZeI+5SLbAoMGH4sGN2nvZjxs+gf1+8TDxcNbqpcJdng88+v
KcYCBVAtNCae3ZScCIiqgbqOWO6EeCi2XoEdxKmmoY6xvw/S+392Gq5Yx6rRhTui
SYuhq6ZEdeugfaFDZoJE4piBPQWsH6KTCZ2GbLeQbrPGYI1ChIz79bwn0vE3KNXH
Ctlv3F9CdRgqXMvX5RbE0bqYlAfzV+qkZciIGLd1iLXxfve68HTQYeO1bfMXd2LT
o3JQBivSU+RMYYXY/+xG3h4+poT9mOlMwArQyEFmYsLCADUkKY0XEkiSfYypcqtl
RBrIxX1bbcjAFWQPGADP9x7MeR1fgHrPbSvfWRaKRM+R7+mPXyfidP4CNvhCdVoN
37TpU+XKy/T+bTuciDPieqOHjIcCDZLFy4EHgs1T8KkcGlMr9/AC6noW5ce46zut
L4CajpAuwsRVAEgERRLU0xdDcRJYHO6GIou4RS4aGyejjaK1bkq1n9/svDt3GmfI
urUCNcjkkLsYqSBlYhmcUE6ijPnhvLCaaACfomUU2Jlk4YRhGhEpzN2l5sFDvbaA
xr4DcPV7/XGqzNTwpk1MHKZaloaRhHF2HDjfyKv+23xMPeNu+nRovabHWP4Il2LI
vowXFSS9ciCxnVOLfEXA
=DqZ5
-----END PGP SIGNATURE-----
Merge tag 'irqchip-fixes-4.0' of git://git.infradead.org/users/jcooper/linux
Pull irqchip fixes from Jason Cooper:
"armada-370-xp:
- Chained per-cpu interrupts
gic{,-v3,v3-its}"
- Various fixes for safer operation"
* tag 'irqchip-fixes-4.0' of git://git.infradead.org/users/jcooper/linux:
irqchip: gicv3-its: Support safe initialization
irqchip: gicv3-its: Define macros for GITS_CTLR fields
irqchip: gicv3-its: Add limitation to page order
irqchip: gicv3-its: Use 64KB page as default granule
irqchip: gicv3-its: Zero itt before handling to hardware
irqchip: gic-v3: Fix out of bounds access to cpu_logical_map
irqchip: gic: Fix unsafe locking reported by lockdep
irqchip: gicv3-its: Fix unsafe locking reported by lockdep
irqchip: gicv3-its: Iterate over PCI aliases to generate ITS configuration
irqchip: gicv3-its: Allocate enough memory for the full range of DeviceID
irqchip: gicv3-its: Fix ITS CPU init
irqchip: armada-370-xp: Fix chained per-cpu interrupts
- Fix for stdout-path option parsing with added unittest
- Fix for stdout-path interaction with earlycon
- Several DT unittest fixes
- Fix Sparc allmodconfig build error on
of_platform_register_reconfig_notifier
- Several DT overlay kconfig and build warning fixes
- Several DT binding documentation updates
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJVAvyzAAoJEMhvYp4jgsXiKSAIALRxbtnjPu13+1vD6C8xcTsN
TsD/GoIOtBjVlEPDFrKXOhRXkxXbgONDSveQYhm0iWr30ECloVoikIxF2NPty2nR
B3xN7WbbmeEBl1ubGVw60xs/M1cF7d11UpjRabjlVqFpMll5LufX0+ZAbLQ+Brsl
5zSGxIonG8pRxFy0yi6++76cyywn3XVYoUTMb+nKaiSzXvOBhGnm5MXruiynVH9m
enVKop8rhizfUdvSHFfxxipFK9L3+EYx0yxaZWW9tvYh6yHhb/GZxQcuz1Rn5KUJ
wY0Y4PJdusLOO0FNprZmLsi3GxEXOIBS0bcPCXQAqD/Kr46waVOETajyIItMYnY=
=nyIQ
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-4.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree fixes from Rob Herring:
- fix for stdout-path option parsing with added unittest
- fix for stdout-path interaction with earlycon
- several DT unittest fixes
- fix Sparc allmodconfig build error on of_platform_register_reconfig_notifier
- several DT overlay kconfig and build warning fixes
- several DT binding documentation updates
* tag 'devicetree-fixes-for-4.0' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of/platform: Fix sparc:allmodconfig build
of: unittest: Add options string testcase variants
of: fix handling of '/' in options for of_find_node_by_path()
of/unittest: Fix the wrong expected value in of_selftest_property_string
of/unittest: remove the duplicate of_changeset_init
dt: submitting-patches: clarify that DT maintainers are to be cced on bindings
of: unittest: fix I2C dependency
of/overlay: Remove unused variable
Documentation: DT: Renamed of-serial.txt to 8250.txt
of: Fix premature bootconsole disable with 'stdout-path'
serial: add device tree binding documentation for ETRAX FS UART
of/overlay: Directly include idr.h
of: Drop superfluous dependance for OF_OVERLAY
of: Add vendor prefix for Arasan
of: Add prompt for OF_OVERLAY config
Pull gadgetfs fixes from Al Viro:
"Assorted fixes around AIO on gadgetfs: leaks, use-after-free, troubles
caused by ->f_op flipping"
* 'gadget' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
gadgetfs: really get rid of switching ->f_op
gadgetfs: get rid of flipping ->f_op in ep_config()
gadget: switch ep_io_operations to ->read_iter/->write_iter
gadgetfs: use-after-free in ->aio_read()
gadget/function/f_fs.c: switch to ->{read,write}_iter()
gadget/function/f_fs.c: use put iov_iter into io_data
gadget/function/f_fs.c: close leaks
move iov_iter.c from mm/ to lib/
new helper: dup_iter()
sparc:allmodconfig fails to build with:
drivers/built-in.o: In function `platform_bus_init':
(.init.text+0x3684): undefined reference to `of_platform_register_reconfig_notifier'
of_platform_register_reconfig_notifier is only declared if both OF_ADDRESS
and OF_DYNAMIC are configured. Yet, the include file only declares a dummy
function if OF_DYNAMIC is not configured. The sparc architecture does not
configure OF_ADDRESS, but does configure OF_DYNAMIC, causing above error.
Fixes: 801d728c10 ("of/reconfig: Add OF_DYNAMIC notifier for platform_bus_type")
Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rob Herring <robh@kernel.org>
Ingo requested this function be renamed to improve readability,
so I've renamed __clocksource_updatefreq_scale() as well as the
__clocksource_updatefreq_hz/khz() functions to avoid
squishedtogethernames.
This touches some of the sh clocksources, which I've not tested.
The arch/arm/plat-omap change is just a comment change for
consistency.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-13-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A long running project has been to clean up remaining uses
of clocksource_register(), replacing it with the simpler
clocksource_register_khz/hz() functions.
However, there are a few cases where we need to self-define
our mult/shift values, so switch the function to a more
obviously internal __clocksource_register() name, and
consolidate much of the internal logic so we don't have
duplication.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-10-git-send-email-john.stultz@linaro.org
[ Minor cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
include/linux/moduleloader.h is more suitable place for this macro.
Also change alignment to PAGE_SIZE for CONFIG_KASAN=n as such
alignment already assumed in several places.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current approach in handling shadow memory for modules is broken.
Shadow memory could be freed only after memory shadow corresponds it is no
longer used. vfree() called from interrupt context could use memory its
freeing to store 'struct llist_node' in it:
void vfree(const void *addr)
{
...
if (unlikely(in_interrupt())) {
struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred);
if (llist_add((struct llist_node *)addr, &p->list))
schedule_work(&p->wq);
Later this list node used in free_work() which actually frees memory.
Currently module_memfree() called in interrupt context will free shadow
before freeing module's memory which could provoke kernel crash.
So shadow memory should be freed after module's memory. However, such
deallocation order could race with kasan_module_alloc() in module_alloc().
Free shadow right before releasing vm area. At this point vfree()'d
memory is not used anymore and yet not available for other allocations.
New VM_KASAN flag used to indicate that vm area has dynamically allocated
shadow memory so kasan frees shadow only if it was previously allocated.
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to facilitate clocksource validation, add a
'max_cycles' field to the clocksource structure which
will hold the maximum cycle value that can safely be
multiplied without potentially causing an overflow.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1426133800-29329-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
John reported that my previous commit added a regression
on his router.
This is because sender_cpu & napi_id share a common location,
so get_xps_queue() can see garbage and perform an out of bound access.
We need to make sure sender_cpu is cleared before doing the transmit,
otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger
this bug.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: John <jw@nuclearfallout.net>
Bisected-by: John <jw@nuclearfallout.net>
Fixes: 2bd82484bb ("xps: fix xps for stacked devices")
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers compare struct clk pointers as a means of knowing
if the two pointers reference the same clock hardware. This behavior is
dubious (drivers must not dereference struct clk), but did not cause any
regressions until the per-user struct clk patch was merged. Now the test
for matching clk's will always fail with per-user struct clk's.
clk_is_match is introduced to fix the regression and prevent drivers
from comparing the pointers manually.
Fixes: 035a61c314 ("clk: Make clk API return per-user struct clk instances")
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Shawn Guo <shawn.guo@linaro.org>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
[arnd@arndb.de: Fix COMMON_CLK=N && HAS_CLK=Y config]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[sboyd@codeaurora.org: const arguments to clk_is_match() and
remove unnecessary ternary operation]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
A collection of driver specific fixes to which the usual comments about
them being important if you see them mostly apply (except for the
comment fix). The pl022 one is particularly nasty for anyone affected
by it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU+tr0AAoJECTWi3JdVIfQ4HIH/2p3AR8VJ1NzmqKslFUaC7SQ
CT6iIiV+1gT+Q/2CLtTwY04gVJrmbO85pl4aotefxuCsb8YFGPCEo3f0lYU/3XwK
ZQuC/7LFpWCqQCtSxoat9XQBHoFkWMrFDdsesQJLg9F46bCx/vVUuMaPrTXwSPLG
DA6isoNZgEBJeKAxKhOdwT/nJUrVJhNwEX8fa/vuISnde4ckVuX+34O60V0N0/S2
7hEw3LQFZW0IPsnkmEygd5ATonK/+s7BXLwoAZWJGpZeWB1YsBUiHV7fLunj6gVy
DMbKI3Fp1Yy/q0h6J+DzzbLvQxj0WTAX8EUz8PCh2QYRvUNiKeJdbKLbLUjGZgA=
=Lpqm
-----END PGP SIGNATURE-----
Merge tag 'spi-v4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A collection of driver specific fixes to which the usual comments
about them being important if you see them mostly apply (except for
the comment fix). The pl022 one is particularly nasty for anyone
affected by it"
* tag 'spi-v4.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: pl022: Fix race in giveback() leading to driver lock-up
spi: dw-mid: avoid potential NULL dereference
spi: img-spfi: Verify max spfi transfer length
spi: fix a typo in comment.
spi: atmel: Fix interrupt setup for PDC transfers
spi: dw: revisit FIFO size detection again
spi: dw-pci: correct number of chip selects
drivers: spi: ti-qspi: wait for busy bit clear before data write/read
Pull workqueue fix from Tejun Heo:
"One fix patch for a subtle livelock condition which can happen on
PREEMPT_NONE kernels involving two racing cancel_work calls. Whoever
comes in the second has to wait for the previous one to finish. This
was implemented by making the later one block for the same condition
that the former would be (work item completion) and then loop and
retest; unfortunately, depending on the wake up order, the later one
could lock out the former one to finish by busy looping on the cpu.
This is fixed by implementing explicit wait mechanism. Work item
might not belong anywhere at this point and there's remote possibility
of thundering herd problem. I originally tried to use bit_waitqueue
but it didn't work for static work items on modules. It's currently
using single wait queue with filtering wake up function and exclusive
wakeup. If this ever becomes a problem, which is not very likely, we
can try to figure out a way to piggy back on bit_waitqueue"
* 'for-4.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE
Here's a round of USB fixes for 4.0-rc3.
Nothing major, the usual gadget, xhci and usb-serial fixes and a few new
device ids as well.
All have been in linux-next successfully.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlT8Q2oACgkQMUfUDdst+ym9cgCgloBq7GqYw5lnW+zVy6fmyS3U
zHMAoMYPLjpUuO4tHfXt46NxVHIMzGsg
=TMtd
-----END PGP SIGNATURE-----
Merge tag 'usb-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here's a round of USB fixes for 4.0-rc3.
Nothing major, the usual gadget, xhci and usb-serial fixes and a few
new device ids as well.
All have been in linux-next successfully"
* tag 'usb-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (36 commits)
xhci: Workaround for PME stuck issues in Intel xhci
xhci: fix reporting of 0-sized URBs in control endpoint
usb: ftdi_sio: Add jtag quirk support for Cyber Cortex AV boards
USB: ch341: set tty baud speed according to tty struct
USB: serial: cp210x: Adding Seletek device id's
USB: pl2303: disable break on shutdown
USB: mxuport: fix null deref when used as a console
USB: serial: clean up bus probe error handling
USB: serial: fix port attribute-creation race
USB: serial: fix tty-device error handling at probe
USB: serial: fix potential use-after-free after failed probe
USB: console: add dummy __module_get
USB: ftdi_sio: add PIDs for Actisense USB devices
Revert "USB: serial: make bulk_out_size a lower limit"
cdc-acm: Add support for Denso cradle CU-321
usb-storage: support for more than 8 LUNs
uas: Add US_FL_NO_REPORT_OPCODES for JMicron JMS539
USB: usbfs: don't leak kernel data in siginfo
xhci: Clear the host side toggle manually when endpoint is 'soft reset'
xhci: Allocate correct amount of scratchpad buffers
...
Here are some tty and serial driver fixes for 4.0-rc3.
Along with the atime fix that you know about, here are some other serial
driver bugfixes as well. Most notable is a wait_until_sent bugfix that
was traced back to being around since before 2.6.12 that Johan has fixed
up.
All have been in linux-next successfully.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlT8RCYACgkQMUfUDdst+yk62QCgycxS4giC2hyRver3dyvaNR6g
zYYAn2w0uRndW+AqP4Tls54isRz6owpF
=gA2k
-----END PGP SIGNATURE-----
Merge tag 'tty-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some tty and serial driver fixes for 4.0-rc3.
Along with the atime fix that you know about, here are some other
serial driver bugfixes as well. Most notable is a wait_until_sent
bugfix that was traced back to being around since before 2.6.12 that
Johan has fixed up.
All have been in linux-next successfully"
* tag 'tty-4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
TTY: fix tty_wait_until_sent maximum timeout
TTY: fix tty_wait_until_sent on 64-bit machines
USB: serial: fix infinite wait_until_sent timeout
TTY: bfin_jtag_comm: remove incorrect wait_until_sent operation
net: irda: fix wait_until_sent poll timeout
serial: uapi: Declare all userspace-visible io types
serial: core: Fix iotype userspace breakage
serial: sprd: Fix missing spin_unlock in sprd_handle_irq()
console: Fix console name size mismatch
tty: fix up atime/mtime mess, take four
serial: 8250_dw: Fix get_mctrl behaviour
serial:8250:8250_pci: delete unneeded quirk entries
serial:8250:8250_pci: fix redundant entry report for WCH_CH352_2S
Change email address for 8250_pci
serial: 8250: Revert "tty: serial: 8250_core: read only RX if there is something in the FIFO"
Revert "tty/serial: of_serial: add DT alias ID handling"
Define macros for GITS_CTLR fields to avoid using magic numbers.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Yun Wu <wuyun.wu@huawei.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/1425659870-11832-11-git-send-email-marc.zyngier@arm.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>
The ITS table allocator is only allocating a single page per table.
This works fine for most things, but leads to silent lack of
interrupt delivery if we end-up with a device that has an ID that is
out of the range defined by a single page of memory. Even worse, depending
on the page size, behaviour changes, which is not a very good experience.
A solution is actually to allocate memory for the full range of ID that
the ITS supports. A massive waste memory wise, but at least a safe bet.
Tested on a Phytium SoC.
Tested-by: Chen Baozi <chenbaozi@kylinos.com.cn>
Acked-by: Chen Baozi <chenbaozi@kylinos.com.cn>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Link: https://lkml.kernel.org/r/1425659870-11832-3-git-send-email-marc.zyngier@arm.com
Signed-off-by: Jason Cooper <jason@lakedaemon.net>