Граф коммитов

2576 Коммитов

Автор SHA1 Сообщение Дата
K. Y. Srinivasan 90ab9d5510 x86, hyperv: Correctly guard the local APIC calibration code
The code that gets the local APIC timer frequency from the hypervisor
rather depends on there being a local APIC.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1381444224-3303-1-git-send-email-kys@microsoft.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-10 15:21:38 -07:00
K. Y. Srinivasan 9e7827b5ea x86, hyperv: Get the local APIC timer frequency from the hypervisor
Hyper-V supports a mechanism for retrieving the local APIC frequency.
Use this and bypass the calibration code in the kernel . This would
allow us to boot the Linux kernel as a "modern VM" on Hyper-V where
many of the legacy devices (such as PIT) are not emulated.

I would like to thank Olaf Hering <olaf@aepfle.de>, Jan Beulich <JBeulich@suse.com> and
H. Peter Anvin <h.peter.anvin@intel.com> for their help in this effort.

In this version of the patch, I have addressed Jan's comments.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Link: http://lkml.kernel.org/r/1380554932-9888-1-git-send-email-olaf@aepfle.de
Tested-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-10-10 11:44:12 -07:00
Ingo Molnar 37bf06375c Linux 3.12-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSUc9zAAoJEHm+PkMAQRiG9DMH/AtpuAF6LlMRPjrCeuJQ1pyh
 T0IUO+CsLKO6qtM5IyweP8V6zaasNjIuW1+B6IwVIl8aOrM+M7CwRiKvpey26ldM
 I8G2ron7hqSOSQqSQs20jN2yGAqQGpYIbTmpdGLAjQ350NNNvEKthbP5SZR5PAmE
 UuIx5OGEkaOyZXvCZJXU9AZkCxbihlMSt2zFVxybq2pwnGezRUYgCigE81aeyE0I
 QLwzzMVdkCxtZEpkdJMpLILAz22jN4RoVDbXRa2XC7dA9I2PEEXI9CcLzqCsx2Ii
 8eYS+no2K5N2rrpER7JFUB2B/2X8FaVDE+aJBCkfbtwaYTV9UYLq3a/sKVpo1Cs=
 =xSFJ
 -----END PGP SIGNATURE-----

Merge tag 'v3.12-rc4' into sched/core

Merge Linux v3.12-rc4 to fix a conflict and also to refresh the tree
before applying more scheduler patches.

Conflicts:
	arch/avr32/include/asm/Kbuild

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 12:36:13 +02:00
Andi Kleen b7af41a1bc perf/x86: Suppress duplicated abort LBR records
Haswell always give an extra LBR record after every TSX abort.
Suppress the extra record.

This only works when the abort is visible in the LBR
If the original abort has already left the 16 LBR entries
the extra entry will will stay.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 10:06:16 +02:00
Andi Kleen a405bad5ad perf/x86: Add Haswell specific transaction flag reporting
In the PEBS handler report the transaction flags using the new
generic transaction flags facility. Most of them come from
the "tsx_tuning" field in PEBSv2, but the abort code is derived
from the RAX register reported in the PEBS record.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 10:06:09 +02:00
Ingo Molnar fafd883f67 Merge branch 'perf/urgent' into perf/core
Pick up the latest fixes before applying new patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 09:59:13 +02:00
Peter Zijlstra d8b11a0cbd perf/x86: Clean up cap_user_time* setting
Currently the cap_user_time_zero capability has different tests than
cap_user_time; even though they expose the exact same data.

Switch from CONSTANT && NONSTOP to sched_clock_stable to also deal
with multi cabinet machines and drop the tsc_disabled() check.. non of
this will work sanely without tsc anyway.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-nmgn0j0muo1r4c94vlfh23xy@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-04 09:58:55 +02:00
Linus Torvalds 9b565a8051 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A couple of tooling fixlets and a PMU detection printout fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix PMU detection printout when no PMU is detected
  perf symbols: Demangle cloned functions
  perf machine: Fix path unpopulated in machine__create_modules()
  perf tools: Explicitly add libdl dependency
  perf probe: Fix probing symbols with optimization suffix
  perf trace: Add mmap2 handler
  perf kmem: Make it work again on non NUMA machines
2013-09-28 14:21:13 -07:00
Ingo Molnar 8a3da6c7d0 perf/x86: Fix PMU detection printout when no PMU is detected
Ran into this cryptic PMU bootup log recently:

[    0.124047] Performance Events:
[    0.125000] smpboot: ...

Turns out we print this if no PMU is detected. Fall back to
the right condition so that the following is printed:

[    0.122381] Performance Events: no PMU driver, software events only.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-u2fwaUffakjp0qkpRfqljgsn@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 15:48:48 +02:00
Linus Torvalds bdc5663fa1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Assorted standalone fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add model number for Avoton Silvermont
  perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
  perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
  perf: Update ABI comment
  tools lib lk: Uninclude linux/magic.h in debugfs.c
  perf tools: Fix old GCC build error in trace-event-parse.c:parse_proc_kallsyms()
  perf probe: Fix finder to find lines of given function
  perf session: Check for SIGINT in more loops
  perf tools: Fix compile with libelf without get_phdrnum
  perf tools: Fix buildid cache handling of kallsyms with kcore
  perf annotate: Fix objdump line parsing offset validation
  perf tools: Fill in new definitions for madvise()/mmap() flags
  perf tools: Sharpen the libaudit dependencies test
2013-09-25 13:28:08 -07:00
Peter Zijlstra c2daa3bed5 sched, x86: Provide a per-cpu preempt_count implementation
Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

NOTE: this save/restore is required even for !PREEMPT kernels as
cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore
task_struct::state.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:57 +02:00
Yan, Zheng cf3b425dd8 perf/x86/intel: Add model number for Avoton Silvermont
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-23 10:22:00 +02:00
Peter Zijlstra 92519bbc8a perf/x86/intel: Fix build warning in intel_pmu_drain_pebs_nhm()
Fengguang Wu reported this build warning:

    arch/x86/kernel/cpu/perf_event_intel_ds.c: In function 'intel_pmu_drain_pebs_nhm':
    arch/x86/kernel/cpu/perf_event_intel_ds.c:964:2: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'int'

Because pointer arithmetics result type is bitness dependent there's no natural
type to use here, cast it to long.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-jbpauwxJqtf24luewcsdFith@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 13:15:27 +02:00
Peter Zijlstra fa73158710 perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b74 ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Also-Fixed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 09:45:11 +02:00
Yan, Zheng 73c4427c6c perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
uncore_validate_group() can't call smp_processor_id() because it is
in preemptible context. Pass NUMA_NO_NODE to the allocator instead.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379400493-11505-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:54:36 +02:00
Peter Zijlstra eb8417aa70 perf/x86/intel: Remove division from the intel_pmu_drain_pebs_nhm() hot path
Only do the division in case we have to print the result out in a warning.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-43nl31erfbajwpfj254f6zji@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-20 06:53:44 +02:00
Linus Torvalds f42bcf1aa8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/lpss: Add pin control support to Intel low power subsystem
  perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB
  x86: Remove now-unused save_rest()
  x86/smpboot: Fix announce_cpu() to printk() the last "OK" properly
2013-09-18 11:26:17 -05:00
Linus Torvalds 186844b292 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix UAPI export of PERF_EVENT_IOC_ID
  perf/x86/intel: Fix Silvermont offcore masks
2013-09-18 11:22:53 -05:00
Stephane Eranian 9d8e3f9693 perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB
On Intel SNB (SNB, SNB-EP), the event MEM_LOAD_UOPS_MISS_RETIRED
supports PEBS. It was missing for the SNB PEBS event constraint
table thereby preventing any measurement with PEBS for it.

This patch adds the event to the PEBS table for SNB.

WARNING: it should be noted that this event like a few others
are subject to the erratum BT241 for Xeon E5 (SNB-EP). As such,
the event may undercount when used with PEBS unless the
workaround is implemented. But without this patch and just the
workaround, the kernel would not allow precise sampling on this
event. BT241 is documented in:

  http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130913201646.GA23981@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-14 08:00:18 +02:00
Linus Torvalds 75acebf242 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Various fixes.

  The -g perf report lockup you reported is only partially addressed,
  patches that fix the excessive runtime are still being worked on"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix uncore PCI fixed counter handling
  uprobes: Fix utask->depth accounting in handle_trampoline()
  perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING
  perf: Fix up MMAP2 buffer space reservation
  perf tools: Add attr->mmap2 support
  perf kvm: Fix sample_type manipulation
  perf evlist: Fix id pos in perf_evlist__open()
  perf trace: Handle perf.data files with no tracepoints
  perf session: Separate progress bar update when processing events
  perf trace: Check if MAP_32BIT is defined
  perf hists: Fix formatting of long symbol names
  perf evlist: Fix parsing with no sample_id_all bit set
  perf tools: Add test for parsing with no sample_id_all bit
  perf trace: Check control+C more often
2013-09-12 10:44:54 -07:00
Ingo Molnar 7f2ee91f54 perf/x86/intel: Clean up EVENT_ATTR_STR() muck
Make the code a bit more readable by removing stray whitespaces et al.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-lzEnychz1ylqy8zjenxOmeht@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:17:50 +02:00
Peter Zijlstra d2beea4a34 perf/x86/intel: Clean-up/reduce PEBS code
Get rid of some pointless duplication introduced by the Haswell code.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8q6y4davda9aawwv5yxe7klp@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:37 +02:00
Peter Zijlstra 2b9e344df3 perf/x86/intel: Clean up checkpoint-interrupt bits
Clean up the weird CP interrupt exception code by keeping a CP mask.

Andi suggested this implementation but weirdly didn't actually
implement it himself, do so now because it removes the conditional in
the interrupt handler and avoids the assumption its only on cnt2.

Suggested-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-dvb4q0rydkfp00kqat4p5bah@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:37 +02:00
Andi Kleen 4b2c4f1f1b perf/x86/intel: Add Haswell TSX event aliases
Add TSX event aliases, and export them from the kernel to perf.

These are used by perf stat -T and to allow
more user friendly access to events. The events are designed to
be fairly generic and may also apply to other architectures
implementing HTM.  They all cover common situations that
happens during tuning of transactional code.

For Haswell we have to separate the HLE and RTM events,
as they are separate in the PMU.

This adds the following events:

 tx-start	Count start transaction (used by perf stat -T)
 tx-commit	Count commit of transaction
 tx-abort	Count all aborts
 tx-conflict	Count aborts due to conflict with another CPU.
 tx-capacity	Count capacity aborts (transaction too large)

Then matching el-* events for HLE

 cycles-t	Transactional cycles (used by perf stat -T)
  * also exists on POWER8

 cycles-ct	Transactional cycles commited (used by perf stat -T)
  * according to Michael Ellerman POWER8 has a cycles-transactional-committed,
  * perf stat -T handles both cases

Note for useful abort profiling often precise has to be set,
as Haswell can only report the point inside the transaction
with precise=2.

For some classes of aborts, like conflicts, this is not needed,
as it makes more sense to look at the complete critical section.

This gives a clean set of generalized events to examine transaction
success and aborts. Haswell has additional events for TSX, but those
are more specialized for very specific situations.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:36 +02:00
Andi Kleen 748e86aa90 perf/x86: Report TSX transaction abort cost as weight
Use the existing weight reporting facility to report the transaction
abort cost, that is the number of cycles wasted in aborts.
Haswell reports this in the PEBS record.

This was in fact the original user for weight.

This is a very useful sort key to concentrate on the most
costly aborts and a good metric for TSX tuning.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:34 +02:00
Andi Kleen 2dbf0116aa perf/x86/intel: Avoid checkpointed counters causing excessive TSX aborts
With checkpointed counters there can be a situation where the counter
is overflowing, aborts the transaction, is set back to a non overflowing
checkpoint, causes interupt. The interrupt doesn't see the overflow
because it has been checkpointed.  This is then a spurious PMI, typically with
a ugly NMI message.  It can also lead to excessive aborts.

Avoid this problem by:

- Using the full counter width for counting counters (earlier patch)

- Forbid sampling for checkpointed counters. It's not too useful anyways,
  checkpointing is mainly for counting. The check is approximate
  (to still handle KVM), but should catch the majority of cases.

- On a PMI always set back checkpointed counters to zero.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:13:33 +02:00
Peter Zijlstra 06c939c1f4 perf/x86/intel: Fix Silvermont offcore masks
Fengguang Wu reported:

> sparse warnings: (new ones prefixed by >>)
>
> >> arch/x86/kernel/cpu/perf_event_intel.c:901:9: sparse: constant 0x768005ffff is so big it is long
> >> arch/x86/kernel/cpu/perf_event_intel.c:902:9: sparse: constant 0x768005ffff is so big it is long
>
> vim +901 arch/x86/kernel/cpu/perf_event_intel.c
>
>    895	 },
>    896	};
>    897
>    898	static struct extra_reg intel_slm_extra_regs[] __read_mostly =
>    899	{
>    900		/* must define OFFCORE_RSP_X first, see intel_fixup_er() */
>  > 901		INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x768005ffff, RSP_0),
>  > 902		INTEL_UEVENT_EXTRA_REG(0x02b7, MSR_OFFCORE_RSP_1, 0x768005ffff, RSP_1),
>    903		EVENT_EXTRA_END
>    904	};
>    905

Extend those constants to 64 bits.

Reported-by: fengguang.wu@intel.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130909112636.GQ31370@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 19:12:27 +02:00
Stephane Eranian dbc33f7016 perf/x86: Fix uncore PCI fixed counter handling
There was a bug in the handling of SNB-EP/IVB-EP uncore PCI
fixed counters, e.g., IMC.

It would cause erratic values to be returned for the IMC
clockticks event. This was due to a bogus hwc->config value
which was then written to PCI config space.

The erratic values can be seen via:

  $ perf stat -a -C 0 -e uncore_imc_0/clockticks/ -I 1000 sleep 10

The fixed counter has most fields marked as reserved with
hw reset values of 0. Yet the kernel was defaulting to a
hwc->config = ~0 and that was causing the issues.

This patch sets the hwc->config values for fixed uncore event
to 0. Now, the values of IMC clockticks is correct.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: peterz@infradead.org
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130909195350.GA17643@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 08:42:37 +02:00
Stephane Eranian 6113af14c8 perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING
The IvyBridge event CYCLE_ACTIVITY:CYCLES_LDM_PENDING can only
be measured on counters 0-3 when HT is off. When HT is on, you
only have counters 0-3.

If you program it on the eight counters for 1s on a 3GHz
IVB laptop running a noploop, you see:

           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
           2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
       3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING

Clearly the last 4 values are bogus.

Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Cc: dhsharp@google.com
Link: http://lkml.kernel.org/r/20130911152222.GA28761@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-12 07:58:26 +02:00
Dave Hansen 6df46865ff mm: vmstats: track TLB flush stats on UP too
The previous patch doing vmstats for TLB flushes ("mm: vmstats: tlb flush
counters") effectively missed UP since arch/x86/mm/tlb.c is only compiled
for SMP.

UP systems do not do remote TLB flushes, so compile those counters out on
UP.

arch/x86/kernel/cpu/mtrr/generic.c calls __flush_tlb() directly.  This is
probably an optimization since both the mtrr code and __flush_tlb() write
cr4.  It would probably be safe to make that a flush_tlb_all() (and then
get these statistics), but the mtrr code is ancient and I'm hesitant to
touch it other than to just stick in the counters.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:09 -07:00
Linus Torvalds b20c99eb66 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS changes from Ingo Molnar:
 "[ The reason for drivers/ updates is that Boris asked for the
    drivers/edac/ changes to go via x86/ras in this cycle ]

  Main changes:

   - AMD CPUs:
      . Add ECC event decoding support for new F15h models
      . Various erratum fixes
      . Fix single-channel on dual-channel-controllers bug.

   - Intel CPUs:
      . UC uncorrectable memory error parsing fix
      . Add support for CMC (Corrected Machine Check) 'FF' (Firmware
        First) flag in the APEI HEST

   - Various cleanups and fixes"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  amd64_edac: Fix incorrect wraparounds
  amd64_edac: Correct erratum 505 range
  cpc925_edac: Use proper array termination
  x86/mce, acpi/apei: Only disable banks listed in HEST if mce is configured
  amd64_edac: Get rid of boot_cpu_data accesses
  amd64_edac: Add ECC decoding support for newer F15h models
  x86, amd_nb: Clarify F15h, model 30h GART and L3 support
  pci_ids: Add PCI device ID functions 3 and 4 for newer F15h models.
  x38_edac: Make a local function static
  i3200_edac: Make a local function static
  x86/mce: Pay no attention to 'F' bit in MCACOD when parsing 'UC' errors
  APEI/ERST: Fix error message formatting
  amd64_edac: Fix single-channel setups
  EDAC: Replace strict_strtol() with kstrtol()
  mce: acpi/apei: Soft-offline a page on firmware GHES notification
  mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
  mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
2013-09-04 11:07:04 -07:00
Linus Torvalds 05eebfb26b Merge branch 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt changes from Ingo Molnar:
 "Hypervisor signature detection cleanup and fixes - the goal is to make
  KVM guests run better on MS/Hyperv and to generalize and factor out
  the code a bit"

* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Correctly detect hypervisor
  x86, kvm: Switch to use hypervisor_cpuid_base()
  xen: Switch to use hypervisor_cpuid_base()
  x86: Introduce hypervisor_cpuid_base()
2013-09-04 11:05:13 -07:00
Linus Torvalds 2a475501b8 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/asmlinkage changes from Ingo Molnar:
 "As a preparation for Andi Kleen's LTO patchset (link time
  optimizations using GCC's -flto which build time optimization has
  steadily increased in quality over the past few years and might
  eventually be usable for the kernel too) this tree includes a handful
  of preparatory patches that make function calling convention
  annotations consistent again:

   - Mark every function without arguments (or 64bit only) that is used
     by assembly code with asmlinkage()

   - Mark every function with parameters or variables that is used by
     assembly code as __visible.

  For the vanilla kernel this has documentation, consistency and
  debuggability advantages, for the time being"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asmlinkage: Fix warning in xen asmlinkage change
  x86, asmlinkage, vdso: Mark vdso variables __visible
  x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
  x86, asmlinkage: Make dump_stack visible
  x86, asmlinkage: Make 64bit checksum functions visible
  x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
  x86, asmlinkage, apm: Make APM data structure used from assembler visible
  x86, asmlinkage: Make syscall tables visible
  x86, asmlinkage: Make several variables used from assembler/linker script visible
  x86, asmlinkage: Make kprobes code visible and fix assembler code
  x86, asmlinkage: Make various syscalls asmlinkage
  x86, asmlinkage: Make 32bit/64bit __switch_to visible
  x86, asmlinkage: Make _*_start_kernel visible
  x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
  x86, asmlinkage: Change dotraplinkage into __visible on 32bit
  x86: Fix sys_call_table type in asm/syscall.h
2013-09-04 08:42:44 -07:00
Joe Perches 7bfb7e6bdd perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
Use the convenience function instead of __GFP_ZERO.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f58599ae1a8d7b32d37e9cf283e95fba6452f7f6.1377809875.git.joe@perches.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02 08:42:49 +02:00
Yan, Zheng 1fa64180fb perf/x86: Add Silvermont (22nm Atom) support
Compared to old atom, Silvermont has offcore and has more events
that support PEBS.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374138144-17278-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02 08:42:47 +02:00
Yan, Zheng 53ad044720 perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
Silvermont (22nm Atom) has two offcore response configuration MSRs,
unlike other Intel CPU, its event code for MSR_OFFCORE_RSP_1 is 0x02b7.

To avoid complicating intel_fixup_er(), use INTEL_UEVENT_EXTRA_REG to
define MSR_OFFCORE_RSP_X. So intel_fixup_er() can find the event code
for OFFCORE_RSP_N by x86_pmu.extra_regs[N].event.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1374138144-17278-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-02 08:42:47 +02:00
Ingo Molnar aee2bce3cf Merge branch 'linus' into perf/core
Pick up the latest upstream fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-29 12:02:08 +02:00
Linus Torvalds 7067552dfb Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Two AMD microcode loader fixes and an OLPC firmware support fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Fix early microcode loading
  x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
  x86: Don't clear olpc_ofw_header when sentinel is detected
2013-08-19 09:18:29 -07:00
Yan, Zheng 77b339bce3 perf/x86/intel/uncore: Enable EV_SEL_EXT bit for PCU
This patch adds support for the SNB-EP PCU uncore PMU extra_sel_bit
(bit 21) which is missing from the documentation in Table-2.75 of
Intel Xeon Processor E5-2600 Product Family Uncore Performance
Monitoring Guide. It is referred to later in Table-2.81. Without
this selection bit explicitly enabled by the kernel, some events
such as COREx_TRANSITION_CYCLES do not count correctly.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1376375382-21350-4-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16 17:55:50 +02:00
Yan, Zheng fd1ec259ba perf/x86/intel/uncore: Add filter support for QPI boxes
The QPI uncore boxes have two pairs of MATCH/MASK registers that
user to filter packet traffic serviced by QPI link layer. These
registers are in auxiliary PCI devices.

This patch adds the auxiliary PCI devices to snbep_uncore_pci_ids
and adds field definitions for the MATCH/MASK registers.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375856245-10717-2-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16 17:55:49 +02:00
Yan, Zheng 899396cf7b perf/x86/intel/uncore: Add auxiliary pci device support
The QPI uncore boxes have two pairs of MATCH/MASK registers that
user to filter packet traffic serviced by QPI link layer. These
registers are in auxiliary PCI devices.

This patch changes the meaning of (struct pci_device_id)->driver_data.
The first 8 bits are device index of the same uncore type, the second
8 bytes are uncore type index. Auxiliary PCI device's type is defined
as UNCORE_EXTRA_PCI_DEV(0xff)

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1375856245-10717-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-16 17:55:48 +02:00
Ingo Molnar c9572f010d Linux 3.11-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSCDSjAAoJEHm+PkMAQRiGDXMIAI7Loae0Oqb1eoeJkvjyZsBS
 OJDeeEcn+k58VbxVHyRdc7hGo4yI4tUZm172SpnOaM8sZ/ehPU7zBrwJK2lzX334
 /jAM3uvVPfxA2nu0I4paNpkED/NQ8NRRsYE1iTE8dzHXOH6dA3mgp5qfco50rQvx
 rvseXpME4KIAJEq4jnyFZF5+nuHiPueM9JftPmSSmJJ3/KY9kY1LESovyWd7ttg1
 jYSVPFal9J0E+tl2UQY5g9H16GqhhjYn+39Iei6Q5P4bL4ZubQgTRQTN9nyDc06Z
 ezQtGoqZ8kEz/2SyRlkda6PzjSEhgXlc8mCL5J7AW+dMhTHHx2IrosjiCA80kG8=
 =c0rK
 -----END PGP SIGNATURE-----

Merge tag 'v3.11-rc5' into perf/core

Merge Linux 3.11-rc5, to sync up with the latest upstream fixes since -rc1.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-15 10:00:09 +02:00
Ingo Molnar 6356bb0ad6 Bit 12 may or may not be set in MCi_STATUS.MCACOD when
an uncorrected error is reported. Ignore it when checking
 error signatures.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJR/912AAoJEKurIx+X31iBz/oP/javBL5Q098ir8qj2GdlkSV0
 sPZkJ0Qd4AmmcvYgiYfE3wVL0F8mL2R3gFLcppN7wBTNqAjFOEOJb2wofwByiv+T
 0eOM9OWxPZTsoxdiCHd5pvoNuw6lkegUfmxbb9vr9Xs0JaJuhjQY/bbbUbi2ue2V
 WoghLZvcSyZBGAaFJ3tWNdPYroHp4OulbvVdgbb4+iLgaLz73zzpv1HQvBnw6CLW
 x+P5guI5FawZnS6FjPwfjIs1mKqU5fdBxGl4vHB55IPyexWOVY6i+zc9VG0EuzDE
 za+FW2v8ohJzzxNP3/u4W3ousJoXkZSrwlZvDvuEkozrM8izibmDr2JglYqnQ2sK
 5WMXL1aXHyTISWpHYiH0/218hljUhxaC5+TfNVeiZYytFULSyZOES8Em3fENx0gN
 HohTZkyBH0jNd2OZytSqS69/dvPgDIFD7qGR6KB2CaBAU+c/qH9M6g8vfaXt5Y/6
 2a0oKrZ+WXiXYgEq3PynnTHTiaHYDo2rjBN+yQDrauomopZv0qpEMEicS66G0lE3
 7+zh3CQOiv6WL9pYQxYjeIiP46H7tb+BSpbsDYDQ23++nLj61by1YXrLBJ2MsGep
 2xEDkoVE7jW5+vskcSIUfavmp7pNNnIpRsU2cb7bem44iTc2DkuHYXZtHstvQIDN
 qIwoXyi3JE2siMTs4icc
 =LbVP
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-f-bit' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE-uncorrected-error fix from Tony Luck:

 "Bit 12 may or may not be set in MCi_STATUS.MCACOD when
  an uncorrected error is reported. Ignore it when checking
  error signatures."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 19:51:43 +02:00
Torsten Kaiser 8c6b79bb12 x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86
cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().

If early microcode loading is enabled its collect_cpu_info_amd_early()
will fill ->x86 and so the fallback to boot_cpu_data is not used. But
->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in
no errata fixes getting applied and my system hangs on boot.

Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only
caller init_amd() will have a struct cpuinfo_x86 as parameter and the
set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses
that struct.

So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum()
and the broken fallback can be dropped.

[ Boris: Drop WARN_ON() since we're called only from init_amd() ]

Signed-off-by: Torsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2013-08-12 18:25:00 +02:00
Ingo Molnar 0237d7f355 Merge branch 'x86/mce' into x86/ras
Pursue a single RAS/MCE topic branch on x86.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 17:54:05 +02:00
Andi Kleen 0499bd867b perf/x86: Add Haswell ULT model number used in Macbook Air and other systems
This one was missed earlier.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1376007983-31616-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-12 12:19:58 +02:00
Andi Kleen 277d5b40b7 x86, asmlinkage: Make several variables used from assembler/linker script visible
Plus one function, load_gs_index().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-06 14:20:13 -07:00
Jason Wang 9df56f19a5 x86: Correctly detect hypervisor
We try to handle the hypervisor compatibility mode by detecting hypervisor
through a specific order. This is not robust, since hypervisors may implement
each others features.

This patch tries to handle this situation by always choosing the last one in the
CPUID leaves. This is done by letting .detect() return a priority instead of
true/false and just re-using the CPUID leaf where the signature were found as
the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
has the highest priority. Other sophisticated detection method could also be
implemented on top.

Suggested by H. Peter Anvin and Paolo Bonzini.

Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Doug Covelli <dcovelli@vmware.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-08-05 06:35:33 -07:00
Vince Weaver c9601247f8 perf/x86: Fix intel QPI uncore event definitions
John McCalpin reports that the "drs_data" and "ncb_data" QPI
uncore events are missing the "extra bit" and always return zero
values unless the bit is properly set.

More details from him:

 According to the Xeon E5-2600 Product Family Uncore Performance
 Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events
 (including the ones that "perf" calls "drs_data" and "ncb_data") require
 that the "extra bit" be set.

 This was confusing for a while -- a note at the bottom of page 94 says
 that the "extra bit" is bit 16 of the control register.
 Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must
 be zero.  Looking around a bit, I found that bit 21 appears to be the
 correct "extra bit", and further investigation shows that "perf" actually
 agrees with me:
	[root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event
	config:0-7,21

 So the command
	# perf -e "uncore_qpi_0/event=drs_data/"
 Is the same as
	# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
 While it should be
	# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"

 I confirmed that this last version gives results that agree with the
 amount of data that I expected the STREAM benchmark to move across the QPI
 link in the second (cross-chip) test of the original script.

Reported-by: John McCalpin <mccalpin@tacc.utexas.edu>
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: zheng.z.yan@intel.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-08-05 11:32:08 +02:00
Tony Luck 1a7f0e3c4f x86/mce: Fix mce regression from recent cleanup
In commit 33d7885b59
   x86/mce: Update MCE severity condition check

We simplified the rules to recognise each classification of recoverable
machine check combining the instruction and data fetch rules into a
single entry based on clarifications in the June 2013 SDM that all
recoverable events would be reported on the unaffected processor with
MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1.  Unfortunately the simplified
rule has a couple of bugs.  Fix them here.

Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-29 11:23:27 -07:00
Adrian Hunter c73deb6aec perf/x86: Add ability to calculate TSC from perf sample timestamps
For modern CPUs, perf clock is directly related to TSC.  TSC
can be calculated from perf clock and vice versa using a simple
calculation.  Two of the three componenets of that calculation
are already exported in struct perf_event_mmap_page.  This patch
exports the third.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1372425741-1676-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-23 12:17:45 +02:00
Paul Gortmaker 148f9bb877 x86: delete __cpuinit usage from all x86 files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit  -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings.  In any case, they are temporary and harmless.

This removes all the arch/x86 uses of the __cpuinit macros from
all C files.  x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:56 -04:00
Linus Torvalds 8cbd0eefca Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management updates from Zhang Rui:
 "There are not too many changes this time, except two new platform
  thermal drivers, ti-soc-thermal driver and x86_pkg_temp_thermal
  driver, and a couple of small fixes.

  Highlights:

   - move the ti-soc-thermal driver out of the staging tree to the
     thermal tree.

   - introduce the x86_pkg_temp_thermal driver.  This driver registers
     CPU digital temperature package level sensor as a thermal zone.

   - small fixes/cleanups including removing redundant use of
     platform_set_drvdata() and of_match_ptr for all platform thermal
     drivers"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (34 commits)
  thermal: cpu_cooling: fix stub function
  thermal: ti-soc-thermal: use standard GPIO DT bindings
  thermal: MAINTAINERS: Add git tree path for SoC specific updates
  thermal: fix x86_pkg_temp_thermal.c build and Kconfig
  Thermal: Documentation for x86 package temperature thermal driver
  Thermal: CPU Package temperature thermal
  thermal: consider emul_temperature while computing trend
  thermal: ti-soc-thermal: add DT example for DRA752 chip
  thermal: ti-soc-thermal: add dra752 chip to device table
  thermal: ti-soc-thermal: add thermal data for DRA752 chips
  thermal: ti-soc-thermal: remove usage of IS_ERR_OR_NULL
  thermal: ti-soc-thermal: freeze FSM while computing trend
  thermal: ti-soc-thermal: remove external heat while extrapolating hotspot
  thermal: ti-soc-thermal: update DT reference for OMAP5430
  x86, mcheck, therm_throt: Process package thresholds
  thermal: cpu_cooling: fix 'descend' check in get_property()
  Thermal: spear: Remove redundant use of of_match_ptr
  Thermal: kirkwood: Remove redundant use of of_match_ptr
  Thermal: dove: Remove redundant use of of_match_ptr
  Thermal: armada: Remove redundant use of of_match_ptr
  ...
2013-07-11 12:26:08 -07:00
Linus Torvalds a9642fa351 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two small fixlets"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix interrupt handler timing harness
  perf/x86/amd: Do not print an error when the device is not present
2013-07-10 16:04:38 -07:00
Linus Torvalds 2e17c5a97e Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Okay this is the big one, I was stalled on the fbdev pull req as I
  stupidly let fbdev guys merge a patch I required to fix a warning with
  some patches I had, they ended up merging the patch from the wrong
  place, but the warning should be fixed.  In future I'll just take the
  patch myself!

  Outside drm:

  There are some snd changes for the HDMI audio interactions on haswell,
  they've been acked for inclusion via my tree.  This relies on the
  wound/wait tree from Ingo which is already merged.

  Major changes:

  AMD finally released the dynamic power management code for all their
  GPUs from r600->present day, this is great, off by default for now but
  also a huge amount of code, in fact it is most of this pull request.

  Since it landed there has been a lot of community testing and Alex has
  sent a lot of fixes for any bugs found so far.  I suspect radeon might
  now be the biggest kernel driver ever :-P p.s.  radeon.dpm=1 to enable
  dynamic powermanagement for anyone.

  New drivers:

  Renesas r-car display unit.

  Other highlights:

   - core: GEM CMA prime support, use new w/w mutexs for TTM
     reservations, cursor hotspot, doc updates
   - dvo chips: chrontel 7010B support
   - i915: Haswell (fbc, ips, vecs, watermarks, audio powerwell),
     Valleyview (enabled by default, rc6), lots of pll reworking, 30bpp
     support (this time for sure)
   - nouveau: async buffer object deletion, context/register init
     updates, kernel vp2 engine support, GF117 support, GK110 accel
     support (with external nvidia ucode), context cleanups.
   - exynos: memory leak fixes, Add S3C64XX SoC series support, device
     tree updates, common clock framework support,
   - qxl: cursor hotspot support, multi-monitor support, suspend/resume
     support
   - mgag200: hw cursor support, g200 mode limiting
   - shmobile: prime support
   - tegra: fixes mostly

  I've been banging on this quite a lot due to the size of it, and it
  seems to okay on everything I've tested it on."

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (811 commits)
  drm/radeon/dpm: implement vblank_too_short callback for si
  drm/radeon/dpm: implement vblank_too_short callback for cayman
  drm/radeon/dpm: implement vblank_too_short callback for btc
  drm/radeon/dpm: implement vblank_too_short callback for evergreen
  drm/radeon/dpm: implement vblank_too_short callback for 7xx
  drm/radeon/dpm: add checks against vblank time
  drm/radeon/dpm: add helper to calculate vblank time
  drm/radeon: remove stray line in old pm code
  drm/radeon/dpm: fix display_gap programming on rv7xx
  drm/nvc0/gr: fix gpc firmware regression
  drm/nouveau: fix minor thinko causing bo moves to not be async on kepler
  drm/radeon/dpm: implement force performance level for TN
  drm/radeon/dpm: implement force performance level for ON/LN
  drm/radeon/dpm: implement force performance level for SI
  drm/radeon/dpm: implement force performance level for cayman
  drm/radeon/dpm: implement force performance levels for 7xx/eg/btc
  drm/radeon/dpm: add infrastructure to force performance levels
  drm/radeon: fix surface setup on r1xx
  drm/radeon: add support for 3d perf states on older asics
  drm/radeon: set default clocks for SI when DPM is disabled
  ...
2013-07-09 16:04:31 -07:00
Naveen N. Rao c3d1fb567a mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
The Corrected Machine Check structure (CMC) in HEST has a flag which can be
set by the firmware to indicate to the OS that it prefers to process the
corrected error events first. In this scenario, the OS is expected to not
monitor for corrected errors (through CMCI/polling). Instead, the firmware
notifies the OS on corrected error events through GHES.

Linux already has support for GHES. This patch adds support for parsing CMC
structure and to disable CMCI/polling if the firmware first flag is set.

Further, the list of machine check bank structures at the end of CMC is used
to determine which MCA banks function in FF mode, so that we continue to
monitor error events on the other banks.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-07-08 11:53:01 -07:00
Peter Zijlstra 100ac53315 perf/x86/amd: Do not print an error when the device is not present
As Linus said its not an error to not have an AMD IOMMU; esp.
when you're not even running on an AMD platform.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Link: http://lkml.kernel.org/r/20130703075542.GF23916@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-07-05 08:27:15 +02:00
Linus Torvalds 80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Linus Torvalds 7f0ef0267e Merge branch 'akpm' (updates from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
 - various misc bits
 - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
   distracted.  There has been quite a bit of activity.
 - About half the MM queue
 - Some backlight bits
 - Various lib/ updates
 - checkpatch updates
 - zillions more little rtc patches
 - ptrace
 - signals
 - exec
 - procfs
 - rapidio
 - nbd
 - aoe
 - pps
 - memstick
 - tools/testing/selftests updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
  tools/testing/selftests: don't assume the x bit is set on scripts
  selftests: add .gitignore for kcmp
  selftests: fix clean target in kcmp Makefile
  selftests: add .gitignore for vm
  selftests: add hugetlbfstest
  self-test: fix make clean
  selftests: exit 1 on failure
  kernel/resource.c: remove the unneeded assignment in function __find_resource
  aio: fix wrong comment in aio_complete()
  drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
  drivers/memstick/host/r592.c: convert to module_pci_driver
  drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
  pps-gpio: add device-tree binding and support
  drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
  drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
  drivers/parport/share.c: use kzalloc
  Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
  aoe: update internal version number to v83
  aoe: update copyright date
  aoe: perform I/O completions in parallel
  ...
2013-07-03 17:12:13 -07:00
Jiang Liu 46a841329a mm/x86: prepare for removing num_physpages and simplify mem_init()
Prepare for removing num_physpages and simplify mem_init().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:38 -07:00
Linus Torvalds a9f4a7005f Thermal limit warnings are too scary and cause unnecessary concern
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRzhAdAAoJEKurIx+X31iByQEP/1eAEHNiSCrrKUm/ovGCiVBZ
 w+ZRJkcKYBj+/3cQUeMds7OevmM8fSU5t+4zGpikobQ75xCjz7vJ6nQNLrSylqQG
 Z4weVme6a529szRwv6H9an+psj5wP0yWAbCgBSw4/O3cogDS0eMD8T6ZhEi8mWJ9
 PRSdqOLdxl7AZpJpJpP3GF7VQLJLDbVv56FcKgAZ3mtFM6qr7HFiTtnTc+Jl33V5
 bM7v23HXyoyRk+JOUMtO+4BlrkKzbtL4Zh8aGw4s71QX3zZbP0BIw9sKffNMD+2k
 oYXOcJjTwbtxQqXKoSKOtvLUQzf/SDqzlFf7yU38Spw+nRECPhVVRO0aAXKOmFKD
 74qyNuW6ZXMw7xH4M7iG6MuM/cZIvueDtp0RSZVc6Gmb5CGb1aXz4GeotvKF5Bev
 nVL65UVo6XwfB4m77ayPYqfD6KX0BsWHdCzWABNl8WbgyCY5dXRJJs/c9iLCl05I
 EXRgWLJK/H/wOY9mj3NGcvRicYJigZP2luJEdwn8mVlpZl+3wstDcgq9Ijxr0LGi
 bj2H9ro3vSaXpAnJw3FR9QbXgH/8Nbb8sV7yBddy9zntrnfEfvrYtrhOKR+fPrQL
 C5zXeaXGEnqCEcYNESSjyhDCa1az49BlJC5ebEHHqdviHzRpVVi/EG920GViWw+D
 z+z25N5hi4yoUF/+1vc/
 =Dm06
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull thermal power-limit update from Tony Luck:
 "Thermal limit warnings are too scary and cause unnecessary concern"

* tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  x86 thermal: Disable power limit notification interrupt by default
  x86 thermal: Delete power-limit-notification console messages
2013-07-03 11:16:09 -07:00
Linus Torvalds 96a3d998fb Merge branch 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tracing updates from Ingo Molnar:
 "This tree adds IRQ vector tracepoints that are named after the handler
  and which output the vector #, based on a zero-overhead approach that
  relies on changing the IDT entries, by Seiji Aguchi.

  The new tracepoints look like this:

   # perf list | grep -i irq_vector
    irq_vectors:local_timer_entry                      [Tracepoint event]
    irq_vectors:local_timer_exit                       [Tracepoint event]
    irq_vectors:reschedule_entry                       [Tracepoint event]
    irq_vectors:reschedule_exit                        [Tracepoint event]
    irq_vectors:spurious_apic_entry                    [Tracepoint event]
    irq_vectors:spurious_apic_exit                     [Tracepoint event]
    irq_vectors:error_apic_entry                       [Tracepoint event]
    irq_vectors:error_apic_exit                        [Tracepoint event]
   [...]"

* 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tracing: Add config option checking to the definitions of mce handlers
  trace,x86: Do not call local_irq_save() in load_current_idt()
  trace,x86: Move creation of irq tracepoints from apic.c to irq.c
  x86, trace: Add irq vector tracepoints
  x86: Rename variables for debugging
  x86, trace: Introduce entering/exiting_irq()
  tracing: Add DEFINE_EVENT_FN() macro
2013-07-02 16:31:49 -07:00
Linus Torvalds 3045f94a20 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The changes in this tree are:

   - ACPI APEI (ACPI Platform Error Interface) improvements, by Chen
     Gong
   - misc MCE fixes/cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Update MCE severity condition check
  mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
  ACPI/APEI: Update einj documentation for param1/param2
  ACPI/APEI: Add parameter check before error injection
  ACPI, APEI, EINJ: Fix error return code in einj_init()
  x86, mce: Fix "braodcast" typo
2013-07-02 16:30:46 -07:00
Linus Torvalds 1982269a5c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "Misc improvements:

   - Fix /proc/mtrr reporting
   - Fix ioremap printout
   - Remove the unused pvclock fixmap entry on 32-bit
   - misc cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioremap: Correct function name output
  x86: Fix /proc/mtrr with base/size more than 44bits
  ix86: Don't waste fixmap entries
  x86/mm: Drop unneeded include <asm/*pgtable, page*_types.h>
  x86_64: Correct phys_addr in cleanup_highmap comment
2013-07-02 16:29:05 -07:00
Linus Torvalds d652df0b2f Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FPU changes from Ingo Molnar:
 "There are two bigger changes in this tree:

   - Add an [early-use-]safe static_cpu_has() variant and other
     robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS
     configurable debugging facility, motivated by recent obscure FPU
     code bugs, by Borislav Petkov

   - Reimplement FPU detection code in C and drop the old asm code, by
     Peter Anvin."

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, fpu: Use static_cpu_has_safe before alternatives
  x86: Add a static_cpu_has_safe variant
  x86: Sanity-check static_cpu_has usage
  x86, cpu: Add a synthetic, always true, cpu feature
  x86: Get rid of ->hard_math and all the FPU asm fu
2013-07-02 16:26:44 -07:00
Linus Torvalds 35c23d5d79 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Two changes:

   - Extend 32-bit double fault debugging aid to 64-bit
   - Fix a build warning"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel/cacheinfo: Shut up last long-standing warning
  x86: Extend #DF debugging aid to 64-bit
2013-07-02 16:24:24 -07:00
Linus Torvalds 002e44bfb5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull asm/x86 changes from Ingo Molnar:
 "Misc changes, with a bigger processor-flags cleanup/reorganization by
  Peter Anvin"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, asm, cleanup: Replace open-coded control register values with symbolic
  x86, processor-flags: Fix the datatypes and add bit number defines
  x86: Rename X86_CR4_RDWRGSFS to X86_CR4_FSGSBASE
  x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED
  linux/const.h: Add _BITUL() and _BITULL()
  x86/vdso: Convert use of typedef ctl_table to struct ctl_table
  x86: __force_order doesn't need to be an actual variable
2013-07-02 16:21:45 -07:00
Linus Torvalds f0bb4c0ab0 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel improvements:

   - watchdog driver improvements by Li Zefan
   - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
   - event multiplexing via hrtimers and other improvements by Stephane
     Eranian
   - kernel stack use optimization by Andrew Hunter
   - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
   - NMI handling rate-limits by Dave Hansen
   - various hw_breakpoint fixes by Oleg Nesterov
   - hw_breakpoint overflow period sampling and related signal handling
     fixes by Jiri Olsa
   - Intel Haswell PMU support by Andi Kleen

  Tooling improvements:

   - Reset SIGTERM handler in workload child process, fix from David
     Ahern.
   - Makefile reorganization, prep work for Kconfig patches, from Jiri
     Olsa.
   - Add automated make test suite, from Jiri Olsa.
   - Add --percent-limit option to 'top' and 'report', from Namhyung
     Kim.
   - Sorting improvements, from Namhyung Kim.
   - Expand definition of sysfs format attribute, from Michael Ellerman.

  Tooling fixes:

   - 'perf tests' fixes from Jiri Olsa.
   - Make Power7 CPI stack events available in sysfs, from Sukadev
     Bhattiprolu.
   - Handle death by SIGTERM in 'perf record', fix from David Ahern.
   - Fix printing of perf_event_paranoid message, from David Ahern.
   - Handle realloc failures in 'perf kvm', from David Ahern.
   - Fix divide by 0 in variance, from David Ahern.
   - Save parent pid in thread struct, from David Ahern.
   - Handle JITed code in shared memory, from Andi Kleen.
   - Fixes for 'perf diff', from Jiri Olsa.
   - Remove some unused struct members, from Jiri Olsa.
   - Add missing liblk.a dependency for python/perf.so, fix from Jiri
     Olsa.
   - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
   - No need to do locking when adding hists in perf report, only 'top'
     needs that, from Namhyung Kim.
   - Fix alignment of symbol column in in the hists browser (top,
     report) when -v is given, from NAmhyung Kim.
   - Fix 'perf top' -E option behavior, from Namhyung Kim.
   - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
   - Fix compile errors in bp_signal 'perf test', from Sukadev
     Bhattiprolu.

  ... and more things"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
  perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
  perf/x86: Fix shared register mutual exclusion enforcement
  perf/x86/intel: Support full width counting
  x86: Add NMI duration tracepoints
  perf: Drop sample rate when sampling is too slow
  x86: Warn when NMI handlers take large amounts of time
  hw_breakpoint: Introduce "struct bp_cpuinfo"
  hw_breakpoint: Simplify *register_wide_hw_breakpoint()
  hw_breakpoint: Introduce cpumask_of_bp()
  hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
  hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
  perf/x86/intel: Add mem-loads/stores support for Haswell
  perf/x86/intel: Support Haswell/v4 LBR format
  perf/x86/intel: Move NMI clearing to end of PMI handler
  perf/x86/intel: Add Haswell PEBS support
  perf/x86/intel: Add simple Haswell PMU support
  perf/x86/intel: Add Haswell PEBS record support
  perf/x86/intel: Fix sparse warning
  perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
  perf/x86/amd: Add IOMMU Performance Counter resource management
  ...
2013-07-02 16:15:23 -07:00
Ingo Molnar fb476cffd5 Changes to simplify the SDM mean that we can also simplify
the code for SRAR (software recoverable action required) errors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRzKq7AAoJEKurIx+X31iBviQP+gJ6b9BemXj8/TLjWN3P3mAB
 bQHU9imlSXmggfauC1C4/b60k4G/D8d1hN1Lboak7TbFvY69ONcSodY9w6O9Lknb
 r83xT0oUIe5IYWqC42+godt8P59ed10lUa8F5eaA9DhhzcsCWO2GG/ITqCmyjEom
 zJGKlLytq6ViJf+6ig3gTmWgHU4tvlJvOPh/O0WNWXH9kbOtnsLT8ImY9GMO/R1B
 LdItg2nxbDZmhjLOsKamaj2YpPbZ9Vtvam9CnQwirbUAkdGZfmRsq+K9y8O5MjHJ
 GRqOKwQ3BdKBUIEu36BxgIGHSg8hWYSefGD8nz8IP+QHwy5fx7SREdgx5M+AMSFt
 8DewSUij2wMwH/vuP4KrdYWva6LNMBtR+NItCkzpdY5ykcdoSzo06nCotGcUrxVN
 iEmYywZzO3CTYJj0lWqdrhhiFln/9ZrhXaWUbRoX5m3HAQK7XeTPT+3O7Vs5fWDN
 3Fb/0EzyJjmv+Mysl8/PvR9umJs3ELK5MiVLIAIkuozh8+L33byk/WX+VOfqTsY4
 KXYDm6fck3tRfn+PJGrAMSwIEyhgVJArRJC39PydEUIwjlel6DRU0Rs26wQXtFh8
 DHxTIiEubmvctvi3DnHOztOI3d3EkAayU6Dk4cNSy46FwujJC0v+07EQav6BLA3e
 x1HK+3PVVSQYuqFIeyWa
 =PJXS
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE cleanup from Tony Luck:

 "Changes to simplify the SDM means that we can also simplify
  the code for SRAR (software recoverable action required) errors."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-28 11:09:26 +02:00
Chen Gong 33d7885b59 x86/mce: Update MCE severity condition check
Update some SRAR severity conditions check to make it clearer,
according to latest Intel SDM Vol 3(June 2013), table 15-20.

Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-27 13:45:52 -07:00
Dave Airlie 4300a0f8bd Linux 3.10-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEbBAABAgAGBQJRxf9cAAoJEHm+PkMAQRiGMWkH911xM4gRmFgE7SqVW4F4AWBm
 ngcqMqNy9IdqKfibORUUDvVfEa5gjD5ai2quIKpfQiaukbpQJ696H90ijuAkajLn
 DQBrN243s0pzhhc/quWINnWxsFQ613JjdUMUMaD7e9A1aKjYzWrPGt/tSjrFXGCP
 tArTupVzc/iOmnEQDKiROI/Nokq44QJ36aTGPM7n08xMtpKmkCXM+9/UosBteB0O
 HVI33dmjwz7i55fI53XAWyuZCE+gSEnA4z8spJ9LfXso2W14V+roc+GuL6OyeeTI
 pCn/+4niVPb4B0ROZlpyVmdZjbPPcMMEK5o+BSJI68SH6LHZTQh2iVuqYfpSyA==
 =uUH5
 -----END PGP SIGNATURE-----

Merge tag 'v3.10-rc7' into drm-next

Linux 3.10-rc7

The sdvo lvds fix in this -fixes pull

commit c3456fb3e4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon Jun 10 09:47:58 2013 +0200

    drm/i915: prefer VBT modes for SVDO-LVDS over EDID

has a silent functional conflict with

commit 990256aec2
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Fri May 31 12:17:07 2013 +0000

    drm: Add probed modes in probe order

in drm-next. W simply need to add the vbt modes before edid modes, i.e. the
other way round than now.

Conflicts:
	drivers/gpu/drm/drm_prime.c
	drivers/gpu/drm/i915/intel_sdvo.c
2013-06-27 20:40:44 +10:00
Stephane Eranian 983433b581 perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
Make sure intel_pmu_pebs_disable() and intel_pmu_pebs_enable()
are symmetrical w.r.t. PEBS-LL and precise store.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1371824448-7306-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:51 +02:00
Stephane Eranian 2f7f73a520 perf/x86: Fix shared register mutual exclusion enforcement
This patch fixes a problem with the shared registers mutual
exclusion code and incremental event scheduling by the
generic perf_event code.

There was a bug whereby the mutual exclusion on the shared
registers was not enforced because of incremental scheduling
abort due to event constraints. As an example on Intel
Nehalem, consider the following events:

group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
group2= L1D_CACHE_LD:I_STATE

The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
are 3 instances here. The first group can be scheduled and is committed.
Then, the generic code tries to schedule group2 and this fails (because
there is no more counter to support the 3rd instance of L1D_CACHE_LD).
But in x86_schedule_events() error path, put_event_contraints() is invoked
on ALL the events and not just the ones that just failed. That causes the
"lock" on the shared offcore_response MSR to be released. Yet the first group
is actually scheduled and is exposed to reprogramming of that shared msr by
the sibling HT thread. In other words, there is no guarantee on what is
measured.

This patch fixes the problem by tagging committed events with the
PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
only the events NOT tagged have their constraint released. The tag
is eventually removed when the event in descheduled.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130620164254.GA3556@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 21:58:49 +02:00
Andi Kleen 069e0c3c40 perf/x86/intel: Support full width counting
Recent Intel CPUs like Haswell and IvyBridge have a new
alternative MSR range for perfctrs that allows writing the full
counter width. Enable this range if the hardware reports it
using a new capability bit.

Currently the perf code queries CPUID to get the counter width,
and sign extends the counter values as needed. The traditional
PERFCTR MSRs always limit to 32bit, even though the counter
internally is larger (usually 48 bits on recent CPUs)

When the new capability is set use the alternative range which
do not have these restrictions.

This lowers the overhead of perf stat slightly because it has to
do less interrupts to accumulate the counter value. On Haswell
it also avoids some problems with TSX aborting when the end of
the counter range is reached.

( See the patch "perf/x86/intel: Avoid checkpointed counters
  causing excessive TSX aborts" for more details. )

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 11:59:25 +02:00
Ingo Molnar ca02c21674 Better comments so we understand our existing machine check
bank bitmaps - prelude to adding another bitmap soon.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRygQkAAoJEKurIx+X31iB5RsP/R6w/X4wbKlXD8K8e2qhSXU2
 oYVtaMN46M+xLRdNDdE1Y+clVKDQTuapfrlOtBLZVHIkNfqF5jAtu2O1wTERpx+F
 7lDHuWuFbddbqV2vqE6RJ6ZqsUTsrlzrqLSLWGSoHn36DlnkSm+L5vDDG75QzV+w
 fPpGpJTM3kRxPUHnmwvfniJxTiBjsli6FDZ1vGq4Ingu+xxOJDU6+FdWts08hn4Y
 +KLjs1JMJSdo3di05T6t4ARKBgW3B1lJnrIS6fGYMmax+kHQqO/zvzHs3A86LZyN
 7L0toEoZHa8VRMN6C1mw0XsFwPmyMOsrHrRpBlFgHpC7QLAzx6LkxchyDBqPL0mo
 W4bnoyu1QZUvblq1mrsnJa9yeyMjSHAeq2XTBj8Pbv20AKzmCbNl3aJ5tCDhPj4Y
 A+e8vk/tq8zWCjdf3SV/D+wjgEtkeWALZZj70maufu9AKtKXy5repa6DZCKEhyIC
 yh72c3gZ25IkxjwcHmJh+CYaKll9MgdW5toZkftBin2iOdWSaHS9QX2r4I5Awvbi
 m0Iwq5Fg8DSs3AhBLxiJi8L7N+RNa+7GoTH0z6UDtfAY3ExvepfwjPzLxAfTbPw6
 oK2wLfuPiNizl2DX1yNlizGD2YSRiWKoap3Iog84A3IMTIe1nJ+97GHvsVcGna8I
 zXjnZDNQThAgBswtNY1s
 =U/Kg
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-mce-bitmap-comment' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras

Pull MCE updates from Tony Luck:

 "Better comments so we understand our existing machine check
  bank bitmaps - prelude to adding another bitmap soon."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-26 10:53:45 +02:00
H. Peter Anvin a3d7b7dddc x86, asm, cleanup: Replace open-coded control register values with symbolic
Clean up an unnecessary open-coded control register values.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org
2013-06-25 16:26:06 -07:00
Naveen N. Rao 0644414e62 mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem
There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps.  Provide comments so that we all know exactly what these
are used for, and why.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-06-25 13:53:27 -07:00
Yinghai Lu d5c78673b1 x86: Fix /proc/mtrr with base/size more than 44bits
On one sytem that mtrr range is more then 44bits, in dmesg we have
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-DFFFF write-through
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   8 disabled
[    0.000000]   9 disabled

but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

so bit 44 and bit 45 get cut off.

We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.

2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.

Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.

So When are we going to have size more than 44bits? that is 16TiB.

after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

-v2: simply checking in mtrr_add_page according to hpa.

[ hpa: This probably wants to go into -stable only after having sat in
  mainline for a bit.  It is not a regression. ]

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-25 13:08:10 -07:00
Dave Hansen 14c63f17b1 perf: Drop sample rate when sampling is too slow
This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second.  If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well.  *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-23 11:52:57 +02:00
Linus Torvalds f71194a7d4 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "This series fixes a couple of build failures, and fixes MTRR cleanup
  and memory setup on very specific memory maps.

  Finally, it fixes triggering backtraces on all CPUs, which was
  inadvertently disabled on x86."

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Fix dummy variable buffer allocation
  x86: Fix trigger_all_cpu_backtrace() implementation
  x86: Fix section mismatch on load_ucode_ap
  x86: fix build error and kconfig for ia32_emulation and binfmt
  range: Do not add new blank slot with add_range_with_merge
  x86, mtrr: Fix original mtrr range get for mtrr_cleanup
2013-06-21 06:33:48 -10:00
Seiji Aguchi cf910e83ae x86, trace: Add irq vector tracepoints
[Purpose of this patch]

As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.

http://www.spinics.net/lists/mm-commits/msg85707.html

<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled.  They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.

There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers.  Tracing such events gives
us the information about IRQ interaction with other system events.

The trace also tells where the system is spending its time.  We want to
know which cores are handling interrupts and how they are affecting other
processes in the system.  Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>

On the other hand, my usecase is tracing just local timer event and
getting a value of instruction pointer.

I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.

[Patch Description]

Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.

So, add following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
   - local_timer_vector
   - reschedule_vector
   - call_function_vector
   - call_function_single_vector
   - irq_work_entry_vector
   - error_apic_vector
   - thermal_apic_vector
   - threshold_apic_vector
   - spurious_apic_vector
   - x86_platform_ipi_vector

Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
makes a zero when tracepoints are disabled. Detailed explanations are as follows.
 - Create trace irq handlers with entering_irq()/exiting_irq().
 - Create a new IDT, trace_idt_table, at boot time by adding a logic to
   _set_gate(). It is just a copy of original idt table.
 - Register the new handlers for tracpoints to the new IDT by introducing
   macros to alloc_intr_gate() called at registering time of irq_vector handlers.
 - Add checking, whether irq vector tracing is on/off, into load_current_idt().
   This has to be done below debug checking for these reasons.
   - Switching to debug IDT may be kicked while tracing is enabled.
   - On the other hands, switching to trace IDT is kicked only when debugging
     is disabled.

In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
used for other purposes.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:34 -07:00
Seiji Aguchi 629f4f9d59 x86: Rename variables for debugging
Rename variables for debugging to describe meaning of them precisely.

Also, introduce a generic way to switch IDT by checking a current state,
debug on/off.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:13 -07:00
Seiji Aguchi eddc0e922a x86, trace: Introduce entering/exiting_irq()
When implementing tracepoints in interrupt handers, if the tracepoints are
simply added in the performance sensitive path of interrupt handers,
it may cause potential performance problem due to the time penalty.

To solve the problem, an idea is to prepare non-trace/trace irq handers and
switch their IDTs at the enabling/disabling time.

So, let's introduce entering_irq()/exiting_irq() for pre/post-
processing of each irq handler.

A way to use them is as follows.

Non-trace irq handler:
smp_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	exiting_irq();		/* post-processing of this handler */

}

Trace irq_handler:
smp_trace_irq_handler()
{
	entering_irq();		/* pre-processing of this handler */
	trace_irq_entry();	/* tracepoint for irq entry */
	__smp_irq_handler();	/*
				 * common logic between non-trace and trace handlers
				 * in a vector.
				 */
	trace_irq_exit();	/* tracepoint for irq exit */
	exiting_irq();		/* post-processing of this handler */

}

If tracepoints can place outside entering_irq()/exiting_irq() as follows,
it looks cleaner.

smp_trace_irq_handler()
{
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
}

But it doesn't work.
The problem is with irq_enter/exit() being called. They must be called before
trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
any tracepoints are used, as tracepoints use  rcu to synchronize.

As a possible alternative, we may be able to call irq_enter() first as follows
if irq_enter() can nest.

smp_trace_irq_hander()
{
	irq_entry();
	trace_irq_entry();
	smp_irq_handler();
	trace_irq_exit();
	irq_exit();
}

But it doesn't work, either.
If irq_enter() is nested, it may have a time penalty because it has to check if it
was already called or not. The time penalty is not desired in performance sensitive
paths even if it is tiny.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
2013-06-20 22:25:01 -07:00
Borislav Petkov 4a90a99c4f x86: Add a static_cpu_has_safe variant
We want to use this in early code where alternatives might not have run
yet and for that case we fall back to the dynamic boot_cpu_has.

For that, force a 5-byte jump since the compiler could be generating
differently sized jumps for each label.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:38:14 -07:00
Borislav Petkov 5700f743b5 x86: Sanity-check static_cpu_has usage
static_cpu_has may be used only after alternatives have run. Before that
it always returns false if constant folding with __builtin_constant_p()
doesn't happen. And you don't want that.

This patch is the result of me debugging an issue where I overzealously
put static_cpu_has in code which executed before alternatives have run
and had to spend some time with scratching head and cursing at the
monitor.

So add a jump to a warning which screams loudly when we use this
function too early. The alternatives patch that check away in
conjunction with patching the rest of the kernel image.

[ hpa: factored this into its own configuration option.  If we want to
  have an overarching option, it should be an option which selects
  other options, not as a group option in the source code. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:37:19 -07:00
Borislav Petkov c3b83598c1 x86, cpu: Add a synthetic, always true, cpu feature
This will be used in alternatives later as an always-replace flag.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-20 17:06:07 -07:00
Borislav Petkov 719038de98 x86/intel/cacheinfo: Shut up last long-standing warning
arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘init_intel_cacheinfo’:
arch/x86/kernel/cpu/intel_cacheinfo.c:642:28: warning: ‘this_leaf.size’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/kernel/cpu/intel_cacheinfo.c:643:29: warning: ‘this_leaf.eax.split.num_threads_sharing’ may be used uninitialized in this function [-Wmaybe-uninitialized]

This keeps on happening during randbuilds and the compiler is
wrong here:

In the case where cpuid4_cache_lookup_regs() returns 0, both
this_leaf.size and this_leaf.eax get initialized. In the case
where the CPUID leaf doesn't contain valid cache info, we error
out which init_intel_cacheinfo() handles correctly without
touching the abovementioned fields.

So shut up the warning by clearing out the struct which we hand
down.

While at it, reverse error handling and gain one indentation
level.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370710095-20547-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-20 12:27:41 +02:00
Andi Kleen f9134f36ae perf/x86/intel: Add mem-loads/stores support for Haswell
mem-loads is basically the same as Sandy Bridge,
but we use a separate string for changes later.

Haswell doesn't support the full precise store mode,
so we emulate it using the "DataLA" facility.
This allows to do everything, but for data sources we
can only detect L1 hit or not.

There is no explicit enable bit anymore, so we have
to tie it to a perf internal only flag.

The address is supported for all memory related PEBS
events with DataLA. Instead of only logging for the
load and store events we allow logging it for all
(it will be simply 0 if the current event does not
support it)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-7-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen 135c5612c4 perf/x86/intel: Support Haswell/v4 LBR format
Haswell has two additional LBR from flags for TSX: in_tx and
abort_tx, implemented as a new "v4" version of the LBR format.

Handle those in and adjust the sign extension code to still
correctly extend. The flags are exported similarly in the LBR
record to the existing misprediction flag

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-6-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:35 +02:00
Andi Kleen 72db559646 perf/x86/intel: Move NMI clearing to end of PMI handler
This avoids some problems with spurious PMIs on Haswell.
Haswell seems to behave more like P4 in this regard. Do
the same thing as the P4 perf handler by unmasking
the NMI only at the end. Shouldn't make any difference
for earlier family 6 cores.

(Tested on Haswell, IvyBridge, Westmere, Saltwell (Atom).)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:34 +02:00
Andi Kleen 3044318f1f perf/x86/intel: Add Haswell PEBS support
Add simple PEBS support for Haswell.

The constraints are similar to SandyBridge with a few new
events.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen 3a632cb229 perf/x86/intel: Add simple Haswell PMU support
Similar to SandyBridge, but has a few new events and two
new counter bits.

There are some new counter flags that need to be prevented
from being set on fixed counters, and allowed to be set
for generic counters.

Also we add support for the counter 2 constraint to handle
all raw events.

(Contains fixes from Stephane Eranian.)

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:33 +02:00
Andi Kleen 130768b8c9 perf/x86/intel: Add Haswell PEBS record support
Add support for the Haswell extended (fmt2) PEBS format.

It has a superset of the nhm (fmt1) PEBS fields, but has a
longer record so we need to adjust the code paths.

The main advantage is the new "EventingRip" support which
directly gives the instruction, not off-by-one instruction. So
with precise == 2 we use that directly and don't try to use LBRs
and walking basic blocks. This lowers the overhead of using
precise significantly.

Some other features are added in later patches.

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andi Kleen <ak@linux.jf.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1371515812-9646-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 14:43:32 +02:00
Yan, Zheng b2fa344d0c perf/x86/intel: Fix sparse warning
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370421025-10986-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:55 +02:00
Suravee Suthikulpanit 7be6296fdd perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.

To invoke the AMD IOMMU PMU, issue a perf tool command such as:

  ./perf stat -a -e amd_iommu/<events>/ <command>

or:

  ./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>

For example:

  ./perf stat -a -e amd_iommu/mem_trans_total/ <command>

The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1370466709-3212-3-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 13:04:53 +02:00
Dave Hansen ae0def05ed perf/x86: Only print PMU state when also WARN()'ing
intel_pmu_handle_irq() has a warning in it if it does too many
loops.  It is a WARN_ONCE(), but the perf_event_print_debug()
call beneath it is unconditional. For the first warning, you get
a nice backtrace and message, but subsequent ones just dump the
PMU state with no leading messages.  I doubt this is what was
intended.

This patch will only print the PMU state when paired with the
WARN_ON() text.  It effectively open-codes WARN_ONCE()'s
one-time-only logic.

My suspicion is that the code really just wants to make sure we
do not sit in the loop and spit out a warning for every loop
iteration after the 100th.  From what I've seen, this is very
unlikely to happen since we also clear the PMU state.

After this patch, instead of seeing the PMU state dumped each
time, you will just see:

	[57494.894540] perf_event_intel: clearing PMU state on CPU#129
	[57579.539668] perf_event_intel: clearing PMU state on CPU#10
	[57587.137762] perf_event_intel: clearing PMU state on CPU#134
	[57623.039912] perf_event_intel: clearing PMU state on CPU#114
	[57644.559943] perf_event_intel: clearing PMU state on CPU#118
	...

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130530174559.0DB049F4@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:47 +02:00
Andrew Hunter 43b4578071 perf/x86: Reduce stack usage of x86_schedule_events()
x86_schedule_events() caches event constraints on the stack during
scheduling.  Given the number of possible events, this is 512 bytes of
stack; since it can be invoked under schedule() under god-knows-what,
this is causing stack blowouts.

Trade some space usage for stack safety: add a place to cache the
constraint pointer to struct perf_event.  For 8 bytes per event (1% of
its size) we can save the giant stack frame.

This shouldn't change any aspect of scheduling whatsoever and while in
theory the locality's a tiny bit worse, I doubt we'll see any
performance impact either.

Tested: `perf stat whatever` does not blow up and produces
results that aren't hugely obviously wrong.  I'm not sure how to run
particularly good tests of perf code, but this should not produce any
functional change whatsoever.

Signed-off-by: Andrew Hunter <ahh@google.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1369332423-4400-1-git-send-email-ahh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:50:44 +02:00
Ingo Molnar eff2108f02 Merge branch 'perf/urgent' into perf/core
Merge in the latest fixes, to avoid conflicts with ongoing work.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:41 +02:00
Stephane Eranian f1a527899e perf/x86: Fix broken PEBS-LL support on SNB-EP/IVB-EP
This patch fixes broken support of PEBS-LL on SNB-EP/IVB-EP.
For some reason, the LDLAT extra reg definition for snb_ep
showed up as duplicate in the snb table.

This patch moves the definition of LDLAT back into the
snb_ep table.

Thanks to Don Zickus for tracking this one down.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130607212210.GA11849@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-06-19 12:44:16 +02:00
Yinghai Lu d8d386c106 x86, mtrr: Fix original mtrr range get for mtrr_cleanup
Joshua reported: Commit cd7b304dfa (x86, range: fix missing merge
during add range) broke mtrr cleanup on his setup in 3.9.5.
corresponding commit in upstream is fbe06b7bae.

  *BAD*gran_size: 64K chunk_size: 16M num_reg: 6 lose cover RAM: -0G

https://bugzilla.kernel.org/show_bug.cgi?id=59491

So it rejects new var mtrr layout.

It turns out we have some problem with initial mtrr range retrieval.
The current sequence is:
	x86_get_mtrr_mem_range
		==> bunchs of add_range_with_merge
		==> bunchs of subract_range
		==> clean_sort_range
	add_range_with_merge for [0,1M)
	sort_range()

add_range_with_merge could have blank slots, so we can not just
sort only, that will have final result have extra blank slot in head.

So move that calling add_range_with_merge for [0,1M), with that we
could avoid extra clean_sort_range calling.

Reported-by: Joshua Covington <joshuacov@googlemail.com>
Tested-by: Joshua Covington <joshuacov@googlemail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371154622-8929-2-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org> v3.9
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-06-18 11:32:02 -05:00