Граф коммитов

2870 Коммитов

Автор SHA1 Сообщение Дата
Jiri Olsa af91568e76 perf/x86/intel/uncore: Make sure only uncore events are collected
The uncore_collect_events functions assumes that event group
might contain only uncore events which is wrong, because it
might contain any type of events.

This bug leads to uncore framework touching 'not' uncore events,
which could end up all sorts of bugs.

One was triggered by Vince's perf fuzzer, when the uncore code
touched breakpoint event private event space as if it was uncore
event and caused BUG:

   BUG: unable to handle kernel paging request at ffffffff82822068
   IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250
   ...

The code in uncore_assign_events() function was looking for
event->hw.idx data while the event was initialized as a
breakpoint with different members in event->hw union.

This patch forces uncore_collect_events() to collect only uncore
events.

Reported-by: Vince Weaver <vince@deater.net>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-11 11:24:14 +01:00
Linus Torvalds 92a578b064 ACPI and power management updates for 3.19-rc1
This time we have some more new material than we used to have during
 the last couple of development cycles.
 
 The most important part of it to me is the introduction of a unified
 interface for accessing device properties provided by platform
 firmware.  It works with Device Trees and ACPI in a uniform way and
 drivers using it need not worry about where the properties come
 from as long as the platform firmware (either DT or ACPI) makes
 them available.  It covers both devices and "bare" device node
 objects without struct device representation as that turns out to
 be necessary in some cases.  This has been in the works for quite
 a few months (and development cycles) and has been approved by
 all of the relevant maintainers.
 
 On top of that, some drivers are switched over to the new interface
 (at25, leds-gpio, gpio_keys_polled) and some additional changes are
 made to the core GPIO subsystem to allow device drivers to manipulate
 GPIOs in the "canonical" way on platforms that provide GPIO information
 in their ACPI tables, but don't assign names to GPIO lines (in which
 case the driver needs to do that on the basis of what it knows about
 the device in question).  That also has been approved by the GPIO
 core maintainers and the rfkill driver is now going to use it.
 
 Second is support for hardware P-states in the intel_pstate driver.
 It uses CPUID to detect whether or not the feature is supported by
 the processor in which case it will be enabled by default.  However,
 it can be disabled entirely from the kernel command line if necessary.
 
 Next is support for a platform firmware interface based on ACPI
 operation regions used by the PMIC (Power Management Integrated
 Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
 That interface is used for manipulating power resources and for
 thermal management: sensor temperature reporting, trip point setting
 and so on.
 
 Also the ACPI core is now going to support the _DEP configuration
 information in a limited way.  Basically, _DEP it supposed to reflect
 off-the-hierarchy dependencies between devices which may be very
 indirect, like when AML for one device accesses locations in an
 operation region handled by another device's driver (usually, the
 device depended on this way is a serial bus or GPIO controller).
 The support added this time is sufficient to make the ACPI battery
 driver work on Asus T100A, but it is general enough to be able to
 cover some other use cases in the future.
 
 Finally, we have a new cpufreq driver for the Loongson1B processor.
 
 In addition to the above, there are fixes and cleanups all over the
 place as usual and a traditional ACPICA update to a recent upstream
 release.
 
 As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver
 for Intel platforms should be able to handle power management of
 the DMA engine correctly, the cpufreq-dt driver should interact
 with the thermal subsystem in a better way and the ACPI backlight
 driver should handle some more corner cases, among other things.
 
 On top of the ACPICA update there are fixes for race conditions
 in the ACPICA's interrupt handling code which might lead to some
 random and strange looking failures on some systems.
 
 In the cleanups department the most visible part is the series
 of commits targeted at getting rid of the CONFIG_PM_RUNTIME
 configuration option.  That was triggered by a discussion
 regarding the generic power domains code during which we realized
 that trying to support certain combinations of PM config options
 was painful and not really worth it, because nobody would use them
 in production anyway.  For this reason, we decided to make
 CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and that lead to the
 conclusion that the latter became redundant and CONFIG_PM could
 be used instead of it.  The material here makes that replacement
 in a major part of the tree, but there will be at least one more
 batch of that in the second part of the merge window.
 
 Specifics:
 
  - Support for retrieving device properties information from ACPI
    _DSD device configuration objects and a unified device properties
    interface for device drivers (and subsystems) on top of that.
    As stated above, this works with Device Trees and ACPI and allows
    device drivers to be written in a platform firmware (DT or ACPI)
    agnostic way.  The at25, leds-gpio and gpio_keys_polled drivers
    are now going to use this new interface and the GPIO subsystem
    is additionally modified to allow device drivers to assign names
    to GPIO resources returned by ACPI _CRS objects (in case _DSD is
    not present or does not provide the expected data).  The changes
    in this set are mostly from Mika Westerberg, Rafael J Wysocki,
    Aaron Lu, and Darren Hart with some fixes from others (Fabio Estevam,
    Geert Uytterhoeven).
 
  - Support for Hardware Managed Performance States (HWP) as described
    in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
    driver.  CPUID is used to detect whether or not the feature is
    supported by the processor.  If supported, it will be enabled
    automatically unless the intel_pstate=no_hwp switch is present in
    the kernel command line.  From Dirk Brandewie.
 
  - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
 
  - Support for firmware interface based on ACPI operation regions
    used by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
    platforms for power resource control and thermal management
    (Aaron Lu).
 
  - Limited support for retrieving off-the-hierarchy dependencies
    between devices from ACPI _DEP device configuration objects
    and deferred probing support for the ACPI battery driver based
    on the _DEP information to make that driver work on Asus T100A
    (Lan Tianyu).
 
  - New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
 
  - ACPICA update to upstream revision 20141107 which only affects
    tools (Bob Moore).
 
  - Fixes for race conditions in the ACPICA's interrupt handling
    code and in the ACPI code related to system suspend and resume
    (Lv Zheng and Rafael J Wysocki).
 
  - ACPI core fix for an RCU-related issue in the ioremap() regions
    management code that slowed down significantly after CPUs had
    been allowed to enter idle states even if they'd had RCU callbakcs
    queued and triggered some problems in certain proprietary graphics
    driver (and elsewhere).  The fix replaces synchronize_rcu() in
    that code with synchronize_rcu_expedited() which makes the issue
    go away.  From Konstantin Khlebnikov.
 
  - ACPI LPSS (Low-Power Subsystem) driver fix to handle power
    management of the DMA engine included into the LPSS correctly.
    The problem is that the DMA engine doesn't have ACPI PM support
    of its own and it simply is turned off when the last LPSS device
    having ACPI PM support goes into D3cold.  To work around that,
    the PM domain used by the ACPI LPSS driver is redesigned so at
    least one device with ACPI PM support will be on as long as the
    DMA engine is in use.  From Andy Shevchenko.
 
  - ACPI backlight driver fix to avoid using it on "Win8-compatible"
    systems where it doesn't work and where it was used by default by
    mistake (Aaron Lu).
 
  - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
    Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and
    Ashwin Chaugule (mostly related to the upcoming ARM64 support).
 
  - Intel RAPL (Running Average Power Limit) power capping driver
    fixes and improvements including new processor IDs (Jacob Pan).
 
  - Generic power domains modification to power up domains after
    attaching devices to them to meet the expectations of device
    drivers and bus types assuming devices to be accessible at
    probe time (Ulf Hansson).
 
  - Preliminary support for controlling device clocks from the
    generic power domains core code and modifications of the
    ARM/shmobile platform to use that feature (Ulf Hansson).
 
  - Assorted minor fixes and cleanups of the generic power
    domains core code (Ulf Hansson, Geert Uytterhoeven).
 
  - Assorted minor fixes and cleanups of the device clocks control
    code in the PM core (Geert Uytterhoeven, Grygorii Strashko).
 
  - Consolidation of device power management Kconfig options by making
    CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
    which is now redundant (Rafael J Wysocki and Kevin Hilman).  That
    is the first batch of the changes needed for this purpose.
 
  - Core device runtime power management support code cleanup related
    to the execution of callbacks (Andrzej Hajda).
 
  - cpuidle ARM support improvements (Lorenzo Pieralisi).
 
  - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and
    a new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
    Bartlomiej Zolnierkiewicz).
 
  - New cpufreq driver callback (->ready) to be executed when the
    cpufreq core is ready to use a given policy object and cpufreq-dt
    driver modification to use that callback for cooling device
    registration (Viresh Kumar).
 
  - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu,
    James Geboski, Tomeu Vizoso).
 
  - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
    cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
    Stefan Wahren, Petr Cvek).
 
  - OPP (Operating Performance Points) framework modification to
    allow OPPs to be removed too and update of a few cpufreq drivers
    (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
    during initialization) on driver removal (Viresh Kumar).
 
  - Hibernation core fixes and cleanups (Tina Ruchandani and
    Markus Elfring).
 
  - PM Kconfig fix related to CPU power management (Pankaj Dubey).
 
  - cpupower tool fix (Prarit Bhargava).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJUhj6JAAoJEILEb/54YlRxTM4P/j5g5SfqvY0QKsn7sR7MGZ6v
 nsgCBhJAqTw3ocNC7EAs8z9h2GWy1KbKpakKYWAh9Fs1yZoey7tFSlcv/Rgjlp70
 uU5sDQHtpE9mHKiymdsowiQuWgpl962L4k+k8hUslhlvgk1PvVbpajR6OqG8G+pD
 asuIW9eh1APNkLyXmRJ3ZPomzs0VmRdZJ0NEs0lKX9mJskqEvxPIwdaxq3iaJq9B
 Fo0J345zUDcJnxWblDRdHlOigCimglElfN5qJwaC4KpwUKuBvLRKbp4f69+wfT0c
 kYFiR29X5KjJ2kLfP/wKsLyuDCYYXRq3tCia5M1tAqOjZ+UA89H/GDftx/5lntmv
 qUlBa35VfdS1SX4HyApZitOHiLgo+It/hl8Z9bJnhyVw66NxmMQ8JYN2imb8Lhqh
 XCLR7BxLTah82AapLJuQ0ZDHPzZqMPG2veC2vAzRMYzVijict/p4Y2+qBqONltER
 4rs9uRVn+hamX33lCLg8BEN8zqlnT3rJFIgGaKjq/wXHAU/zpE9CjOrKMQcAg9+s
 t51XMNPwypHMAYyGVhEL89ImjXnXxBkLRuquhlmEpvQchIhR+mR3dLsarGn7da44
 WPIQJXzcsojXczcwwfqsJCR4I1FTFyQIW+UNh02GkDRgRovQqo+Jk762U7vQwqH+
 LBdhvVaS1VW4v+FWXEoZ
 =5dox
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management updates from Rafael Wysocki:
 "This time we have some more new material than we used to have during
  the last couple of development cycles.

  The most important part of it to me is the introduction of a unified
  interface for accessing device properties provided by platform
  firmware.  It works with Device Trees and ACPI in a uniform way and
  drivers using it need not worry about where the properties come from
  as long as the platform firmware (either DT or ACPI) makes them
  available.  It covers both devices and "bare" device node objects
  without struct device representation as that turns out to be necessary
  in some cases.  This has been in the works for quite a few months (and
  development cycles) and has been approved by all of the relevant
  maintainers.

  On top of that, some drivers are switched over to the new interface
  (at25, leds-gpio, gpio_keys_polled) and some additional changes are
  made to the core GPIO subsystem to allow device drivers to manipulate
  GPIOs in the "canonical" way on platforms that provide GPIO
  information in their ACPI tables, but don't assign names to GPIO lines
  (in which case the driver needs to do that on the basis of what it
  knows about the device in question).  That also has been approved by
  the GPIO core maintainers and the rfkill driver is now going to use
  it.

  Second is support for hardware P-states in the intel_pstate driver.
  It uses CPUID to detect whether or not the feature is supported by the
  processor in which case it will be enabled by default.  However, it
  can be disabled entirely from the kernel command line if necessary.

  Next is support for a platform firmware interface based on ACPI
  operation regions used by the PMIC (Power Management Integrated
  Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
  That interface is used for manipulating power resources and for
  thermal management: sensor temperature reporting, trip point setting
  and so on.

  Also the ACPI core is now going to support the _DEP configuration
  information in a limited way.  Basically, _DEP it supposed to reflect
  off-the-hierarchy dependencies between devices which may be very
  indirect, like when AML for one device accesses locations in an
  operation region handled by another device's driver (usually, the
  device depended on this way is a serial bus or GPIO controller).  The
  support added this time is sufficient to make the ACPI battery driver
  work on Asus T100A, but it is general enough to be able to cover some
  other use cases in the future.

  Finally, we have a new cpufreq driver for the Loongson1B processor.

  In addition to the above, there are fixes and cleanups all over the
  place as usual and a traditional ACPICA update to a recent upstream
  release.

  As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for
  Intel platforms should be able to handle power management of the DMA
  engine correctly, the cpufreq-dt driver should interact with the
  thermal subsystem in a better way and the ACPI backlight driver should
  handle some more corner cases, among other things.

  On top of the ACPICA update there are fixes for race conditions in the
  ACPICA's interrupt handling code which might lead to some random and
  strange looking failures on some systems.

  In the cleanups department the most visible part is the series of
  commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration
  option.  That was triggered by a discussion regarding the generic
  power domains code during which we realized that trying to support
  certain combinations of PM config options was painful and not really
  worth it, because nobody would use them in production anyway.  For
  this reason, we decided to make CONFIG_PM_SLEEP select
  CONFIG_PM_RUNTIME and that lead to the conclusion that the latter
  became redundant and CONFIG_PM could be used instead of it.  The
  material here makes that replacement in a major part of the tree, but
  there will be at least one more batch of that in the second part of
  the merge window.

  Specifics:

   - Support for retrieving device properties information from ACPI _DSD
     device configuration objects and a unified device properties
     interface for device drivers (and subsystems) on top of that.  As
     stated above, this works with Device Trees and ACPI and allows
     device drivers to be written in a platform firmware (DT or ACPI)
     agnostic way.  The at25, leds-gpio and gpio_keys_polled drivers are
     now going to use this new interface and the GPIO subsystem is
     additionally modified to allow device drivers to assign names to
     GPIO resources returned by ACPI _CRS objects (in case _DSD is not
     present or does not provide the expected data).  The changes in
     this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron
     Lu, and Darren Hart with some fixes from others (Fabio Estevam,
     Geert Uytterhoeven).

   - Support for Hardware Managed Performance States (HWP) as described
     in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
     driver.  CPUID is used to detect whether or not the feature is
     supported by the processor.  If supported, it will be enabled
     automatically unless the intel_pstate=no_hwp switch is present in
     the kernel command line.  From Dirk Brandewie.

   - New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).

   - Support for firmware interface based on ACPI operation regions used
     by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
     platforms for power resource control and thermal management (Aaron
     Lu).

   - Limited support for retrieving off-the-hierarchy dependencies
     between devices from ACPI _DEP device configuration objects and
     deferred probing support for the ACPI battery driver based on the
     _DEP information to make that driver work on Asus T100A (Lan
     Tianyu).

   - New cpufreq driver for the Loongson1B processor (Kelvin Cheung).

   - ACPICA update to upstream revision 20141107 which only affects
     tools (Bob Moore).

   - Fixes for race conditions in the ACPICA's interrupt handling code
     and in the ACPI code related to system suspend and resume (Lv Zheng
     and Rafael J Wysocki).

   - ACPI core fix for an RCU-related issue in the ioremap() regions
     management code that slowed down significantly after CPUs had been
     allowed to enter idle states even if they'd had RCU callbakcs
     queued and triggered some problems in certain proprietary graphics
     driver (and elsewhere).  The fix replaces synchronize_rcu() in that
     code with synchronize_rcu_expedited() which makes the issue go
     away.  From Konstantin Khlebnikov.

   - ACPI LPSS (Low-Power Subsystem) driver fix to handle power
     management of the DMA engine included into the LPSS correctly.  The
     problem is that the DMA engine doesn't have ACPI PM support of its
     own and it simply is turned off when the last LPSS device having
     ACPI PM support goes into D3cold.  To work around that, the PM
     domain used by the ACPI LPSS driver is redesigned so at least one
     device with ACPI PM support will be on as long as the DMA engine is
     in use.  From Andy Shevchenko.

   - ACPI backlight driver fix to avoid using it on "Win8-compatible"
     systems where it doesn't work and where it was used by default by
     mistake (Aaron Lu).

   - Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
     Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin
     Chaugule (mostly related to the upcoming ARM64 support).

   - Intel RAPL (Running Average Power Limit) power capping driver fixes
     and improvements including new processor IDs (Jacob Pan).

   - Generic power domains modification to power up domains after
     attaching devices to them to meet the expectations of device
     drivers and bus types assuming devices to be accessible at probe
     time (Ulf Hansson).

   - Preliminary support for controlling device clocks from the generic
     power domains core code and modifications of the ARM/shmobile
     platform to use that feature (Ulf Hansson).

   - Assorted minor fixes and cleanups of the generic power domains core
     code (Ulf Hansson, Geert Uytterhoeven).

   - Assorted minor fixes and cleanups of the device clocks control code
     in the PM core (Geert Uytterhoeven, Grygorii Strashko).

   - Consolidation of device power management Kconfig options by making
     CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
     which is now redundant (Rafael J Wysocki and Kevin Hilman).  That
     is the first batch of the changes needed for this purpose.

   - Core device runtime power management support code cleanup related
     to the execution of callbacks (Andrzej Hajda).

   - cpuidle ARM support improvements (Lorenzo Pieralisi).

   - cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a
     new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
     Bartlomiej Zolnierkiewicz).

   - New cpufreq driver callback (->ready) to be executed when the
     cpufreq core is ready to use a given policy object and cpufreq-dt
     driver modification to use that callback for cooling device
     registration (Viresh Kumar).

   - cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James
     Geboski, Tomeu Vizoso).

   - Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
     cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
     Stefan Wahren, Petr Cvek).

   - OPP (Operating Performance Points) framework modification to allow
     OPPs to be removed too and update of a few cpufreq drivers
     (cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
     during initialization) on driver removal (Viresh Kumar).

   - Hibernation core fixes and cleanups (Tina Ruchandani and Markus
     Elfring).

   - PM Kconfig fix related to CPU power management (Pankaj Dubey).

   - cpupower tool fix (Prarit Bhargava)"

* tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits)
  i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c
  dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  tools: cpupower: fix return checks for sysfs_get_idlestate_count()
  drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
  MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  leds: leds-gpio: Fix multiple instances registration without 'label' property
  iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  hwrandom / exynos / PM: Use CONFIG_PM in #ifdef
  block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
  USB / PM: Drop CONFIG_PM_RUNTIME from the USB core
  PM: Merge the SET*_RUNTIME_PM_OPS() macros
  ...
2014-12-10 21:17:00 -08:00
Linus Torvalds 3a5dc1fafb Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Reload microcode when resuming and the case when only the early
     loader has been utilized.  (Borislav Petkov)

   - Also, do not load the driver on paravirt guests.  (Boris
     Ostrovsky)"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/intel: Fish out the stashed microcode for the BSP
  x86, microcode: Reload microcode on resume
  x86, microcode: Don't initialize microcode code on paravirt
  x86, microcode, intel: Drop unused parameter
  x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
2014-12-10 15:01:43 -08:00
Linus Torvalds 3100e448e7 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 vdso updates from Ingo Molnar:
 "Various vDSO updates from Andy Lutomirski, mostly cleanups and
  reorganization to improve maintainability, but also some
  micro-optimizations and robustization changes"

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64/vsyscall: Restore orig_ax after vsyscall seccomp
  x86_64: Add a comment explaining the TASK_SIZE_MAX guard page
  x86_64,vsyscall: Make vsyscall emulation configurable
  x86_64, vsyscall: Rewrite comment and clean up headers in vsyscall code
  x86_64, vsyscall: Turn vsyscalls all the way off when vsyscall==none
  x86,vdso: Use LSL unconditionally for vgetcpu
  x86: vdso: Fix build with older gcc
  x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls
  x86_64/vdso: Remove jiffies from the vvar page
  x86/vdso: Make the PER_CPU segment 32 bits
  x86/vdso: Make the PER_CPU segment start out accessed
  x86/vdso: Change the PER_CPU segment to use struct desc_struct
  x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c
  x86_64/vsyscall: Move all of the gate_area code to vsyscall_64.c
2014-12-10 14:24:20 -08:00
Linus Torvalds c9f861c772 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The biggest change in this cycle is better support for UCNA
  (UnCorrected No Action) events:

    "Handle all uncorrected error reports in the same way (soft
     offline the page). We used to only do that for SRAO
     (software recoverable action optional) machine checks, but
     it makes sense to also do it for UCNA (UnCorrected No
     Action) logs found by CMCI or polling."

  plus various x86 MCE handling updates and fixes"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Spell "panicked" correctly
  x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
  x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
  x86, MCE, AMD: Assign interrupt handler only when bank supports it
  x86, MCE, AMD: Drop software-defined bank in error thresholding
  x86, MCE, AMD: Move invariant code out from loop body
  x86, MCE, AMD: Correct thresholding error logging
  x86, MCE, AMD: Use macros to compute bank MSRs
  RAS, HWPOISON: Fix wrong error recovery status
  GHES: Make ghes_estatus_caches static
  APEI, GHES: Cleanup unnecessary function for lockless list
2014-12-10 14:20:10 -08:00
Linus Torvalds 206f18f2ca Merge branches 'x86-build-for-linus', 'x86-cleanups-for-linus' and 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build, cleanup and defconfig updates from Ingo Molnar:
 "A single minor build change to suppress a repetitive build messages,
  misc cleanups and a defconfig update"

* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/purgatory, build: Suppress kexec-purgatory.c is up to date message

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, CPU, AMD: Move K8 TLB flush filter workaround to K8 code
  x86, espfix: Remove stale ptemask
  x86, msr: Use seek definitions instead of hard-coded values
  x86, msr: Convert printk to pr_foo()
  x86, msr: Use PTR_ERR_OR_ZERO
  x86/simplefb: Use PTR_ERR_OR_ZERO
  x86/sysfb: Use PTR_ERR_OR_ZERO
  x86, cpuid: Use PTR_ERR_OR_ZERO

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kconfig/defconfig: Enable CONFIG_FHANDLE=y
2014-12-10 12:35:46 -08:00
Linus Torvalds 9d0cf6f564 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Misc changes:

   - context switch micro-optimization
   - debug printout micro-optimization
   - comment enhancements and typo fix"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Replace seq_printf() with seq_puts()
  x86/asm: Fix typo in arch/x86/kernel/asm_offset_64.c
  sched/x86: Add a comment clarifying LDT context switching
  sched/x86_64: Don't save flags on context switch
2014-12-10 12:09:26 -08:00
Linus Torvalds 3eb5b893eb Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10 09:34:43 -08:00
Borislav Petkov 25cdb9c868 x86/microcode/intel: Fish out the stashed microcode for the BSP
I'm such a moron! The simple solution of saving the BSP patch
for use on resume was too simple (and wrong!), hint:
sizeof(struct microcode_intel).

What needs to be done instead is to fish out the microcode patch
we have stashed previously and apply that on the BSP in case the
late loader hasn't been utilized.

So do that instead.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141208110820.GB20057@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-10 11:36:28 +01:00
Linus Torvalds 5706ffd045 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events update from Ingo Molnar:
 "On the kernel side there's few changes, the one that stands out is
  PEBS machine state sampling support on x86, by Stephane Eranian.

  On the tooling side:

  User visible tooling changes:

   - Don't open the DWARF info multiple times, keeping instead a dwfl
     handle in struct dso, greatly speeding up 'perf report' on powerpc.
     (Sukadev Bhattiprolu)

   - Introduce PARSE_OPT_DISABLED option flag and use it to avoid
     showing undersired options in tools that provides frontends to
     'perf record', like sched, kvm, etc (Namhyung Kim)

   - Fallback to kallsyms when using the minimal 'ELF' loader (Arnaldo
     Carvalho de Melo)

   - Fix annotation with kcore (Adrian Hunter)

   - Support source line numbers in annotate using a hotkey (Andi Kleen)

   - Callchain improvements including:
     * Enable printing the srcline in the history
     * Make get_srcline fall back to sym+offset (Andi Kleen)

   - TUI hist_entry browser fixes, including showing missing overhead
     value for first level callchain.  Detected comparing the output of
     --stdio/--gui (that matched) with --tui, that had this problem.
     (Namhyung Kim)

   - Support handling complete branch stacks as histograms (Andi Kleen)

  Tooling infrastructure changes:

   - Prep work for supporting per-pkg and snapshot counters in 'perf
     stat' (Jiri Olsa)

   - 'perf stat' refactorings, moving stuff from it to evsel.c to use in
     per-pkg/snapshot format changes (Jiri Olsa)

   - Add per-pkg format file parsing (Matt Fleming)

   - Clean up libelf feature support code (Namhyung Kim)

   - Add gzip decompression support for kernel modules (Namhyung Kim)

   - More prep patches for Intel PT, including a a thread stack and more
     stuff made available via the database export mechanism (Adrian
     Hunter)

   - More Intel PT work, including a facility to export sample data
     (comms, threads, symbol names, etc) in a database friendly way,
     with an script to use this to create a postgresql database.
     (Adrian Hunter)

   - Make sure that thread->mg->machine points to the machine where the
     thread exists (it was being set only for the kmaps kernel modules
     case, do it as well for the mmaps) and use it to shorten function
     signatures (Arnaldo Carvalho de Melo)

  ... and lots of other fixes and smaller improvements"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
  perf report: In branch stack mode use address history sorting
  perf report: Add --branch-history option
  perf callchain: Support handling complete branch stacks as histograms
  perf stat: Add support for snapshot counters
  perf stat: Add support for per-pkg counters
  perf tools: Remove perf_evsel__read interface
  perf stat: Use read_counter in read_counter_aggr
  perf stat: Make read_counter work over the thread dimension
  perf stat: Use perf_evsel__read_cb in read_counter
  perf tools: Add snapshot format file parsing
  perf tools: Add per-pkg format file parsing
  perf evsel: Introduce perf_evsel__read_cb function
  perf evsel: Introduce perf_counts_values__scale function
  perf evsel: Introduce perf_evsel__compute_deltas function
  perf tools: Allow to force redirect pr_debug to stderr.
  perf tools: Fix segfault due to invalid kernel dso access
  perf callchain: Make get_srcline fall back to sym+offset
  perf symbols: Move bfd_demangle stubbing to its only user
  perf callchain: Enable printing the srcline in the history
  perf tools: Collapse first level callchain entry if it has sibling
  ...
2014-12-09 20:55:37 -08:00
Rafael J. Wysocki cfc75ed68b Merge branch 'pm-cpufreq'
* pm-cpufreq: (21 commits)
  intel_pstate: skip this driver if Sun server has _PPC method
  cpufreq: arm_big_little: free OPP table created during ->init()
  imx6q: free OPP table created during ->init()
  exynos5440: free OPP table created during ->init()
  cpufreq-dt: free OPP table created during ->init()
  cpufreq-dt: register cooling device from ->ready() callback
  cpufreq: Introduce ->ready() callback for cpufreq drivers
  cpufreq-dt: pass 'policy->related_cpus' to of_cpufreq_cooling_register()
  cpufreq: Fix formatting issues in 'struct cpufreq_driver'
  cpufreq: pxa2xx: Add Kconfig entry
  cpufreq: Ref the policy object sooner
  cpufreq: Kconfig: Remove architecture specific menu entries
  cpufreq: pcc: Enable autoload of pcc-cpufreq for ACPI processors
  intel_pstate: Add CPUID for BDW-H CPU
  intel_pstate: Add support for HWP
  x86: Add support for Intel HWP feature detection.
  cpufreq: respect the min/max settings from user space
  cpufreq: cpufreq-dt: Handle regulator_get_voltage() failure
  cpufreq: cpufreq-dt: Improve debug about matching OPP
  cpufreq: Loongson1: Add cpufreq driver for Loongson1B
  ...
2014-12-08 20:00:15 +01:00
Ingo Molnar 2a2662bf88 Merge branch 'perf/core-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into perf/hw_breakpoints
Pull AMD range breakpoints support from Frederic Weisbecker:

" - Extend breakpoint tools and core to support address range through perf
    event with initial backend support for AMD extended breakpoints.

    Syntax is:

           perf record -e mem:addr/len:type

    For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)

           perf record -e mem:0x1000/512:w

 - Clean up a bit breakpoint code validation

 It has been acked by Jiri and Oleg. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:50:24 +01:00
Rasmus Villemoes 3736708f03 x86: Replace seq_printf() with seq_puts()
seq_puts is a lot cheaper than seq_printf, so use that to print
literal strings.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: http://lkml.kernel.org/r/1417208622-12264-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:48:15 +01:00
Borislav Petkov c7c9b3929b x86/mce: Spell "panicked" correctly
We need the additional "k" to make it a hard-c:

  https://en.wiktionary.org/wiki/panicked

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417642605-15730-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-12-08 11:12:46 +01:00
Borislav Petkov fbae4ba8c4 x86, microcode: Reload microcode on resume
Normally, we do reapply microcode on resume. However, in the cases where
that microcode comes from the early loader and the late loader hasn't
been utilized yet, there's no easy way for us to go and apply the patch
applied during boot by the early loader.

Thus, reuse the patch stashed by the early loader for the BSP.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-06 13:03:03 +01:00
Boris Ostrovsky a18a0f6850 x86, microcode: Don't initialize microcode code on paravirt
Paravirtual guests are not expected to load microcode into processors
and therefore it is not necessary to initialize microcode loading
logic.

In fact, under certain circumstances initializing this logic may cause
the guest to crash. Specifically, 32-bit kernels use __pa_nodebug()
macro which does not work in Xen (the code path that leads to this macro
happens during resume when we call mc_bp_resume()->load_ucode_ap()
->check_loader_disabled_ap())

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1417469264-31470-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-06 12:59:03 +01:00
Borislav Petkov 47768626c6 x86, microcode, intel: Drop unused parameter
apply_microcode_early() doesn't use mc_saved_data, kill it.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-06 12:58:56 +01:00
Jacob Shin d6d55f0b9d perf/x86/amd: AMD support for bp_len > HW_BREAKPOINT_LEN_8
Implement hardware breakpoint address mask for AMD Family 16h and
above processors. CPUID feature bit indicates hardware support for
DRn_ADDR_MASK MSRs. These masks further qualify DRn/DR7 hardware
breakpoint addresses to allow matching of larger addresses ranges.

Valuable advice and pseudo code from Oleg Nesterov <oleg@redhat.com>

Signed-off-by: Jacob Shin <jacob.w.shin@gmail.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: xiakaixu <xiakaixu@huawei.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-12-03 15:14:26 +01:00
Borislav Petkov 2ef84b3bb9 x86, microcode, AMD: Do not use smp_processor_id() in preemtible context
Hand down the cpu number instead, otherwise lockdep screams when doing

echo 1 > /sys/devices/system/cpu/microcode/reload.

BUG: using smp_processor_id() in preemptible [00000000] code: amd64-microcode/2470
caller is debug_smp_processor_id+0x12/0x20
CPU: 1 PID: 2470 Comm: amd64-microcode Not tainted 3.18.0-rc6+ #26
...

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417428741-4501-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-01 11:51:05 +01:00
Borislav Petkov 02ecc41abc x86, microcode: Limit the microcode reloading to 64-bit for now
First, there was this: https://bugzilla.kernel.org/show_bug.cgi?id=88001

The problem there was that microcode patches are not being reapplied
after suspend-to-ram. It was important to reapply them, though, because
of for example Haswell's TSX erratum which disabled TSX instructions
with a microcode patch.

A simple fix was fb86b97300 ("x86, microcode: Update BSPs microcode
on resume") but, as it is often the case, simple fixes are too
simple. This one causes 32-bit resume to fail:

https://bugzilla.kernel.org/show_bug.cgi?id=88391

Properly fixing this would require more involved changes for which it
is too late now, right before the merge window. Thus, limit this to
64-bit only temporarily.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1417353999-32236-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-01 10:55:08 +01:00
Rafael J. Wysocki 8a497cfdc0 Merge back earlier cpufreq material for 3.19-rc1. 2014-12-01 02:46:24 +01:00
Linus Torvalds c6c9161d06 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "Misc fixes:
   - gold linker build fix
   - noxsave command line parsing fix
   - bugfix for NX setup
   - microcode resume path bug fix
   - _TIF_NOHZ versus TIF_NOHZ bugfix as discussed in the mysterious
     lockup thread"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, syscall: Fix _TIF_NOHZ handling in syscall_trace_enter_phase1
  x86, kaslr: Handle Gold linker for finding bss/brk
  x86, mm: Set NX across entire PMD at boot
  x86, microcode: Update BSPs microcode on resume
  x86: Require exact match for 'noxsave' command line option
2014-11-21 15:46:17 -08:00
Linus Torvalds 13f5004c94 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes: two Intel uncore driver fixes, a CPU-hotplug fix and a
  build dependencies fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
  perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
  perf: Fix corruption of sibling list with hotplug
  perf/x86: Fix embarrasing typo
2014-11-21 15:44:07 -08:00
Chen Yucong fa92c58694 x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll
Uncorrected no action required (UCNA) - is a uncorrected recoverable
machine check error that is not signaled via a machine check exception
and, instead, is reported to system software as a corrected machine
check error. UCNA errors indicate that some data in the system is
corrupted, but the data has not been consumed and the processor state
is valid and you may continue execution on this processor. UCNA errors
require no action from system software to continue execution. Note that
UCNA errors are supported by the processor only when IA32_MCG_CAP[24]
(MCG_SER_P) is set.
                                               -- Intel SDM Volume 3B

Deferred errors are errors that cannot be corrected by hardware, but
do not cause an immediate interruption in program flow, loss of data
integrity, or corruption of processor state. These errors indicate
that data has been corrupted but not consumed. Hardware writes information
to the status and address registers in the corresponding bank that
identifies the source of the error if deferred errors are enabled for
logging. Deferred errors are not reported via machine check exceptions;
they can be seen by polling the MCi_STATUS registers.
                                                -- AMD64 APM Volume 2

Above two items, both UCNA and Deferred errors belong to detected
errors, but they can't be corrected by hardware, and this is very
similar to Software Recoverable Action Optional (SRAO) errors.
Therefore, we can take some actions that have been used for handling
SRAO errors to handle UCNA and Deferred errors.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:56:51 -08:00
Chen Yucong e3480271f5 x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error
Until now, the mce_severity mechanism can only identify the severity
of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter
out DEFERRED error for AMD platform.

This patch extends the mce_severity mechanism for handling
UCNA/DEFERRED error. In order to do this, the patch introduces a new
severity level - MCE_UCNA/DEFERRED_SEVERITY.

In addition, mce_severity is specific to machine check exception,
and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity
mechanism in non-exception context, the patch also introduces a new
argument (is_excp) for mce_severity. `is_excp' is used to explicitly
specify the calling context of mce_severity.

Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-19 10:55:43 -08:00
Borislav Petkov fb86b97300 x86, microcode: Update BSPs microcode on resume
In the situation when we apply early microcode but do *not* apply late
microcode, we fail to update the BSP's microcode on resume because we
haven't initialized the uci->mc microcode pointer. So, in order to
alleviate that, we go and dig out the stashed microcode patch during
early boot. It is basically the same thing that is done on the APs early
during boot so do that too here.

Tested-by: alex.schnaidt@gmail.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88001
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20141118094657.GA6635@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 18:32:24 +01:00
Rafael J. Wysocki bd2a0f6754 Merge back cpufreq material for 3.19-rc1. 2014-11-18 01:22:29 +01:00
Dave Hansen 6ba48ff46f x86: Remove arbitrary instruction size limit in instruction decoder
The current x86 instruction decoder steps along through the
instruction stream but always ensures that it never steps farther
than the largest possible instruction size (MAX_INSN_SIZE).

The MPX code is now going to be doing some decoding of userspace
instructions.  We copy those from userspace in to the kernel and
they're obviously completely untrusted coming from userspace.  In
addition to the constraint that instructions can only be so long,
we also have to be aware of how long the buffer is that came in
from userspace.  This _looks_ to be similar to what the perf and
kprobes is doing, but it's unclear to me whether they are
affected.

The whole reason we need this is that it is perfectly valid to be
executing an instruction within MAX_INSN_SIZE bytes of an
unreadable page. We should be able to gracefully handle short
reads in those cases.

This adds support to the decoder to record how long the buffer
being decoded is and to refuse to "validate" the instruction if
we would have gone over the end of the buffer to decode it.

The kprobes code probably needs to be looked at here a bit more
carefully.  This patch still respects the MAX_INSN_SIZE limit
there but the kprobes code does look like it might be able to
be a bit more strict than it currently is.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: x86@kernel.org
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20141114153957.E6B01535@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:52 +01:00
Dave Hansen 2cd3949f70 x86: Require exact match for 'noxsave' command line option
We have some very similarly named command-line options:

arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup);
arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup);

__setup() is designed to match options that take arguments, like
"foo=bar" where you would have:

	__setup("foo", x86_foo_func...);

The problem is that "noxsave" actually _matches_ "noxsaves" in
the same way that "foo" matches "foo=bar".  If you boot an old
kernel that does not know about "noxsaves" with "noxsaves" on the
command line, it will interpret the argument as "noxsave", which
is not what you want at all.

This makes the "noxsave" handler only return success when it finds
an *exact* match.

[ tglx: We really need to make __setup() more robust. ]

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: x86@kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-16 12:13:16 +01:00
Stephane Eranian aea48559ac perf/x86: Add support for sampling PEBS machine state registers
PEBS can capture machine state regs at retiremnt of the sampled
instructions. When precise sampling is enabled on an event, PEBS
is used, so substitute the interrupted state with the PEBS state.
Note that not all registers are captured by PEBS. Those missing
are replaced by the interrupt state counter-parts.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411559322-16548-3-git-send-email-eranian@google.com
Cc: cebbert.lkml@gmail.com
Cc: jolsa@redhat.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:58 +01:00
Andi Kleen af4bdcf675 perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events
Disallow setting inv/cmask/etc. flags for all PEBS events
on these CPUs, except for the UOPS_RETIRED.* events on Nehalem/Westmere,
which are needed for cycles:p. This avoids an undefined situation
strongly discouraged by the Intle SDM. The PLD_* events were already
covered. This follows the earlier changes for Sandy Bridge and alter.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:56 +01:00
Andi Kleen 0dbc94796d perf/x86/intel: Use INTEL_FLAGS_UEVENT_CONSTRAINT for PRECDIST
My earlier commit:

  86a04461a9 ("perf/x86: Revamp PEBS event selection")

made nearly all PEBS on Sandy/IvyBridge/Haswell to reject non zero flags.

However this wasn't done for the INST_RETIRED.PREC_DIST event
because no suitable macro existed. Now that we have
INTEL_FLAGS_UEVENT_CONSTRAINT enforce zero flags for
INST_RETIRED.PREC_DIST too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411569288-5627-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:55 +01:00
Andi Kleen 7550ddffe4 perf/x86: Add INTEL_FLAGS_UEVENT_CONSTRAINT
Add a FLAGS_UEVENT_CONSTRAINT macro that allows us to
match on event+umask, and in additional all flags.

This is needed to ensure the INV and CMASK fields
are zero for specific events, as this can cause undefined
behavior.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1411569288-5627-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:54 +01:00
Andi Kleen c0737ce453 perf/x86/intel/uncore: Add scaling units to the EP iMC events
Add scaling to MB/s to the memory controller read/write
events for Sandy/IvyBridge/Haswell-EP similar to how the client
does. This makes the events easier to use from the
standard perf tool.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:41:52 +01:00
Ingo Molnar f108c898dd Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 11:31:17 +01:00
Andi Kleen 68055915c1 perf/x86/intel/uncore: Fix boot crash on SBOX PMU on Haswell-EP
There were several reports that on some systems writing the SBOX0 PMU
initialization MSR would #GP at boot. This did not happen on all
systems -- my two test systems booted fine.

Writing the three initialization bits bit-by-bit seems to avoid the
problem. So add a special callback to do just that.

This replaces an earlier patch that disabled the SBOX.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Reported-and-Tested-by: Patrick Lu <patrick.lu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-4-git-send-email-andi@firstfloor.org
[ Fixed a whitespace error and added attribution tags that were left out inexplicably. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 09:53:36 +01:00
Andi Kleen 41a134a583 perf/x86/intel/uncore: Fix IRP uncore register offsets on Haswell EP
The counter register offsets for the IRP box PMU for Haswell-EP
were incorrect. The offsets actually changed over IvyBridge EP.

Fix them to the correct values. For this we need to fork the read
function from the IVB and use an own counter array.

Tested-by: patrick.lu@intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415062828-19759-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 09:45:47 +01:00
Aravind Gopalakrishnan 904cb3677f perf/x86/amd/ibs: Update IBS MSRs and feature definitions
New Fam15h models carry extra feature bits and extend
the MSR register space for IBS ops. Adding them here.

While at it, add functionality to read IbsBrTarget and
OpData4 depending on their availability if user wants a
PERF_SAMPLE_RAW.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <paulus@samba.org>
Cc: <acme@kernel.org>
Link: http://lkml.kernel.org/r/1415651066-13523-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-12 15:12:32 +01:00
Dirk Brandewie 7787388772 x86: Add support for Intel HWP feature detection.
Add support of Hardware Managed Performance States (HWP) described in Volume 3
section 14.4 of the SDM.

One bit CPUID.06H:EAX[bit 7] expresses the presence of the HWP feature on
the processor. The remaining bits CPUID.06H:EAX[bit 8-11] denote the
presense of various HWP features.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-12 00:04:37 +01:00
Borislav Petkov 6f9b63a0ae x86, CPU, AMD: Move K8 TLB flush filter workaround to K8 code
This belongs with the rest of the code in init_amd_k8() which gets
executed on family 0xf.

Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-11 17:58:20 +01:00
Borislav Petkov c0a717f23d x86, microcode, AMD: Fix ucode patch stashing on 32-bit
Save the patch while we're running on the BSP instead of later, before
the initrd has been jettisoned. More importantly, on 32-bit we need to
access the physical address instead of the virtual.

This way we actually do find it on the APs instead of having to go
through the initrd each time.

Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 5335ba5cf4 ("x86, microcode, AMD: Fix early ucode loading")
Cc: <stable@vger.kernel.org> # v3.13+
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-10 13:50:55 +01:00
Sudeep Holla 5aaba36318 cpumask: factor out show_cpumap into separate helper function
Many sysfs *_show function use cpu{list,mask}_scnprintf to copy cpumap
to the buffer aligned to PAGE_SIZE, append '\n' and '\0' to return null
terminated buffer with newline.

This patch creates a new helper function cpumap_print_to_pagebuf in
cpumask.h using newly added bitmap_print_to_pagebuf and consolidates
most of those sysfs functions using the new helper function.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: x86@kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-07 11:45:00 -08:00
Borislav Petkov 85be07c324 x86, microcode: Fix accessing dis_ucode_ldr on 32-bit
We should be accessing it through a pointer, like on the BSP.

Tested-by: Richard Hendershot <rshendershot@mchsi.com>
Fixes: 65cef1311d ("x86, microcode: Add a disable chicken bit")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-05 17:28:06 +01:00
Andy Lutomirski e76b027e64 x86,vdso: Use LSL unconditionally for vgetcpu
LSL is faster than RDTSCP and works everywhere; there's no need to
switch between them depending on CPU.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Andi Kleen <andi@firstfloor.org>
Link: http://lkml.kernel.org/r/72f73d5ec4514e02bba345b9759177ef03742efb.1414706021.git.luto@amacapital.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-03 13:41:53 +01:00
Borislav Petkov 4750a0d112 x86, microcode, AMD: Fix early ucode loading on 32-bit
Konrad triggered the following splat below in a 32-bit guest on an AMD
box. As it turns out, in save_microcode_in_initrd_amd() we're using the
*physical* address of the container *after* we have enabled paging and
thus we #PF in load_microcode_amd() when trying to access the microcode
container in the ramdisk range.

Because the ramdisk is exactly there:

[    0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff]

and we fault at 0x35e04304.

And since this guest doesn't relocate the ramdisk, we don't do the
computation which will give us the correct virtual address and we end up
with the PA.

So, we should actually be using virtual addresses on 32-bit too by the
time we're freeing the initrd. Do that then!

Unpacking initramfs...
BUG: unable to handle kernel paging request at 35d4e304
IP: [<c042e905>] load_microcode_amd+0x25/0x4a0
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.1-302.fc21.i686 #1
Hardware name: Xen HVM domU, BIOS 4.4.1 10/01/2014
task: f5098000 ti: f50d0000 task.ti: f50d0000
EIP: 0060:[<c042e905>] EFLAGS: 00010246 CPU: 0
EIP is at load_microcode_amd+0x25/0x4a0
EAX: 00000000 EBX: f6e9ec4c ECX: 00001ec4 EDX: 00000000
ESI: f5d4e000 EDI: 35d4e2fc EBP: f50d1ed0 ESP: f50d1e94
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 35d4e304 CR3: 00e33000 CR4: 000406d0
Stack:
 00000000 00000000 f50d1ebc f50d1ec4 f5d4e000 c0d7735a f50d1ed0 15a3d17f
 f50d1ec4 00600f20 00001ec4 bfb83203 f6e9ec4c f5d4e000 c0d7735a f50d1ed8
 c0d80861 f50d1ee0 c0d80429 f50d1ef0 c0d889a9 f5d4e000 c0000000 f50d1f04
Call Trace:
? unpack_to_rootfs
? unpack_to_rootfs
save_microcode_in_initrd_amd
save_microcode_in_initrd
free_initrd_mem
populate_rootfs
? unpack_to_rootfs
do_one_initcall
? unpack_to_rootfs
? repair_env_string
? proc_mkdir
kernel_init_freeable
kernel_init
ret_from_kernel_thread
? rest_init

Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
References: https://bugzilla.redhat.com/show_bug.cgi?id=1158204
Fixes: 75a1ba5b2c ("x86, microcode, AMD: Unify valid container checks")
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v3.14+
Link: http://lkml.kernel.org/r/20141101100100.GA4462@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-01 20:24:21 +01:00
Chen Yucong 8dcf32ea22 x86, MCE, AMD: Assign interrupt handler only when bank supports it
There are some AMD CPU models which have thresholding banks but which
cannot generate a thresholding interrupt. This is denoted by the bit
MCi_MISC[IntP]. Make sure to check that bit before assigning the
thresholding interrupt handler.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
[ Boris: save an indentation level and rewrite commit message. ]
Link: http://lkml.kernel.org/r/1412662128.28440.18.camel@debian
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-11-01 11:28:23 +01:00
Linus Torvalds 19e0d5f16a Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Fixes from all around the place:

   - hyper-V 32-bit PAE guest kernel fix
   - two IRQ allocation fixes on certain x86 boards
   - intel-mid boot crash fix
   - intel-quark quirk
   - /proc/interrupts duplicate irq chip name fix
   - cma boot crash fix
   - syscall audit fix
   - boot crash fix with certain TSC configurations (seen on Qemu)
   - smpboot.c build warning fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
  ACPI, irq, x86: Return IRQ instead of GSI in mp_register_gsi()
  x86, intel-mid: Create IRQs for APB timers and RTC timers
  x86: Don't enable F00F workaround on Intel Quark processors
  x86/irq: Fix XT-PIC-XT-PIC in /proc/interrupts
  x86, cma: Reserve DMA contiguous area after initmem_init()
  i386/audit: stop scribbling on the stack frame
  x86, apic: Handle a bad TSC more gracefully
  x86: ACPI: Do not translate GSI number if IOAPIC is disabled
  x86/smpboot: Move data structure to its primary usage scope
2014-10-31 14:30:16 -07:00
Ingo Molnar 1776b10627 perf/x86/intel: Revert incomplete and undocumented Broadwell client support
These patches:

  86a349a28b ("perf/x86/intel: Add Broadwell core support")
  c46e665f03 ("perf/x86: Add INST_RETIRED.ALL workarounds")
  fdda3c4aac ("perf/x86/intel: Use Broadwell cache event list for Haswell")

introduced magic constants and unexplained changes:

  https://lkml.org/lkml/2014/10/28/1128
  https://lkml.org/lkml/2014/10/27/325
  https://lkml.org/lkml/2014/8/27/546
  https://lkml.org/lkml/2014/10/28/546

Peter Zijlstra has attempted to help out, to clean up the mess:

  https://lkml.org/lkml/2014/10/28/543

But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.

Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.

That kind of behavior is not acceptable.

There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:

  e735b9db12 ("perf/x86/intel/uncore: Add Haswell-EP uncore support")

  https://lkml.org/lkml/2014/10/22/730

Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.

As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.

( If the Haswell-EP crash is not resolved via a simple fix then we'll have
  to revert the Haswell-EP uncore driver as well. )

The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29 11:07:58 +01:00
Dave Jones d4e1a0af1d x86: Don't enable F00F workaround on Intel Quark processors
The Intel Quark processor is a part of family 5, but does not have the
F00F bug present in Pentiums of the same family.

Pentiums were models 0 through 8, Quark is model 9.

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Link: http://lkml.kernel.org/r/20141028175753.GA12743@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-29 08:52:09 +01:00
Peter Zijlstra 7fb0f1de49 perf/x86: Fix compile warnings for intel_uncore
The uncore drivers require PCI and generate compile time warnings when
!CONFIG_PCI.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:51:03 +01:00
Peter Zijlstra (Intel) 65d71fe137 perf: Fix bogus kernel printk
Andy spotted the fail in what was intended as a conditional printk level.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Fixes: cc6cd47e73 ("perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141007124757.GH19379@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-28 10:51:01 +01:00
Borislav Petkov a3a529d104 x86, MCE, AMD: Drop software-defined bank in error thresholding
Aravind had the good question about why we're assigning a
software-defined bank when reporting error thresholding errors instead
of simply using the bank which reports the last error causing the
overflow.

Digging through git history, it pointed to

9526866439 ("[PATCH] x86_64: mce_amd support for family 0x10 processors")

which added that functionality. The problem with this, however, is that
tools don't know about software-defined banks and get puzzled. So drop
that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the
thresholding interrupt.

Save us a couple of MSR reads while at it.

Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: https://lkml.kernel.org/r/5435B206.60402@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-21 22:28:48 +02:00
Chen Yucong 69b9575835 x86, MCE, AMD: Move invariant code out from loop body
Assigning to mce_threshold_vector is loop-invariant code in
mce_amd_feature_init(). So do it only once, out of loop body.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Link: http://lkml.kernel.org/r/1412263212.8085.6.camel@debian
[ Boris: commit message corrections. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-21 22:12:56 +02:00
Chen Yucong 44612a3ac6 x86, MCE, AMD: Correct thresholding error logging
mce_setup() does not gather the content of IA32_MCG_STATUS, so it
should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS
to avoid that mce_log() logs the processed threshold event again
at next time.

But we do the logging ourselves and machine_check_poll() is completely
useless there. So kill it.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-21 22:12:22 +02:00
Chen Yucong 4b737d78a8 x86, MCE, AMD: Use macros to compute bank MSRs
Avoid open coded calculations for bank MSRs by hiding the index
of higher bank MSRs in well-defined macros.

No semantic changes.

Signed-off-by: Chen Yucong <slaoub@gmail.com>
Link: http://lkml.kernel.org/r/1411438561-24319-1-git-send-email-slaoub@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-10-21 22:07:24 +02:00
Linus Torvalds 0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00
Linus Torvalds dfe2c6dcc8 Merge branch 'akpm' (patches from Andrew Morton)
Merge second patch-bomb from Andrew Morton:
 - a few hotfixes
 - drivers/dma updates
 - MAINTAINERS updates
 - Quite a lot of lib/ updates
 - checkpatch updates
 - binfmt updates
 - autofs4
 - drivers/rtc/
 - various small tweaks to less used filesystems
 - ipc/ updates
 - kernel/watchdog.c changes

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
  mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
  kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h>
  ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
  frv: remove unused declarations of __start___ex_table and __stop___ex_table
  kvm: ensure hard lockup detection is disabled by default
  kernel/watchdog.c: control hard lockup detection default
  staging: rtl8192u: use %*pEn to escape buffer
  staging: rtl8192e: use %*pEn to escape buffer
  staging: wlan-ng: use %*pEhp to print SN
  lib80211: remove unused print_ssid()
  wireless: hostap: proc: print properly escaped SSID
  wireless: ipw2x00: print SSID via %*pE
  wireless: libertas: print esaped string via %*pE
  lib/vsprintf: add %*pE[achnops] format specifier
  lib / string_helpers: introduce string_escape_mem()
  lib / string_helpers: refactoring the test suite
  lib / string_helpers: move documentation to c-file
  include/linux: remove strict_strto* definitions
  arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
  fs: check bh blocknr earlier when searching lru
  ...
2014-10-14 03:54:50 +02:00
Linus Torvalds 77654908ff Merge branches 'x86-ras-for-linus', 'x86-uv-for-linus' and 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 ras, uv and vdso fixlets from Ingo Molnar:
 "ras: tone down a kernel message to only occur during initial bootup,
    not during suspend/resume cycles.

  uv: a cleanup commit

  vdso: a fix to error checking"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Avoid showing repetitive message from intel_init_thermal()

* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/uv: Remove unnecessary #ifdef

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Fix vdso2c's special_pages[] error checking
2014-10-14 02:31:22 +02:00
Linus Torvalds 2fd7476de9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc smaller fixes that missed the v3.17 cycle"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/build: Add arch/x86/purgatory/ make generated files to gitignore
  x86: Fix section conflict for numachip
  x86: Reject x32 executables if x32 ABI not supported
  x86_64, entry: Filter RFLAGS.NT on entry from userspace
  x86, boot, kaslr: Fix nuisance warning on 32-bit builds
2014-10-14 02:28:16 +02:00
Linus Torvalds f1bfbd984b Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes in this tree are:

   - fix and update Intel Quark [Galileo] SoC platform support

   - update IOSF chipset side band interface and make it available via
     debugfs

   - enable HPETs on Soekris net6501 and other e6xx based systems"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()
  x86: Quark: Comment setup_arch() to document TLB/PGE bug
  x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
  x86/platform/intel/iosf: Add debugfs config option for IOSF
  x86/platform/intel/iosf: Add better description of IOSF driver in config
  x86/platform/intel/iosf: Add Braswell PCI ID
  x86/platform/pmc_atom: Fix warning when CONFIG_DEBUG_FS=n
  x86: HPET force enable for e6xx based systems
  x86/iosf: Add debugfs support
  x86/iosf: Add Kconfig prompt for IOSF_MBI selection
2014-10-14 02:23:55 +02:00
Linus Torvalds e3438330f5 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Ingo Molnar:
 "Misc smaller cleanups"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, intel: Fix total_size computation
  x86, microcode, intel: Rename apply_microcode and declare it static
  x86, microcode, intel: Fix typos
  x86, microcode, intel: Add missing static declarations
  x86, microcode, amd: Fix missing static declaration
2014-10-14 02:21:51 +02:00
Linus Torvalds 708d0b41a2 Merge branch 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Ingo Molnar:
 "This tree includes the following changes:

   - Introduce DISABLED_MASK to list disabled CPU features, to simplify
     CPU feature handling and avoid excessive #ifdefs

   - Remove the lightly used cpu_has_pae() primitive"

* 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Add more disabled features
  x86: Introduce disabled-features
  x86: Axe the lightly-used cpu_has_pae
2014-10-14 02:19:47 +02:00
Andrew Morton e48510f451 arch/x86/kernel/cpu/common.c: fix unused symbol warning
x86_64 allnoconfig:

arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:23 +02:00
Linus Torvalds 19e00d593e Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 bootup updates from Ingo Molnar:
 "The changes in this cycle were:

   - Fix rare SMP-boot hang (mostly in virtual environments)

   - Fix build warning with certain (rare) toolchains"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/relocs: Make per_cpu_load_addr static
  x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
2014-10-13 18:16:32 +02:00
Linus Torvalds 9d9420f120 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side updates:

   - Fix and enhance poll support (Jiri Olsa)

   - Re-enable inheritance optimization (Jiri Olsa)

   - Enhance Intel memory events support (Stephane Eranian)

   - Refactor the Intel uncore driver to be more maintainable (Zheng
     Yan)

   - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra,
     Andi Kleen)

   - [ plus various smaller fixes/cleanups ]

  User visible tooling updates:

   - Add +field argument support for --field option, so that one can add
     fields to the default list of fields to show, ie now one can just
     do:

         perf report --fields +pid

     And the pid will appear in addition to the default fields (Jiri
     Olsa)

   - Add +field argument support for --sort option (Jiri Olsa)

   - Honour -w in the report tools (report, top), allowing to specify
     the widths for the histogram entries columns (Namhyung Kim)

   - Properly show submicrosecond times in 'perf kvm stat' (Christian
     Borntraeger)

   - Add beautifier for mremap flags param in 'trace' (Alex Snast)

   - perf script: Allow callchains if any event samples them

   - Don't truncate Intel style addresses in 'annotate' (Alex Converse)

   - Allow profiling when kptr_restrict == 1 for non root users, kernel
     samples will just remain unresolved (Andi Kleen)

   - Allow configuring default options for callchains in config file
     (Namhyung Kim)

   - Support operations for shared futexes.  (Davidlohr Bueso)

   - "perf kvm stat report" improvements by Alexander Yarygin:
       -  Save pid string in opts.target.pid
       -  Enable the target.system_wide flag
       -  Unify the title bar output

   - [ plus lots of other fixes and small improvements.  ]

  Tooling infrastructure changes:

   - Refactor unit and scale function parameters for PMU parsing
     routines (Matt Fleming)

   - Improve DSO long names lookup with rbtree, resulting in great
     speedup for workloads with lots of DSOs (Waiman Long)

   - We were not handling POLLHUP notifications for event file
     descriptors

     Fix it by filtering entries in the events file descriptor array
     after poll() returns, refcounting mmaps so that when the last fd
     pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho
     de Melo)

   - Intel PT prep work, from Adrian Hunter, including:
       - Let a user specify a PMU event without any config terms
       - Add perf-with-kcore script
       - Let default config be defined for a PMU
       - Add perf_pmu__scan_file()
       - Add a 'perf test' for tracking with sched_switch
       - Add 'flush' callback to scripting API

   - Use ring buffer consume method to look like other tools (Arnaldo
     Carvalho de Melo)

   - hists browser (used in top and report) refactorings, getting rid of
     unused variables and reducing source code size by handling similar
     cases in a fewer functions (Namhyung Kim).

   - Replace thread unsafe strerror() with strerror_r() accross the
     whole tools/perf/ tree (Masami Hiramatsu)

   - Rename ordered_samples to ordered_events and allow setting a queue
     size for ordering events (Jiri Olsa)

   - [ plus lots of fixes, cleanups and other improvements ]"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits)
  perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
  perf/x86/intel/uncore: Fix minor race in box set up
  perf record: Fix error message for --filter option not coming after tracepoint
  perf tools: Fix build breakage on arm64 targets
  perf symbols: Improve DSO long names lookup speed with rbtree
  perf symbols: Encapsulate dsos list head into struct dsos
  perf bench futex: Sanitize -q option in requeue
  perf bench futex: Support operations for shared futexes
  perf trace: Fix mmap return address truncation to 32-bit
  perf tools: Refactor unit and scale function parameters
  perf tools: Fix line number in the config file error message
  perf tools: Convert {record,top}.call-graph option to call-graph.record-mode
  perf tools: Introduce perf_callchain_config()
  perf callchain: Move some parser functions to callchain.c
  perf tools: Move callchain config from record_opts to callchain_param
  perf hists browser: Fix callchain print bug on TUI
  perf tools: Use ACCESS_ONCE() instead of volatile cast
  perf tools: Modify error code for when perf_session__new() fails
  perf tools: Fix perf record as non root with kptr_restrict == 1
  perf stat: Fix --per-core on multi socket systems
  ...
2014-10-13 15:58:15 +02:00
Linus Torvalds e4e65676f2 Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
 
 - s390 moves closer towards host large page support
 
 - PowerPC has improved support for debugging (both inside the guest and
   via gdbstub) and support for e6500 processors
 
 - ARM/ARM64 support read-only memory (which is necessary to put firmware
   in emulated NOR flash)
 
 - x86 has the usual emulator fixes and nested virtualization improvements
   (including improved Windows support on Intel and Jailhouse hypervisor
   support on AMD), adaptive PLE which helps overcommitting of huge guests.
   Also included are some patches that make KVM more friendly to memory
   hot-unplug, and fixes for rare caching bugs.
 
 Two patches have trivial mm/ parts that were acked by Rik and Andrew.
 
 Note: I will soon switch to a subkey for signing purposes.  To verify
 future signed pull requests from me, please update my key with
 "gpg --recv-keys 9B4D86F2".  You should see 3 new subkeys---the
 one for signing will be a 2048-bit RSA key, 4E6B09D7.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL
 WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq
 lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY
 x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK
 7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV
 sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU
 YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91
 YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam
 ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd
 CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy
 z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG
 BgEfS2x6jRc5zB3hjwDr
 =iEVi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Fixes and features for 3.18.

  Apart from the usual cleanups, here is the summary of new features:

   - s390 moves closer towards host large page support

   - PowerPC has improved support for debugging (both inside the guest
     and via gdbstub) and support for e6500 processors

   - ARM/ARM64 support read-only memory (which is necessary to put
     firmware in emulated NOR flash)

   - x86 has the usual emulator fixes and nested virtualization
     improvements (including improved Windows support on Intel and
     Jailhouse hypervisor support on AMD), adaptive PLE which helps
     overcommitting of huge guests.  Also included are some patches that
     make KVM more friendly to memory hot-unplug, and fixes for rare
     caching bugs.

  Two patches have trivial mm/ parts that were acked by Rik and Andrew.

  Note: I will soon switch to a subkey for signing purposes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
  kvm: do not handle APIC access page if in-kernel irqchip is not in use
  KVM: s390: count vcpu wakeups in stat.halt_wakeup
  KVM: s390/facilities: allow TOD-CLOCK steering facility bit
  KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
  arm/arm64: KVM: Report correct FSC for unsupported fault types
  arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
  kvm: Fix kvm_get_page_retry_io __gup retval check
  arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
  kvm: x86: Unpin and remove kvm_arch->apic_access_page
  kvm: vmx: Implement set_apic_access_page_addr
  kvm: x86: Add request bit to reload APIC access page address
  kvm: Add arch specific mmu notifier for page invalidation
  kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
  kvm: Fix page ageing bugs
  kvm/x86/mmu: Pass gfn and level to rmapp callback.
  x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
  kvm: x86: use macros to compute bank MSRs
  KVM: x86: Remove debug assertion of non-PAE reserved bits
  kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
  kvm: Faults which trigger IO release the mmap_sem
  ...
2014-10-08 05:27:39 -04:00
Bryan O'Donoghue aece118e48 x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()
Intel processors which don't report cache information via cpuid(2)
or cpuid(4) need quirk code in the legacy_cache_size callback to
report this data. For Intel that callback is is intel_size_cache().

This patch enables calling of cpu_detect_cache_sizes() inside of
init_intel() and hence the calling of the legacy_cache callback in
intel_size_cache(). Adding this call will ensure that PIII Tualatin
currently in intel_size_cache() and Quark SoC X1000 being added to
intel_size_cache() in this patch will report their respective cache
sizes.

This model of calling cpu_detect_cache_sizes() is consistent with
AMD/Via/Cirix/Transmeta and Centaur.

Also added is a string to idenitfy the Quark as Quark SoC X1000
giving better and more descriptive output via /proc/cpuinfo

Adding cpu_detect_cache_sizes to init_intel() will enable calling
of intel_size_cache() on Intel processors which currently no code
can reach. Therefore this patch will also re-enable reporting
of PIII Tualatin cache size information as well as add
Quark SoC X1000 support.

Comment text and cache flow logic suggested by Thomas Gleixner

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: davej@redhat.com
Cc: hmh@hmh.eng.br
Link: http://lkml.kernel.org/r/1412641189-12415-3-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-08 10:07:46 +02:00
Andy Lutomirski 8c7aa698ba x86_64, entry: Filter RFLAGS.NT on entry from userspace
The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still set NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen?  Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault.  That
segfault cannot be handled, because signal delivery fails, too.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.

To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it.  As a result,
it seems to have very little effect on SYSENTER performance on my
machine.

There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.

Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

I haven't touched anything on 32-bit kernels.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Note to stable maintainers: there is no known security issue here.
A misguided program can set NT and cause the kernel to try and fail
to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275

Cc: <stable@vger.kernel.org>
Reported-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-10-06 10:53:26 -07:00
Wei Huang cc6cd47e73 perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
PMU checking can fail due to various reasons. On native machine, this
is mostly caused by faulty hardware and it is reasonable to use
KERN_ERR in reporting. However, when kernel is running on virtualized
environment, this checking can fail if virtual PMU is not supported
(e.g. KVM on AMD host). It is annoying to see an error message on
splash screen, even though we know such failure is benign on
virtualized environment.

This patch checks if the kernel is running in a virtualized environment.
If so, it will use KERN_INFO in reporting, which reduces the syslog
priority of them. This patch was tested successfully on KVM.

Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411617314-24659-1-git-send-email-wei@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 06:04:41 +02:00
Andi Kleen 4f971248bc perf/x86/intel/uncore: Fix minor race in box set up
I was looking for the trinity oops cause in the uncore driver.
(so far didn't found it)

However I found this tiny race: when a box is set up two threads on the
same CPU, they may be setting up the box in parallel (e.g. with kernel
preemption). This could lead to the reference count being increasing
too much. Always recheck there is no existing cpu reference inside the lock.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411424826-15629-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03 06:02:49 +02:00
Bryan O'Donoghue ee1b5b165c x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
Quark x1000 advertises PGE via the standard CPUID method
PGE bits exist in Quark X1000's PTEs. In order to flush
an individual PTE it is necessary to reload CR3 irrespective
of the PTE.PGE bit.

See Quark Core_DevMan_001.pdf section 6.4.11

This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to
crash and burn on this platform.

Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 15:06:15 +02:00
Stephane Eranian 521e8bac67 perf/x86/intel/uncore: Update support for client uncore IMC PMU
This patch restructures the memory controller (IMC) uncore PMU support
for client SNB/IVB/HSW processors. The main change is that it can now
cope with more than one PCI device ID per processor model. There are
many flavors of memory controllers for each processor. They have
different PCI device ID, yet they behave the same w.r.t. the memory
controller PMU that we are interested in.

The patch now supports two distinct memory controllers for IVB
processors: one for mobile, one for desktop.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140917090616.GA11281@quad
Cc: ak@linux.intel.com
Cc: kan.liang@intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:25 +02:00
Andi Kleen b10fc1c3e3 perf/x86/intel/uncore: Fix PCU filter setup for Sandy/Ivy/Haswell EP
The PCU frequency band filters use 8 bit each in a register.
When setting up the value the shift value was not correctly
scaled, which resulted in all filters except for band 0 to
be zero. Fix the scaling.

This allows to correctly monitor multiple uncore frequency bands.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:24 +02:00
Andi Kleen 7e96ae1a89 perf/x86/intel/uncore: Add missing cbox filter flags on IvyBridge-EP uncore driver
The IvyBridge-EP uncore driver was missing three filter flags:
NC, ISOC, C6 which are useful in some cases. Support them in the same way
as the Haswell EP driver, by allowing to set them and exposing
them in the sysfs formats.

Also fix a typo in a define.

Relies on the Haswell EP driver to be applied earlier.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1409872109-31645-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:23 +02:00
Yan, Zheng 513d793e5f perf/x86/intel/uncore: Register the PMU only if the uncore pci device exists
Current code registers PMUs for all possible uncore pci devices.
This is not good because, on some machines, one or more uncore pci
devices can be missing. The missing pci device make corresponding
PMU unusable. Register the PMU only if the uncore device exists.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:22 +02:00
Yan, Zheng e735b9db12 perf/x86/intel/uncore: Add Haswell-EP uncore support
The uncore subsystem in Haswell-EP is similar to Sandy/Ivy
Bridge-EP. There are some differences in config register
encoding and pci device IDs. The Haswell-EP uncore also
supports a few new events. Add the Haswell-EP driver to
the snbep split driver.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
[ Add missing break. Add imc events. Add cbox nc/isoc/c6. ]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409872109-31645-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:21 +02:00
Andi Kleen fdda3c4aac perf/x86/intel: Use Broadwell cache event list for Haswell
Use the newly added Broadwell cache event list for Haswell too.
All Haswell and Broadwell events and offcore masks used in these lists
are identical.

However Haswell is very different from the Sandy Bridge
list that was used previously. That fixes a wide range of mis-counting
cache events.

The node events are now only for retired memory events, so prefetching
and speculative memory accesses are not included. They are PEBS
capable now, which makes it much easier to sample for them, plus it's
possible to create address maps with -d.

The prefetch events are gone now. They way the hardware counts
them is very misleading (some prefetches included, others not), so
it seemed best to leave them out.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-5-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:20 +02:00
Andi Kleen c46e665f03 perf/x86: Add INST_RETIRED.ALL workarounds
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.

Add a new callback to enforce this, and set it for Broadwell.

This is erratum BDM57 and BDM11.

How does this handle the case when an app requests a specific
period with some of the bottom bits set

The apps thinks it is sampling at X occurences per sample, when it is
in fact at X - 63 (worst case).

Short answer:

Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.

Long answer:

<write up by Peter:>

IFF we guarantee perf_event_attr::sample_period >= 128.

Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.

So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.

So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.

So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1409683455-29168-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:19 +02:00
Andi Kleen 86a349a28b perf/x86/intel: Add Broadwell core support
Add Broadwell support for Broadwell Client to perf.  This is very
similar to Haswell.  It uses a new cache event table, because there
were various changes there.

The constraint list has one new event that needs to be handled over
Haswell.

The PEBS event list is the same, so we reuse Haswell's.

[fengguang.wu: make intel_bdw_event_constraints[] static]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:18 +02:00
Andi Kleen d86c8eaf95 perf/x86/intel: Document all Haswell models
Add names for each Haswell model as requested by Peter.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1409683455-29168-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:16 +02:00
Andi Kleen b76146851e perf/x86/intel: Remove incorrect model number from Haswell perf
71 is a Broadwell, not a Haswell. The model number was added
by mistake earlier.

Remove it for now, until it can be re-added later with
real Broadwell support.

In practice it does not cause a lot of issues because the Broadwell
PMU is very similar to Haswell, but some details were wrong,
and it's better to handle it correctly.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1409683455-29168-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24 14:48:15 +02:00
Paolo Bonzini c1118b3602 x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
In that case, KVM will fail to patch VMCALL instructions to VMMCALL
as required on AMD processors.

The failure mode is currently a divide-by-zero exception, which obviously
is a KVM bug that has to be fixed.  However, picking the right instruction
between VMCALL and VMMCALL will be faster and will help if you cannot upgrade
the hypervisor.

Reported-by: Chris Webb <chris@arachsys.com>
Tested-by: Chris Webb <chris@arachsys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24 14:07:57 +02:00
Rakib Mullick d286c3af48 x86/mce: Avoid showing repetitive message from intel_init_thermal()
intel_init_thermal() is called from a) at the time of system initializing
and b) at the time of system resume to initialize thermal
monitoring.

In case when thermal monitoring is handled by SMI, we get to know it via
printk(). Currently it gives the message at both cases, but its okay if
we get it only once and no need to get the same message at every time
system resumes.

So, limit showing this message only at system boot time by avoid showing
at system resume and reduce abusing kernel log buffer.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1411068135.5121.10.camel@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19 12:56:05 +02:00
Igor Mammedov ce4b1b1650 x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here:

  https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.

To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP. If AP doesn't respond in 10
seconds, the master CPU will timeout and cancel
AP onlining.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1403266991-12233-1-git-send-email-imammedo@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-16 11:11:32 +02:00
Dave Hansen 9298b815ef x86: Add more disabled features
The original motivation for these patches was for an Intel CPU
feature called MPX.  The patch to add a disabled feature for it
will go in with the other parts of the support.

But, in the meantime, there are a few other features than MPX
that we can make assumptions about at compile-time based on
compile options.  Add them to disabled-features.h and check them
with cpu_feature_enabled().

Note that this gets rid of the last things that needed an #ifdef
CONFIG_X86_64 in cpufeature.h.  Yay!

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-11 14:30:17 -07:00
Andi Kleen a08b6769d4 perf/x86: Fix section mismatch in split uncore driver
The new split Intel uncore driver code that recently went
into tip added a section mismatch, which the build process
complains about.

uncore_pmu_register() can be called from uncore_pci_probe,()
which is not __init and can be called from pci driver ->probe.
I'm not fully sure if it's actually possible to call the probe
function later, but it seems safer to mark uncore_pmu_register
not __init.

This also fixes the warning.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1409332858-29039-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-09 06:53:08 +02:00
Mathias Krause 066ce64c7e perf/x86/intel: Mark initialization code as such
A few of the initialization functions are missing the __init annotation.
Fix this and thereby allow ~680 additional bytes of code to be released
after initialization.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1409071785-26015-1-git-send-email-minipli@googlemail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-09 06:53:06 +02:00
Christoph Lameter 89cbc76768 x86: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:49 -04:00
Ingo Molnar 83bc90e115 Merge branch 'linus' into perf/core, to fix conflicts
Conflicts:
	arch/x86/kernel/cpu/perf_event_intel_uncore*.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-24 22:32:24 +02:00
Ingo Molnar 44afe60294 A bunch of cleanups from Henrique.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT9z1RAAoJEBLB8Bhh3lVKA0UP/0sCiH8JtSaP+qdvuU5xtldM
 L1O76MrQ1De6BO1XqjucsxClbVPPxBE3nBGzxfSoIMdil0A8PmHmJj6/5GILAosd
 kX7LxoFVOFOQg+hekXkcd4t4lmiTPcmfgzywOnLykB9Iv4Bn12RpPhGjkgywHhGC
 6Rvh9cawqJb2rAui2zseK9RAHHhTasXqo7vWAm5kQPFE98fV6gp4Pcf/iJb8Rd09
 7Rp5f5vSvWkmCKx3nCzNGGkZgvwVD+sJb9jQJ6QFaoPYmNa0Te3mJ5nl32aNrsct
 5s6BPEimHJc4lt5IUMc1MRhg2DNrdmcUxYaonOraeG59XRtSxH9OvOSCS291qd/m
 lZOADMjxD/d0mvpY937UOOiZ2yC0RDNx2OQDnQTEWYtGJnzqq1myVVEAo6bkPYhd
 2JrqtElUY9o2iO2yaFge9/Po4rAtvHlgzKcdldSUOk6xLkj1FpWZK+PqO5vqKets
 KltJQupwUMgXCQ9jCF+Y4mdyuncRm/GhhCYsiWr16Xjp/znX6hS6i1DK9j70uim3
 7Uu09fdlP7V3dFiX2cuKwH2tQ+jcAIOoZjtQKwhSLOrva/BWqSOFbX9W3UxQH0X9
 5JVxTn6Rutr6pn7rm9zG82Xq+UueQMEzE1kklwjmmiNI2044PCvuYtWvfTY4pJ71
 oW09NvqBAlMMaWDJ92ps
 =brfb
 -----END PGP SIGNATURE-----

Merge tag 'microcode_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/microcode

Pull x86/microcode updates from Borislav Petkov:

   "A bunch of cleanups from Henrique."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-24 11:27:42 +02:00
Josh Triplett 9def39be4e x86: Support compiling out human-friendly processor feature names
The table mapping CPUID bits to human-readable strings takes up a
non-trivial amount of space, and only exists to support /proc/cpuinfo
and a couple of kernel messages.  Since programs depend on the format of
/proc/cpuinfo, force inclusion of the table when building with /proc
support; otherwise, support omitting that table to save space, in which
case the kernel messages will print features numerically instead.

In addition to saving 1408 bytes out of vmlinux, this also saves 1373
bytes out of the uncompressed setup code, which contributes directly to
the size of bzImage.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-17 15:54:00 -07:00
Josh Triplett 39f838e06f x86: Drop support for /proc files when !CONFIG_PROC_FS
arch/x86/kernel/cpu/proc.c only exists to support files in /proc; omit that
file when compiling without CONFIG_PROC_FS.

Saves 645 additional bytes on 32-bit x86 when !CONFIG_PROC_FS:

add/remove: 0/5 grow/shrink: 0/0 up/down: 0/-645 (-645)
function                                     old     new   delta
c_stop                                         1       -      -1
c_next                                        11       -     -11
cpuinfo_op                                    16       -     -16
c_start                                       24       -     -24
show_cpuinfo                                 593       -    -593

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-17 15:20:37 -07:00
Linus Torvalds a11c5c9ef6 PCI changes for the v3.17 merge window (part 2):
Miscellaneous
     - Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJT7PyAAAoJEFmIoMA60/r8kjQQALr/8oEfZoVcjgCb7waWOr25
 hUTnrI6GBIAh/50hoBiPq0ouPCAKVv66+CUhuhFkLP7oJz+rMU0B9hfUvdLfmCpH
 7ppaallkllT9nPFIr7h5RUWLXsoQyuHmCYmSrUCcnlT2LPgU0dN72YWElLisEM6Z
 Pldg3933xyIQaCWviHjGEjWb7NvC+JY4pTkV5iyqGgU8Ale/eFYtLLSfdBEjIbGv
 VDirYZmKELYeuncZPrTAsp4IENRMZn702wwDakMSODVMEWtJB5h4yrBawqQDlFP5
 9ztIX6n9p9zkdVKbYZlx/Xwv6SYEnYXLxauVQMSO3Nck7Z10R5Ud+5uuCg/6mWH8
 AQI4UV5bbJcg7zHgocTG9XLFLFPoPtD2JT6k6UT1LeUAiAOqcSzhRO+/qJBmJOWZ
 Zv+EHXPlxBrl0zNifut6ZQrY17teuItVtmha70a/9W3PjnIx3KecqLcTwdTvDsOY
 IAyH8WMZrBKpPpsczSmfE93i2Z1QRS91HEAOeSMxl/98dcDTdllYZS7spjoDll2f
 xmpGDbpriLSCu2XsGHfTC9RbqA7CyuFlHggJSQDkT/5Esli0sCs7eweTuK3RVvPu
 t6bUHK3yElb6x9qMZhb5q6l72wSMlGMishTdaxEHmqrEA8PtaIFodmVX2T/Zel5n
 GHN6bysPqDItNR2v/3JX
 =jJGu
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull DEFINE_PCI_DEVICE_TABLE removal from Bjorn Helgaas:
 "Part two of the PCI changes for v3.17:

    - Remove DEFINE_PCI_DEVICE_TABLE macro use (Benoit Taine)

  It's a mechanical change that removes uses of the
  DEFINE_PCI_DEVICE_TABLE macro.  I waited until later in the merge
  window to reduce conflicts, but it's possible you'll still see a few"

* tag 'pci-v3.17-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use
2014-08-14 18:10:33 -06:00
Linus Torvalds 7453f33b2e Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/xsave changes from Peter Anvin:
 "This is a patchset to support the XSAVES instruction required to
  support context switch of supervisor-only features in upcoming
  silicon.

  This patchset missed the 3.16 merge window, which is why it is based
  on 3.15-rc7"

* 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, xsave: Add forgotten inline annotation
  x86/xsaves: Clean up code in xstate offsets computation in xsave area
  x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)
  Define kernel API to get address of each state in xsave area
  x86/xsaves: Enable xsaves/xrstors
  x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf
  x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time
  x86/xsaves: Add xsaves and xrstors support for booting time
  x86/xsaves: Clear reserved bits in xsave header
  x86/xsaves: Use xsave/xrstor for saving and restoring user space context
  x86/xsaves: Use xsaves/xrstors for context switch
  x86/xsaves: Use xsaves/xrstors to save and restore xsave area
  x86/xsaves: Define a macro for handling xsave/xrstor instruction fault
  x86/xsaves: Define macros for xsave instructions
  x86/xsaves: Change compacted format xsave area header
  x86/alternative: Add alternative_input_2 to support alternative with two features and input
  x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
2014-08-13 18:20:04 -06:00
Peter Zijlstra ddcd0973fe perf/x86/uncore: Rename IvyTown to IvyBridge-EP
Keeping track of all the various CPU names is hard enough; adding extra
silly names for no reason is just not helping. If we know the base arch
name (IvyBridge) then we can do the client/server parts with the well
known {,EP,EX} postfixes, no need to remember endless amounts of
unrelated and pointless names for this.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8559jke61dsyr7d0i74iutli@git.kernel.org
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-13 07:51:18 +02:00
Stephane Eranian 85a16ef66c perf/x86/uncore: Export basic memory events for IVT IMC PMU
This patch exposes two basic events for Ivytown IMC uncore PMU:

- cas_count_read: number of full-cache line reads to memory controller
- cas_count_write: number of full-cache line writes to memory controller

Those events use the same encoding as for SNB-EP, so reuse the same
event table. See specification in:

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/xeon-e5-2600-v2-uncore-manual.pdf

By aggregating all the read and write events from all the memory controllers
of each processor socket, one can determine the total memory bandwidth utilization.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140812060031.GA25239@quad
Cc: zheng.z.yan@intel.com
Cc: ak@linux.intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-13 07:51:17 +02:00
Stephane Eranian c8aab2e04a perf/x86: Clean up __intel_pmu_pebs_event() code
This patch makes the code more readable. It also renames
precise_store_data_hsw() to precise_datala_hsw() because
the function is called for both loads and stores on HSW.
The patch also gets rid of the hardcoded store events
codes in that same function.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1407785233-32193-5-git-send-email-eranian@google.com
Cc: ak@linux.intel.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-13 07:51:16 +02:00
Stephane Eranian 770eee1fd3 perf/x86: Fix data source encoding issues for load latency/precise store
This patch fixes issues introuduce by Andi's previous patch 'Revamp PEBS'
series.

This patch fixes the following:

 - precise_store_data_hsw() encode the mem op type whenever we can
 - precise_store_data_hsw set the default data source correctly

 - 0 is not a valid init value for data source. Define PERF_MEM_NA as the
   default value

This bug was actually introduced by

    commit 722e76e60f
    Author: Stephane Eranian <eranian@google.com>
    Date:   Thu May 15 17:56:44 2014 +0200

        fix Haswell precise store data source encoding

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1407785233-32193-4-git-send-email-eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: ak@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-13 07:51:15 +02:00
Andi Kleen f3908b8cfb perf/x86: Don't mark DataLA addresses as store
Haswell supports reporting the data address for a range
of PEBS events, including:

	UOPS_RETIRED.ALL
	MEM_UOPS_RETIRED.STLB_MISS_LOADS
	MEM_UOPS_RETIRED.STLB_MISS_STORES
	MEM_UOPS_RETIRED.LOCK_LOADS
	MEM_UOPS_RETIRED.SPLIT_LOADS
	MEM_UOPS_RETIRED.SPLIT_STORES
	MEM_UOPS_RETIRED.ALL_LOADS
	MEM_UOPS_RETIRED.ALL_STORES
	MEM_LOAD_UOPS_RETIRED.L1_HIT
	MEM_LOAD_UOPS_RETIRED.L2_HIT
	MEM_LOAD_UOPS_RETIRED.L3_HIT
	MEM_LOAD_UOPS_RETIRED.L1_MISS
	MEM_LOAD_UOPS_RETIRED.L2_MISS
	MEM_LOAD_UOPS_RETIRED.L3_MISS
	MEM_LOAD_UOPS_RETIRED.HIT_LFB
	MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS
	MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT
	MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM
	MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_NONE
	MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM

This facility was already enabled earlier with the original Haswell
perf changes.

However these addresses were always reports as stores by perf, which is wrong,
as they could be loads too.  The hardware does not distinguish loads and stores
for these instructions, so there's no (cheap) way for the profiler
to find out.

Change the type to PERF_MEM_OP_NA instead.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1407785233-32193-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-13 07:51:14 +02:00
Andi Kleen 86a04461a9 perf/x86: Revamp PEBS event selection
The basic idea is that it does not make sense to list all PEBS
events individually. The list is very long, sometimes outdated
and the hardware doesn't need it. If an event does not support
PEBS it will just not count, there is no security issue.

We need to only list events that something special, like
supporting load or store addresses.

This vastly simplifies the PEBS event selection. It also
speeds up the scheduling because the scheduler doesn't
have to walk as many constraints.

Bugs fixed:

 - We do not allow setting forbidden flags with PEBS anymore
   (SDM 18.9.4), except for the special cycle event.
   This is done using a new constraint macro that also
   matches on the event flags.

 - Correct DataLA and load/store/na flags reporting on Haswell
   [Requires a followon patch]

 - We did not allow all PEBS events on Haswell:
   We were missing some valid subevents in d1-d2 (MEM_LOAD_UOPS_RETIRED.*,
   MEM_LOAD_UOPS_RETIRED_L3_HIT_RETIRED.*)

This includes the changes proposed by Stephane earlier and obsoletes
his patchkit (except for some changes on pre Sandy Bridge/Silvermont
CPUs)

I only did Sandy Bridge and Silvermont and later so far, mostly because these
are the parts I could directly confirm the hardware behavior with hardware
architects. Also I do not believe the older CPUs have any
missing events in their PEBS list, so there's no pressing
need to change them.

I did not implement the flag proposed by Peter to allow
setting forbidden flags. If really needed this could
be implemented on to of this patch.

v2: Fix broken store events on SNB/IVB (Stephane Eranian)
v3: More fixes. Rename some arguments (Stephane Eranian)
v4: List most Haswell events individually again to report
memory operation type correctly.
Add new flags to describe load/store/na for datala.
Update description.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1407785233-32193-2-git-send-email-eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
Cc: Mark Davies <junk@eslaf.co.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-13 07:51:13 +02:00