Граф коммитов

71641 Коммитов

Автор SHA1 Сообщение Дата
Michael S. Tsirkin 9c6ab1931f if_tun: drop broken IFF_VNET_LE
Everyone should use TUNSETVNETLE/TUNGETVNETLE instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-16 11:19:42 -05:00
Michael S. Tsirkin 5eea84f478 if_tun: add TUNSETVNETLE/TUNGETVNETLE
ifreq flags field is only 16 bit wide, so setting IFF_VNET_LE there has
no effect:
doesn't fit in two bytes.

The tests passed apparently because they have an even number of bugs,
all cancelling out.

Luckily we didn't release a kernel with this flag, so it's
not too late to fix this.

Add TUNSETVNETLE/TUNGETVNETLE to really achieve the purpose
of IFF_VNET_LE.

This has an added benefit that if we ever want a BE flag,
we won't have to deal with weird configurations like
setting both LE and BE at the same time.

IFF_VNET_LE will be dropped in a follow-up patch.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-16 11:19:41 -05:00
Jiang Liu e89900c9ad x86, irq: Introduce helper to check whether an IOAPIC has been registered
Introduce acpi_ioapic_registered() to check whether an IOAPIC has already
been registered, it will be used when enabling IOAPIC hotplug.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1414387308-27148-18-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Jiang Liu cffe0a2b5a x86, irq: Keep balance of IOAPIC pin reference count
To keep balance of IOAPIC pin reference count, we need to protect
pirq_enable_irq(), acpi_pci_irq_enable() and intel_mid_pci_irq_enable()
from reentrance. There are two cases which will cause reentrance.

The first case is caused by suspend/hibernation. If pcibios_disable_irq
is called during suspending/hibernating, we don't release the assigned
IRQ number, otherwise it may break the suspend/hibernation. So late when
pcibios_enable_irq is called during resume, we shouldn't allocate IRQ
number again.

The second case is that function acpi_pci_irq_enable() may be called
twice for PCI devices present at boot time as below:
1) pci_acpi_init()
	--> acpi_pci_irq_enable() if pci_routeirq is true
2) pci_enable_device()
	--> pcibios_enable_device()
		--> acpi_pci_irq_enable()
We can't kill kernel parameter pci_routeirq yet because it's still
needed for debugging purpose.

So flag irq_managed is introduced to track whether IRQ number is
assigned by OS and to protect pirq_enable_irq(), acpi_pci_irq_enable()
and intel_mid_pci_irq_enable() from reentrance.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/1414387308-27148-13-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 14:08:15 +01:00
Thomas Gleixner 8ab7913675 Merge branch 'x86/vt-d' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu into x86/apic-picks
Required to apply Jiangs x86 irq handling rework without creating a
nightmare of conflicts.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-12-16 12:25:38 +01:00
Linus Torvalds 2dbfca5a18 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds
Pull LED subsystem update from Bryan Wu:
 "We got some cleanup and driver for LP8860 as well as some patches for
  LED Flash Class"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
  leds: lp8860: Fix module dependency
  leds: lp8860: Introduce TI lp8860 4 channel LED driver
  leds: Add support for setting brightness in a synchronous way
  leds: implement sysfs interface locking mechanism
  leds: syscon: handle multiple syscon instances
  leds: delete copy/paste mistake
  leds: regulator: Convert to devm_regulator_get_exclusive
2014-12-15 18:28:25 -08:00
Haggai Eran 7bdf65d411 IB/mlx5: Handle page faults
This patch implement a page fault handler (leaving the pages pinned as
of time being).  The page fault handler handles initiator and responder
page faults for UD/RC transports, for send/receive operations, as well
as RDMA read/write initiator support.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:03 -08:00
Haggai Eran 6aec21f6a8 IB/mlx5: Page faults handling infrastructure
* Refactor MR registration and cleanup, and fix reg_pages accounting.
* Create a work queue to handle page fault events in a kthread context.
* Register a fault handler to get events from the core for each QP.

The registered fault handler is empty in this patch, and only a later
patch implements it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:03 -08:00
Haggai Eran 832a6b06ab IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
The new function allows updating the page tables of a memory region
after it was created. This can be used to handle page faults and page
invalidations.

Since mlx5_ib_update_mtt will need to work from within page invalidation,
so it must not block on memory allocation. It employs an atomic memory
allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails.

In order to reuse code from mlx5_ib_populate_pas, the patch splits
this function and add the needed parameters.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:02 -08:00
Haggai Eran cc149f751b IB/mlx5: Changes in memory region creation to support on-demand paging
This patch wraps together several changes needed for on-demand paging support
in the mlx5_ib_populate_pas function, and when registering memory regions.

* Instead of accepting a UMR bit telling the function to enable all
  access flags, the function now accepts the access flags themselves.
* For on-demand paging memory regions, fill the memory tables from the
  correct list, and enable/disable the access flags per-page according
  to whether the page is present.
* A new bit is set to enable writing of access flags when using the
  firmware create_mkey command.
* Disable contig pages when on-demand paging is enabled.

In addition the patch changes the UMR code to use PTR_ALIGN instead of
our own macro.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:19:02 -08:00
Haggai Eran e420f0c0f3 mlx5_core: Add support for page faults events and low level handling
* Add a handler function pointer in the mlx5_core_qp struct for page
  fault events. Handle page fault events by calling the handler
  function, if not NULL.
* Add on-demand paging capability query command.
* Export command for resuming QPs after page faults.
* Add various constants related to paging support.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:18:59 -08:00
Roland Dreier 6cb7ff3dcf mlx5_core: Re-add MLX5_DEV_CAP_FLAG_ON_DMND_PG flag
In commit 0c7aac854f ("net/mlx5_core: Remove unused dev cap enum
fields"), the flag MLX5_DEV_CAP_FLAG_ON_DMND_PG was removed.
Unfortunately the on-demand paging changes actually use it, so re-add
the missing flag.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:18:58 -08:00
Haggai Eran 882214e2b1 IB/core: Implement support for MMU notifiers regarding on demand paging regions
* Add an interval tree implementation for ODP umems. Create an
  interval tree for each ucontext (including a count of the number of
  ODP MRs in this context, semaphore, etc.), and register ODP umems in
  the interval tree.
* Add MMU notifiers handling functions, using the interval tree to
  notify only the relevant umems and underlying MRs.
* Register to receive MMU notifier events from the MM subsystem upon
  ODP MR registration (and unregister accordingly).
* Add a completion object to synchronize the destruction of ODP umems.
* Add mechanism to abort page faults when there's a concurrent invalidation.

The way we synchronize between concurrent invalidations and page
faults is by keeping a counter of currently running invalidations, and
a sequence number that is incremented whenever an invalidation is
caught. The page fault code checks the counter and also verifies that
the sequence number hasn't progressed before it updates the umem's
page tables. This is similar to what the kvm module does.

In order to prevent the case where we register a umem in the middle of
an ongoing notifier, we also keep a per ucontext counter of the total
number of active mmu notifiers. We only enable new umems when all the
running notifiers complete.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yuval Dagan <yuvalda@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:36 -08:00
Shachar Raindel 8ada2c1c0c IB/core: Add support for on demand paging regions
* Extend the umem struct to keep the ODP related data.
* Allocate and initialize the ODP related information in the umem
  (page_list, dma_list) and freeing as needed in the end of the run.
* Store a reference to the process PID struct in the ucontext.  Used to
  safely obtain the task_struct and the mm during fault handling,
  without preventing the task destruction if needed.
* Add 2 helper functions: ib_umem_odp_map_dma_pages and
  ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
  of specific pages of the umem (and, currently, pin them).
* Support for page faults only - IB core will keep the reference on
  the pages used and call put_page when freeing an ODP umem
  area. Invalidations support will be added in a later patch.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:36 -08:00
Sagi Grimberg 860f10a799 IB/core: Add flags for on demand paging support
* Add a configuration option for enable on-demand paging support in
  the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a
  later patch, this configuration option will select the MMU_NOTIFIER
  configuration option to enable mmu notifiers.
* Add a flag for on demand paging (ODP) support in the IB device capabilities.
* Add a flag to request ODP MR in the access flags to reg_mr.
* Fail registrations done with the ODP flag when the low-level driver
  doesn't support this.
* Change the conditions in which an MR will be writable to explicitly
  specify the access flags.  This is to avoid making an MR writable just
  because it is an ODP MR.
* Add a ODP capabilities to the extended query device verb.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Eli Cohen 5a77abf9a9 IB/core: Add support for extended query device caps
Add extensible query device capabilities verb to allow adding new features.
ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to
copy capability fields to be used by both ib_uverbs_query_device and
ib_uverbs_ex_query_device.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran c1395a2a8c IB/mlx5: Add function to read WQE from user-space
Add a helper function mlx5_ib_read_user_wqe to read information from
user-space owned work queues.  The function will be used in a later
patch by the page-fault handling code in mlx5_ib.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>

[ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran c5d76f130b IB/core: Add umem function to read data from user-space
In some drivers there's a need to read data from a user space area
that was pinned using ib_umem when running from a different process
context.

The ib_umem_copy_from function allows reading data from the physical
pages pinned in the ib_umem struct.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran 406f9e5fa9 IB/core: Replace ib_umem's offset field with a full address
In order to allow umems that do not pin memory, we need the umem to
keep track of its region's address.

This makes the offset field redundant, and so this patch removes it.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Haggai Eran 968e78dd96 IB/mlx5: Enhance UMR support to allow partial page table update
The current UMR interface doesn't allow partial updates to a memory
region's page tables. This patch changes the interface to allow that.

It also changes the way the UMR operation validates the memory
region's state.  When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR
operation to fail if the MKEY is in the free state. When it is
unchecked the operation will check that it isn't in the free state.

Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:13:35 -08:00
Linus Torvalds dab363f938 Staging patches for 3.19-rc1
Here's the big staging tree pull request for 3.19-rc1.
 
 We continued to delete more lines than were added, always a good thing,
 but not at a huge rate this release, only about 70k lines removed
 overall mostly from removing the horrid bcm driver.
 
 Lots of normal staging driver cleanups and fixes all over the place,
 well over a thousand of them, the shortlog shows all the horrid details.
 
 The "contentious" thing here is the movement of the Android binder code
 out of staging into the "real" part of the kernel.  This is code that
 has been stable for a few years now and is working as-is in the tens of
 millions of devices with no issues.  Yes, the code is horrid, and the
 userspace api leaves a lot to be desired, but it's not going to change
 due to legacy issues that we have no control over.  Because so many
 devices and companies rely on this, and the code is stable, might as
 well promote it out of staging.
 
 This was all discussed at the Linux Plumbers conference, and everyone
 participating agreed that this was the best way forward.
 
 There is work happening to replace the binder code with something new
 that is happening right now, but I don't expect to see the results of
 that work for another year at the earliest.  If that ever happens, and
 Android switches over to it, I'll gladly remove this version.
 
 As for maintainers, I'll be glad to maintain this code, I've been doing
 it for the past few years with no problems.  I'll send a MAINTAINERS
 entry for it before 3.19-final is out, still need to talk to the Google
 developers about if they are willing to help with it or not, last I
 checked they were, which was good.
 
 All of these patches have been in linux-next for a while with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSPICkACgkQMUfUDdst+yksdwCfSLE9VUy1o2sAPDRe+J3bQced
 EWEAoL3RtnejKbo5tHS2IT69pLrwiIDS
 =YXyM
 -----END PGP SIGNATURE-----

Merge tag 'staging-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver updates from Greg KH:
 "Here's the big staging tree pull request for 3.19-rc1.

  We continued to delete more lines than were added, always a good
  thing, but not at a huge rate this release, only about 70k lines
  removed overall mostly from removing the horrid bcm driver.

  Lots of normal staging driver cleanups and fixes all over the place,
  well over a thousand of them, the shortlog shows all the horrid
  details.

  The "contentious" thing here is the movement of the Android binder
  code out of staging into the "real" part of the kernel.  This is code
  that has been stable for a few years now and is working as-is in the
  tens of millions of devices with no issues.  Yes, the code is horrid,
  and the userspace api leaves a lot to be desired, but it's not going
  to change due to legacy issues that we have no control over.  Because
  so many devices and companies rely on this, and the code is stable,
  might as well promote it out of staging.

  This was all discussed at the Linux Plumbers conference, and everyone
  participating agreed that this was the best way forward.

  There is work happening to replace the binder code with something new
  that is happening right now, but I don't expect to see the results of
  that work for another year at the earliest.  If that ever happens, and
  Android switches over to it, I'll gladly remove this version.

  As for maintainers, I'll be glad to maintain this code, I've been
  doing it for the past few years with no problems.  I'll send a
  MAINTAINERS entry for it before 3.19-final is out, still need to talk
  to the Google developers about if they are willing to help with it or
  not, last I checked they were, which was good.

  All of these patches have been in linux-next for a while with no
  reported issues"

* tag 'staging-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1382 commits)
  Staging: slicoss: Fix long line issues in slicoss.c
  staging: rtl8712: remove unnecessary else after return
  staging: comedi: change some printk calls to pr_err
  staging: rtl8723au: hal: Removed the extra semicolon
  lustre: Deletion of unnecessary checks before three function calls
  staging: lustre: fix sparse warnings: static function declaration
  staging: lustre: fixed sparse warnings related to static declarations
  staging: unisys: remove duplicate header
  staging: unisys: remove unneeded structure
  staging: ft1000 : replace __attribute ((__packed__) with __packed
  drivers: staging: rtl8192e: Include "asm/unaligned.h" instead of "access_ok.h" in "rtl819x_BAProc.c"
  Drivers:staging:rtl8192e: Fixed checkpatch warning
  Drivers:staging:clocking-wizard: Added a newline
  staging: clocking-wizard: check for a valid clk_name pointer
  staging: rtl8723au: Hal_InitPGData() avoid unnecessary typecasts
  staging: rtl8723au: _DisableAnalog(): Avoid zero-init variables unnecessarily
  staging: rtl8723au: Remove unnecessary wrapper _ResetDigitalProcedure1()
  staging: rtl8723au: _ResetDigitalProcedure1_92C() reduce code obfuscation
  staging: rtl8723au: Remove unnecessary wrapper _DisableRFAFEAndResetBB()
  staging: rtl8723au: _DisableRFAFEAndResetBB8192C(): Reduce code obfuscation
  ...
2014-12-15 18:06:13 -08:00
Linus Torvalds 60d7ef3fd3 Merge branch 'irq-irqdomain-arm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq domain ARM updates from Thomas Gleixner:
 "This set of changes make use of hierarchical irqdomains to provide:

   - MSI/ITS support for GICv3
   - MSI support for GICv2m
   - Interrupt polarity extender for GICv1

  Marc has come more cleanups for the existing extension hooks of GIC in
  the pipeline, but they are going to be 3.20 material"

* 'irq-irqdomain-arm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  irqchip: gicv3-its: Fix ITT allocation
  irqchip: gicv3-its: Move some alloc/free code to activate/deactivate
  irqchip: gicv3-its: Fix domain free in multi-MSI case
  irqchip: gic: Remove warning by including linux/irqdomain.h
  irqchip: gic-v2m: Add DT bindings for GICv2m
  irqchip: gic-v2m: Add support for ARM GICv2m MSI(-X) doorbell
  irqchip: mtk-sysirq: dt-bindings: Add bindings for mediatek sysirq
  irqchip: mtk-sysirq: Add sysirq interrupt polarity support
  irqchip: gic: Support hierarchy irq domain.
  irqchip: GICv3: Binding updates for ITS
  irqchip: GICv3: ITS: enable compilation of the ITS driver
  irqchip: GICv3: ITS: plug ITS init into main GICv3 code
  irqchip: GICv3: ITS: DT probing and initialization
  irqchip: GICv3: ITS: MSI support
  irqchip: GICv3: ITS: device allocation and configuration
  irqchip: GICv3: ITS: tables allocators
  irqchip: GICv3: ITS: LPI allocator
  irqchip: GICv3: ITS: irqchip implementation
  irqchip: GICv3: ITS command queue
  irqchip: GICv3: rework redistributor structure
  ...
2014-12-15 17:30:09 -08:00
Tero Kristo 6f8e853d18 ARM: OMAP2+: clock: fix DPLL code to use new determine rate APIs
While the change for determine_rate clock operation was merged,
the OMAP counterpart using these calls was overlooked for some reason,
and caused boot failures on at least OMAP4 platforms. Fixed by updating
the DPLL API calls to use the new parameters.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Fixes: 646cafc6aa ("clk: Change clk_ops->determine_rate")
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Tested-by: Kevin Hilman <khilman@linaro.org>
Reported-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Michael Turquette <mturquette@linaro.org>
2014-12-15 17:05:08 -08:00
Michael Turquette b1924c2ec1 Merge tag 'for-v3.19/omap-a' of git://git.kernel.org/pub/scm/linux/kernel/git/pjw/omap-pending into tmp
Some OMAP clock/hwmod patches for v3.19.

Most of the patches are clock-related.  The DPLL implementation is
changed to better align to the common clock framework.
There is also a patch that removes a few lines from the hwmod code -
this patch should have no functional effect.

Basic build, boot, and PM test logs for these patches can be found here:

http://www.pwsan.com/omap/testlogs/omap-a-for-v3.19/20141113094101/
2014-12-15 17:05:07 -08:00
Linus Torvalds 988adfdffd Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "Highlights:

   - AMD KFD driver merge

     This is the AMD HSA interface for exposing a lowlevel interface for
     GPGPU use.  They have an open source userspace built on top of this
     interface, and the code looks as good as it was going to get out of
     tree.

   - Initial atomic modesetting work

     The need for an atomic modesetting interface to allow userspace to
     try and send a complete set of modesetting state to the driver has
     arisen, and been suffering from neglect this past year.  No more,
     the start of the common code and changes for msm driver to use it
     are in this tree.  Ongoing work to get the userspace ioctl finished
     and the code clean will probably wait until next kernel.

   - DisplayID 1.3 and tiled monitor exposed to userspace.

     Tiled monitor property is now exposed for userspace to make use of.

   - Rockchip drm driver merged.

   - imx gpu driver moved out of staging

  Other stuff:

   - core:
        panel - MIPI DSI + new panels.
        expose suggested x/y properties for virtual GPUs

   - i915:
        Initial Skylake (SKL) support
        gen3/4 reset work
        start of dri1/ums removal
        infoframe tracking
        fixes for lots of things.

   - nouveau:
        tegra k1 voltage support
        GM204 modesetting support
        GT21x memory reclocking work

   - radeon:
        CI dpm fixes
        GPUVM improvements
        Initial DPM fan control

   - rcar-du:
        HDMI support added
        removed some support for old boards
        slave encoder driver for Analog Devices adv7511

   - exynos:
        Exynos4415 SoC support

   - msm:
        a4xx gpu support
        atomic helper conversion

   - tegra:
        iommu support
        universal plane support
        ganged-mode DSI support

   - sti:
        HDMI i2c improvements

   - vmwgfx:
        some late fixes.

   - qxl:
        use suggested x/y properties"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits)
  drm: sti: fix module compilation issue
  drm/i915: save/restore GMBUS freq across suspend/resume on gen4
  drm: sti: correctly cleanup CRTC and planes
  drm: sti: add HQVDP plane
  drm: sti: add cursor plane
  drm: sti: enable auxiliary CRTC
  drm: sti: fix delay in VTG programming
  drm: sti: prepare sti_tvout to support auxiliary crtc
  drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off}
  drm: sti: fix hdmi avi infoframe
  drm: sti: remove event lock while disabling vblank
  drm: sti: simplify gdp code
  drm: sti: clear all mixer control
  drm: sti: remove gpio for HDMI hot plug detection
  drm: sti: allow to change hdmi ddc i2c adapter
  drm/doc: Document drm_add_modes_noedid() usage
  drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
  drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
  drm: Zero out DRM object memory upon cleanup
  drm/i915/bdw: Fix the write setting up the WIZ hashing mode
  ...
2014-12-15 15:52:01 -08:00
Michael S. Tsirkin b9f7ac8c72 vringh: update for virtio 1.0 APIs
When switching everything over to virtio 1.0 memory access APIs,
I missed converting vringh.
Fortunately, it's straight-forward.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-15 23:49:28 +02:00
Michael S. Tsirkin b97a8a9006 vringh: 64 bit features
Pass u64 everywhere.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-15 23:49:23 +02:00
Steven Rostedt (Red Hat) 0daa230296 tracing: Add tp_printk cmdline to have tracepoints go to printk()
Add the kernel command line tp_printk option that will have tracepoints
that are active sent to printk() as well as to the trace buffer.

Passing "tp_printk" will activate this. To turn it off, the sysctl
/proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note,
this only works if the cmdline option is used. Echoing 1 into the sysctl
file without the cmdline option will have no affect.

Note, this is a dangerous option. Having high frequency tracepoints send
their data to printk() can possibly cause a live lock. This is another
reason why this is only active if the command line option is used.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-15 10:17:38 -05:00
Steven Rostedt (Red Hat) 5f893b2639 tracing: Move enabling tracepoints to just after rcu_init()
Enabling tracepoints at boot up can be very useful. The tracepoint
can be initialized right after RCU has been. There's no need to
wait for the early_initcall() to be called. That's too late for some
things that can use tracepoints for debugging. Move the logic to
enable tracepoints out of the initcalls and into init/main.c to
right after rcu_init().

This also allows trace_printk() to be used early too.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-15 10:16:50 -05:00
Rafael J. Wysocki 4f7ad5211e SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
After commit b2b49ccbdd (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.

Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under
drivers/scsi/ and in include/scsi/scsi_device.h.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2014-12-15 15:11:06 +01:00
Paolo Bonzini 333bce5aac Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
problems, clarifies VCPU init, and fixes a regression concerning the
 VGIC init flow.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUjsVhAAoJEEtpOizt6ddy5rIH/1V/YVwhprC55YqdHelU9Qu2
 Muzsx+7F71NxC7xgMGFqPD1YrPR+hxvoPhy+ADOBlvcqlolrkDnV9I+8e3geaYNc
 nZ/yEnoGTtbAggiS1smx7usBv34Z88Sd5txNjmj1cmHBy+VOWlyidWMkGBTsfBRe
 mVc61BDUfyC47udgRHXhwS80sbHLJHElmADisFOVmQNBYwwiHiTdx0hMBMnHcC3Y
 /3T0tKxHdeTISnmA+J+n7TcChtTIM4xqC6kwf3rw3b7XX8gdtTKylDHX2GLAg646
 RdebAG2twmGpIc6SxXZbo38f3oY9OFo1Le5xZGa6iUjD56VDw/e4wg4iA2juo0Y=
 =J2Ut
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-3.19-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
problems, clarifies VCPU init, and fixes a regression concerning the
VGIC init flow.

Conflicts:
	arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]
2014-12-15 13:06:40 +01:00
Christoffer Dall 05971120fc arm/arm64: KVM: Require in-kernel vgic for the arch timers
It is curently possible to run a VM with architected timers support
without creating an in-kernel VGIC, which will result in interrupts from
the virtual timer going nowhere.

To address this issue, move the architected timers initialization to the
time when we run a VCPU for the first time, and then only initialize
(and enable) the architected timers if we have a properly created and
initialized in-kernel VGIC.

When injecting interrupts from the virtual timer to the vgic, the
current setup should ensure that this never calls an on-demand init of
the VGIC, which is the only call path that could return an error from
kvm_vgic_inject_irq(), so capture the return value and raise a warning
if there's an error there.

We also change the kvm_timer_init() function from returning an int to be
a void function, since the function always succeeds.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-12-15 11:50:42 +01:00
Linus Torvalds 67e2c38838 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "In terms of changes, there's general maintenance to the Smack,
  SELinux, and integrity code.

  The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT,
  which allows IMA appraisal to require signatures.  Support for reading
  keys from rootfs before init is call is also added"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
  selinux: Remove security_ops extern
  security: smack: fix out-of-bounds access in smk_parse_smack()
  VFS: refactor vfs_read()
  ima: require signature based appraisal
  integrity: provide a hook to load keys when rootfs is ready
  ima: load x509 certificate from the kernel
  integrity: provide a function to load x509 certificate from the kernel
  integrity: define a new function integrity_read_file()
  Security: smack: replace kzalloc with kmem_cache for inode_smack
  Smack: Lock mode for the floor and hat labels
  ima: added support for new kernel cmdline parameter ima_template_fmt
  ima: allocate field pointers array on demand in template_desc_init_fields()
  ima: don't allocate a copy of template_fmt in template_desc_init_fields()
  ima: display template format in meas. list if template name length is zero
  ima: added error messages to template-related functions
  ima: use atomic bit operations to protect policy update interface
  ima: ignore empty and with whitespaces policy lines
  ima: no need to allocate entry for comment
  ima: report policy load status
  ima: use path names cache
  ...
2014-12-14 20:36:37 -08:00
Linus Torvalds 6ae840e7cc Char/Misc driver patches for 3.19-rc1
Here's the big char/misc driver update for 3.19-rc1
 
 Lots of little things all over the place in different drivers, and a new
 subsystem, "coresight" has been added.  Full details are in the
 shortlog.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSODosACgkQMUfUDdst+ykSNwCfcqx1Z3rQzbLwSrR2sa1fV3Zb
 yEAAniJoLZ4ZkoQK4/1ozsFc31q+gXNm
 =/epr
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here's the big char/misc driver update for 3.19-rc1

  Lots of little things all over the place in different drivers, and a
  new subsystem, "coresight" has been added.  Full details are in the
  shortlog"

* tag 'char-misc-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (73 commits)
  parport: parport_pc, do not remove parent devices early
  spmi: Remove shutdown/suspend/resume kernel-doc
  carma-fpga-program: drop videobuf dependency
  carma-fpga: drop videobuf dependency
  carma-fpga-program.c: fix compile errors
  i8k: Fix temperature bug handling in i8k_get_temp()
  cxl: Name interrupts in /proc/interrupt
  CXL: Return error to PSL if IRQ demultiplexing fails & print clearer warning
  coresight-replicator: remove .owner field for driver
  coresight: fixed comments in coresight.h
  coresight: fix typo in comment in coresight-priv.h
  coresight: bindings for coresight drivers
  coresight: Adding ABI documentation
  w1: support auto-load of w1_bq27000 module.
  w1: avoid potential u16 overflow
  cn: verify msg->len before making callback
  mei: export fw status registers through sysfs
  mei: read and print all six FW status registers
  mei: txe: add cherrytrail device id
  mei: kill cached host and me csr values
  ...
2014-12-14 16:43:47 -08:00
Linus Torvalds e6b5be2be4 Driver core patches for 3.19-rc1
Here's the set of driver core patches for 3.19-rc1.
 
 They are dominated by the removal of the .owner field in platform
 drivers.  They touch a lot of files, but they are "simple" changes, just
 removing a line in a structure.
 
 Other than that, a few minor driver core and debugfs changes.  There are
 some ath9k patches coming in through this tree that have been acked by
 the wireless maintainers as they relied on the debugfs changes.
 
 Everything has been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
 53kAoLeteByQ3iVwWurwwseRPiWa8+MI
 =OVRS
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core update from Greg KH:
 "Here's the set of driver core patches for 3.19-rc1.

  They are dominated by the removal of the .owner field in platform
  drivers.  They touch a lot of files, but they are "simple" changes,
  just removing a line in a structure.

  Other than that, a few minor driver core and debugfs changes.  There
  are some ath9k patches coming in through this tree that have been
  acked by the wireless maintainers as they relied on the debugfs
  changes.

  Everything has been in linux-next for a while"

* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
  Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
  fs: debugfs: add forward declaration for struct device type
  firmware class: Deletion of an unnecessary check before the function call "vunmap"
  firmware loader: fix hung task warning dump
  devcoredump: provide a one-way disable function
  device: Add dev_<level>_once variants
  ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
  ath: use seq_file api for ath9k debugfs files
  debugfs: add helper function to create device related seq_file
  drivers/base: cacheinfo: remove noisy error boot message
  Revert "core: platform: add warning if driver has no owner"
  drivers: base: support cpu cache information interface to userspace via sysfs
  drivers: base: add cpu_device_create to support per-cpu devices
  topology: replace custom attribute macros with standard DEVICE_ATTR*
  cpumask: factor out show_cpumap into separate helper function
  driver core: Fix unbalanced device reference in drivers_probe
  driver core: fix race with userland in device_add()
  sysfs/kernfs: make read requests on pre-alloc files use the buffer.
  sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
  fs: sysfs: return EGBIG on write if offset is larger than file size
  ...
2014-12-14 16:10:09 -08:00
Linus Torvalds 37da7bbbe8 TTY/Serial driver patches for 3.19-rc1
Here's the big tty/serial driver update for 3.19-rc1.
 
 There are a number of TTY core changes/fixes in here from Peter Hurley
 that have all been teted in linux-next for a long time now.  There are
 also the normal serial driver updates as well, full details in the
 changelog below.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOD/MACgkQMUfUDdst+ymW+wCfbSzoYMRObIImMPWfoQtxkvvN
 rpkAnAtyEP/zZIfkQIuKTSH6FJxocF8V
 =WZt3
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here's the big tty/serial driver update for 3.19-rc1.

  There are a number of TTY core changes/fixes in here from Peter Hurley
  that have all been teted in linux-next for a long time now.  There are
  also the normal serial driver updates as well, full details in the
  changelog below"

* tag 'tty-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (219 commits)
  serial: pxa: hold port.lock when reporting modem line changes
  tty-hvsi_lib: Deletion of an unnecessary check before the function call "tty_kref_put"
  tty: Deletion of unnecessary checks before two function calls
  n_tty: Fix read_buf race condition, increment read_head after pushing data
  serial: of-serial: add PM suspend/resume support
  Revert "serial: of-serial: add PM suspend/resume support"
  Revert "serial: of-serial: fix up PM ops on no_console_suspend and port type"
  serial: 8250: don't attempt a trylock if in sysrq
  serial: core: Add big-endian iotype
  serial: samsung: use port->fifosize instead of hardcoded values
  serial: samsung: prefer to use fifosize from driver data
  serial: samsung: fix style problems
  serial: samsung: wait for transfer completion before clock disable
  serial: icom: fix error return code
  serial: tegra: clean up tty-flag assignments
  serial: Fix io address assign flow with Fintek PCI-to-UART Product
  serial: mxs-auart: fix tx_empty against shift register
  serial: mxs-auart: fix gpio change detection on interrupt
  serial: mxs-auart: Fix mxs_auart_set_ldisc()
  serial: 8250_dw: Use 64-bit access for OCTEON.
  ...
2014-12-14 15:23:32 -08:00
Linus Torvalds e7cf773d43 USB patches for 3.19-rc1
Here's the big set of USB and PHY patches for 3.19-rc1.
 
 The normal churn in the USB gadget area is in here, as well as xhci and
 other individual USB driver updates.  The PHY tree is also in here, as
 there were dependancies on the USB tree.
 
 All of these have been in linux-next.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlSOEHcACgkQMUfUDdst+ykziQCgsm1D/af2nac6CTF2pov8VMIY
 ywgAnRi8LtZ2WassrwTNxY86Avaqryis
 =UVp8
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB updates from Greg KH:
 "Here's the big set of USB and PHY patches for 3.19-rc1.

  The normal churn in the USB gadget area is in here, as well as xhci
  and other individual USB driver updates.  The PHY tree is also in
  here, as there were dependancies on the USB tree.

  All of these have been in linux-next"

* tag 'usb-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (351 commits)
  arm: omap3: twl: remove usb phy init data
  usbip: fix error handling in stub_probe()
  usb: gadget: udc: missing curly braces
  USB: mos7720: delete some unneeded code
  wusb: replace memset by memzero_explicit
  usbip: remove unneeded structure
  usb: xhci: fix comment for PORT_DEV_REMOVE
  xhci: don't use the same variable for stopped and halted rings current TD
  xhci: clear extra bits from slot context when setting max exit latency
  xhci: cleanup finish_td function
  USB: adutux: NULL dereferences on disconnect
  usb: chipidea: fix platform_no_drv_owner.cocci warnings
  usb: chipidea: Fixed a few typos in comments
  Documentation: bindings: add doc for the USB2 ChipIdea USB driver
  usb: chipidea: add a usb2 driver for ci13xxx
  usb: chipidea: fix phy handling
  usb: chipidea: remove duplicate dev_set_drvdata for host_start
  usb: chipidea: parameter 'mode' isn't needed for hw_device_reset
  usb: chipidea: add controller reset API
  usb: chipidea: remove flag CI_HDRC_REQUIRE_TRANSCEIVER
  ...
2014-12-14 14:57:16 -08:00
Linus Torvalds 980f3c344f This is the bulk of GPIO changes for the v3.19 series:
- A new API that allows setting more than one GPIO at the
   time. This is implemented for the new descriptor-based
   API only and makes it possible to e.g. toggle a clock and
   data line at the same time, if the hardware can do this
   with a single register write. Both consumers and drivers
   need new calls, and the core will fall back to driving
   individual lines where needed. Implemented for the MPC8xxx
   driver initially.
 - Patched the mdio-mux-gpio and the serial mctrl driver
   that drives modems to use the new multiple-setting API
   to set several signals simultaneously.
 - Get rid of the global GPIO descriptor array, and instead
   allocate descriptors dynamically for each GPIO on a certain
   GPIO chip. This moves us closer to getting rid of the
   limitation of using the global, static GPIO numberspace.
 - New driver and device tree bindings for 74xx ICs.
 - New driver and device tree bindings for the VF610 Vybrid.
 - Support the RCAR r8a7793 and r8a7794.
 - Guidelines for GPIO device tree bindings trying to get
   things a bit more strict with the advent of combined
   device properties.
 - Suspend/resume support for the MVEBU driver.
 - A slew of minor fixes and improvements.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUjgQ7AAoJEEEQszewGV1zuJ8P+wamlDNhJbsgqXPcSCZZFgeP
 1O22VRYqoo/i8mAzNCRi2h6NogO9Da6rCRhHdH35TsuNzIbusHE+btMukj248qJ7
 WYOf25I0ImyUP8kulogW4/+7lYibRLHnN2BSLuAkApofmxDvODPS1KNWHulcOcxl
 VaVsA8wvFzQO1s1Wjv94ctVfs5rqk7mBfPwk61zHuLeETecmKg0e52p0Uzqlq6gi
 UKi9uK3sjQ7kI/+xa+qDrF9GRwRR22oJfD/9zNj8g94iU9iMs5Oh+Zp3RJcvYUSD
 y5BIb+IY2ATy20ZkijWmeP8LJz6pja+C9Ne7lKM0jkv7geGeHGAoavz0n3oUq4oz
 IvUNz6hCAP9PcxWc5a9FFqqORLWrRew6GmZmJvIkmC9K+3UQcWhkzO3vLpfl6Q9h
 S728XexkIlhxG9NcER21bFXV2dw3z/X9dm5mQ473TqJm+wQmRuYcPRg053NbqMcx
 juvkweCksx8qlpnjo/1QXQcVwFM8kuR7xAlVo7zdMDOU5F8pdxRnsTl0cUdx5cPv
 DKeMRg8+FYcHmIoe/EodemIh7cAZtEpijZNNAr9cDmAjifeBjWhCb+zri5SIc96x
 0jKVTXyY4jnHXBVoA0FIl1d2t54yVjh3PYiu0MjeLJ9tyB+Px/nOxW8FrdlFnPJ/
 oP5WK13c8h3bMkxUzsvL
 =ZAhA
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull take two of the GPIO updates:
 "Same stuff as last time, now with a fixup patch for the previous
  compile error plus I ran a few extra rounds of compile-testing.

  This is the bulk of GPIO changes for the v3.19 series:

   - A new API that allows setting more than one GPIO at the time.  This
     is implemented for the new descriptor-based API only and makes it
     possible to e.g. toggle a clock and data line at the same time, if
     the hardware can do this with a single register write.  Both
     consumers and drivers need new calls, and the core will fall back
     to driving individual lines where needed.  Implemented for the
     MPC8xxx driver initially

   - Patched the mdio-mux-gpio and the serial mctrl driver that drives
     modems to use the new multiple-setting API to set several signals
     simultaneously

   - Get rid of the global GPIO descriptor array, and instead allocate
     descriptors dynamically for each GPIO on a certain GPIO chip.  This
     moves us closer to getting rid of the limitation of using the
     global, static GPIO numberspace

   - New driver and device tree bindings for 74xx ICs

   - New driver and device tree bindings for the VF610 Vybrid

   - Support the RCAR r8a7793 and r8a7794

   - Guidelines for GPIO device tree bindings trying to get things a bit
     more strict with the advent of combined device properties

   - Suspend/resume support for the MVEBU driver

   - A slew of minor fixes and improvements"

* tag 'gpio-v3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (33 commits)
  gpio: mcp23s08: fix up compilation error
  gpio: pl061: document gpio-ranges property for bindings file
  gpio: pl061: hook request if gpio-ranges avaiable
  gpio: mcp23s08: Add option to configure IRQ output polarity as active high
  gpio: fix deferred probe detection for legacy API
  serial: mctrl_gpio: use gpiod_set_array function
  mdio-mux-gpio: Use GPIO descriptor interface and new gpiod_set_array function
  gpio: remove const modifier from gpiod_get_direction()
  gpio: remove gpio_descs global array
  gpio: mxs: implement get_direction callback
  gpio: em: Use dynamic allocation of GPIOs
  gpio: Check if base is positive before calling gpio_is_valid()
  gpio: mcp23s08: Add simple IRQ support for SPI devices
  gpio: mcp23s08: request a shared interrupt
  gpio: mcp23s08: Do not free unrequested interrupt
  gpio: rcar: Add r8a7793 and r8a7794 support
  gpio-mpc8xxx: add mpc8xxx_gpio_set_multiple function
  gpiolib: allow simultaneous setting of multiple GPIO outputs
  gpio: mvebu: add suspend/resume support
  gpio: gpio-davinci: remove duplicate check on resource
  ..
2014-12-14 14:05:05 -08:00
Linus Torvalds 7d22286ff7 Merge git://git.kvack.org/~bcrl/aio-next
Pull aio updates from Benjamin LaHaise.

* git://git.kvack.org/~bcrl/aio-next:
  aio: Skip timer for io_getevents if timeout=0
  aio: Make it possible to remap aio ring
2014-12-14 13:36:57 -08:00
Linus Torvalds 96895199c8 Merge branch 'i2c/for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "For 3.19, the I2C subsystem has to offer special candy this time.
  Right in time for Christmas :)

   - I2C slave framework: finally, a generic mechanism for Linux being
     an I2C slave (if the bus driver supports that).  Docs are still
     missing but will come later this cycle, the code is good enough to
     go.
   - I2C muxes represent their topology in sysfs much more detailed.
     This will help users to navigate around much easier.
   - irq population of i2c clients is now done at probe time, not device
     creation time, to have better support for deferred probing.
   - new drivers for Imagination SCB, Amlogic Meson
   - DMA support added for Freescale IMX, Renesas SHMobile
   - slightly bigger driver updates to OMAP, i801, AT91, and rk3x
     (mostly quirk handling, timing updates, and using better kernel
     interfaces)
   - eeprom driver can now write with byte-access (very slow, but OK to
     have)
   - and the bunch of smaller fixes, cleanups, ID updates..."

* 'i2c/for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (56 commits)
  i2c: sh_mobile: remove unneeded DMA mask
  i2c: rcar: add slave support
  i2c: slave-eeprom: add eeprom simulator driver
  i2c: core changes for slave support
  MAINTAINERS: add I2C dt bindings also to I2C realm
  i2c: designware: Fix falling time bindings doc
  i2c: davinci: switch to use platform_get_irq
  Documentation: i2c: Use PM ops instead of legacy suspend/resume
  i2c: sh_mobile: optimize irq entry
  i2c: pxa: add support for SCCB devices
  omap: i2c: don't check bus state IP rev3.3 and earlier
  i2c: s3c2410: Handle i2c sys_cfg register in i2c driver
  i2c: rk3x: add Kconfig dependency on COMMON_CLK
  i2c: omap: add notes related to i2c multimaster mode
  i2c: omap: don't reset controller if Arbitration Lost detected
  i2c: omap: implement workaround for handling invalid BB-bit values
  i2c: omap: cleanup register definitions
  i2c: rk3x: handle dynamic clock rate changes correctly
  i2c: at91: enable probe deferring on dma channel request
  i2c: at91: remove legacy DMA support
  ...
2014-12-14 12:54:40 -08:00
Michael S. Tsirkin d71de9ec6b virtio: core support for config generation
virtio 1.0 spec says:

Drivers MUST NOT assume reads from fields greater than 32 bits wide are
atomic, nor are reads from multiple fields: drivers SHOULD read device
configuration space fields like so:
	u32 before, after;
	do {
		before = get_config_generation(device);
		// read config entry/entries.
		after = get_config_generation(device);
	} while (after != before);

Do exactly this, for transports that support it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-14 18:21:31 +02:00
Michael S. Tsirkin 0dce3771fd virtio_pci: add VIRTIO_PCI_NO_LEGACY
Add macro to disable all legacy register defines.
Helpful to make sure legacy macros don't leak
through into modern code.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-14 15:10:31 +02:00
Pavel Emelyanov e4a0d3e720 aio: Make it possible to remap aio ring
There are actually two issues this patch addresses. Let me start with
the one I tried to solve in the beginning.

So, in the checkpoint-restore project (criu) we try to dump tasks'
state and restore one back exactly as it was. One of the tasks' state
bits is rings set up with io_setup() call. There's (almost) no problems
in dumping them, there's a problem restoring them -- if I dump a task
with aio ring originally mapped at address A, I want to restore one
back at exactly the same address A. Unfortunately, the io_setup() does
not allow for that -- it mmaps the ring at whatever place mm finds
appropriate (it calls do_mmap_pgoff() with zero address and without
the MAP_FIXED flag).

To make restore possible I'm going to mremap() the freshly created ring
into the address A (under which it was seen before dump). The problem is
that the ring's virtual address is passed back to the user-space as the
context ID and this ID is then used as search key by all the other io_foo()
calls. Reworking this ID to be just some integer doesn't seem to work, as
this value is already used by libaio as a pointer using which this library
accesses memory for aio meta-data.

So, to make restore work we need to make sure that

a) ring is mapped at desired virtual address
b) kioctx->user_id matches this value

Having said that, the patch makes mremap() on aio region update the
kioctx's user_id and mmap_base values.

Here appears the 2nd issue I mentioned in the beginning of this mail.
If (regardless of the C/R dances I do) someone creates an io context
with io_setup(), then mremap()-s the ring and then destroys the context,
the kill_ioctx() routine will call munmap() on wrong (old) address.
This will result in a) aio ring remaining in memory and b) some other
vma get unexpectedly unmapped.

What do you think?

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
2014-12-13 17:49:50 -05:00
Linus Torvalds 9ea18f8cab Merge branch 'for-3.19/drivers' of git://git.kernel.dk/linux-block
Pull block layer driver updates from Jens Axboe:

 - NVMe updates:
        - The blk-mq conversion from Matias (and others)

        - A stack of NVMe bug fixes from the nvme tree, mostly from Keith.

        - Various bug fixes from me, fixing issues in both the blk-mq
          conversion and generic bugs.

        - Abort and CPU online fix from Sam.

        - Hot add/remove fix from Indraneel.

 - A couple of drbd fixes from the drbd team (Andreas, Lars, Philipp)

 - With the generic IO stat accounting from 3.19/core, converting md,
   bcache, and rsxx to use those.  From Gu Zheng.

 - Boundary check for queue/irq mode for null_blk from Matias.  Fixes
   cases where invalid values could be given, causing the device to hang.

 - The xen blkfront pull request, with two bug fixes from Vitaly.

* 'for-3.19/drivers' of git://git.kernel.dk/linux-block: (56 commits)
  NVMe: fix race condition in nvme_submit_sync_cmd()
  NVMe: fix retry/error logic in nvme_queue_rq()
  NVMe: Fix FS mount issue (hot-remove followed by hot-add)
  NVMe: fix error return checking from blk_mq_alloc_request()
  NVMe: fix freeing of wrong request in abort path
  xen/blkfront: remove redundant flush_op
  xen/blkfront: improve protection against issuing unsupported REQ_FUA
  NVMe: Fix command setup on IO retry
  null_blk: boundary check queue_mode and irqmode
  block/rsxx: use generic io stats accounting functions to simplify io stat accounting
  md: use generic io stats accounting functions to simplify io stat accounting
  drbd: use generic io stats accounting functions to simplify io stat accounting
  md/bcache: use generic io stats accounting functions to simplify io stat accounting
  NVMe: Update module version major number
  NVMe: fail pci initialization if the device doesn't have any BARs
  NVMe: add ->exit_hctx() hook
  NVMe: make setup work for devices that don't do INTx
  NVMe: enable IO stats by default
  NVMe: nvme_submit_async_admin_req() must use atomic rq allocation
  NVMe: replace blk_put_request() with blk_mq_free_request()
  ...
2014-12-13 14:22:26 -08:00
Linus Torvalds caf292ae5b Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-block
Pull block driver core update from Jens Axboe:
 "This is the pull request for the core block IO changes for 3.19.  Not
  a huge round this time, mostly lots of little good fixes:

   - Fix a bug in sysfs blktrace interface causing a NULL pointer
     dereference, when enabled/disabled through that API.  From Arianna
     Avanzini.

   - Various updates/fixes/improvements for blk-mq:

        - A set of updates from Bart, mostly fixing buts in the tag
          handling.

        - Cleanup/code consolidation from Christoph.

        - Extend queue_rq API to be able to handle batching issues of IO
          requests. NVMe will utilize this shortly. From me.

        - A few tag and request handling updates from me.

        - Cleanup of the preempt handling for running queues from Paolo.

        - Prevent running of unmapped hardware queues from Ming Lei.

        - Move the kdump memory limiting check to be in the correct
          location, from Shaohua.

        - Initialize all software queues at init time from Takashi. This
          prevents a kobject warning when CPUs are brought online that
          weren't online when a queue was registered.

   - Single writeback fix for I_DIRTY clearing from Tejun.  Queued with
     the core IO changes, since it's just a single fix.

   - Version X of the __bio_add_page() segment addition retry from
     Maurizio.  Hope the Xth time is the charm.

   - Documentation fixup for IO scheduler merging from Jan.

   - Introduce (and use) generic IO stat accounting helpers for non-rq
     drivers, from Gu Zheng.

   - Kill off artificial limiting of max sectors in a request from
     Christoph"

* 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits)
  bio: modify __bio_add_page() to accept pages that don't start a new segment
  blk-mq: Fix uninitialized kobject at CPU hotplugging
  blktrace: don't let the sysfs interface remove trace from running list
  blk-mq: Use all available hardware queues
  blk-mq: Micro-optimize bt_get()
  blk-mq: Fix a race between bt_clear_tag() and bt_get()
  blk-mq: Avoid that __bt_get_word() wraps multiple times
  blk-mq: Fix a use-after-free
  blk-mq: prevent unmapped hw queue from being scheduled
  blk-mq: re-check for available tags after running the hardware queue
  blk-mq: fix hang in bt_get()
  blk-mq: move the kdump check to blk_mq_alloc_tag_set
  blk-mq: cleanup tag free handling
  blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map
  blk: introduce generic io stat accounting help function
  blk-mq: handle the single queue case in blk_mq_hctx_next_cpu
  genhd: check for int overflow in disk_expand_part_tbl()
  blk-mq: add blk_mq_free_hctx_request()
  blk-mq: export blk_mq_free_request()
  blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable
  ...
2014-12-13 14:14:23 -08:00
Linus Torvalds a99abce2d9 Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "Two small patches from the audit next branch; only one of which has
  any real significant code changes, the other is simply a MAINTAINERS
  update for audit.

  The single code patch is pretty small and rather straightforward, it
  changes the audit "version" number reported to userspace from an
  integer to a bitmap which is used to indicate the functionality of the
  running kernel.  This really doesn't have much impact on the kernel,
  but it will make life easier for the audit userspace folks.

  Thankfully we were still on a version number which allowed us to do
  this without breaking userspace"

* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
  audit: convert status version to a feature bitmap
  audit: add Paul Moore to the MAINTAINERS entry
2014-12-13 13:41:28 -08:00
Linus Torvalds e3aa91a7cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 - The crypto API is now documented :)
 - Disallow arbitrary module loading through crypto API.
 - Allow get request with empty driver name through crypto_user.
 - Allow speed testing of arbitrary hash functions.
 - Add caam support for ctr(aes), gcm(aes) and their derivatives.
 - nx now supports concurrent hashing properly.
 - Add sahara support for SHA1/256.
 - Add ARM64 version of CRC32.
 - Misc fixes.

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
  crypto: tcrypt - Allow speed testing of arbitrary hash functions
  crypto: af_alg - add user space interface for AEAD
  crypto: qat - fix problem with coalescing enable logic
  crypto: sahara - add support for SHA1/256
  crypto: sahara - replace tasklets with kthread
  crypto: sahara - add support for i.MX53
  crypto: sahara - fix spinlock initialization
  crypto: arm - replace memset by memzero_explicit
  crypto: powerpc - replace memset by memzero_explicit
  crypto: sha - replace memset by memzero_explicit
  crypto: sparc - replace memset by memzero_explicit
  crypto: algif_skcipher - initialize upon init request
  crypto: algif_skcipher - removed unneeded code
  crypto: algif_skcipher - Fixed blocking recvmsg
  crypto: drbg - use memzero_explicit() for clearing sensitive data
  crypto: drbg - use MODULE_ALIAS_CRYPTO
  crypto: include crypto- module prefix in template
  crypto: user - add MODULE_ALIAS
  crypto: sha-mb - remove a bogus NULL check
  crytpo: qat - Fix 64 bytes requests
  ...
2014-12-13 13:33:26 -08:00
Linus Torvalds 78a45c6f06 Merge branch 'akpm' (second patch-bomb from Andrew)
Merge second patchbomb from Andrew Morton:
 - the rest of MM
 - misc fs fixes
 - add execveat() syscall
 - new ratelimit feature for fault-injection
 - decompressor updates
 - ipc/ updates
 - fallocate feature creep
 - fsnotify cleanups
 - a few other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits)
  cgroups: Documentation: fix trivial typos and wrong paragraph numberings
  parisc: percpu: update comments referring to __get_cpu_var
  percpu: update local_ops.txt to reflect this_cpu operations
  percpu: remove __get_cpu_var and __raw_get_cpu_var macros
  fsnotify: remove destroy_list from fsnotify_mark
  fsnotify: unify inode and mount marks handling
  fallocate: create FAN_MODIFY and IN_MODIFY events
  mm/cma: make kmemleak ignore CMA regions
  slub: fix cpuset check in get_any_partial
  slab: fix cpuset check in fallback_alloc
  shmdt: use i_size_read() instead of ->i_size
  ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
  ipc/msg: increase MSGMNI, remove scaling
  ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
  ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
  lib/decompress.c: consistency of compress formats for kernel image
  decompress_bunzip2: off by one in get_next_block()
  usr/Kconfig: make initrd compression algorithm selection not expert
  fault-inject: add ratelimit option
  ratelimit: add initialization macro
  ...
2014-12-13 13:00:36 -08:00
Christoph Lameter 6c51ec4d18 percpu: remove __get_cpu_var and __raw_get_cpu_var macros
No user is left in the kernel source tree.  Therefore we can drop the
definitions.

This is the final merge of the transition away from __get_cpu_var.  After
this patch the kernel will not build if anyone uses __get_cpu_var.

Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Jan Kara 37d469e767 fsnotify: remove destroy_list from fsnotify_mark
destroy_list is used to track marks which still need waiting for srcu
period end before they can be freed.  However by the time mark is added to
destroy_list it isn't in group's list of marks anymore and thus we can
reuse fsnotify_mark->g_list for queueing into destroy_list.  This saves
two pointers for each fsnotify_mark.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Jan Kara 0809ab69a2 fsnotify: unify inode and mount marks handling
There's a lot of common code in inode and mount marks handling.  Factor it
out to a common helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:53 -08:00
Manfred Spraul 0050ee059f ipc/msg: increase MSGMNI, remove scaling
SysV can be abused to allocate locked kernel memory.  For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: increase MSGMNI to the maximum supported.

And: If we ignore the risk of locking too much memory, then an automatic
scaling of MSGMNI doesn't make sense.  Therefore the logic can be removed.

The code preserves auto_msgmni to avoid breaking any user space applications
that expect that the value exists.

Notes:
1) If an administrator must limit the memory allocations, then he can set
MSGMNI as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
to control latency vs. throughput:
If MSGMNB is large, then msgsnd() just returns and more messages can be queued
before a task switch to a task that calls msgrcv() is forced.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Manfred Spraul e843e7d2c8 ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
a)

SysV can be abused to allocate locked kernel memory.  For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: Increase the sysv sem limits so that all known applications
will work with these defaults.

b)

With regards to the maximum supported:
Some of the specified hard limits are not correct anymore, therefore the
patch updates the documentation.

- SEMMNI must stay below IPCMNI, which is 32768.
  As for SHMMAX: Stay a bit below this limit.

- SEMMSL was limited to 8k, to ensure that the kmalloc for the kernel array
  was limited to 16 kB (order=2)

  This doesn't apply anymore:
   - the allocation size isn't sizeof(short)*nsems anymore.
   - ipc_alloc falls back to vmalloc

- SEMOPM should stay below 1000, to limit the kmalloc in semtimedop() to an
  order=1 allocation.
  Therefore: Leave it at 500 (order=0 allocation).

Note:
If an administrator must limit the memory allocations, then he can set the
values as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Dmitry Monakhov 6adc4a22f2 fault-inject: add ratelimit option
Current debug levels are not optimal.  Especially if one want to provoke
big numbers of faults(broken device simulator) then any verbose level will
produce giant numbers of identical logging messages.  Let's add ratelimit
parameter for that purpose.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Acked-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Dmitry Monakhov 89e3f90995 ratelimit: add initialization macro
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
David Drysdale 51f39a1f0c syscalls: implement execveat() system call
This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts).  The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.

Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns.  The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.

This patch (of 4):

Add a new execveat(2) system call.  execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers.  This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).

Based on patches by Meredydd Luff.

Signed-off-by: David Drysdale <drysdale@google.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rich Felker <dalias@aerifal.cx>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:51 -08:00
Vladimir Davydov 8135be5a80 memcg: fix possible use-after-free in memcg_kmem_get_cache()
Suppose task @t that belongs to a memory cgroup @memcg is going to
allocate an object from a kmem cache @c.  The copy of @c corresponding to
@memcg, @mc, is empty.  Then if kmem_cache_alloc races with the memory
cgroup destruction we can access the memory cgroup's copy of the cache
after it was destroyed:

CPU0				CPU1
----				----
[ current=@t
  @mc->memcg_params->nr_pages=0 ]

kmem_cache_alloc(@c):
  call memcg_kmem_get_cache(@c);
  proceed to allocation from @mc:
    alloc a page for @mc:
      ...

				move @t from @memcg
				destroy @memcg:
				  mem_cgroup_css_offline(@memcg):
				    memcg_unregister_all_caches(@memcg):
				      kmem_cache_destroy(@mc)

    add page to @mc

We could fix this issue by taking a reference to a per-memcg cache, but
that would require adding a per-cpu reference counter to per-memcg caches,
which would look cumbersome.

Instead, let's take a reference to a memory cgroup, which already has a
per-cpu reference counter, in the beginning of kmem_cache_alloc to be
dropped in the end, and move per memcg caches destruction from css offline
to css free.  As a side effect, per-memcg caches will be destroyed not one
by one, but all at once when the last page accounted to the memory cgroup
is freed.  This doesn't sound as a high price for code readability though.

Note, this patch does add some overhead to the kmem_cache_alloc hot path,
but it is pretty negligible - it's just a function call plus a per cpu
counter decrement, which is comparable to what we already have in
memcg_kmem_get_cache.  Besides, it's only relevant if there are memory
cgroups with kmem accounting enabled.  I don't think we can find a way to
handle this race w/o it, because alloc_page called from kmem_cache_alloc
may sleep so we can't flush all pending kmallocs w/o reference counting.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:49 -08:00
Oleg Nesterov d003f371b2 oom: don't assume that a coredumping thread will exit soon
oom_kill.c assumes that PF_EXITING task should exit and free the memory
soon.  This is wrong in many ways and one important case is the coredump.
A task can sleep in exit_mm() "forever" while the coredumping sub-thread
can need more memory.

Change the PF_EXITING checks to take SIGNAL_GROUP_COREDUMP into account,
we add the new trivial helper for that.

Note: this is only the first step, this patch doesn't try to solve other
problems.  The SIGNAL_GROUP_COREDUMP check is obviously racy, a task can
participate in coredump after it was already observed in PF_EXITING state,
so TIF_MEMDIE (which also blocks oom-killer) still can be wrongly set.
fatal_signal_pending() can be true because of SIGNAL_GROUP_COREDUMP so
out_of_memory() and mem_cgroup_out_of_memory() shouldn't blindly trust it.
 And even the name/usage of the new helper is confusing, an exiting thread
can only free its ->mm if it is the only/last task in thread group.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:49 -08:00
Johannes Weiner 6b4f7799c6 mm: vmscan: invoke slab shrinkers from shrink_zone()
The slab shrinkers are currently invoked from the zonelist walkers in
kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the
eligible LRU pages and assemble a nodemask to pass to NUMA-aware
shrinkers, which then again have to walk over the nodemask.  This is
redundant code, extra runtime work, and fairly inaccurate when it comes to
the estimation of actually scannable LRU pages.  The code duplication will
only get worse when making the shrinkers cgroup-aware and requiring them
to have out-of-band cgroup hierarchy walks as well.

Instead, invoke the shrinkers from shrink_zone(), which is where all
reclaimers end up, to avoid this duplication.

Take the count for eligible LRU pages out of get_scan_count(), which
considers many more factors than just the availability of swap space, like
zone_reclaimable_pages() currently does.  Accumulate the number over all
visited lruvecs to get the per-zone value.

Some nodes have multiple zones due to memory addressing restrictions.  To
avoid putting too much pressure on the shrinkers, only invoke them once
for each such node, using the class zone of the allocation as the pivot
zone.

For now, this integrates the slab shrinking better into the reclaim logic
and gets rid of duplicative invocations from kswapd, direct reclaim, and
zone reclaim.  It also prepares for cgroup-awareness, allowing
memcg-capable shrinkers to be added at the lruvec level without much
duplication of both code and runtime work.

This changes kswapd behavior, which used to invoke the shrinkers for each
zone, but with scan ratios gathered from the entire node, resulting in
meaningless pressure quantities on multi-zone nodes.

Zone reclaim behavior also changes.  It used to shrink slabs until the
same amount of pages were shrunk as were reclaimed from the LRUs.  Now it
merely invokes the shrinkers once with the zone's scan ratio, which makes
the shrinkers go easier on caches that implement aging and would prefer
feeding back pressure from recently used slab objects to unused LRU pages.

[vdavydov@parallels.com: assure class zone is populated]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Davidlohr Bueso f5f302e212 mm,vmacache: count number of system-wide flushes
These flushes deal with sequence number overflows, such as for long lived
threads.  These are rare, but interesting from a debugging PoV.  As such,
display the number of flushes when vmacache debugging is enabled.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim 48c96a3685 mm/page_owner: keep track of page owners
This is the page owner tracking code which is introduced so far ago.  It
is resident on Andrew's tree, though, nobody tried to upstream so it
remain as is.  Our company uses this feature actively to debug memory leak
or to find a memory hogger so I decide to upstream this feature.

This functionality help us to know who allocates the page.  When
allocating a page, we store some information about allocation in extra
memory.  Later, if we need to know status of all pages, we can get and
analyze it from this stored information.

In previous version of this feature, extra memory is statically defined in
struct page, but, in this version, extra memory is allocated outside of
struct page.  It enables us to turn on/off this feature at boottime
without considerable memory waste.

Although we already have tracepoint for tracing page allocation/free,
using it to analyze page owner is rather complex.  We need to enlarge the
trace buffer for preventing overlapping until userspace program launched.
And, launched program continually dump out the trace buffer for later
analysis and it would change system behaviour with more possibility rather
than just keeping it in memory, so bad for debug.

Moreover, we can use page_owner feature further for various purposes.  For
example, we can use it for fragmentation statistics implemented in this
patch.  And, I also plan to implement some CMA failure debugging feature
using this interface.

I'd like to give the credit for all developers contributed this feature,
but, it's not easy because I don't know exact history.  Sorry about that.
Below is people who has "Signed-off-by" in the patches in Andrew's tree.

Contributor:
Alexander Nyberg <alexn@dsv.su.se>
Mel Gorman <mgorman@suse.de>
Dave Hansen <dave@linux.vnet.ibm.com>
Minchan Kim <minchan@kernel.org>
Michal Nazarewicz <mina86@mina86.com>
Andrew Morton <akpm@linux-foundation.org>
Jungsoo Son <jungsoo.son@lge.com>

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim 9a92a6ce6f stacktrace: introduce snprint_stack_trace for buffer output
Current stacktrace only have the function for console output.  page_owner
that will be introduced in following patch needs to print the output of
stacktrace into the buffer for our own output format so so new function,
snprint_stack_trace(), is needed.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim 031bc5743f mm/debug-pagealloc: make debug-pagealloc boottime configurable
Now, we have prepared to avoid using debug-pagealloc in boottime.  So
introduce new kernel-parameter to disable debug-pagealloc in boottime, and
makes related functions to be disabled in this case.

Only non-intuitive part is change of guard page functions.  Because guard
page is effective only if debug-pagealloc is enabled, turning off
according to debug-pagealloc is reasonable thing to do.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim e30825f186 mm/debug-pagealloc: prepare boottime configurable on/off
Until now, debug-pagealloc needs extra flags in struct page, so we need to
recompile whole source code when we decide to use it.  This is really
painful, because it takes some time to recompile and sometimes rebuild is
not possible due to third party module depending on struct page.  So, we
can't use this good feature in many cases.

Now, we have the page extension feature that allows us to insert extra
flags to outside of struct page.  This gets rid of third party module
issue mentioned above.  And, this allows us to determine if we need extra
memory for this page extension in boottime.  With these property, we can
avoid using debug-pagealloc in boottime with low computational overhead in
the kernel built with CONFIG_DEBUG_PAGEALLOC.  This will help our
development process greatly.

This patch is the preparation step to achive above goal.  debug-pagealloc
originally uses extra field of struct page, but, after this patch, it will
use field of struct page_ext.  Because memory for page_ext is allocated
later than initialization of page allocator in CONFIG_SPARSEMEM, we should
disable debug-pagealloc feature temporarily until initialization of
page_ext.  This patch implements this.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Joonsoo Kim eefa864b70 mm/page_ext: resurrect struct page extending code for debugging
When we debug something, we'd like to insert some information to every
page.  For this purpose, we sometimes modify struct page itself.  But,
this has drawbacks.  First, it requires re-compile.  This makes us
hesitate to use the powerful debug feature so development process is
slowed down.  And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency.  At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel.  Keeping this as it is would be better to reproduce errornous
situation.

This feature is intended to overcome above mentioned problems.  This
feature allocates memory for extended data per page in certain place
rather than the struct page itself.  This memory can be accessed by the
accessor functions provided by this code.  During the boot process, it
checks whether allocation of huge chunk of memory is needed or not.  If
not, it avoids allocating memory at all.  With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.

Until now, memcg uses this technique.  But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed.  I'd like to use this code to develop debug feature, so
this patch resurrect it.

To help these things to work well, this patch introduces two callbacks for
clients.  One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time.  The other is optional, init
callback, which is used to do proper initialization after memory is
allocated.  Detailed explanation about purpose of these functions is in
code comment.  Please refer it.

Others are completely same with previous extension code in memcg.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Jianyu Zhan 2d48366b3f mm, gfp: escalatedly define GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE
GFP_USER, GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE are escalatedly confined
defined, also implied by their names:

GFP_USER                                  = GFP_USER
GFP_USER + __GFP_HIGHMEM                  = GFP_HIGHUSER
GFP_USER + __GFP_HIGHMEM + __GFP_MOVABLE  = GFP_HIGHUSER_MOVABLE

So just make GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE escalatedly defined to
reflect this fact.  It also makes the definition clear and texturally warn
on any furture break-up of this escalated relastionship.

Signed-off-by: Jianyu Zhan <jianyu.zhan@emc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:48 -08:00
Andrew Morton 66f2ca7e3f include/linux/kmemleak.h: needs slab.h
include/linux/kmemleak.h: In function 'kmemleak_alloc_recursive':
include/linux/kmemleak.h:43: error: 'SLAB_NOLEAKTRACE' undeclared (first use in this function)

Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:47 -08:00
Zhang Zhen 056b7ccef4 mm/memcontrol.c: remove the unused arg in __memcg_kmem_get_cache()
The gfp was passed in but never used in this function.

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:47 -08:00
Tejun Heo bd6dace78b mm: move swp_entry_t definition to include/linux/mm_types.h
swp_entry_t being defined in include/linux/swap.h instead of
include/linux/mm_types.h causes cyclic include dependency later when
include/linux/page_cgroup.h is included from writeback path.  Move the
definition to include/linux/mm_types.h.

While at it, reformat the comment above it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:47 -08:00
Vladimir Davydov 6f185c290e memcg: turn memcg_kmem_skip_account into a bit field
It isn't supposed to stack, so turn it into a bit-field to save 4 bytes on
the task_struct.

Also, remove the memcg_stop/resume_kmem_account helpers - it is clearer to
set/clear the flag inline.  Regarding the overwhelming comment to the
helpers, which is removed by this patch too, we already have a compact yet
accurate explanation in memcg_schedule_cache_create, no need in yet
another one.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:47 -08:00
Michal Nazarewicz 5e19b013f5 lib: bitmap: add alignment offset for bitmap_find_next_zero_area()
Add a bitmap_find_next_zero_area_off() function which works like
bitmap_find_next_zero_area() function except it allows an offset to be
specified when alignment is checked.  This lets caller request a bit such
that its number plus the offset is aligned according to the mask.

[gregory.0xf0@gmail.com: Retrieved from https://patchwork.linuxtv.org/patch/6254/ and updated documentation]
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:46 -08:00
Davidlohr Bueso 3dec0ba0be mm/rmap: share the i_mmap_rwsem
Similarly to the anon memory counterpart, we can share the mapping's lock
ownership as the interval tree is not modified when doing doing the walk,
only the file page.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:45 -08:00
Davidlohr Bueso c8c06efa8b mm: convert i_mmap_mutex to rwsem
The i_mmap_mutex is a close cousin of the anon vma lock, both protecting
similar data, one for file backed pages and the other for anon memory.  To
this end, this lock can also be a rwsem.  In addition, there are some
important opportunities to share the lock when there are no tree
modifications.

This conversion is straightforward.  For now, all users take the write
lock.

[sfr@canb.auug.org.au: update fremap.c]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:45 -08:00
Davidlohr Bueso 8b28f621be mm,fs: introduce helpers around the i_mmap_mutex
This series is a continuation of the conversion of the i_mmap_mutex to
rwsem, following what we have for the anon memory counterpart.  With
Hugh's feedback from the first iteration.

Ultimately, the most obvious paths that require exclusive ownership of the
lock is when we modify the VMA interval tree, via
vma_interval_tree_insert() and vma_interval_tree_remove() families.  Cases
such as unmapping, where the ptes content is changed but the tree remains
untouched should make it safe to share the i_mmap_rwsem.

As such, the code of course is straightforward, however the devil is very
much in the details.  While its been tested on a number of workloads
without anything exploding, I would not be surprised if there are some
less documented/known assumptions about the lock that could suffer from
these changes.  Or maybe I'm just missing something, but either way I
believe its at the point where it could use more eyes and hopefully some
time in linux-next.

Because the lock type conversion is the heart of this patchset,
its worth noting a few comparisons between mutex vs rwsem (xadd):

  (i) Same size, no extra footprint.

  (ii) Both have CONFIG_XXX_SPIN_ON_OWNER capabilities for
       exclusive lock ownership.

  (iii) Both can be slightly unfair wrt exclusive ownership, with
        writer lock stealing properties, not necessarily respecting
        FIFO order for granting the lock when contended.

  (iv) Mutexes can be slightly faster than rwsems when
       the lock is non-contended.

  (v) Both suck at performance for debug (slowpaths), which
      shouldn't matter anyway.

Sharing the lock is obviously beneficial, and sem writer ownership is
close enough to mutexes.  The biggest winner of these changes is
migration.

As for concrete numbers, the following performance results are for a
4-socket 60-core IvyBridge-EX with 130Gb of RAM.

Both alltests and disk (xfs+ramdisk) workloads of aim7 suite do quite well
with this set, with a steady ~60% throughput (jpm) increase for alltests
and up to ~30% for disk for high amounts of concurrency.  Lower counts of
workload users (< 100) does not show much difference at all, so at least
no regressions.

                    3.18-rc1            3.18-rc1-i_mmap_rwsem
alltests-100     17918.72 (  0.00%)    28417.97 ( 58.59%)
alltests-200     16529.39 (  0.00%)    26807.92 ( 62.18%)
alltests-300     16591.17 (  0.00%)    26878.08 ( 62.00%)
alltests-400     16490.37 (  0.00%)    26664.63 ( 61.70%)
alltests-500     16593.17 (  0.00%)    26433.72 ( 59.30%)
alltests-600     16508.56 (  0.00%)    26409.20 ( 59.97%)
alltests-700     16508.19 (  0.00%)    26298.58 ( 59.31%)
alltests-800     16437.58 (  0.00%)    26433.02 ( 60.81%)
alltests-900     16418.35 (  0.00%)    26241.61 ( 59.83%)
alltests-1000    16369.00 (  0.00%)    26195.76 ( 60.03%)
alltests-1100    16330.11 (  0.00%)    26133.46 ( 60.03%)
alltests-1200    16341.30 (  0.00%)    26084.03 ( 59.62%)
alltests-1300    16304.75 (  0.00%)    26024.74 ( 59.61%)
alltests-1400    16231.08 (  0.00%)    25952.35 ( 59.89%)
alltests-1500    16168.06 (  0.00%)    25850.58 ( 59.89%)
alltests-1600    16142.56 (  0.00%)    25767.42 ( 59.62%)
alltests-1700    16118.91 (  0.00%)    25689.58 ( 59.38%)
alltests-1800    16068.06 (  0.00%)    25599.71 ( 59.32%)
alltests-1900    16046.94 (  0.00%)    25525.92 ( 59.07%)
alltests-2000    16007.26 (  0.00%)    25513.07 ( 59.38%)

disk-100          7582.14 (  0.00%)     7257.48 ( -4.28%)
disk-200          6962.44 (  0.00%)     7109.15 (  2.11%)
disk-300          6435.93 (  0.00%)     6904.75 (  7.28%)
disk-400          6370.84 (  0.00%)     6861.26 (  7.70%)
disk-500          6353.42 (  0.00%)     6846.71 (  7.76%)
disk-600          6368.82 (  0.00%)     6806.75 (  6.88%)
disk-700          6331.37 (  0.00%)     6796.01 (  7.34%)
disk-800          6324.22 (  0.00%)     6788.00 (  7.33%)
disk-900          6253.52 (  0.00%)     6750.43 (  7.95%)
disk-1000         6242.53 (  0.00%)     6855.11 (  9.81%)
disk-1100         6234.75 (  0.00%)     6858.47 ( 10.00%)
disk-1200         6312.76 (  0.00%)     6845.13 (  8.43%)
disk-1300         6309.95 (  0.00%)     6834.51 (  8.31%)
disk-1400         6171.76 (  0.00%)     6787.09 (  9.97%)
disk-1500         6139.81 (  0.00%)     6761.09 ( 10.12%)
disk-1600         4807.12 (  0.00%)     6725.33 ( 39.90%)
disk-1700         4669.50 (  0.00%)     5985.38 ( 28.18%)
disk-1800         4663.51 (  0.00%)     5972.99 ( 28.08%)
disk-1900         4674.31 (  0.00%)     5949.94 ( 27.29%)
disk-2000         4668.36 (  0.00%)     5834.93 ( 24.99%)

In addition, a 67.5% increase in successfully migrated NUMA pages, thus
improving node locality.

The patch layout is simple but designed for bisection (in case reversion
is needed if the changes break upstream) and easier review:

o Patches 1-4 convert the i_mmap lock from mutex to rwsem.
o Patches 5-10 share the lock in specific paths, each patch
  details the rationale behind why it should be safe.

This patchset has been tested with: postgres 9.4 (with brand new hugetlb
support), hugetlbfs test suite (all tests pass, in fact more tests pass
with these changes than with an upstream kernel), ltp, aim7 benchmarks,
memcached and iozone with the -B option for mmap'ing.  *Untested* paths
are nommu, memory-failure, uprobes and xip.

This patch (of 8):

Various parts of the kernel acquire and release this mutex, so add
i_mmap_lock_write() and immap_unlock_write() helper functions that will
encapsulate this logic.  The next patch will make use of these.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:45 -08:00
Christoffer Dall 1f57be2895 arm/arm64: KVM: Add (new) vgic_initialized macro
Some code paths will need to check to see if the internal state of the
vgic has been initialized (such as when creating new VCPUs), so
introduce such a macro that checks the nr_cpus field which is set when
the vgic has been initialized.

Also set nr_cpus = 0 in kvm_vgic_destroy, because the error path in
vgic_init() will call this function, and code should never errornously
assume the vgic to be properly initialized after an error.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-12-13 14:17:10 +01:00
Christoffer Dall c52edf5f8c arm/arm64: KVM: Rename vgic_initialized to vgic_ready
The vgic_initialized() macro currently returns the state of the
vgic->ready flag, which indicates if the vgic is ready to be used when
running a VM, not specifically if its internal state has been
initialized.

Rename the macro accordingly in preparation for a more nuanced
initialization flow.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-12-13 14:17:05 +01:00
Peter Maydell 6d3cfbe21b arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps()
VGIC initialization currently happens in three phases:
 (1) kvm_vgic_create() (triggered by userspace GIC creation)
 (2) vgic_init_maps() (triggered by userspace GIC register read/write
     requests, or from kvm_vgic_init() if not already run)
 (3) kvm_vgic_init() (triggered by first VM run)

We were doing initialization of some state to correspond with the
state of a freshly-reset GIC in kvm_vgic_init(); this is too late,
since it will overwrite changes made by userspace using the
register access APIs before the VM is run. Move this initialization
earlier, into the vgic_init_maps() phase.

This fixes a bug where QEMU could successfully restore a saved
VM state snapshot into a VM that had already been run, but could
not restore it "from cold" using the -loadvm command line option
(the symptoms being that the restored VM would run but interrupts
were ignored).

Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to
kvm_vgic_map_resources.

  [ This patch is originally written by Peter Maydell, but I have
    modified it somewhat heavily, renaming various bits and moving code
    around.  If something is broken, I am to be blamed. - Christoffer ]

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2014-12-13 14:15:52 +01:00
Thomas Gleixner c291ee6221 genirq: Prevent proc race against freeing of irq descriptors
Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.

CPU0				CPU1
				show_interrupts()
				  desc = irq_to_desc(X);
free_desc(desc)
  remove_from_radix_tree();
  kfree(desc);
				  raw_spinlock_irq(&desc->lock);

/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.

The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.

For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.

Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.

Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.

Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.

Fixes: 1f5a5b87f7 "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2014-12-13 13:33:07 +01:00
Zhang Rui 2707dbd09a Merge branches 'thermal-core-fix', 'thermal-soc' and 'thermal-int340x' into next 2014-12-13 12:25:19 +08:00
Steven Rostedt (Red Hat) aee4e5f3d3 tracing/sched: Check preempt_count() for current when reading task->state
When recording the state of a task for the sched_switch tracepoint a check of
task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
because, technically, a task being preempted is really in the TASK_RUNNING
state, and that is what should be recorded when tracing a sched_switch,
even if the task put itself into another state (it hasn't scheduled out
in that state yet).

But with the change to use per_cpu preempt counts, the
task_thread_info(p)->preempt_count is no longer used, and instead
task_preempt_count(p) is used.

The problem is that this does not use the current preempt count but a stale
one from a previous sched_switch. The task_preempt_count(p) uses
saved_preempt_count and not preempt_count(). But for tracing sched_switch,
if p is current, we really want preempt_count().

I hit this bug when I was tracing sleep and the call from do_nanosleep()
scheduled out in the "RUNNING" state.

           sleep-4290  [000] 537272.259992: sched_switch:         sleep:4290 [120] R ==> swapper/0:0 [120]
           sleep-4290  [000] 537272.260015: kernel_stack:         <stack trace>
=> __schedule (ffffffff8150864a)
=> schedule (ffffffff815089f8)
=> do_nanosleep (ffffffff8150b76c)
=> hrtimer_nanosleep (ffffffff8108d66b)
=> SyS_nanosleep (ffffffff8108d750)
=> return_to_handler (ffffffff8150e8e5)
=> tracesys_phase2 (ffffffff8150c844)

After a bit of hair pulling, I found that the state was really
TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
set and caused the sched_switch tracepoint to show it as RUNNING.

Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.home

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org # 3.13+
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 0102874755 "sched: Create more preempt_count accessors"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-12-12 23:16:38 -05:00
Linus Torvalds f96fe22567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull another networking update from David Miller:
 "Small follow-up to the main merge pull from the other day:

  1) Alexander Duyck's DMA memory barrier patch set.

  2) cxgb4 driver fixes from Karen Xie.

  3) Add missing export of fixed_phy_register() to modules, from Mark
     Salter.

  4) DSA bug fixes from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  net/macb: add TX multiqueue support for gem
  linux/interrupt.h: remove the definition of unused tasklet_hi_enable
  jme: replace calls to redundant function
  net: ethernet: davicom: Allow to select DM9000 for nios2
  net: ethernet: smsc: Allow to select SMC91X for nios2
  cxgb4: Add support for QSA modules
  libcxgbi: fix freeing skb prematurely
  cxgb4i: use set_wr_txq() to set tx queues
  cxgb4i: handle non-pdu-aligned rx data
  cxgb4i: additional types of negative advice
  cxgb4/cxgb4i: set the max. pdu length in firmware
  cxgb4i: fix credit check for tx_data_wr
  cxgb4i: fix tx immediate data credit check
  net: phy: export fixed_phy_register()
  fib_trie: Fix trie balancing issue if new node pushes down existing node
  vlan: Add ability to always enable TSO/UFO
  r8169:update rtl8168g pcie ephy parameter
  net: dsa: bcm_sf2: force link for all fixed PHY devices
  fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
  r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
  ...
2014-12-12 16:11:12 -08:00
Ulf Hansson 40bd62c619 PM: Remove the SET_PM_RUNTIME_PM_OPS() macro
There're now no users left of the SET_PM_RUNTIME_PM_OPS() macro, since
all have converted to use the SET_RUNTIME_PM_OPS() macro instead, so
let's remove it.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-13 00:45:24 +01:00
Linus Torvalds 26ceb127f7 Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "The major updates included in this update are:

   - Clang compatible stack pointer accesses by Behan Webster.
   - SA11x0 updates from Dmitry Eremin-Solenikov.
   - kgdb handling of breakpoints with read-only text/modules
   - Support for Privileged-no-execute feature on ARMv7 to prevent
     userspace code execution by the kernel.
   - AMBA primecell bus handling of irq-safe runtime PM
   - Unwinding support for memset/memzero/memmove/memcpy functions
   - VFP fixes for Krait CPUs and improvements in detecting the VFP
     architecture
   - A number of code cleanups (using pr_*, removing or reducing the
     severity of a couple of kernel messages, splitting ftrace asm code
     out to a separate file, etc.)
   - Add machine name to stack dump output"

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (62 commits)
  ARM: 8247/2: pcmcia: sa1100: make use of device clock
  ARM: 8246/2: pcmcia: sa1111: provide device clock
  ARM: 8245/1: pcmcia: soc-common: enable/disable socket clocks
  ARM: 8244/1: fbdev: sa1100fb: make use of device clock
  ARM: 8243/1: sa1100: add a clock alias for sa1111 pcmcia device
  ARM: 8242/1: sa1100: add cpu clock
  ARM: 8221/1: PJ4: allow building in Thumb-2 mode
  ARM: 8234/1: sa1100: reorder IRQ handling code
  ARM: 8233/1: sa1100: switch to hwirq usage
  ARM: 8232/1: sa1100: merge GPIO multiplexer IRQ to "normal" irq domain
  ARM: 8231/1: sa1100: introduce irqdomains support
  ARM: 8230/1: sa1100: shift IRQs by one
  ARM: 8229/1: sa1100: replace irq numbers with names in irq driver
  ARM: 8228/1: sa1100: drop entry-macro.S
  ARM: 8227/1: sa1100: switch to MULTI_IRQ_HANDLER
  ARM: 8241/1: Update processor_modes for hyp and monitor mode
  ARM: 8240/1: MCPM: document mcpm_sync_init()
  ARM: 8239/1: Introduce {set,clear}_pte_bit
  ARM: 8238/1: mm: Refine set_memory_* functions
  ARM: 8237/1: fix flush_pfn_alias
  ...
2014-12-12 15:26:48 -08:00
Linus Torvalds 8d14066755 IOMMU Updates for Linux v3.19
This time with:
 
 	* A new IOMMU-API call: iommu_map_sg() to map multiple
 	  non-contiguous pages into an IO address space with only one
 	  API call. This allows certain optimizations in the IOMMU
 	  driver.
 
 	* DMAR device hotplug in the Intel VT-d driver. It is now
 	  possible to hotplug the IOMMU itself.
 
 	* A new IOMMU driver for the Rockchip ARM platform.
 
 	* Couple of cleanups and improvements in the OMAP IOMMU driver.
 
 	* Nesting support for the ARM-SMMU driver.
 
 	* Various other small cleanups and improvements.
 
 Please note that this time some branches were also pulled into other
 trees, like the DRI and the Tegra tree. The VT-d branch was also pulled
 into tip/x86/apic.
 Some patches for the AMD IOMMUv2 driver are not in the IOMMU tree but
 were merged by Andrew (or finally ended up in the DRI tree).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUiwktAAoJECvwRC2XARrjweAP/Rr4igltfUFjfSp7iQBoQxQP
 qmFv3/A6f0gxfJ2IhA8ZnmJwoNAxYnqCEl+vA0FYDnXxO6CCHXWkeO5NprwU+fuG
 BZDn4lmg+GhpTCXS5668l+MZxtqZCMaCQbK5pm+b+uk8qkXznlVs2Nb00BBL/TTo
 4WhyLPbNnAfQkBGTFf47QqSi5YYPmJ44TvIcHPsFBz0dTesO7JKg9c4HpyxUmnMs
 7VDPWJmiuTJ+/UISkFzxHVw9GcL6XXhdu70XEFrVo6wnXRDTqbDtBUP8vnghnqwG
 kOIYrY+7HO3iEDaiiSGqxeMH5Uac2yIlQi4W4TXWg7WpKenZPqnwZusD50vO3HyB
 9L4VIL14gbgV6s+/9YNDYR494d9+xRjGMO4FAIIWnVFmE98+zqHpc8OaF0azY4N7
 3vxlM6VGvWytTmTpU/J/VMhwFPo/6QHdGpBa0k0+ACMQ8LTPt+Q7o+fTBcDCgnJW
 SAa5AGruMBFgZ2RtN9bgAGxAhuSPmh7pD+va5vUzvAngY0o6tamblh9G6Sk26nka
 y0RZVJ+vplP3jvLXMy9SByxGgBv3/iNF8YUd7Zp0rpZyW8KElRm4m9xuvxzu9cIW
 3SR+/pEhvqqXo48XS040FScD65QjGbKWGuiH+Zs+6Ij4MY6xxCAlf7EV+8DIiTO5
 LcI66Fk6oFzFKz2BRBpn
 =x7Ei
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:
 "This time with:

   - A new IOMMU-API call: iommu_map_sg() to map multiple non-contiguous
     pages into an IO address space with only one API call.  This allows
     certain optimizations in the IOMMU driver.

   - DMAR device hotplug in the Intel VT-d driver.  It is now possible
     to hotplug the IOMMU itself.

   - A new IOMMU driver for the Rockchip ARM platform.

   - Couple of cleanups and improvements in the OMAP IOMMU driver.

   - Nesting support for the ARM-SMMU driver.

   - Various other small cleanups and improvements.

  Please note that this time some branches were also pulled into other
  trees, like the DRI and the Tegra tree.  The VT-d branch was also
  pulled into tip/x86/apic.

  Some patches for the AMD IOMMUv2 driver are not in the IOMMU tree but
  were merged by Andrew (or finally ended up in the DRI tree)"

* tag 'iommu-updates-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (42 commits)
  iommu: Decouple iommu_map_sg from CPU page size
  iommu/vt-d: Fix an off-by-one bug in __domain_mapping()
  pci, ACPI, iommu: Enhance pci_root to support DMAR device hotplug
  iommu/vt-d: Enhance intel-iommu driver to support DMAR unit hotplug
  iommu/vt-d: Enhance error recovery in function intel_enable_irq_remapping()
  iommu/vt-d: Enhance intel_irq_remapping driver to support DMAR unit hotplug
  iommu/vt-d: Search for ACPI _DSM method for DMAR hotplug
  iommu/vt-d: Implement DMAR unit hotplug framework
  iommu/vt-d: Dynamically allocate and free seq_id for DMAR units
  iommu/vt-d: Introduce helper function dmar_walk_resources()
  iommu/arm-smmu: add support for DOMAIN_ATTR_NESTING attribute
  iommu/arm-smmu: Play nice on non-ARM/SMMU systems
  iommu/amd: remove compiler warning due to IOMMU_CAP_NOEXEC
  iommu/arm-smmu: add IOMMU_CAP_NOEXEC to the ARM SMMU driver
  iommu: add capability IOMMU_CAP_NOEXEC
  iommu/arm-smmu: change IOMMU_EXEC to IOMMU_NOEXEC
  iommu/amd: Fix accounting of device_state
  x86/vt-d: Fix incorrect bit operations in setting values
  iommu/rockchip: Allow to compile with COMPILE_TEST
  iommu/ipmmu-vmsa: Return proper error if devm_request_irq fails
  ...
2014-12-12 15:10:34 -08:00
Linus Torvalds 87c779baab Merge branch 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
 "Main features this time are:

   - BAM v1.3.0 support form qcom bam dma
   - support for Allwinner sun8i dma
   - atmels eXtended DMA Controller driver
   - chancnt cleanup by Maxime
   - fixes spread over drivers"

* 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (56 commits)
  dmaenegine: Delete a check before free_percpu()
  dmaengine: ioatdma: fix dma mapping errors
  dma: cppi41: add a delay while setting the TD bit
  dma: cppi41: wait longer for the HW to return the descriptor
  dmaengine: fsl-edma: fixup reg offset and hw S/G support in big-endian model
  dmaengine: fsl-edma: fix calculation of remaining bytes
  drivers/dma/pch_dma: declare pch_dma_id_table as static
  dmaengine: ste_dma40: fix error return code
  dma: imx-sdma: clarify about firmware not found error
  Documentation: devicetree: Fix Xilinx VDMA specification
  dmaengine: pl330: update author info
  dmaengine: clarify the issue_pending expectations
  dmaengine: at_xdmac: Add DMA_PRIVATE
  ARM: dts: at_xdmac: fix bad value of dma-cells in documentation
  dmaengine: at_xdmac: fix missing spin_unlock
  dmaengine: at_xdmac: fix a bug in transfer residue computation
  dmaengine: at_xdmac: fix software lockup at_xdmac_tx_status()
  dmaengine: at_xdmac: remove chancnt affectation
  dmaengine: at_xdmac: prefer usage of readl/writel_relaxed
  dmaengine: xdmac: fix print warning on dma_addr_t variable
  ...
2014-12-12 14:59:53 -08:00
Linus Torvalds eea0cf3fcd IPMI Driver updates for 3.19
For the following changes:
   - Quite a few bug fixes
   - A new driver for the powernv
   - A new driver for the SMBus interface from the IPMI 2.0 specification
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlSKGRsACgkQIXnXXONXEReJngCeJsMEdY9pBJ9GEppkiSv0HG74
 VR8AoJb3PQ2SfqmAbT0RgACWEkSuWCdj
 =9NNo
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.code.sf.net/p/openipmi/linux-ipmi

Pull IPMI driver updates from Corey Minyard:
  - Quite a few bug fixes
  - A new driver for the powernv
  - A new driver for the SMBus interface from the IPMI 2.0 specification

* tag 'for-linus' of git://git.code.sf.net/p/openipmi/linux-ipmi:
  ipmi: Check the BT interrupt enable periodically
  ipmi: Fix attention handling for system interfaces
  ipmi: Periodically check to see if irqs and messages are set right
  drivers/char/ipmi: Add powernv IPMI driver
  ipmi: Add SMBus interface driver (SSIF)
  ipmi: Remove the now unused priority from SMI sender
  ipmi: Remove the now unnecessary message queue
  ipmi: Make the message handler easier to use for SMI interfaces
  ipmi: Move message sending into its own function
  ipmi: rename waiting_msgs to waiting_rcv_msgs
  ipmi: Fix handling of BMC flags
  ipmi: Initialize BMC device attributes
  ipmi: Unregister previously registered driver in error case
  ipmi: Use the proper type for acpi_handle
  ipmi: Fix a bug in hot add/remove
  ipmi: Remove useless sysfs_name parameters
  ipmi: clean up the device handling for the bmc device
  ipmi: Move the address source to string to ipmi-generic code
  ipmi: Ignore SSIF in the PNP handling
2014-12-12 14:49:56 -08:00
Linus Torvalds 823e334ecd Docs changes for the 3.19 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUiaPRAAoJEI3ONVYwIuV68qQP/2j93CDahLEXlpbimdHCOyKO
 xLKYwSIOxpzCTv8mQouNH4czVDheSSrq+xDUafioaMRlM80fb1EdhC9Nhr/xT/O+
 fCsgfLETlpxAS4j6gjn60H8QYdnBgcz0p2hOgvysf8WHZ0PHzSnfE78BGS1Mnugs
 gjF97y5pa3p+uTNNxHA7aB+Rg3/JbsuDXH0CtdzjpsS0bM/MBnGV0AQLLZ3qdzvf
 6bSwb8LyV54cRjYrs0SZT0OWrPXlUc0AxjrQjq2IcyL+s5Uwu/Eb1RoZ69C14ohC
 h5A1C87v4asqrzsC0VCqG48wi36+2k7zfVT7wSkHfERXWGtjpd7Uy3lccMMPLg/H
 Mnt6EKDVn4L2b8VOM1wl8pj0kIbln/g2whJ6x40Q3R4uO8o94Vy0HpeTTeNxdwyY
 iERVkHPuRAZ2SLrz4BheiGRgcePgocMrofyGcJ3/ezMGcq3sUZnAJYmJYF+S5jkZ
 Q94ee8jgypgEvuQHX52n3AiqJspNpqAGqXE5ZY4PgLt4KlwbT1yQ5hXUX34ou+3u
 sLkFO8vNUQa4o1TTfwCzkPun/Aoy+Rb8BCzgbml/FlPgidXpNcofO/g5BWRjXwOX
 mpUCNPiMhryx0aTrEgl1GChholqToakS6RTvqdxUKC3honKWBQ93zsWSDkEzS+3h
 +qxzXDCN3u3Py9r/XvPK
 =Pq+D
 -----END PGP SIGNATURE-----

Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6

Pull documentation update from Jonathan Corbet:
 "Here's my set of accumulated documentation changes for 3.19.

  It includes a couple of additions to the coding style document, some
  fixes for minor build problems within the documentation tree, the
  relocation of the kselftest docs, and various tweaks and additions.

  A couple of changes reach outside of Documentation/; they only make
  trivial comment changes and I did my best to get the required acks.

  Complete with a shiny signed tag this time around"

* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6:
  kobject: grammar fix
  Input: xpad - update docs to reflect current state
  Documentation: Build mic/mpssd only for x86_64
  cgroups: Documentation: fix wrong cgroupfs paths
  Documentation/email-clients.txt: add info about Claws Mail
  CodingStyle: add some more error handling guidelines
  kselftest: Move the docs to the Documentation dir
  Documentation: fix formatting to make 's' happy
  Documentation: power: Fix typo in Documentation/power
  Documentation: vm: Add 1GB large page support information
  ipv4: add kernel parameter tcpmhash_entries
  Documentation: Fix a typo in mailbox.txt
  treewide: Fix typo in Documentation/DocBook/device-drivers
  CodingStyle: Add a chapter on conditional compilation
2014-12-12 14:42:48 -08:00
Rafael J. Wysocki 175f8e2650 ACPI / PM: Do not disable wakeup GPEs that have not been enabled
In some cases acpi_device_wakeup() may be called to ensure wakeup
power to be off for a given device even though that device's wakeup
GPE has not been enabled so far.  It calls acpi_disable_gpe() on a
GPE that's not enabled and this causes ACPICA to return the AE_LIMIT
status code from that call which then is reported as an error by the
ACPICA's debug facilities (if enabled).  This may lead to a fair
amount of confusion, so introduce a new ACPI device wakeup flag
to store the wakeup GPE status and avoid disabling wakeup GPEs
that have not been enabled.

Reported-and-tested-by: Venkat Raghavulu <venkat.raghavulu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-12 22:51:58 +01:00
Quentin Lambert 7142637dd9 linux/interrupt.h: remove the definition of unused tasklet_hi_enable
Signed-off-by: Quentin Lambert <lambert.quentin@gmail.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-12 15:15:41 -05:00
Linus Torvalds 6ce4436c9c Couple of pstore-ram enhancements to allow use of different memory attributes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUi0B6AAoJEKurIx+X31iByH8P/jfMgzyUO+KpJMA1DbgCAG7x
 WPJgbMUyPwB63DH09RyMEmiwf61Rl1klXTPVNY0Dnj7qRJOmpB9U3vGIfO4HpD84
 5IZMBlc+Jl+kJCxSAJYbTJTZLsIMjFGOfuVTvlY+HnMBitQVBumKptmC0DoBBqgz
 yYy5MHRMaVoHcogyMyBiknmxdxu6/ruUKY+6yyvdUESt0SCcJG8V6Qik7TMmnx47
 NvIIPzfibvvLLnd8IOEj2fwh8XMtJdfcCxPpAEvEaNq0jZEDF9K22jttTQvl9r92
 NQf7JKQQrNfzloRZ3flKax5ZMGi9RkcirTLLdJ4I2xMGVHOA4XUAjsSCYR6INuuJ
 Ox00FnuiIrADNw37m52Y+ujPTF1C2PQUNK69gwsLd84MSjy+95F2dlC5cC3Yt4N5
 rpstXxWELZTqjMGD8GTPOpv6zlg799IbFexr4H6KTc+47EX0MNayJiI6L597gYnq
 gIiPmDnnz6WlWp4HHgBIwjNAH3Tbf/uU3MlgzqS3Ftd7YkYmLnxvClhrwgErviFn
 Nfnz2LtGuMxMHSt0uSWxODVEaR4reKRVJBvhRSGWL1PufylEyt0YWayiqpohuKD9
 6X/RufWK5qdCBHytoGyMUZ57oqxth9QSVG4RBkGPmaZgMq/5DdyOhBfW0yInjMuo
 AuDMmqrU5yFTitLMGcsG
 =kcmD
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-morepstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull pstore update #2 from Tony Luck:
 "Couple of pstore-ram enhancements to allow use of different memory
  attributes"

* tag 'please-pull-morepstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore-ram: Allow optional mapping with pgprot_noncached
  pstore-ram: Fix hangs by using write-combine mappings
2014-12-12 11:34:13 -08:00
Linus Torvalds bdeb03cada Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "From a feature point of view, most of the code here comes from Miao
  Xie and others at Fujitsu to implement scrubbing and replacing devices
  on raid56.  This has been in development for a while, and it's a big
  improvement.

  Filipe and Josef have a great assortment of fixes, many of which solve
  problems corruptions either after a crash or in error conditions.  I
  still have a round two from Filipe for next week that solves
  corruptions with discard and block group removal"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits)
  Btrfs: make get_caching_control unconditionally return the ctl
  Btrfs: fix unprotected deletion from pending_chunks list
  Btrfs: fix fs mapping extent map leak
  Btrfs: fix memory leak after block remove + trimming
  Btrfs: make btrfs_abort_transaction consider existence of new block groups
  Btrfs: fix race between writing free space cache and trimming
  Btrfs: fix race between fs trimming and block group remove/allocation
  Btrfs, replace: enable dev-replace for raid56
  Btrfs: fix freeing used extents after removing empty block group
  Btrfs: fix crash caused by block group removal
  Btrfs: fix invalid block group rbtree access after bg is removed
  Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56
  Btrfs, replace: write raid56 parity into the replace target device
  Btrfs, replace: write dirty pages into the replace target device
  Btrfs, raid56: support parity scrub on raid56
  Btrfs, raid56: use a variant to record the operation type
  Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted
  Btrfs, raid56: don't change bbio and raid_map
  Btrfs: remove unnecessary code of stripe_index assignment in __btrfs_map_block
  Btrfs: remove noused bbio_ret in __btrfs_map_block in condition
  ...
2014-12-12 11:15:23 -08:00
Linus Torvalds 0349678ccd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
 - i2c-hid race condition fix from Jean-Baptiste Maneyrol
 - Logitech driver now supports vendor-specific HID++ protocol, allowing
   us to deliver a full multitouch support on wider range of Logitech
   touchpads.  Written by Benjamin Tissoires
 - MS Surface Pro 3 Type Cover support added by Alan Wu
 - RMI touchpad support improvements from Andrew Duggan
 - a lot of updates to Wacom driver from Jason Gerecke and Ping Cheng
 - various small fixes all over the place

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (56 commits)
  HID: rmi: The address of query8 must be calculated based on which query registers are present
  HID: rmi: Check for additional ACM registers appended to F11 data report
  HID: i2c-hid: prevent buffer overflow in early IRQ
  HID: logitech-hidpp: disable io in probe error path
  HID: logitech-hidpp: add boundary check for name retrieval
  HID: logitech-hidpp: check name retrieval return code
  HID: logitech-hidpp: do not return the name length
  HID: wacom: Report input events for each finger on generic devices
  HID: wacom: Initialize MT slots for generic devices at post_parse_hid
  HID: wacom: Update maximum X/Y accounding to outbound offset
  HID: wacom: Add support for DTU-1031X
  HID: wacom: add defines for new Cintiq and DTU outbound tracking
  HID: wacom: fix freeze on open when autosuspend is on
  HID: wacom: re-add accidentally dropped Lenovo PID
  HID: make hid_report_len as a static inline function in hid.h
  HID: wacom: Consult the application usage when determining field type
  HID: wacom: PAD is independent with pen/touch
  HID: multitouch: Add quirk for VTL touch panels
  HID: i2c-hid: fix race condition reading reports
  HID: wacom: Add angular resolution data to some ABS axes
  ...
2014-12-12 10:26:47 -08:00
Linus Torvalds a7cb7bb664 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree update from Jiri Kosina:
 "Usual stuff: documentation updates, printk() fixes, etc"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
  intel_ips: fix a type in error message
  cpufreq: cpufreq-dt: Move newline to end of error message
  ps3rom: fix error return code
  treewide: fix typo in printk and Kconfig
  ARM: dts: bcm63138: change "interupts" to "interrupts"
  Replace mentions of "list_struct" to "list_head"
  kernel: trace: fix printk message
  scsi: mpt2sas: fix ioctl in comment
  zbud, zswap: change module author email
  clocksource: Fix 'clcoksource' typo in comment
  arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help
  gpio: msm-v1: make boolean argument more obvious
  usb: Fix typo in usb-serial-simple.c
  PCI: Fix comment typo 'COMFIG_PM_OPS'
  powerpc: Fix comment typo 'CONIFG_8xx'
  powerpc: Fix comment typos 'CONFiG_ALTIVEC'
  clk: st: Spelling s/stucture/structure/
  isci: Spelling s/stucture/structure/
  usb: gadget: zero: Spelling s/infrastucture/infrastructure/
  treewide: Fix company name in module descriptions
  ...
2014-12-12 10:08:06 -08:00
Linus Torvalds 9bfccec24e Lots of bugs fixes, including Zheng and Jan's extent status shrinker
fixes, which should improve CPU utilization and potential soft lockups
 under heavy memory pressure, and Eric Whitney's bigalloc fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUiRUwAAoJENNvdpvBGATwltQP/3sjHtFw+RUvKgQ8vX9M2THk
 4b9j0ja0mrD3ObTXUxdDuOh1q09MsfSUiOYK6KZOav3nO/dRODqZnWgXz/zJt3LC
 R97s4velgzZi3F2ijnLiCo5RVZahN9xs8bUHZ85orMIr5wogwGdaUpnoqZSg0Ehr
 PIFnTNORyNXBwEm3XPjUmENTdyq9FZ8DsS6ACFzgFi79QTSyJFEM4LAl2XaqwMGV
 fVhNwnOGIyT8lHZAtDcobkaC86NjakmpW2Ip3p9/UEQtynh16UeVXKEO3K7CcQ+L
 YJRDNnSIlGpR1OJp+v6QJPUd8q4fc/8JW9AxxsLak0eqkszuB+MxoQXOCFV5AWaf
 jrs4TV3y0hCuB4OwuYUpnfcU1o+O7p39MqXMv8SA1ZBPbijN/LQSMErFtXj2oih6
 3gJHUWLwELGeR+d9JlI29zxhOeOIotX255UBgj2oasQ0X3BW3qAgQ4LmP3QY90Pm
 BUmxiMoIWB9N3kU4XQGf+Kyy8JeMLJj0frHDxI3XLz+B+IlWCCkBH6y3AD/a13kS
 HHMMLOwHGEs0lYEKsm89dkcij5GuKd8eKT8Q0+CvKD9Z6HPdYvQxoazmF87Q6j/7
 ZmshaVxtWaLpNbDaXVg+IgZifJAN0+mVzVHRhY9TSjx8k9qLdSgSEqYWjkSjx9Ij
 nNB2zVrHZDMvZ7MCZy85
 =ZrTc
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Lots of bugs fixes, including Zheng and Jan's extent status shrinker
  fixes, which should improve CPU utilization and potential soft lockups
  under heavy memory pressure, and Eric Whitney's bigalloc fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits)
  ext4: ext4_da_convert_inline_data_to_extent drop locked page after error
  ext4: fix suboptimal seek_{data,hole} extents traversial
  ext4: ext4_inline_data_fiemap should respect callers argument
  ext4: prevent fsreentrance deadlock for inline_data
  ext4: forbid journal_async_commit in data=ordered mode
  jbd2: remove unnecessary NULL check before iput()
  ext4: Remove an unnecessary check for NULL before iput()
  ext4: remove unneeded code in ext4_unlink
  ext4: don't count external journal blocks as overhead
  ext4: remove never taken branch from ext4_ext_shift_path_extents()
  ext4: create nojournal_checksum mount option
  ext4: update comments regarding ext4_delete_inode()
  ext4: cleanup GFP flags inside resize path
  ext4: introduce aging to extent status tree
  ext4: cleanup flag definitions for extent status tree
  ext4: limit number of scanned extents in status tree shrinker
  ext4: move handling of list of shrinkable inodes into extent status code
  ext4: change LRU to round-robin in extent status tree shrinker
  ext4: cache extent hole in extent status tree for ext4_da_map_blocks()
  ext4: fix block reservation for bigalloc filesystems
  ...
2014-12-12 09:28:03 -08:00
Jiri Kosina 019e129f9b Merge branches 'for-3.19/hid-report-len', 'for-3.19/i2c-hid', 'for-3.19/lenovo', 'for-3.19/logitech', 'for-3.19/microsoft', 'for-3.19/plantronics', 'for-3.19/rmi', 'for-3.19/sony' and 'for-3.19/wacom' into for-linus 2014-12-12 11:15:33 +01:00
Jiri Kosina 3ee420ba2e Merge branches 'for-3.18/upstream-fixes' and 'for-3.19/upstream' into for-linus
Conflicts:
	drivers/hid/hid-input.c
2014-12-12 11:09:23 +01:00
Richard Guy Briggs 63f13448d8 powerpc: add little endian flag to syscall_get_arch()
Since both ppc and ppc64 have LE variants which are now reported by uname, add
that flag (__AUDIT_ARCH_LE) to syscall_get_arch() and add AUDIT_ARCH_PPC64LE
variant.

Without this,  perf trace and auditctl fail.

Mainline kernel reports ppc64le (per a058801) but there is no matching
AUDIT_ARCH_PPC64LE.

Since 32-bit PPC LE is not supported by audit, don't advertise it in
AUDIT_ARCH_PPC* variants.

See:
	https://www.redhat.com/archives/linux-audit/2014-August/msg00082.html
	https://www.redhat.com/archives/linux-audit/2014-December/msg00004.html

Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-12-12 20:14:08 +11:00
Linus Torvalds 2756d373a3 Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup update from Tejun Heo:
 "cpuset got simplified a bit.  cgroup core got a fix on unified
  hierarchy and grew some effective css related interfaces which will be
  used for blkio support for writeback IO traffic which is currently
  being worked on"

* 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: implement cgroup_get_e_css()
  cgroup: add cgroup_subsys->css_e_css_changed()
  cgroup: add cgroup_subsys->css_released()
  cgroup: fix the async css offline wait logic in cgroup_subtree_control_write()
  cgroup: restructure child_subsys_mask handling in cgroup_subtree_control_write()
  cgroup: separate out cgroup_calc_child_subsys_mask() from cgroup_refresh_child_subsys_mask()
  cpuset: lock vs unlock typo
  cpuset: simplify cpuset_node_allowed API
  cpuset: convert callback_mutex to a spinlock
2014-12-11 18:57:19 -08:00
Linus Torvalds 4e8790f77f Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata changes from Tejun Heo:
 "The only interesting piece is the support for shingled drives.  The
  changes in libata layer are minimal.  All it does is identifying the
  new class of device and report upwards accordingly"

* 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: Remove FIXME comment in atapi_request_sense()
  sata_rcar: Document deprecated "renesas,rcar-sata"
  sata_rcar: Add clocks to sata_rcar bindings
  ahci_sunxi: Make AHCI_HFLAG_NO_PMP flag configurable with a module option
  libata-scsi: Update SATL for ZAC drives
  libata: Implement ATA_DEV_ZAC
  libsas: use ata_dev_classify()
2014-12-11 18:52:37 -08:00
Linus Torvalds eedb3d3304 Merge branch 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu updates from Tejun Heo:
 "Nothing interesting.  A patch to convert the remaining __get_cpu_var()
  users, another to fix non-critical off-by-one in an assertion and a
  cosmetic conversion to lockless_dereference() in percpu-ref.

  The back-merge from mainline is to receive lockless_dereference()"

* 'for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: Replace smp_read_barrier_depends() with lockless_dereference()
  percpu: Convert remaining __get_cpu_var uses in 3.18-rcX
  percpu: off by one in BUG_ON()
2014-12-11 18:36:26 -08:00