__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Kconfig: remove BE-only platforms from LE kernel build from Boqun Feng
- Refresh ps3_defconfig from Geoff Levand
- Emit GNU & SysV hashes for the vdso from Michael Ellerman
- Define an enum for the bolted SLB indexes from Anshuman Khandual
- Use a local to avoid multiple calls to get_slb_shadow() from Michael Ellerman
- Add gettimeofday() benchmark from Michael Neuling
- Avoid link stack corruption in __get_datapage() from Michael Neuling
- Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar K.V
- Add ppc64le_defconfig from Michael Ellerman
- pseries: extract of_helpers module from Andy Shevchenko
- Correct string length in pseries_of_derive_parent() from Nathan Fontenot
- Free the MSI bitmap if it was slab allocated from Denis Kirjanov
- Shorten irq_chip name for the SIU from Christophe Leroy
- Wait 1s for secondaries to enter OPAL during kexec from Samuel Mendoza-Jonas
- Fix _ALIGN_* errors due to type difference. from Aneesh Kumar K.V
- powerpc/pseries/hvcserver: don't memset pi_buff if it is null from Colin Ian King
- Disable hugepd for 64K page size. from Aneesh Kumar K.V
- Differentiate between hugetlb and THP during page walk from Aneesh Kumar K.V
- Make PCI non-optional for pseries from Michael Ellerman
- Individual System V IPC system calls from Sam bobroff
- Add selftest of unmuxed IPC calls from Michael Ellerman
- discard .exit.data at runtime from Stephen Rothwell
- Delete old orphaned PrPMC 280/2800 DTS and boot file. from Paul Gortmaker
- Use of_get_next_parent to simplify code from Christophe Jaillet
- Paginate some xmon output from Sam bobroff
- Add some more elements to the xmon PACA dump from Michael Ellerman
- Allow the tm-syscall selftest to build with old headers from Michael Ellerman
- Run EBB selftests only on POWER8 from Denis Kirjanov
- Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael Ellerman
- Avoid reference to potentially freed memory in prom.c from Christophe Jaillet
- Quieten boot wrapper output with run_cmd from Geoff Levand
- EEH fixes and cleanups from Gavin Shan
- Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
- Use of_get_next_parent() in of_get_ibm_chip_id() from Michael Ellerman
- Fix section mismatch warning in msi_bitmap_alloc() from Denis Kirjanov
- Fix ps3-lpm white space from Rudhresh Kumar J
- Fix ps3-vuart null dereference from Colin King
- nvram: Add missing kfree in error path from Christophe Jaillet
- nvram: Fix function name in some errors messages. from Christophe Jaillet
- drivers/macintosh: adb: fix misleading Kconfig help text from Aaro Koskinen
- agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
- cxl: Free virtual PHB when removing from Andrew Donnellan
- scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from Michael Ellerman
- scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O= from Michael Ellerman
- Freescale updates from Scott: Highlights include 64-bit book3e kexec/kdump
support, a rework of the qoriq clock driver, device tree changes including
qoriq fman nodes, support for a new 85xx board, and some fixes.
- MPC5xxx updates from Anatolij: Highlights include a driver for MPC512x
LocalPlus Bus FIFO with its device tree binding documentation, mpc512x
device tree updates and some minor fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWPEZgAAoJEFHr6jzI4aWANjYQAKX2Q/95hqKfCuF5FBcUmtMC
Pu/Nff027MVzxZ2ApDcvvLGps5Nz2bn3nIhc9zjkXc5E8DuL6X3Yl8ce7qyNcc3g
cJJ8RvtUo6J1OMWetXFehtPYniAAwKMhZYKnj0+WnLr2SyH/Vhl3ehDkFbGyPtuH
r+2E7krFjfVgU+bzciIFnOaDekFuFN/pXWMb6e6zQyBJe9N8ZIp96uouGCebKVd0
VDLItzdaKErT8JFfbymMPvZm3V0rMVx4WWu3kAbQX8LrD5a18NF1zrjAOHRXc61n
kkk8/DPuNOon1PbXXyiS5BcFyZRe+KE3VBnoW5sOMqMIRg5WdO1oU3e2pEfXMO8+
leXYwFLXiKzUZuOgQG2QiUhrzD2yC1o6/TJWATv0dSl9AwrecgPX+Vj6X357slAf
A9E3eMy5tgnpndBWZmvZS3W7YDKH+NkeZ+Q40+NErAlqr++ErrTcKVndk5vWlYTT
7mMZeTXagX66al/k5ATKqwB7iUSpnYHSAa9fcUYPSM2FnXsDxPyeJGkBbcoOmkGj
QrpgNYOvJaUJd076goZCV39v0c1xpfV9/9kyVch8HUadf6JcjpVZwYnbGw2qlJjh
ZanuBG2VOeSwaKQqXiRBSBetnpAg8CVpFjDmX9wOBfSek2wxEJqDX/vQExdbIDQQ
pUs7vnUxLzhmW/x+ygOI
=YwcM
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Kconfig: remove BE-only platforms from LE kernel build from Boqun
Feng
- Refresh ps3_defconfig from Geoff Levand
- Emit GNU & SysV hashes for the vdso from Michael Ellerman
- Define an enum for the bolted SLB indexes from Anshuman Khandual
- Use a local to avoid multiple calls to get_slb_shadow() from Michael
Ellerman
- Add gettimeofday() benchmark from Michael Neuling
- Avoid link stack corruption in __get_datapage() from Michael Neuling
- Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar
K.V
- Add ppc64le_defconfig from Michael Ellerman
- pseries: extract of_helpers module from Andy Shevchenko
- Correct string length in pseries_of_derive_parent() from Nathan
Fontenot
- Free the MSI bitmap if it was slab allocated from Denis Kirjanov
- Shorten irq_chip name for the SIU from Christophe Leroy
- Wait 1s for secondaries to enter OPAL during kexec from Samuel
Mendoza-Jonas
- Fix _ALIGN_* errors due to type difference, from Aneesh Kumar K.V
- powerpc/pseries/hvcserver: don't memset pi_buff if it is null from
Colin Ian King
- Disable hugepd for 64K page size, from Aneesh Kumar K.V
- Differentiate between hugetlb and THP during page walk from Aneesh
Kumar K.V
- Make PCI non-optional for pseries from Michael Ellerman
- Individual System V IPC system calls from Sam bobroff
- Add selftest of unmuxed IPC calls from Michael Ellerman
- discard .exit.data at runtime from Stephen Rothwell
- Delete old orphaned PrPMC 280/2800 DTS and boot file, from Paul
Gortmaker
- Use of_get_next_parent to simplify code from Christophe Jaillet
- Paginate some xmon output from Sam bobroff
- Add some more elements to the xmon PACA dump from Michael Ellerman
- Allow the tm-syscall selftest to build with old headers from Michael
Ellerman
- Run EBB selftests only on POWER8 from Denis Kirjanov
- Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael
Ellerman
- Avoid reference to potentially freed memory in prom.c from Christophe
Jaillet
- Quieten boot wrapper output with run_cmd from Geoff Levand
- EEH fixes and cleanups from Gavin Shan
- Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
- Use of_get_next_parent() in of_get_ibm_chip_id() from Michael
Ellerman
- Fix section mismatch warning in msi_bitmap_alloc() from Denis
Kirjanov
- Fix ps3-lpm white space from Rudhresh Kumar J
- Fix ps3-vuart null dereference from Colin King
- nvram: Add missing kfree in error path from Christophe Jaillet
- nvram: Fix function name in some errors messages, from Christophe
Jaillet
- drivers/macintosh: adb: fix misleading Kconfig help text from Aaro
Koskinen
- agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
- cxl: Free virtual PHB when removing from Andrew Donnellan
- scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from
Michael Ellerman
- scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building
with O= from Michael Ellerman
- Freescale updates from Scott: Highlights include 64-bit book3e
kexec/kdump support, a rework of the qoriq clock driver, device tree
changes including qoriq fman nodes, support for a new 85xx board, and
some fixes.
- MPC5xxx updates from Anatolij: Highlights include a driver for
MPC512x LocalPlus Bus FIFO with its device tree binding
documentation, mpc512x device tree updates and some minor fixes.
* tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (106 commits)
powerpc/msi: Fix section mismatch warning in msi_bitmap_alloc()
powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id()
powerpc/pseries: Correct string length in pseries_of_derive_parent()
powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry
powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)
powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan
powerpc/fsl: Add #clock-cells and clockgen label to clockgen nodes
powerpc: handle error case in cpm_muram_alloc()
powerpc: mpic: use IRQCHIP_SKIP_SET_WAKE instead of redundant mpic_irq_set_wake
powerpc/book3e-64: Enable kexec
powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop
powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
powerpc/book3e-64/kexec: Enable SMP release
powerpc/book3e-64/kexec: create an identity TLB mapping
powerpc/book3e-64: Don't limit paca to 256 MiB
powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
powerpc/book3e: support CONFIG_RELOCATABLE
powerpc/booke64: Fix args to copy_and_flush
powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
powerpc/e6500: kexec: Handle hardware threads
...
handling.
PPC: Mostly bug fixes.
ARM: No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite for
IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86: quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new component (in
virt/lib/) that connects VFIO and KVM together. The same infrastructure
will be used for ARM interrupt forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic interrupt
controller will have to wait for 4.5. These will let KVM expose Hyper-V
devices.
- nested virtualization now supports VPID (same as PCID but for vCPUs)
which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for clflushopt,
clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
userspace, which reduces the attack surface of the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten to not
require help from the hypervisor.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
=iv2z
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.4.
s390:
A bunch of fixes and optimizations for interrupt and time handling.
PPC:
Mostly bug fixes.
ARM:
No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite
for IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86:
Quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new
component (in virt/lib/) that connects VFIO and KVM together.
The same infrastructure will be used for ARM interrupt
forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic
interrupt controller will have to wait for 4.5. These will let
KVM expose Hyper-V devices.
- nested virtualization now supports VPID (same as PCID but for
vCPUs) which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for
clflushopt, clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel +
IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten
to not require help from the hypervisor"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
KVM: VMX: Fix commit which broke PML
KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
KVM: x86: allow RSM from 64-bit mode
KVM: VMX: fix SMEP and SMAP without EPT
KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
KVM: device assignment: remove pointless #ifdefs
KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
KVM: x86: zero apic_arb_prio on reset
drivers/hv: share Hyper-V SynIC constants with userspace
KVM: x86: handle SMBASE as physical address in RSM
KVM: x86: add read_phys to x86_emulate_ops
KVM: x86: removing unused variable
KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
KVM: arm/arm64: Optimize away redundant LR tracking
KVM: s390: use simple switch statement as multiplexer
KVM: s390: drop useless newline in debugging data
KVM: s390: SCA must not cross page boundaries
KVM: arm: Do not indent the arguments of DECLARE_BITMAP
...
This time including:
* A new IOMMU driver for s390 pci devices
* Common dma-ops support based on iommu-api for ARM64. The plan is to
use this as a basis for ARM32 and hopefully other architectures as
well in the future.
* MSI support for ARM-SMMUv3
* Cleanups and dead code removal in the AMD IOMMU driver
* Better RMRR handling for the Intel VT-d driver
* Various other cleanups and small fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n
pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81
qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo
1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji
12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q
K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO
EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS
ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q
gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots
kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu
nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD
SecikoTtH1Q4RVtqOcAQ
=jLlB
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
"This time including:
- A new IOMMU driver for s390 pci devices
- Common dma-ops support based on iommu-api for ARM64. The plan is
to use this as a basis for ARM32 and hopefully other architectures
as well in the future.
- MSI support for ARM-SMMUv3
- Cleanups and dead code removal in the AMD IOMMU driver
- Better RMRR handling for the Intel VT-d driver
- Various other cleanups and small fixes"
* tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
iommu/vt-d: Fix return value check of parse_ioapics_under_ir()
iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope()
iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir
iommu: Move default domain allocation to iommu_group_get_for_dev()
iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev
iommu/arm-smmu: Switch to device_group call-back
iommu/fsl: Convert to device_group call-back
iommu: Add device_group call-back to x86 iommu drivers
iommu: Add generic_device_group() function
iommu: Export and rename iommu_group_get_for_pci_dev()
iommu: Revive device_group iommu-ops call-back
iommu/amd: Remove find_last_devid_on_pci()
iommu/amd: Remove first/last_device handling
iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL
iommu/amd: Cleanup buffer allocation
iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu
iommu/amd: Align DTE flag definitions
iommu/amd: Remove old alias handling code
iommu/amd: Set alias DTE in do_attach/do_detach
iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled
...
Pull intel iommu updates from David Woodhouse:
"This adds "Shared Virtual Memory" (aka PASID support) for the Intel
IOMMU. This allows devices to do DMA using process address space,
translated through the normal CPU page tables for the relevant mm.
With corresponding support added to the i915 driver, this has been
tested with the graphics device on Skylake. We don't have the
required TLP support in our PCIe root ports for supporting discrete
devices yet, so it's only integrated devices that can do it so far"
* git://git.infradead.org/intel-iommu: (23 commits)
iommu/vt-d: Fix rwxp flags in SVM device fault callback
iommu/vt-d: Expose struct svm_dev_ops without CONFIG_INTEL_IOMMU_SVM
iommu/vt-d: Clean up pasid_enabled() and ecs_enabled() dependencies
iommu/vt-d: Handle Caching Mode implementations of SVM
iommu/vt-d: Fix SVM IOTLB flush handling
iommu/vt-d: Use dev_err(..) in intel_svm_device_to_iommu(..)
iommu/vt-d: fix a loop in prq_event_thread()
iommu/vt-d: Fix IOTLB flushing for global pages
iommu/vt-d: Fix address shifting in page request handler
iommu/vt-d: shift wrapping bug in prq_event_thread()
iommu/vt-d: Fix NULL pointer dereference in page request error case
iommu/vt-d: Implement SVM_FLAG_SUPERVISOR_MODE for kernel access
iommu/vt-d: Implement SVM_FLAG_PRIVATE_PASID to allocate unique PASIDs
iommu/vt-d: Add callback to device driver on page faults
iommu/vt-d: Implement page request handling
iommu/vt-d: Generalise DMAR MSI setup to allow for page request events
iommu/vt-d: Implement deferred invalidate for SVM
iommu/vt-d: Add basic SVM PASID support
iommu/vt-d: Always enable PASID/PRI PCI capabilities before ATS
iommu/vt-d: Add initial support for PASID tables
...
Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch of
debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlY6ePQACgkQMUfUDdst+ymNTgCgpP0CZw57GpwF/Hp2L/lMkVeo
Kx8AoKhEi4iqD5fdCQS9qTfomB+2/M6g
=g7ZO
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch
of debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time"
* tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
debugfs: Add debugfs_create_ulong()
of: to support binding numa node to specified device in devicetree
debugfs: Add read-only/write-only bool file ops
debugfs: Add read-only/write-only size_t file ops
debugfs: Add read-only/write-only x64 file ops
debugfs: Consolidate file mode checks in debugfs_create_*()
Revert "mm: Check if section present during memory block (un)registering"
driver-core: platform: Provide helpers for multi-driver modules
mm: Check if section present during memory block (un)registering
devres: fix a for loop bounds check
CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit
base/platform: assert that dev_pm_domain callbacks are called unconditionally
sysfs: correctly handle short reads on PREALLOC attrs.
base: soc: siplify ida usage
kobject: move EXPORT_SYMBOL() macros next to corresponding definitions
kobject: explain what kobject's sd field is
debugfs: document that debugfs_remove*() accepts NULL and error values
debugfs: Pass bool pointer to debugfs_create_bool()
ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'
This is the downside of using bitfields in the struct definition, rather
than doing all the explicit masking and shifting.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Two late fixes for the AMD IOMMU driver:
* One adds an additional check to the io page-fault handler to
avoid a BUG_ON being hit in handle_mm_fault()
* Second patch fixes a problem with devices writing to the
system management area and were blocked by the IOMMU because
the driver wrongly cleared out the DTE flags allowing that
access.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJWLeEzAAoJECvwRC2XARrjD/gQAIAihdsuv33iQQAfwvaztNOu
l5WqQ6gr54xQXKebkY9MgiX5qXIqyNPHku08WGp63kWkfKSkTgvqQw/WNoRcpE2+
GrMKl4EvJyvOp9S3tbtx3QNeb19wGEHyOFN6UuhNBhcXzxsqMN4c3D77upZ2N81k
hYzYYjWNL+5NwsUtK8oZSZUwhmGr4Iuim9mJDMMGhZYw88/dQICIQQtPWc8/ritJ
v/sBJA3KdyRvStxuba64NOWnByYXYnzyrJvBtVPMfPfFjfcyC0D0dwfXe3jvyjh3
nDSRoXqGsd35MBwDfVIf3HUKP4Wxwd+5pbSyrTfD5b4anEFL62ifdTb6/lpMFY/X
uo88xn9oTSMHO0TOPJw2XaBB8Y2OwW1FE7BVpa0CYFMDwQ/vaIGXAoBcxGL98a26
O+xd+pcMVELwOT0XFS5ue7eaZZdLCooj52s1ik8tMIB2qu6lFzd4JyWA7O3LbnMU
qT7YvZKbATjBvIaP0fHpZuZv6iyE2L9pdrvDIGeBb2TqE7r89JfRIYZcYfSvtrdA
TwlijU/w3eMdUoDVDCSlVT9UVfFyPZNwVS1qT3iU4OeVS8MPTnM/lNnWokruAoY0
hcbOOZ7EEtT6o/1GXyWDVaBKbAuWRNhbEEUMEBUlgPN7k9AW6dQIZDPftXwQ05tQ
XzUgi1ueNHcw1br6PItG
=4Bny
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"Two late fixes for the AMD IOMMU driver:
- add an additional check to the io page-fault handler to avoid a
BUG_ON being hit in handle_mm_fault()
- fix a problem with devices writing to the system management area
and were blocked by the IOMMU because the driver wrongly cleared
out the DTE flags allowing that access"
* tag 'iommu-fixes-v4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Don't clear DTE flags when modifying it
iommu/amd: Fix BUG when faulting a PROT_NONE VMA
When booted with intel_iommu=ecs_off we were still allocating the PASID
tables even though we couldn't actually use them. We really want to make
the pasid_enabled() macro depend on ecs_enabled().
Which is unfortunate, because currently they're the other way round to
cope with the Broadwell/Skylake problems with ECS.
Instead of having ecs_enabled() depend on pasid_enabled(), which was never
something that made me happy anyway, make it depend in the normal case
on the "broken PASID" bit 28 *not* being set.
Then pasid_enabled() can depend on ecs_enabled() as it should. And we also
don't need to mess with it if we ever see an implementation that has some
features requiring ECS (like PRI) but which *doesn't* have PASID support.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Not entirely clear why, but it seems we need to reserve PASID zero and
flush it when we make a PASID entry present.
Quite we we couldn't use the true PASID value, isn't clear.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Propagate the error-value from the function ir_parse_ioapic_hpet_scope()
in parse_ioapics_under_ir() and cleanup its calling loop.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Adjust the return value of parse_ioapics_under_ir as
negative value representing failure and "0" representing
succcess. Just make it consistent with other function
implementations, and we can judge if calling is successfull
by if (!parse_ioapics_under_ir()) style.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Now that the iommu core support for iommu groups is not
pci-centric anymore, we can move default domain allocation
to the bus independent iommu_group_get_for_dev() function.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
All callers of iommu_group_get_for_dev() provide a
device_group call-back now, so this fall-back is no longer
needed.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This converts the ARM SMMU and the SMMUv3 driver to use the
new device_group call-back.
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Convert the fsl pamu driver to make use of the new
device_group call-back.
Cc: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Rename that function to pci_device_group() and export it, so
that IOMMU drivers can use it as their device_group
call-back.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
That call-back is currently unused, change it into a
call-back function for finding the right IOMMU group for a
device.
This is a first step to remove the hard-coded PCI dependency
in the iommu-group code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Pull intel-iommu bugfix from David Woodhouse:
"This contains a single fix, for when the IOMMU API is used to overlay
an existing mapping comprised of 4KiB pages, with a mapping that can
use superpages.
For the *first* superpage in the new mapping, we were correctly¹
freeing the old bottom-level page table page and clearing the link to
it, before installing the superpage. For subsequent superpages,
however, we weren't. This causes a memory leak, and a warning about
setting a PTE which is already set.
¹ Well, not *entirely* correctly. We just free the page table pages
right there and then, which is wrong. In fact they should only be
freed *after* the IOTLB is flushed so we know the hardware will no
longer be looking at them.... and in fact I note that the IOTLB
flush is completely missing from the intel_iommu_map() code path,
although it needs to be there if it's permitted to overwrite
existing mappings.
Fixing those is somewhat more intrusive though, and will probably
need to wait for 4.4 at this point"
* tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu:
iommu/vt-d: fix range computation when making room for large pages
The code is buggy and the values read from PCI are not
reliable anyway, so it is the best to just remove this code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Clean up the functions to allocate the command, event and
ppr-log buffers. Remove redundant code and change the return
value to int.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The driver always uses a constant size for these buffers
anyway, so there is no need to waste memory to store the
sizes.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This mostly removes the code to create dev_data structures
for alias device ids. They are not necessary anymore, as
they were only created for device ids which have no struct
pci_dev associated with it. But these device ids are
handled in a simpler way now, so there is no need for this
code anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With this we don't have to create dev_data entries for
non-existent devices (which only exist as request-ids).
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The alias list is handled aleady by iommu core code. No need
anymore to handle it in this part of the AMD IOMMU code
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The condition in the BUG_ON is an indicator of a BUG, but no
reason to kill the code path. Turn it into a WARN_ON and
bail out if it is hit.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
During device assignment/deassignment the flags in the DTE
get lost, which might cause spurious faults, for example
when the device tries to access the system management range.
Fix this by not clearing the flags with the rest of the DTE.
Reported-by: G. Richard Bellamy <rbellamy@pteradigm.com>
Tested-by: G. Richard Bellamy <rbellamy@pteradigm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Change the 'pages' parameter to 'unsigned long' to avoid overflow.
Fix the device-IOTLB flush parameter calculation — the size of the IOTLB
flush is indicated by the position of the least significant zero bit in
the address field. For example, a value of 0x12345f000 will flush from
0x123440000 to 0x12347ffff (256KiB).
Finally, the cap_pgsel_inv() is not relevant to SVM; the spec says that
*all* implementations must support page-selective invaliation for
"first-level" translations. So don't check for it.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This will give a little bit of assistance to those developing drivers
using SVM. It might cause a slight annoyance to end-users whose kernel
disables the IOMMU when drivers are trying to use it. But the fix there
is to fix the kernel to enable the IOMMU.
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
There is an extra semi-colon on this if statement so we always break on
the first iteration.
Fixes: 0204a49609 ('iommu/vt-d: Add callback to device driver on page faults')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This really should be VTD_PAGE_SHIFT, not PAGE_SHIFT. Not that we ever
really anticipate seeing this used on IA64, but we should get it right
anyway.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The "req->addr" variable is a bit field declared as "u64 addr:52;".
The "address" variable is a u64. We need to cast "req->addr" to a u64
before the shift or the result is truncated to 52 bits.
Fixes: a222a7f0bb ('iommu/vt-d: Implement page request handling')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Dan Carpenter pointed out an error path which could lead to us
dereferencing the 'svm' pointer after we know it to be NULL because the
PASID lookup failed. Fix that, and make it less likely to happen again.
Fixes: a222a7f0bb ('iommu/vt-d: Implement page request handling')
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Despite being a platform device, the SMMUv3 is capable of signaling
interrupts using MSIs. Hook it into the platform MSI framework and
enjoy faults being reported in a new and exciting way.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
[will: tidied up the binding example and reworked most of the code]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since commit 1463fe44fd ("iommu/arm-smmu: Don't use VMIDs for stage-1
translations"), we don't need the GR0 base address when initialising a
context bank, so remove the useless local variable and its init code.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The bitmap allocator returns an int, which is one of the standard
negative values on failure. Rather than assigning this straight to a
u16 (like we do for the ASID and VMID callers), which means that we
won't detect failure correctly, use an int for the purposes of error
checking.
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This is only usable for the static 1:1 mapping of physical memory.
Any access to vmalloc or module regions will require some way of doing
an IOTLB flush. It's theoretically possible to hook into the
tlb_flush_kernel_range() function, but that seems like overkill — most
of the addresses accessed through a kernel PASID *will* be in the 1:1
mapping.
If we really need to allow access to more interesting kernel regions,
then the answer will probably be an explicit IOTLB flush call after use,
akin to the DMA API's unmap function.
In fact, it might be worth introducing that sooner rather than later, and
making it just BUG() if the address isn't in the static 1:1 mapping.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
Since associating an IOVA allocator with an IOMMU domain is a fairly
common need, rather than introduce yet another private structure just to
do this for ourselves, extend the top-level struct iommu_domain with the
notion. A simple opaque cookie allows reuse by other IOMMU API users
with their various different incompatible allocator types.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>