* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: sysfs: fix printk warnings
PCI: fix pci_bus_alloc_resource() hang, prefer positive decode
PCI: read current power state at enable time
PCI: fix size checks for mmap() on /proc/bus/pci files
x86/PCI: coalesce overlapping host bridge windows
PCI hotplug: ibmphp: Add check to prevent reading beyond mapped area
Cast pci_resource_start() and pci_resource_len() to u64 for printk.
drivers/pci/pci-sysfs.c:753: warning: format '%16Lx' expects type 'long long unsigned int', but argument 9 has type 'resource_size_t'
drivers/pci/pci-sysfs.c:753: warning: format '%16Lx' expects type 'long long unsigned int', but argument 10 has type 'resource_size_t'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When a PCI bus has two resources with the same start/end, e.g.,
pci_bus 0000:04: resource 2 [mem 0xd0000000-0xd7ffffff pref]
pci_bus 0000:04: resource 7 [mem 0xd0000000-0xd7ffffff]
the previous pci_bus_find_resource_prev() implementation would alternate
between them forever:
pci_bus_find_resource_prev(... [mem 0xd0000000-0xd7ffffff pref])
returns [mem 0xd0000000-0xd7ffffff]
pci_bus_find_resource_prev(... [mem 0xd0000000-0xd7ffffff])
returns [mem 0xd0000000-0xd7ffffff pref]
pci_bus_find_resource_prev(... [mem 0xd0000000-0xd7ffffff pref])
returns [mem 0xd0000000-0xd7ffffff]
...
This happened because there was no ordering between two resources with the
same start and end. A resource that had the same start and end as the
cursor, but was not itself the cursor, was considered to be before the
cursor.
This patch fixes the hang by making a fixed ordering between any two
resources.
In addition, it tries to allocate from positively decoded regions before
using any subtractively decoded resources. This means we will use a
positive decode region before a subtractive decode one, even if it means
using a smaller address.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=22062
Reported-by: Borislav Petkov <bp@amd64.org>
Tested-by: Borislav Petkov <bp@amd64.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When we enable a PCI device, we avoid doing a lot of the initial setup
work if the device's enable count is non-zero. If we don't fetch the
power state though, we may later fail to set up MSI due to the unknown
status. So pick it up before we short circuit the rest due to a
pre-existing enable or mismatched enable/disable pair (as happens with
VGA devices, which are special in a special way).
Tested-by: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Reported-by: Dave Airlie <airlied@linux.ie>
Tested-by: Dave Airlie <airlied@linux.ie>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The checks for valid mmaps of PCI resources made through /proc/bus/pci files
that were introduced in 9eff02e204 have several
problems:
1. mmap() calls on /proc/bus/pci files are made with real file offsets > 0,
whereas under /sys/bus/pci/devices, the start of the resource corresponds
to offset 0. This may lead to false negatives in pci_mmap_fits(), which
implicitly assumes the /sys/bus/pci/devices layout.
2. The loop in proc_bus_pci_mmap doesn't skip empty resouces. This leads
to false positives, because pci_mmap_fits() doesn't treat empty resources
correctly (the calculated size is 1 << (8*sizeof(resource_size_t)-PAGE_SHIFT)
in this case!).
3. If a user maps resources with BAR > 0, pci_mmap_fits will emit bogus
WARNINGS for the first resources that don't fit until the correct one is found.
On many controllers the first 2-4 BARs are used, and the others are empty.
In this case, an mmap attempt will first fail on the non-empty BARs
(including the "right" BAR because of 1.) and emit bogus WARNINGS because
of 3., and finally succeed on the first empty BAR because of 2.
This is certainly not the intended behaviour.
This patch addresses all 3 issues.
Updated with an enum type for the additional parameter for pci_mmap_fits().
Cc: stable@kernel.org
Signed-off-by: Martin Wilck <martin.wilck@ts.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
While testing various randconfigs with ktest.pl, I hit the following panic:
BUG: unable to handle kernel paging request at f7e54b03
IP: [<c0d63409>] ibmphp_access_ebda+0x101/0x19bb
Adding printks, I found that the loop that reads the ebda blocks
can move out of the mapped section.
ibmphp_access_ebda: start=f7e44c00 size=5120 end=f7e46000
ibmphp_access_ebda: io_mem=f7e44d80 offset=384
ibmphp_access_ebda: io_mem=f7e54b03 offset=65283
The start of the iomap was at f7e44c00 and had a size of 5120,
making the end f7e46000. We start with an offset of 0x180 or
384, giving the first read at 0xf7e44d80. Reading that location
yields 65283, which is much bigger than the 5120 that was allocated
and makes the next read at f7e54b03 which is outside the mapped area.
Perhaps this is a bug in the driver, or buggy hardware, but this patch
is more about not crashing my box on start up and just giving a warning
if it detects this error.
This patch at least lets my box boot with just a warning.
Cc: Chandru Siddalingappa <chandru@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Stanse found that when pdev is found and has no driver a reference is
leaked in pcifront_common_process. So add pci_dev_put there. For the
pdev == NULL case, pci_dev_put(NULL) is fine.
[v2: Updated to not dereference pcidev->dev per Milton's observation]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
In drivers/pci/xen-pcifront.c the xen/xenbus.h header is included twice -
once is enough.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
and branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
xen: register xen pci notifier
xen: initialize cpu masks for pv guests in xen_smp_init
xen: add a missing #include to arch/x86/pci/xen.c
xen: mask the MTRR feature from the cpuid
xen: make hvc_xen console work for dom0.
xen: add the direct mapping area for ISA bus access
xen: Initialize xenbus for dom0.
xen: use vcpu_ops to setup cpu masks
xen: map a dummy page for local apic and ioapic in xen_set_fixmap
xen: remap MSIs into pirqs when running as initial domain
xen: remap GSIs as pirqs when running as initial domain
xen: introduce XEN_DOM0 as a silent option
xen: map MSIs into pirqs
xen: support GSI -> pirq remapping in PV on HVM guests
xen: add xen hvm acpi_register_gsi variant
acpi: use indirect call to register gsi in different modes
xen: implement xen_hvm_register_pirq
xen: get the maximum number of pirqs from xen
xen: support pirq != irq
* 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (27 commits)
X86/PCI: Remove the dependency on isapnp_disable.
xen: Update Makefile with CONFIG_BLOCK dependency for biomerge.c
MAINTAINERS: Add myself to the Xen Hypervisor Interface and remove Chris Wright.
x86: xen: Sanitse irq handling (part two)
swiotlb-xen: On x86-32 builts, select SWIOTLB instead of depending on it.
MAINTAINERS: Add myself for Xen PCI and Xen SWIOTLB maintainer.
xen/pci: Request ACS when Xen-SWIOTLB is activated.
xen-pcifront: Xen PCI frontend driver.
xenbus: prevent warnings on unhandled enumeration values
xenbus: Xen paravirtualised PCI hotplug support.
xen/x86/PCI: Add support for the Xen PCI subsystem
x86: Introduce x86_msi_ops
msi: Introduce default_[teardown|setup]_msi_irqs with fallback.
x86/PCI: Export pci_walk_bus function.
x86/PCI: make sure _PAGE_IOMAP it set on pci mappings
x86/PCI: Clean up pci_cache_line_size
xen: fix shared irq device passthrough
xen: Provide a variant of xen_poll_irq with timeout.
xen: Find an unbound irq number in reverse order (high to low).
xen: statically initialize cpu_evtchn_mask_p
...
Fix up trivial conflicts in drivers/pci/Makefile
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits)
x86: allocate space within a region top-down
x86: update iomem_resource end based on CPU physical address capabilities
x86/PCI: allocate space from the end of a region, not the beginning
PCI: allocate bus resources from the top down
resources: support allocating space within a region from the top down
resources: handle overflow when aligning start of available area
resources: ensure callback doesn't allocate outside available space
resources: factor out resource_clip() to simplify find_resource()
resources: add a default alignf to simplify find_resource()
x86/PCI: MMCONFIG: fix region end calculation
PCI: Add support for polling PME state on suspended legacy PCI devices
PCI: Export some PCI PM functionality
PCI: fix message typo
PCI: log vendor/device ID always
PCI: update Intel chipset names and defines
PCI: use new ccflags variable in Makefile
PCI: add PCI_MSIX_TABLE/PBA defines
PCI: add PCI vendor id for STmicroelectronics
x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs
PCI: OLPC: Only enable PCI configuration type override on XO-1
...
The BKL was pushed into this function when it was converted to use the
unlocked_ioctl interface, but nothing that the function touches is
actually protected by the BKL. So just remove the BKL entirely, so that
we finally can get a realistic system build without the BKL being
enabled at all.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allocate space from the highest-address PCI bus resource first, then work
downward.
Previously, we looked for space in PCI host bridge windows in the order
we discovered the windows. For example, given the following windows
(discovered via an ACPI _CRS method):
pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff]
pci_root PNP0A03:00: host bridge window [mem 0x000c0000-0x000effff]
pci_root PNP0A03:00: host bridge window [mem 0x000f0000-0x000fffff]
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xf7ffffff]
pci_root PNP0A03:00: host bridge window [mem 0xff980000-0xff980fff]
pci_root PNP0A03:00: host bridge window [mem 0xff97c000-0xff97ffff]
pci_root PNP0A03:00: host bridge window [mem 0xfed20000-0xfed9ffff]
we attempted to allocate from [mem 0x000a0000-0x000bffff] first, then
[mem 0x000c0000-0x000effff], and so on.
With this patch, we allocate from [mem 0xff980000-0xff980fff] first, then
[mem 0xff97c000-0xff97ffff], [mem 0xfed20000-0xfed9ffff], etc.
Allocating top-down follows Windows practice, so we're less likely to
trip over BIOS defects in the _CRS description.
On the machine above (a Dell T3500), the [mem 0xbff00000-0xbfffffff] region
doesn't actually work and is likely a BIOS defect. The symptom is that we
move the AHCI controller to 0xbff00000, which leads to "Boot has failed,
sleeping forever," a BUG in ahci_stop_engine(), or some other boot failure.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c43
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=620313
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=629933
Reported-by: Brian Bloniarz <phunge0@hotmail.com>
Reported-and-tested-by: Stefan Becker <chemobejk@gmail.com>
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove in_workqueue_context()
workqueue: Clarify that schedule_on_each_cpu is synchronous
memory_hotplug: drop spurious calls to flush_scheduled_work()
shpchp: update workqueue usage
pciehp: update workqueue usage
isdn/eicon: don't call flush_scheduled_work() from diva_os_remove_soft_isr()
workqueue: add and use WQ_MEM_RECLAIM flag
workqueue: fix HIGHPRI handling in keep_working()
workqueue: add queue_work and activate_work trace points
workqueue: prepare for more tracepoints
workqueue: implement flush[_delayed]_work_sync()
workqueue: factor out start_flush_work()
workqueue: cleanup flush/cancel functions
workqueue: implement alloc_ordered_workqueue()
Fix up trivial conflict in fs/gfs2/main.c as per Tejun
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
block: autoconvert trivial BKL users to private mutex
drivers: autoconvert trivial BKL users to private mutex
ipmi: autoconvert trivial BKL users to private mutex
mac: autoconvert trivial BKL users to private mutex
mtd: autoconvert trivial BKL users to private mutex
scsi: autoconvert trivial BKL users to private mutex
Fix up trivial conflicts (due to addition of private mutex right next to
deletion of a version string) in drivers/char/pcmcia/cm40[04]0_cs.c
This is a port of the 2.6.18 Xen PCI front driver with fixes
to make it build under 2.6.34 and later (for the full list of
changes: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
historic/xen-pcifront-0.1). It also includes the fixes
to make it work properly.
[v2: Updated Kconfig, removed crud, added Reviewed-by]
[v3: Added 'static', fixed grant table leak, redid Kconfig]
[v4: Added one more 'static' and removed comments]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Jan Beulich <JBeulich@novell.com>
Introduce an override for the arch_[teardown|setup]_msi_irqs
that can be utilized to fallback to the default arch_* code.
If a platform wants to utilize the code paths defined
in driver/pci/msi.c it has to define HAVE_DEFAULT_MSI_TEARDOWN_IRQS
or HAVE_DEFAULT_MSI_SETUP_IRQS. Otherwise the old mechanism
of over-ridding the arch_* works fine.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: x86@kernel.org
In preperation of modularizing Xen-pcifront the pci_walk_bus
needs to be exported so that the xen-pcifront module can walk
call the pci subsystem to walk the PCI devices and claim them.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> [http://marc.info/?l=linux-pci&m=126149958010298&w=2]
The patch below updates broken web addresses in the kernel
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* Rename shpchp_wq to shpchp_ordered_wq and add non-ordered shpchp_wq
which is used instead of the system workqueue. This is to remove
the use of flush_scheduled_work() which is deprecated and scheduled
for removal.
* With cmwq in place, there's no point in creating workqueues lazily.
Create both shpchp_wq and shpchp_ordered_wq upfront.
* Include workqueue.h from shpchp.h.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* Rename pciehp_wq to pciehp_ordered_wq and add non-ordered pciehp_wq
which is used instead of the system workqueue. This is to remove
the use of flush_scheduled_work() which is deprecated and scheduled
for removal.
* With cmwq in place, there's no point in creating workqueues lazily.
Create both pciehp_wq and pciehp_ordered_wq upfront.
* Include workqueue.h from pciehp.h.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Not all hardware vendors hook up the PME line for legacy PCI devices,
meaning that wakeup events get lost. The only way around this is to poll
the devices to see if their state has changed, so add support for doing
that on legacy PCI devices that aren't part of the core chipset.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
It's helpful to have some extra PCI power management functions available to
platform code, so move the declarations to an exported header.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Previously we had to have CONFIG_PCI_DEBUG=y or CONFIG_DYNAMIC_DEBUG=y
to turn on this printk, but I think the IDs are valuable enough that it's
worth putting them in the log always.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
These are already defined in pcilib's pci/header.h but not in kernel's
linux/pci_regs.h. Copy them to avoid using magic numbers.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
A long time ago I worked on a RHEL5 bug in which kdump hung during boot
on a set of systems. The systems hung because they never received timer
interrupts during calibrate_delay. These systems also all had Opteron
processors on a hypertransport bus, bridged to a pci bus via an Nvidia
MCP55 northbridge chip. After much wrangling I managed to learn from
Nvidia that they have an undocumented register in some versions of that
chip which control how legacy interrupts are send to the cpu complex
when the ioapic isn't active. Nvidia defaults this register to only
send legacy interrupts to the BSP, so if kdump happens to boot on an AP,
we never get timer interrupts and boom. I had initially used this quirk
as a workaround, with my intent being to move apic initalization to an
earlier point in the boot process, so the setting of the register would
be irrelevant. Given the work involved in doing that however, the
fragile nature of the apic initalization code, and the fact that, over
the 2 years since we found this bug, the MCP55 is the only chip which
seems to have this issue, I've figure at this point its likely safer to
just carry the quirk around. By setting the referenced bits in this
hidden register, interrupts will be broadcast to all cpus when the
ioapic isn't active on the above described systems.
Acked-by: Simon Horman <horms@verge.net.au>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is a design issue related to PCIe AER and _OSC that the BIOS
may be asked to grant control of the AER service even if some
Hardware Error Source Table (HEST) entries contain information
meaning that the BIOS really should control it. Namely,
pcie_port_acpi_setup() calls pcie_aer_get_firmware_first() that
determines whether or not the AER service should be controlled by
the BIOS on the basis of the HEST information for the given PCIe
port. The BIOS is asked to grant control of the AER service for
a PCIe Root Complex if pcie_aer_get_firmware_first() returns 'false'
for at least one root port in that complex, even if all of the other
root ports' HEST entries have the FIRMWARE_FIRST flag set (and none
of them has the GLOBAL flag set). However, if the AER service is
controlled by the kernel, that may interfere with the BIOS' handling
of the error sources having the FIRMWARE_FIRST flag. Moreover,
there may be PCIe endpoints that have the FIRMWARE_FIRST flag set in
HEST and are attached to the root ports in question, in which case it
also may be unsafe to ask the BIOS for control of the AER service.
For this reason, introduce a function checking if there's at least
one PCIe-related HEST entry with the FIRMWARE_FIRST flag set and
disable the native AER service altogether if this function returns
'true'.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
quiet the warning about use of uninitialized e_src in
aer_isr() e_src is initialized by get_e_source()
Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
All operations in the pci procfs ioctl functions are
atomic, so no lock is needed here.
Also add a compat_ioctl method, since all the commands
are compatible in 32 bit mode.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Indent the branch of an if.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r disable braces4@
position p1,p2;
statement S1,S2;
@@
(
if (...) { ... }
|
if (...) S1@p1 S2@p2
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
if (p1[0].column == p2[0].column):
cocci.print_main("branch",p1)
cocci.print_secs("after",p2)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
irq_2_iommu is in struct irq_cfg, so we can do the irq_remapped check
based on irq_cfg instead of going through a lookup function. That's
especially interesting in the eoi_ioapic_irq() hotpath.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Switch the intr_remapping code to use the irq_2_iommu struct in
irg_cfg.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
That interrupt remapping code is x86 specific and tied to the io_apic
code. No need for separate allocator functions in the interrupt
remapping code. This allows to simplify the code and irq_2_iommu is
small (13 bytes on 64bit) so it's not a real problem even if interrupt
remapping is runtime disabled. If it's compile time disabled the
impact is zero.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
No need to dereference irq_desc.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
With SPARSE_IRQ=y the irte descriptors are dynamically allocated, but not
freed in free_irte().
That was ok as long as the sparse irq core was not freeing irq descriptors on
destroy_irq(). Now we leak the irte descriptor. Free it in free_irte().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Handing down irq_desc to msi just so that msi can access
irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code
can hand down a pointer to msi_desc so msi code does not need to know
about the irq descriptor at all.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
All these files use the big kernel lock in a trivial
way to serialize their private file operations,
typically resulting from an earlier semi-automatic
pushdown from VFS.
None of these drivers appears to want to lock against
other code, and they all use the BKL as the top-level
lock in their file operations, meaning that there
is no lock-order inversion problem.
Consequently, we can remove the BKL completely,
replacing it with a per-file mutex in every case.
Using a scripted approach means we can avoid
typos.
These drivers do not seem to be under active
maintainance from my brief investigation. Apologies
to those maintainers that I have missed.
file=$1
name=$2
if grep -q lock_kernel ${file} ; then
if grep -q 'include.*linux.mutex.h' ${file} ; then
sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
else
sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
fi
sed -i ${file} \
-e "/^#include.*linux.mutex.h/,$ {
1,/^\(static\|int\|long\)/ {
/^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);
} }" \
-e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
-e '/[ ]*cycle_kernel_lock();/d'
else
sed -i -e '/include.*\<smp_lock.h\>/d' ${file} \
-e '/cycle_kernel_lock()/d'
fi
Signed-off-by: Arnd Bergmann <arnd@arndb.de>