The main benefit of using ACPI host bridge window information is that
we can do better resource allocation in systems with multiple host bridges,
e.g., http://bugzilla.kernel.org/show_bug.cgi?id=14183
Sometimes we need _CRS information even if we only have one host bridge,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/341681
Most of these systems are relatively new, so this patch turns on
"pci=use_crs" only on machines with a BIOS date of 2008 or newer.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Previously we used a table of size PCI_BUS_NUM_RESOURCES (16) for resources
forwarded to a bus by its upstream bridge. We've increased this size
several times when the table overflowed.
But there's no good limit on the number of resources because host bridges
and subtractive decode bridges can forward any number of ranges to their
secondary buses.
This patch reduces the table to only PCI_BRIDGE_RESOURCE_NUM (4) entries,
which corresponds to the number of windows a PCI-to-PCI (3) or CardBus (4)
bridge can positively decode. Any additional resources, e.g., PCI host
bridge windows or subtractively-decoded regions, are kept in a list.
I'd prefer a single list rather than this split table/list approach, but
that requires simultaneous changes to every architecture. This approach
only requires immediate changes where we set up (a) host bridges with more
than four windows and (b) subtractive-decode P2P bridges, and we can
incrementally change other architectures to use the list.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The roundup() caused a build error (undefined reference to `__udivdi3').
We're aligning to power-of-two boundaries, so it's simpler to just use
ALIGN() anyway, which avoids the division.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI device BARs are guaranteed to start and end on at least a four-byte
(I/O) or a sixteen-byte (MMIO) boundary because they're aligned on their
size and the low BAR bits are reserved. PCI-to-PCI bridge apertures
have even larger alignment restrictions.
However, some BIOSes (e.g., HP DL360 BIOS P31) report host bridge windows
like "[io 0x0000-0x2cfe]". This is wrong because it excludes the last
port at 0x2cff: it's impossible for a downstream device to claim 0x2cfe
without also claiming 0x2cff. In fact, this BIOS configures a device
behind the bridge to "[io 0x2c00-0x2cff]", so we know the window actually
does include 0x2cff.
This patch rounds the start and end of apertures to the appropriate
boundary. I experimentally determined that Windows contains a similar
workaround; details here:
http://bugzilla.kernel.org/show_bug.cgi?id=14337
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We have occasional problems with PCI resource allocation, and sometimes
they could be avoided by paying attention to what ACPI tells us about
the host bridges. This patch doesn't change the behavior, but it prints
window information that should make debugging easier.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use the dev_printk-like "%04x:%02x" format for printing PCI bus numbers.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jesse accidentally applied v1 [1] of the patchset instead of v2 [2]. This
is the diff between v1 and v2.
The changes in this patch are:
- tidied vsprintf stack buffer to shrink and compute size more
accurately
- use %pR for decoding and %pr for "raw" (with type and flags) instead
of adding %pRt and %pRf
[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This allows us to remove adjust_transparent_bridge_resources and give
x86_pci_root_bus_res_quirks a chance when _CRS is not used or not there.
Acked-by: Gary Hade <garyhade@us.ibm.com>
Tested-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Don't touch info->res_num if we are out of space.
Acked-by: Gary Hade <garyhade@us.ibm.com>
Tested-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This reverts commit 9e9f46c44e.
Quoting from the commit message:
"At this point, it seems to solve more problems than it causes, so let's
try using it by default. It's an easy revert if it ends up causing
trouble."
And guess what? The _CRS code causes trouble.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Issue a warning if _CRS returns too many resource descriptors to be
accommodated by the fixed size resource array instances. If there is no
transparent bridge on the root bus "too many" is the
PCI_BUS_NUM_RESOURCES size of the resource array. Otherwise, the last 3
slots of the resource array must be excluded making the maximum
(PCI_BUS_NUM_RESOURCES - 3).
The current code:
- is silent when _CRS returns too many resource descriptors and
- incorrectly allows use of the last 3 slots of the resource array
for a root bus with a transparent bridge
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
At this point, it seems to solve more problems than it causes, so let's try using it by default. It's an easy revert if it ends up causing trouble.
Reviewed-by: Yinghai Lu <yhlu.kernel@gmail.com>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since pci_bus has a struct device, use dev_printk directly instead
of faking it by hand.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Change PCI bus locality messages so they have a bit more context
and look like the rest of PCI, e.g.,
- bus 01 -> node 0
- bus 04 -> node 0
+ pci 0000:01: bus on NUMA node 0
+ pci 0000:04: bus on NUMA node 0
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Impact: cleanup
Now that arch/x86/pci/pci.h is used in a number of other places as well,
move the lowlevel x86 pci definitions into the architecture include files.
(not to be confused with the existing arch/x86/include/asm/pci.h file,
which provides public details about x86 PCI)
Tested on: X86_32_UP, X86_32_SMP and X86_64_SMP
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dump all the PIC, local APIC and I/O APIC information at the
fs_initcall() level, which is after ACPI (if used) has initialised PCI
information, making the point of invocation consistent across MP-table and
ACPI platforms. Remove explicit calls to print_IO_APIC() from elsewhere.
Make the interface of all the functions involved consistent between 32-bit
and 64-bit versions and make them all static by default by the means of a
New-and-Improved(TM) __apicdebuginit() macro.
Note that like print_IO_APIC() all these only output anything if
"apic=debug" has been passed to the kernel through the command line.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
So far subsys_initcalls has been executed in this order depending on
the object order in the Makefile:
arch/x86/pci/visws.c:subsys_initcall(pcibios_init);
arch/x86/pci/numa.c:subsys_initcall(pci_numa_init);
arch/x86/pci/acpi.c:subsys_initcall(pci_acpi_init);
arch/x86/pci/legacy.c:subsys_initcall(pci_legacy_init);
arch/x86/pci/irq.c:subsys_initcall(pcibios_irq_init);
arch/x86/pci/common.c:subsys_initcall(pcibios_init);
This patch removes the ordering dependency. There is now only one
subsys_initcall function that contains subsystem initialization code
with a defined order.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
also make bus_numa work even if ACPI_NUMA is not defined.
don't call pxm_to_node again, and use node directly.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a numa system (with multi HT chains) may return node without ram. Aka it
is not online. Try to get an online node, otherwise return -1.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
so we don't align the io port start address for pci cards.
also move out dmi check out acpi.c, because it has nothing to do with acpi.
it could spare some calling when we have several peer root buses.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Currently, on an amd k8 system with multi ht chains, the numa_node of
pci devices under /sys/devices/pci0000:80/* is always 0, even if that
chain is on node 1 or 2 or 3.
Workaround: pcibus_to_node(bus) is used when we want to get the node that
pci_device is on.
In struct device, we already have numa_node member, and we could use
dev_to_node()/set_dev_node() to get and set numa_node in the device.
set_dev_node is called in pci_device_add() with pcibus_to_node(bus),
and pcibus_to_node uses bus->sysdata for nodeid.
The problem is when pci_add_device is called, bus->sysdata is not assigned
correct nodeid yet. The result is that numa_node will always be 0.
pcibios_scan_root and pci_scan_root could take sysdata. So we need to get
mp_bus_to_node mapping before these two are called, and thus
get_mp_bus_to_node could get correct node for sysdata in root bus.
In scanning of the root bus, all child busses will take parent bus sysdata.
So all pci_device->dev.numa_node will be assigned correctly and automatically.
Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we
could also could make other bus specific device get the correct numa_node
too.
This is an updated version of pci_sysdata and Jeff's pci_domain patch.
[ mingo@elte.hu: build fix ]
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The "pci=routeirq" option was added in 2004, and I don't get any valid
reports anymore. The option is still mentioned in kernel-parameters.txt.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The PCI bus names included in /proc/iomem and /proc/ioports are
of the form 'PCI Bus #XX' where XX is the bus number. This patch
changes the naming to 'PCI Bus XXXX:YY' where XXXX is the domain
number and YY is the bus number. For example, PCI bus 14 in
domain 0 will show as 'PCI Bus 0000:14' instead of 'PCI Bus #14'.
This change makes the naming consistent with other architectures
such as ia64 where multiple PCI domain support has been around
longer.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
http://bugzilla.kernel.org/show_bug.cgi?id=10124
this change:
commit 08f1c192c3
Author: Muli Ben-Yehuda <muli@il.ibm.com>
Date: Sun Jul 22 00:23:39 2007 +0300
x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata
This patch introduces struct pci_sysdata to x86 and x86-64, and
converts the existing two users (NUMA, Calgary) to use it.
This lays the groundwork for having other users of sysdata, such as
the PCI domains work.
The Calgary bits are tested, the NUMA bits just look ok.
replaces pcibios_scan_root by pci_scan_bus_parented...
but in pcibios_scan_root we have a check about scanned busses.
Cc: <yakui.zhao@intel.com>
Cc: Stian Jordet <stian@jordet.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Yinghai Lu" <yhlu.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
need to check info->res_num less than PCI_BUS_NUM_RESOURCES, so
info->bus->resource[info->res_num] = res will not beyond of bus resource
array when acpi returns too many resource entries.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Gary Hade <gary.hade@us.ibm.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch fixes the following section mismatches with CONFIG_HOTPLUG=n:
<-- snip -->
...
WARNING: vmlinux.o(.data+0x23640): Section mismatch: reference to .init.text.20:can_skip_ioresource_align (between 'acpi_pciprobe_dmi_table' and 'pcibios_irq_mask')
WARNING: vmlinux.o(.data+0x2366c): Section mismatch: reference to .init.text.20:can_skip_ioresource_align (between 'acpi_pciprobe_dmi_table' and 'pcibios_irq_mask')
WARNING: vmlinux.o(.data+0x23698): Section mismatch: reference to .init.text.20:can_skip_ioresource_align (between 'acpi_pciprobe_dmi_table' and 'pcibios_irq_mask')
...
<-- snip -->
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Fix DMI const-ification fallout that appeared when merging subsystem
trees.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix bug in pci_read() and pci_write() which prevented PCI domain
support from working (hardcoded domain 0).
* unconditionally enable CONFIG_PCI_DOMAINS
* implement pci_domain_nr() and pci_proc_domain(), as required of
all arches when CONFIG_PCI_DOMAINS is enabled.
* store domain in struct pci_sysdata, as assigned by ACPI
* support "pci=nodomains"
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Use _CRS for PCI resource allocation
This patch resolves an issue where incorrect PCI memory and i/o ranges
are being assigned to hotplugged PCI devices on some IBM systems. The
resource mis-allocation not only makes the PCI device unuseable but
often makes the entire system unuseable due to resulting machine checks.
The hotplug capable PCI slots on the affected systems are not located
under a standard P2P bridge but are instead located under PCI root
bridges or subtractive decode P2P bridges. For example, the IBM x3850
contains 2 hotplug capable PCI-X slots and 4 hotplug capable PCIe slots
with the PCI-X slots each located under a PCI root bridge and the PCIe
slots each located under a subtractive decode P2P bridge.
The current i386/x86_64 PCI resource allocation code does not use _CRS
returned resource information. No other resource information source is
available for slots that are not below a standard P2P bridge so
incorrect ranges are being allocated from e820 hole causing the bad
result.
This patch causes the kernel to use _CRS returned resource info. It is
roughly based on a change provided by Matthew Wilcox for the ia64 kernel
in 2005. Due to possible buggy BIOS factor and possible yet to be
discovered kernel issues the function is disabled by default and can be
enabled with pci=use_crs.
Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Skip ISA ioresource alignment on some systems
To conserve limited PCI i/o resource on some IBM multi-node systems, the
BIOS allocates (via _CRS) and expects the kernel to use addresses in
ranges currently excluded by pcibios_align_resource() [i386/pci/i386.c].
This change allows the kernel to use the currently excluded address
ranges on the IBM x3800, x3850, and x3950.
Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>