Minor formatting fixup since the information which core was associated
with the MCE is not always valid.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Building for X86_32 produces shift count warnings, so use BIT_64() to
eliminate the warnings.
drivers/edac/mce_amd.c:778: warning: left shift count >= width of type
drivers/edac/mce_amd.c:778: warning: left shift count >= width of type
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: bluesmoke-devel@lists.sourceforge.net
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Now that everything is inplace, enable MCE decoding on F15h. Make
initcall routine a bit more readable.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Shorten up MCi_STATUS flags and add BD's new deferred and poison types.
Also, simplify formatting.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h.
Add a decoder function for CU MCEs.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add a decoder for F15h DC MCEs to support the new types of DC MCEs
introduced by the BD microarchitecture.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
F15h enlarges the extended error code of an MCE to a 5-bit field
(MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden
on F15h.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
K8 does not allow for an atomic RMW to a cacheline as F10h does so
disable the error injection interface for it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate
bandwidth it succeeded setting and remove superfluous arg pointer used
for that. A negative value returned still means that an error occurred
while setting the scrubrate. Document this for future reference.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Now that all prerequisites are in place, drop the two-stage driver
instances initialization in favor of the following simple init sequence:
1. Probe PCI device: we only test ECC capabilities here and if none exit
early.
2. If the hw supports ECC and it is/can be enabled, we init the per-node
instance.
Remove "amd64_" prefix from static functions touched, while at it.
There actually should be no visible functional change resulting from
this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Rework the code to check the hardware ECC capabilities at PCI probing
time. We do all further initialization only if we actually can/have ECC
enabled.
While at it:
0. Fix function naming.
1. Simplify/clarify debug output.
2. Remove amd64_ prefix from the static functions
3. Reorganize code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This is in preparation for the init path reorganization where we want
only to
1) test whether a particular node supports ECC
2) can it be enabled
and only then do the necessary allocation/initialization. For that,
we need to decouple the ECC settings of the node from the instance's
descriptor.
The should be no functional change introduced by this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
PCI ECS is being enabled by default since 2.6.26 on AMD so this code is
just superfluous now, remove it.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Remove static allocation in favor of dynamically allocating space for as
many driver instances as northbridges present on the system.
There should be no functional change resulting from this patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Rename variables representing PCI devices to their BKDG names for faster
search and shorter, clearer code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Move the remaining per-family init code into the proper place and
simplify the rest of the initialization. Reorganize error handling in
amd64_init_one_instance().
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Run a per-family init function which does all the settings based on
the family this driver instance is running on. Move the scrubrate
calculation in it and simplify code.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Cleanup L3 cache index disable support
x86, amd-nb: Cleanup AMD northbridge caching code
x86, amd-nb: Complete the rename of AMD NB and related code
When matching error address to the range contained by one memory node,
we're in valid range when node interleaving
1. is disabled, or
2. enabled and when the address bits we interleave on match the
interleave selector on this node (see the "Node Interleaving" section in
the BKDG for an enlightening example).
Thus, when we early-exit, we need to reverse the compound logic
statement properly.
Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This corrects the misprint introduced when moving '#if
PAGE_SHIFT' from i7core_edac.c to edac_core.h (commit
e9144601d3)
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Andrei Konovalov <akonovalov@mvista.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
00740c5854 changed edac_core to
un-/register a workqueue item only if a lowlevel driver supplies a
polling routine. Normally, when we remove a polling low-level driver, we
go and cancel all the queued work. However, the workqueue unreg happens
based on the ->op_state setting, and edac_mc_del_mc() sets this to
OP_OFFLINE _before_ we cancel the work item, leading to NULL ptr oops on
the workqueue list.
Fix it by putting the unreg stuff in proper order.
Cc: <stable@kernel.org> #36.x
Reported-and-tested-by: Tobias Karnat <tobias.karnat@googlemail.com>
LKML-Reference: <1291201307.3029.21.camel@Tobias-Karnat>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Otherwise, variable i will be -1 inside the latest iteration of the
while loop.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Change EDAC's Makefile to use <modules>-y instead of
<modules>-objs because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.
[bp: Fixup commit message]
[bp: Fixup indentation]
Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Support more than just the "Misc Control" part of the northbridges.
Support more flags by turning "gart_supported" into a single bit flag
that is stored in a flags member. Clean up related code by using a set
of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb())
instead of accessing the NB data structures directly. Reorder the
initialization code and put the GART flush words caching in a separate
function.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Not only the naming of the files was confusing, it was even more so for
the function and variable names.
Renamed the K8 NB and NUMA stuff that is also used on other AMD
platforms. This also renames the CONFIG_K8_NUMA option to
CONFIG_AMD_NUMA and the related file k8topology_64.c to
amdtopology_64.c. No functional changes intended.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (34 commits)
i7core_edac: return -ENODEV when devices were already probed
i7core_edac: properly terminate pci_dev_table
i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
i7core_edac: Fix refcount error at PCI devices
i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
i7core_edac: Fix an oops at i7core probe
i7core_edac: Remove unused member channels in i7core_pvt
i7core_edac: Remove unused arg csrow from get_dimm_config
i7core_edac: Reduce args of i7core_register_mci
i7core_edac: Introduce i7core_unregister_mci
i7core_edac: Use saved pointers
i7core_edac: Check probe counter in i7core_remove
i7core_edac: Call pci_dev_put() when alloc_i7core_dev() failed
i7core_edac: Fix error path of i7core_register_mci
i7core_edac: Fix order of lines in i7core_register_mci
i7core_edac: Always do get/put for all devices
i7core_edac: Introduce i7core_pci_ctl_create/release
i7core_edac: Introduce free_i7core_dev
i7core_edac: Introduce alloc_i7core_dev
i7core_edac: Reduce args of i7core_get_onedevice
...
* 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/edac: (25 commits)
i7300_edac: Properly initialize per-csrow memory size
V4L/DVB: i7300_edac: better initialize page counts
MAINTAINERS: Add maintainer for i7300-edac driver
i7300-edac: CodingStyle cleanup
i7300_edac: Improve comments
i7300_edac: Cleanup: reorganize the file contents
i7300_edac: Properly detect channel on CE errors
i7300_edac: enrich FBD error info for corrected errors
i7300_edac: enrich FBD error info for fatal errors
i7300_edac: pre-allocate a buffer used to prepare err messages
i7300_edac: Fix MTR x4/x8 detection logic
i7300_edac: Make the debug messages coherent with the others
i7300_edac: Cleanup: remove get_error_info logic
i7300_edac: Add a code to cleanup error registers
i7300_edac: Add support for reporting FBD errors
i7300_edac: Properly detect the type of error correction
i7300_edac: Detect if the device is on single mode
i7300_edac: Adds detection for enhanced scrub mode on x8
i7300_edac: Clear the error bit after reading
i7300_edac: Add error detection code for global errors
...
Due to the nature of i7core, we need to probe and attach all PCI
devices used by this driver during the first time probe is called.
However, PCI core will call the probe routine one time for each CPU
socket. If we return -EINVAL to those calls, it would seem that the
driver fails, when, in fact, there's no more devices left to initialize.
Changing the return code to -ENODEV solves this issue.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
At pci_xeon_fixup(), it waits for a null-terminated table, while at
i7core_get_all_devices, it just do a for 0..ARRAY_SIZE. As other tables
are zero-terminated, change it to be terminate with 0 as well, and fixes
a bug where it may be running out of the table elements.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
That's a nasty bug that took me a lot of time to track, and whose
solution took just one line to solve. The best fragrances and the worse
poisons are shipped on the smalest bottles.
The drivers/pci/quick.c implements the pci_get_device function. The normal
behavior is that you call it, the function returns you a pdev pointer
and increment pdev->kobj.kref.refcount of the pci device. However,
if you want to keep searching an object, you need to pass the previous
pdev function to the search.
When you use a not null pointer to pdev "from" field, pci_get_device
will decrement pdev->kobj.kref.refcount, assuming that the driver won't
be using the previous pdev.
The solution is simple: we just need to call pci_dev_get() manually,
for the pdev's that the driver will actually use.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Probably due to a bug or some testing logic at PCI level, device
refcount for <bus>:00.0 device is decremented at the end of the
pci_get_device, made by i7core_get_all_devices(). The fact is that
the first versions of the driver relied on those devices to probe
for Nehalem, but the current versions don't use it at all.
So, let's just remove those devices from the driver, making it simpler
and fixing the bug.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
changeset c91d57ba9ce5b5c93a7077e2f72510eb1f9131c4 moved the init
of the priv pointer to the end of the probe routine. However, we need
them before that, otherwise, we hit an OOPS:
[ 67.743453] EDAC DEBUG: mci_bind_devs: Associated fn 0.0, dev = ffff88011b46e000, socket 0
[ 67.751861] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 67.759685] IP: [<ffffffffa017e484>] i7core_probe+0x979/0x130c [i7core_edac]
[ 67.766721] PGD 10bd38067 PUD 10bd37067 PMD 0
[ 67.771178] Oops: 0000 [#1] SMP
[ 67.774414] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[ 67.782213] CPU 1
[ 67.784042] Modules linked in: i7core_edac(+) edac_core cpufreq_ondemand binfmt_misc dm_multipath video output pci_slot snd_hda_codd
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>