Add a hook so that the paravirt backend knows when the allocator is
ready. This is useful for the obvious reason that the allocator is
available, but the other side-effect of having the bootmem allocator
available is that each page now has an associated "struct page".
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
It's useful to know which mm is allocating a pagetable. Xen uses this
to determine whether the pagetable being added to is pinned or not.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Use existing elfnote.h to generate vsyscall notes, rather than doing
it locally. Changes elfnote.h a bit to suit, since this is the first
asm user, and it wasn't quite right.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.com>
Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that. I've
preserved the integer values so that any callers I've missed should
still work OK.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.
Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
and ppc). Patches for s390(x) and ia64 are already available from
previous posts, but it was decided that they should be added later
once fallocate is in the mainline. Hence not including those patches
in this take.
2. Changes to glibc,
a) to support fallocate() system call
b) to make posix_fallocate() and posix_fallocate64() call fallocate()
Signed-off-by: Amit Arora <aarora@in.ibm.com>
Mark variables with uninitialized_var() if such a warning appears,
and analysis proves that the var is initialized properly on all paths
it is used.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits)
KVM: Use CPU_DYING for disabling virtualization
KVM: Tune hotplug/suspend IPIs
KVM: Keep track of which cpus have virtualization enabled
SMP: Allow smp_call_function_single() to current cpu
i386: Allow smp_call_function_single() to current cpu
x86_64: Allow smp_call_function_single() to current cpu
HOTPLUG: Adapt thermal throttle to CPU_DYING
HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING
HOTPLUG: Add CPU_DYING notifier
KVM: Clean up #includes
KVM: Remove kvmfs in favor of the anonymous inodes source
KVM: SVM: Reliably detect if SVM was disabled by BIOS
KVM: VMX: Remove unnecessary code in vmx_tlb_flush()
KVM: MMU: Fix Wrong tlb flush order
KVM: VMX: Reinitialize the real-mode tss when entering real mode
KVM: Avoid useless memory write when possible
KVM: Fix x86 emulator writeback
KVM: Add support for in-kernel pio handlers
KVM: VMX: Fix interrupt checking on lightweight exit
KVM: Adds support for in-kernel mmio handlers
...
Allow fbcon to select the primary display adapter using the
fb_is_primary_device() arch-specific helper. If a a primary adapter is
detected, fbcon will unbind the old adapter from the VT layer, then rebind
using the new adapter. This requires that bind_/unbind_con_driver() be made
public.
Because this feature may produce unexpected behavior (from the user's POV),
this must be explicitly enabled in Kconfig.
[akpm@linux-foundation.org: export unbind_con_driver]
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add function helper, fb_is_primary_device(). Given struct fb_info, it will
return a nonzero value if the device is the primary display.
Currently, only the i386 is supported where the function checks for the
IORESOURCE_ROM_SHADOW flag.
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid dirtying remote cpu's memory if it already has the correct value.
Cc: Andi Kleen <ak@suse.de>
Cc: Konrad Rzeszutek <konrad@darnok.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on usage and testing over the past couple of years, kprobes on
i386, ia64, powerpc and x86_64 is no longer EXPERIMENTAL.
This is a follow-up to Robert P.J. Day's patch making "Instrumentation
support" non-EXPERIMENTAL:
http://marc.info/?l=linux-kernel&m=118396955423812&w=2
Arch maintainers for sparc64, avr32 and s390 need to take a similar call.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Identical implementations of PTRACE_POKEDATA go into generic_ptrace_pokedata()
function.
AFAICS, fix bug on xtensa where successful PTRACE_POKEDATA will nevertheless
return EPERM.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current generic bug implementation has a call to dump_stack() in case a
WARN_ON(whatever) gets hit. Since report_bug(), which calls dump_stack(),
gets called from an exception handler we can do better: just pass the
pt_regs structure to report_bug() and pass it to show_regs() in case of a
warning. This will give more debug informations like register contents,
etc... In addition this avoids some pointless lines that dump_stack()
emits, since it includes a stack backtrace of the exception handler which
is of no interest in case of a warning. E.g. on s390 the following lines
are currently always present in a stack backtrace if dump_stack() gets
called from report_bug():
[<000000000001517a>] show_trace+0x92/0xe8)
[<0000000000015270>] show_stack+0xa0/0xd0
[<00000000000152ce>] dump_stack+0x2e/0x3c
[<0000000000195450>] report_bug+0x98/0xf8
[<0000000000016cc8>] illegal_op+0x1fc/0x21c
[<00000000000227d6>] sysc_return+0x0/0x10
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This follows a suggestion from Chuck Ebbert on how to make seccomp
absolutely zerocost in schedule too. The only remaining footprint of
seccomp is in terms of the bzImage size that becomes a few bytes (perhaps
even a few kbytes) larger, measure it if you care in the embedded.
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make a "menuconfig" out of the Kconfig objects "menu, ..., endmenu",
so that the user can disable all the options in that menu at once
instead of having to disable each option separately.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: Philippe Elie <phil.el@wanadoo.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Needed to get fixed virtual address for USB debug and earlycon with mmio.
Signed-off-by: Eric W. Biderman <ebiderman@xmisson.com>
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes the requirement for callers to get_cpu() to check in simple
cases.
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Based on a patch from Joachim which didn't apply, so I fixed
it up by hand, and also corrected the surrounding indentation
a little.
Signed-off-by: Joachim.Deguara <joachim.deguara@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch contains the overdue removal of X86_SPEEDSTEP_CENTRINO_ACPI.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
On some motherboards ACPI C3 is available, but it isn't
causing frequency transition on VIA Nehemiah. Longhaul
wasn't working at all earlier, but due to
scaling_cur_speed returning true CPU frequency now, it
looks like CPU is getting stuck at highest frequency
since 2.6.21. I didn't find a reason. Halt is causing
frequency transition.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Fix sysfs_create_file return value handling
[CPUFREQ] ondemand: fix tickless accounting and software coordination bug
[CPUFREQ] ondemand: add a check to avoid negative load calculation
[CPUFREQ] Keep userspace governor quiet when it is not being used
[CPUFREQ] Longhaul - Proper register access
[CPUFREQ] Kconfig powernow-k8 driver should depend on ACPI P-States driver
[CPUFREQ] Longhaul - Replace ACPI functions with direct I/O
[CPUFREQ] Longhaul - Remove duplicate multipliers
[CPUFREQ] Longhaul - Embedded "conservative"
[CPUFREQ] acpi-cpufreq: Proper ReadModifyWrite of PERF_CTL MSR
[CPUFREQ] check return value of sysfs_create_file
[CPUFREQ] Longhaul - Check ACPI "BM DMA in progress" bit
[CPUFREQ] Longhaul - Move old_ratio to correct place
[CPUFREQ] Longhaul - VT8237 support
[CPUFREQ] Longhaul - Use all kinds of support
[CPUFREQ] powernow-k8: clarify number of cores.
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (34 commits)
PCI: Only build PCI syscalls on architectures that want them
PCI: limit pci_get_bus_and_slot to domain 0
PCI: hotplug: acpiphp: avoid acpiphp "cannot get bridge info" PCI hotplug failure
PCI: hotplug: acpiphp: remove hot plug parameter write to PCI host bridge
PCI: hotplug: acpiphp: fix slot poweroff problem on systems without _PS3
PCI: hotplug: pciehp: wait for 1 second after power off slot
PCI: pci_set_power_state(): check for PM capabilities earlier
PCI: cpci_hotplug: Convert to use the kthread API
PCI: add pci_try_set_mwi
PCI: pcie: remove SPIN_LOCK_UNLOCKED
PCI: ROUND_UP macro cleanup in drivers/pci
PCI: remove pci_dac_dma_... APIs
PCI: pci-x-pci-express-read-control-interfaces cleanups
PCI: Fix typo in include/linux/pci.h
PCI: pci_ids, remove double or more empty lines
PCI: pci_ids, add atheros and 3com_2 vendors
PCI: pci_ids, reorder some entries
PCI: i386: traps, change VENDOR to DEVICE
PCI: ATM: lanai, change VENDOR to DEVICE
PCI: Change all drivers to use pci_device->revision
...
This removes the old i386 setup code. This is done as a separate patch
to avoid breaking git bisect as some of the i386 code was also used by
the old x86-64 code.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch hooks the new x86 setup code into the Makefile machinery. It
also adapts boot/tools/build.c to a two-file (as opposed to three-file)
universe, and simplifies it substantially.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linker script to define the layout of the new x86 setup code.
Includes assert for size overflow and a misaligned setup header.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The assembly header and initialization code, and the main() routine.
main.c also contains some miscellaneous very short routines.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the code which actually does the switch to protected mode,
including all preparation. It is also responsible for invoking the
boot loader hooks, if present.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Video mode probing for the new x86 setup code. This code breaks down
different drivers into modules. This code deliberately drops support
for a lot of the vendor-specific mode probing present in the assembly
version, since a lot of those probes have been found to be stale in
current versions of those chips -- frequently, support for those modes
have been dropped from recent video BIOSes due to space constraints,
but the video BIOS signatures are still the same.
However, additional drivers should be extremely straightforward to plug
in, if desirable.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Voyager support for the new x86 setup code. This implements the same
functionality as the assembly version.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MCA probing support for the new x86 setup code. This implements the
same functionality as the assembly version.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Probe EDD and MBR signatures, in order to make it easier to map
physical hard drives to BIOS drives.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Verify that the CPU has enough features to run the kernel. This may
entail enabling features on some CPUs.
By doing this in the setup code we can be guaranteed to still be able to
write to the console through the BIOS.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Module which only includes the kernel version string.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This implements writing text to the console, including printf().
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simple command-line parser which allows us to access the kernel command
line from the setup code.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
APM probing code for the new x86 setup code. This implements the
same functionality as the assembly version.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A20 handling code for the new x86 setup code. This implements the same
algorithms as the assembly version.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
strcmp(), memcpy(), memset(), as well as routines to copy to and from
other segments (as pointed to by fs and gs).
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A simple collection of bitops for the new x86 setup code.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Top header file for the new x86 setup code.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc for i386 can be used with the assembly prefix ".code16gcc" to generate
16-bit (real-mode) code. This header file provides the assembly prefix.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make struct boot_params a real structure, and remove the handling of
some obsolete fields, in particular hd*_info, which was only used by
the ST-506 driver, and likely to be wrong for that driver on any
modern BIOS.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make definitions for struct e820entry and struct e820map
consistent between i386 and x86-64.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The relocatable kernel code needs a scratch field for the decompressor
to determine its own location. It was using a location inside
struct screen_info; reserve a free location and document it as scratch
instead.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some Intel features are spread around in different CPUID leafs like 0x5,
0x6 and 0xA. Make this feature detection code common across i386 and
x86_64.
Display Intel Dynamic Acceleration feature in /proc/cpuinfo. This feature
will be enabled automatically by current acpi-cpufreq driver.
Refer to Intel Software Developer's Manual for more details about the feature.
Thanks to hpa (H Peter Anvin) for the making the actual code detecting the
scattered features data-driven.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The X86_MINIMUM_CPU_MODEL name isn't really right, so change it to
X86_MINIMUM_CPU_FAMILY. Also, the default minimum should be 3, not 0.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unify the handling of the CPU features vectors between i386 and x86-64.
This also adopts the collapsing of features which are required at
compile-time into constant tests from x86-64 to i386.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
traps, change VENDOR to DEVICE
Change macro for SGI lithium (arch/i386/mach-visws/traps.c) device from
VENDOR to DEVICE, because it's a device id.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Instead of all drivers reading pci config space to get the revision
ID, they can now use the pci_device->revision member.
This exposes some issues where drivers where reading a word or a dword
for the revision number, and adding useless error-handling around the
read. Some drivers even just read it for no purpose of all.
In devices where the revision ID is being copied over and used in what
appears to be the equivalent of hotpath, I have left the copy code
and the cached copy as not to influence the driver's performance.
Compile tested with make all{yes,mod}config on x86_64 and i386.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
track TSC-unstable events and propagate it to the scheduler code.
Also allow sched_clock() to be used when the TSC is unstable,
the rq_clock() wrapper creates a reliable clock out of it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.
this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.
(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)
under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The printk level in this printk is bogus, as the previous printk
didn't have a terminating \n resulting in ..
Intel E7520/7320/7525 detected.<6>Disabling irq balancing and affinity
It also never printed a \n at all in the case where we didn't do
the quirk.
Change it to only make noise if it actually does something useful.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Writing to MSR 0x51400017 forces a hard reset on CS5536-based machines,
this has the reboot fixup do just that if such a board is detected.
Acked-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
o Commit 1833d6bc72 broke the build if
compiled with CONFIG_ES7000=y and CONFIG_X86_GENERICARCH=n
arch/i386/kernel/built-in.o(.init.text+0x4fa9): In function `acpi_parse_madt':
: undefined reference to `acpi_madt_oem_check'
arch/i386/kernel/built-in.o(.init.text+0x7406): In function `smp_read_mpc':
: undefined reference to `mps_oem_check'
arch/i386/kernel/built-in.o(.init.text+0x8990): In function
`connect_bsp_APIC':
: undefined reference to `enable_apic_mode'
make: *** [.tmp_vmlinux1] Error 1
o Fix the build issue. Provided the definitions of missing functions.
o Don't have ES7000 machine. Only compile tested.
Cc: Len Brown <lenb@kernel.org>
Cc: Natalie Protasevich <protasnb@gmail.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Processors synchronization in set_mtrr requires the .gate field to be set
after .count field is properly initialized. Without an explicit barrier,
the compiler was reordering those memory stores. That was sometimes
causing a processor (in ipi_handler) to see the .gate change and decrement
.count before the latter is set by set_mtrr() (which then hangs in a
infinite loop with irqs disabled).
Signed-off-by: Loic Prylli <loic@myri.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commit 635cf99a80 introduced a
regression. Executing a ptrace single step after certain int80
accesses will infinitely loop and never advance the PC.
The TIF_SINGLESTEP check should be done on the return from the syscall
and not before it.
I loops on each single step on the pop right after the int80 which writes out
to the console. At that point you can issue as many single steps as you want
and it will not advance any further.
The test case is below:
/* Test whether singlestep through an int80 syscall works.
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/user.h>
#include <string.h>
static int child, status;
static struct user_regs_struct regs;
static void do_child()
{
char str[80] = "child: int80 test\n";
ptrace(PTRACE_TRACEME, 0, 0, 0);
kill(getpid(), SIGUSR1);
write(fileno(stdout),str,strlen(str));
asm ("int $0x80" : : "a" (20)); /* getpid */
}
static void do_parent()
{
unsigned long eip, expected = 0;
again:
waitpid(child, &status, 0);
if (WIFEXITED(status) || WIFSIGNALED(status))
return;
if (WIFSTOPPED(status)) {
ptrace(PTRACE_GETREGS, child, 0, ®s);
eip = regs.eip;
if (expected)
fprintf(stderr, "child stop @ %08lx, expected %08lx %s\n",
eip, expected,
eip == expected ? "" : " <== ERROR");
if (*(unsigned short *)eip == 0x80cd) {
fprintf(stderr, "int 0x80 at %08x\n", (unsigned int)eip);
expected = eip + 2;
} else
expected = 0;
ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
}
goto again;
}
int main(int argc, char * const argv[])
{
child = fork();
if (child)
do_parent();
else
do_child();
return 0;
}
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: <stable@kernel.org>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When Andi reverted the HPET resource reservation (in commit
0f8dc2f065), he didn't remove the now
unused variables, which just causes gcc to be noisy.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this change it works again when the nmi watchdog is disabled.
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthias Lenk reports that the PCI subsystem would move the HPET on
SB400/SB600-based systems, where the HPET is in BAR1 of the SMbus
controller.
The reason? The ACPI layer registered the PCI MMIO range as being busy
too early, before PCI enumeration had happened, causing the PCI layer to
decide that it should relocate the resources somewhere else.
Firmware resources should be marked busy _after_ the PCI enumeration and
probing has happened, not before.
Remove the too-early reservation, we'll fix it up to do it properly
later. In the meantime, this solves the regression.
Tested-by: Matthias Lenk <matthias.lenk@amd.com>
Cc: Aaron Durbin <adurbin@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 3ebad59056 ("[PATCH] x86: Save and
restore the fixed-range MTRRs of the BSP when suspending") added mtrr
operations without verifying that the CPU has MTRRs. Crashes transmeta
CPUs.
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux@horizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 9215da3320 "fixed" the MTRR range
check to not allow any MTRR's under the 1MB mark (since that's where the
fixed MTRR's are active).
However, that was totally bogus, since it's normal (and almost required)
to have a large variable MTRR that starts at 0, and covers some large
percentage of the whole RAM, and then using the fixed MTRR's to override
that large MTRR to handle the special ISA hole in the 640k-1M region.
The old check was bogus too (checking that no variable MTRR is used that
is entirely under the 1MB range), but at least it wasn't actively
detrimental, because no sane situation would ever trigger such MTRR
usage in the first place.
That said, the whole notion of not allowing variable MTRR's in the low
1MB is just stupid, so rather than revert the commit, this just removes
the whole sad and unnecessary check entirely.
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Luca Palermo <darkmage@sabayonlinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Register %ebx serves as the "global offset table base register" for
position-independent code. For absolute code, %ebx serves as a local
register and has no specified role in the function calling sequence. In
either case, a function must preserve the register value for the caller.
acpi_copy_wakeup_routine overrides %ebx without saving it, this may corrupt
the called data.
Kevin found that most time the value of Sx is saved in %esi, however
sometimes compiler also uses %ebx. When this happens, suspends fails since
sleep value in ebx is changed by acpi_copy_wakeup_routine.
The same funtion in X86_64 doesn't have this problem.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Looks-okay-to: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is only used for PAE kernels in set_64bit.
The problem is that due to a old Windows bug many CPUs need magic MSRs
to enable CMPXCHG64, and we can't do that nicely early enough before
it is potentially used.
But since we only need it in PAE kernels so only force the checking
for CMPXCHG65 with PAE.
This fixes a boot failure on Transmeta Crusoe
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do not mark the kernel text read only if KPROBES is in the kernel;
kprobes needs to hot-patch the kernel text to insert it's
instrumentation.
In this case, only mark the .rodata segment as read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: S. P. Prasanna <prasanna@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: William Cohen <wcohen@redhat.com>
Cc: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In previous commit I used u32 for u16 register.
This code will work only when ACPI block address is set.
For now it is only for VT8235 and VT8237.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
WARNING: arch/x86_64/kernel/built-in.o(.text+0xace9): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/x86_64/kernel/built-in.o(.text+0xad09): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: arch/x86_64/kernel/built-in.o(.text+0xad38): Section mismatch: reference to .init.text: (between 'get_mtrr_state' and 'mtrr_wrmsr')
WARNING: drivers/built-in.o(.text+0x3a680): Section mismatch: reference to .init.text:acpi_map_pxm_to_node (between 'acpi_get_node' and 'acpi_lock_ac_dir')
AK: also marked mtrr_bp_init __init to avoid some more warnings
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Disable CLFLUSH again; it is still broken. Always do WBINVD.
- Always flush in the i386 case, not only when there are deferred pages.
These are both brute-force inefficient fixes, to be improved
next release cycle.
The changes to i386 are a little more extensive than strictly
needed (some dead code added), but it is more similar to the x86-64 version
now and the dead code will be used soon.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now Kprobes cannot write to the write protected kernel text when
DEBUG_RODATA is enabled. Disallow this in Kconfig for now.
Temporary fix for 2.6.22. In .23 add code to temporarily
unprotect it.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Several reports that VIA bridges don't support DAC and corrupt
data. I don't know if it's fixed, but let's just blacklist
them all for now.
It can be overwritten with iommu=usedac
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix oops triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog
The culprit seems to be 09198e68501a7e34737cd9264d266f42429abcdc:
[PATCH] i386: Clean up NMI watchdog code
In two places, the parameters to release_{evntsel,perfctr}_nmi
got interchanged during the cleanup.
Fix interchanged parameters to release_{evntsel,perfctr}_nmi.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When disabled through /proc/sys/kernel/nmi_watchdog, the NMI watchdog uses the
stop() method directly, which does not decrement the activity counter, leading
to a BUG(). Use the wrapper function instead to fix that.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At system boot time, the NMI watchdog no longer reserved its MSRs, allowing
other subsystems to mess with them. Fix that.
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix various bits of obviously-busted code which we're not happening to
compile, due to ifdefs.
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a minor fix, but what is currently there is essentially wrong.
In do_page_fault, if the faulting address from user code happens to be
in kernel address space (int *p = (int*)-1; p = 0xbed;) then the
do_page_fault handler will jump over the local_irq_enable with the
goto bad_area_nosemaphore;
But the first line there sees this is user code and goes through the
process of sending a signal to send SIGSEGV to the user task. This whole
time interrupts are disabled and the task can not be preempted by a
higher priority task.
This patch always enables interrupts in the user path of the
bad_area_nosemaphore.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
powernow-k8 really needs to use ACPI to function on SMP systems.
The current Kconfig allows us to build kernels which fail mysteriously
for some users due to us trying to automatically enable this, and
getting it wrong. It's easier to just present this as an option
to the user.
Signed-off-by: Joshua Hoblitt <jhoblitt@ifa.hawaii.edu>
Signed-off-by: Dave Jones <davej@redhat.com>
Current version of "bm status" bit test works as long as
no USB device is in use. When USB device is plugged in ACPI
function in this context is always returning 1. Until reboot.
Direct I/O is working fine even when many USB devices are
connected.
Change bm_timeout value to less annoying. 1000 is still much
more then worst case observed and it is much better when status
bit gets stuck.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Rafael gets this on an SMP box with kernel preemption enabled, during
hibernation and restore (100% of the time):
Enabling non-boot CPUs ...
BUG: using smp_processor_id() in preemptible [00000001] code: bash/4514
caller is mtrr_save_state+0x9/0x40
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the following section mismatch warnings in microcode.c:
WARNING: arch/i386/kernel/built-in.o(.init.text+0x3966): Section mismatch: reference to .exit.text: (between 'microcode_init' and 'parse_maxcpus')
WARNING: arch/i386/kernel/built-in.o(.init.text+0x3992): Section mismatch: reference to .exit.text: (between 'microcode_init' and 'parse_maxcpus')
The warning are caused by a function marked __init that
calls a function marked __exit.
Functions marked __exit may be discarded either during link or run-time
and thus the reference is not good.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Force Dell E520 to use the BIOS to shutdown/reboot.
I have at least one report that this patch fixes shutdown/reboot
problems on the Dell E520 platform.
(Andi says: People can always set the boot option. It hardly seems like a
critical issue needing a backport.)
Signed-off-by: Tim Gardner <tim.gardner@ubuntu.com>
Acked-by: Andi Kleen <ak@suse.de>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chuck reports that the recent fix from Andi to oprofile
6c977aad03 introduces a double free. Each
cpu's cpu_msrs is setup to point to cpu 0's, which causes free_msrs to free
cpu 0's pointers for_each_possible_cpu. Rather than copy the pointers, do
a deep copy instead.
[acme@redhat.com: allocate_msrs() was using for_each_online_cpu()]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dave Jones <davej@redhat.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfgang gets:
PCI: Cannot allocate resource region 0 of device 0000:00:04.0
PCI: Error while updating region 0000:00:04.0/0 (a8008000 != fec08000)
Note that the BAR seems to have high address bits hardwired to fec00000.
And device 0000:00:04.0 is
00:04.0 System peripheral: Siemens Nixdorf AG FSC Multiprocessor Interrupt Controller (rev 02)
I'd guess that when we try to reassign this resource, PCI interrupts might
just stop working. This could explain SCSI timeouts and other weird things.
Cc: Wolfgang Erig <Wolfgang.Erig@gmx.de>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jarek Poplawski noted that boot_cpu_data.x86_cache_size is signed int
and can be < 0 too.
In fact we test for it. Except we assigned it to an unsigned value..
Cc: Jarek Poplawski <jarkao2@o2.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove duplicate multipliers in clock_ratio table. On 1,4GHz
Nehemiah two frequencies are present twice in table. It isn't
fatal, but with voltage scaling enabled each will be set twice.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Longhaul with voltage scaling enabled works great on Ezra
CPU (Longhaul ver. 2). As long as "conservative" governor is
used. Both "ondemand" and "userspace" can change voltage
from min to max at once. Motherboard unfortunatly turns off
when vid difference is big. Longhaul was printing warning
message, but it is not enough. Now driver will have
"conservative" governor built in and will split bigger
changes to smaller ones.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
During recent acpi-cpufreq changes, writing to PERF_CTL msr
changed from RMW of entire 64 bit to RMW of low 32 bit and clearing of
upper 32 bit. Fix it back to do a proper RMW of the MSR.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
It is good idea to wait for PCI bus to become idle before
frequency change. Thanks to ACPI it is possible. It makes
sense only when northbridge support is in use because it is
only case in which we can disable arbiter after check if PCI
bus is busy.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Move one line where it should be. After first transition
Longhaul will skip frequency transition if destination
frequency is already set.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
Looks like VT8237 has the same bits which VT8235 has.
Poke registers if it is found.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch is removing southbridge support as separate
kind of support. Instead it is used to make other kinds
of support more stable. Also northbridge and ACPI C3
support both will be used if both are available.
Signed-off-by: Rafal Bilski <rafalbilski@interia.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
WARNING: arch/i386/mach-generic/built-in.o(.data+0xc4): Section mismatch: reference to .init.text: (between 'apic_bigsmp' and 'cpu.4905')
This appears to be resolvable by removing all the __init and __initdata
qualifiers from arch/i386/mach-generic/bigsmp.c
Signed-off-by: William Irwin <bill.irwin@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's not sufficiently documented that CONFIG_HOTPLUG_CPU is required for
suspend/hibernation on SMP.
Point out the non-obvious.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>