WSL2-Linux-Kernel/arch/x86/Kconfig

1813 строки
57 KiB
Plaintext
Исходник Обычный вид История

# x86 configuration
mainmenu "Linux Kernel Configuration for x86"
# Select 32 or 64 bit
config 64BIT
x86: simplify "make ARCH=x86" and fix kconfig all.config Simplify "make ARCH=x86" and fix kconfig so we again can set 64BIT in all.config. For a fix the diffstat is nice: 6 files changed, 3 insertions(+), 36 deletions(-) The patch reverts these commits: - 0f855aa64b3f63d35a891510cf7db932a435c116 ("kconfig: add helper to set config symbol from environment variable") - 2a113281f5cd2febbab21a93c8943f8d3eece4d3 ("kconfig: use $K64BIT to set 64BIT with all*config targets") Roman Zippel pointed out that kconfig supported string compares so the additional complexity introduced by the above two patches were not needed. With this patch we have following behaviour: # make {allno,allyes,allmod,rand}config [ARCH=...] option \ host arch | 32bit | 64bit ===================================================== ./. | 32bit | 64bit ARCH=x86 | 32bit | 32bit ARCH=i386 | 32bit | 32bit ARCH=x86_64 | 64bit | 64bit The general rule are that ARCH= and native architecture takes precedence over the configuration. So make ARCH=i386 [whatever] will always build a 32-bit kernel no matter what the configuration says. The configuration will be updated to 32-bit if it was configured to 64-bit and the other way around. This behaviour is consistent with previous behaviour so no suprises here. make ARCH=x86 will per default result in a 32-bit kernel but as the only ARCH= value x86 allow the user to select between 32-bit and 64-bit using menuconfig. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Andreas Herrmann <aherrman@arcor.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-17 17:37:31 +03:00
bool "64-bit kernel" if ARCH = "x86"
default ARCH = "x86_64"
help
Say yes to build a 64-bit kernel - formerly known as x86_64
Say no to build a 32-bit kernel - formerly known as i386
config X86_32
def_bool !64BIT
config X86_64
def_bool 64BIT
### Arch settings
config X86
def_bool y
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_IDE
select HAVE_OPROFILE
select HAVE_IOREMAP_PROT
x86: lockless get_user_pages_fast() Implement get_user_pages_fast without locking in the fastpath on x86. Do an optimistic lockless pagetable walk, without taking mmap_sem or any page table locks or even mmap_sem. Page table existence is guaranteed by turning interrupts off (combined with the fact that we're always looking up the current mm, means we can do the lockless page table walk within the constraints of the TLB shootdown design). Basically we can do this lockless pagetable walk in a similar manner to the way the CPU's pagetable walker does not have to take any locks to find present ptes. This patch (combined with the subsequent ones to convert direct IO to use it) was found to give about 10% performance improvement on a 2 socket 8 core Intel Xeon system running an OLTP workload on DB2 v9.5 "To test the effects of the patch, an OLTP workload was run on an IBM x3850 M2 server with 2 processors (quad-core Intel Xeon processors at 2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing runs with and without the patch resulted in an overall performance benefit of ~9.8%. Correspondingly, oprofiles showed that samples from __up_read and __down_read routines that is seen during thread contention for system resources was reduced from 2.8% down to .05%. Monitoring the /proc/vmstat output from the patched run showed that the counter for fast_gup contained a very high number while the fast_gup_slow value was zero." (fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a counter we had for the number of times the slowpath was invoked). The main reason for the improvement is that DB2 has multiple threads each issuing direct-IO. Direct-IO uses get_user_pages, and thus the threads contend the mmap_sem cacheline, and can also contend on page table locks. I would anticipate larger performance gains on larger systems, however I think DB2 uses an adaptive mix of threads and processes, so it could be that thread contention remains pretty constant as machine size increases. In which case, we stuck with "only" a 10% gain. The downside of using get_user_pages_fast is that if there is not a pte with the correct permissions for the access, we end up falling back to get_user_pages and so the get_user_pages_fast is a bit of extra work. However this should not be the common case in most performance critical code. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: Kconfig fix] [akpm@linux-foundation.org: Makefile fix/cleanup] [akpm@linux-foundation.org: warning fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 06:45:24 +04:00
select HAVE_GET_USER_PAGES_FAST
select HAVE_KPROBES
select ARCH_WANT_OPTIONAL_GPIOLIB
select HAVE_KRETPROBES
select HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE
select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
select HAVE_ARCH_KGDB if !X86_VOYAGER
select HAVE_GENERIC_DMA_COHERENT if X86_32
select HAVE_EFFICIENT_UNALIGNED_ACCESS
config ARCH_DEFCONFIG
string
default "arch/x86/configs/i386_defconfig" if X86_32
default "arch/x86/configs/x86_64_defconfig" if X86_64
config GENERIC_LOCKBREAK
x86: FIFO ticket spinlocks Introduce ticket lock spinlocks for x86 which are FIFO. The implementation is described in the comments. The straight-line lock/unlock instruction sequence is slightly slower than the dec based locks on modern x86 CPUs, however the difference is quite small on Core2 and Opteron when working out of cache, and becomes almost insignificant even on P4 when the lock misses cache. trylock is more significantly slower, but they are relatively rare. On an 8 core (2 socket) Opteron, spinlock unfairness is extremely noticable, with a userspace test having a difference of up to 2x runtime per thread, and some threads are starved or "unfairly" granted the lock up to 1 000 000 (!) times. After this patch, all threads appear to finish at exactly the same time. The memory ordering of the lock does conform to x86 standards, and the implementation has been reviewed by Intel and AMD engineers. The algorithm also tells us how many CPUs are contending the lock, so lockbreak becomes trivial and we no longer have to waste 4 bytes per spinlock for it. After this, we can no longer spin on any locks with preempt enabled and cannot reenable interrupts when spinning on an irq safe lock, because at that point we have already taken a ticket and the would deadlock if the same CPU tries to take the lock again. These are questionable anyway: if the lock happens to be called under a preempt or interrupt disabled section, then it will just have the same latency problems. The real fix is to keep critical sections short, and ensure locks are reasonably fair (which this patch does). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30 15:31:21 +03:00
def_bool n
config GENERIC_TIME
def_bool y
config GENERIC_CMOS_UPDATE
def_bool y
config CLOCKSOURCE_WATCHDOG
def_bool y
config GENERIC_CLOCKEVENTS
def_bool y
config GENERIC_CLOCKEVENTS_BROADCAST
def_bool y
depends on X86_64 || (X86_32 && X86_LOCAL_APIC)
config LOCKDEP_SUPPORT
def_bool y
config STACKTRACE_SUPPORT
def_bool y
config HAVE_LATENCYTOP_SUPPORT
def_bool y
SLUB: Alternate fast paths using cmpxchg_local Provide an alternate implementation of the SLUB fast paths for alloc and free using cmpxchg_local. The cmpxchg_local fast path is selected for arches that have CONFIG_FAST_CMPXCHG_LOCAL set. An arch should only set CONFIG_FAST_CMPXCHG_LOCAL if the cmpxchg_local is faster than an interrupt enable/disable sequence. This is known to be true for both x86 platforms so set FAST_CMPXCHG_LOCAL for both arches. Currently another requirement for the fastpath is that the kernel is compiled without preemption. The restriction will go away with the introduction of a new per cpu allocator and new per cpu operations. The advantages of a cmpxchg_local based fast path are: 1. Potentially lower cycle count (30%-60% faster) 2. There is no need to disable and enable interrupts on the fast path. Currently interrupts have to be disabled and enabled on every slab operation. This is likely avoiding a significant percentage of interrupt off / on sequences in the kernel. 3. The disposal of freed slabs can occur with interrupts enabled. The alternate path is realized using #ifdef's. Several attempts to do the same with macros and inline functions resulted in a mess (in particular due to the strange way that local_interrupt_save() handles its argument and due to the need to define macros/functions that sometimes disable interrupts and sometimes do something else). [clameter: Stripped preempt bits and disabled fastpath if preempt is enabled] Signed-off-by: Christoph Lameter <clameter@sgi.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-01-08 10:20:30 +03:00
config FAST_CMPXCHG_LOCAL
bool
default y
config MMU
def_bool y
config ZONE_DMA
def_bool y
config SBUS
bool
config GENERIC_ISA_DMA
def_bool y
config GENERIC_IOMAP
def_bool y
config GENERIC_BUG
def_bool y
depends on BUG
config GENERIC_HWEIGHT
def_bool y
config GENERIC_GPIO
def_bool n
config ARCH_MAY_HAVE_PC_FDC
def_bool y
config RWSEM_GENERIC_SPINLOCK
def_bool !X86_XADD
config RWSEM_XCHGADD_ALGORITHM
def_bool X86_XADD
config ARCH_HAS_ILOG2_U32
def_bool n
config ARCH_HAS_ILOG2_U64
def_bool n
config ARCH_HAS_CPU_IDLE_WAIT
def_bool y
config GENERIC_CALIBRATE_DELAY
def_bool y
config GENERIC_TIME_VSYSCALL
bool
default X86_64
config ARCH_HAS_CPU_RELAX
def_bool y
config ARCH_HAS_CACHE_LINE_SIZE
def_bool y
config HAVE_SETUP_PER_CPU_AREA
x86: cleanup early per cpu variables/accesses v4 * Introduce a new PER_CPU macro called "EARLY_PER_CPU". This is used by some per_cpu variables that are initialized and accessed before there are per_cpu areas allocated. ["Early" in respect to per_cpu variables is "earlier than the per_cpu areas have been setup".] This patchset adds these new macros: DEFINE_EARLY_PER_CPU(_type, _name, _initvalue) EXPORT_EARLY_PER_CPU_SYMBOL(_name) DECLARE_EARLY_PER_CPU(_type, _name) early_per_cpu_ptr(_name) early_per_cpu_map(_name, _idx) early_per_cpu(_name, _cpu) The DEFINE macro defines the per_cpu variable as well as the early map and pointer. It also initializes the per_cpu variable and map elements to "_initvalue". The early_* macros provide access to the initial map (usually setup during system init) and the early pointer. This pointer is initialized to point to the early map but is then NULL'ed when the actual per_cpu areas are setup. After that the per_cpu variable is the correct access to the variable. The early_per_cpu() macro is not very efficient but does show how to access the variable if you have a function that can be called both "early" and "late". It tests the early ptr to be NULL, and if not then it's still valid. Otherwise, the per_cpu variable is used instead: #define early_per_cpu(_name, _cpu) \ (early_per_cpu_ptr(_name) ? \ early_per_cpu_ptr(_name)[_cpu] : \ per_cpu(_name, _cpu)) A better method is to actually check the pointer manually. In the case below, numa_set_node can be called both "early" and "late": void __cpuinit numa_set_node(int cpu, int node) { int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map); if (cpu_to_node_map) cpu_to_node_map[cpu] = node; else per_cpu(x86_cpu_to_node_map, cpu) = node; } * Add a flag "arch_provides_topology_pointers" that indicates pointers to topology cpumask_t maps are available. Otherwise, use the function returning the cpumask_t value. This is useful if cpumask_t set size is very large to avoid copying data on to/off of the stack. * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while the non-debug case has been optimized a bit. * Remove an unreferenced compiler warning in drivers/base/topology.c * Clean up #ifdef in setup.c For inclusion into sched-devel/latest tree. Based on: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git + sched-devel/latest .../mingo/linux-2.6-sched-devel.git Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-12 23:21:12 +04:00
def_bool X86_64_SMP || (X86_SMP && !X86_VOYAGER)
config HAVE_CPUMASK_OF_CPU_MAP
def_bool X86_64_SMP
config ARCH_HIBERNATION_POSSIBLE
def_bool y
depends on !SMP || !X86_VOYAGER
config ARCH_SUSPEND_POSSIBLE
def_bool y
depends on !X86_VOYAGER
config ZONE_DMA32
bool
default X86_64
config ARCH_POPULATES_NODE_MAP
def_bool y
config AUDIT_ARCH
bool
default X86_64
config ARCH_SUPPORTS_AOUT
def_bool y
config ARCH_SUPPORTS_OPTIMIZED_INLINING
def_bool y
# Use the generic interrupt handling code in kernel/irq/:
config GENERIC_HARDIRQS
bool
default y
config GENERIC_IRQ_PROBE
bool
default y
config GENERIC_PENDING_IRQ
bool
depends on GENERIC_HARDIRQS && SMP
default y
config X86_SMP
bool
depends on SMP && ((X86_32 && !X86_VOYAGER) || X86_64)
select USE_GENERIC_SMP_HELPERS
default y
config X86_32_SMP
def_bool y
depends on X86_32 && SMP
config X86_64_SMP
def_bool y
depends on X86_64 && SMP
config X86_HT
bool
depends on SMP
depends on (X86_32 && !X86_VOYAGER) || X86_64
default y
config X86_BIOS_REBOOT
bool
depends on !X86_VOYAGER
default y
config X86_TRAMPOLINE
bool
depends on X86_SMP || (X86_VOYAGER && SMP) || (64BIT && ACPI_SLEEP)
default y
config KTIME_SCALAR
def_bool X86_32
source "init/Kconfig"
menu "Processor type and features"
source "kernel/time/Kconfig"
config SMP
bool "Symmetric multi-processing support"
---help---
This enables support for systems with more than one CPU. If you have
a system with only one CPU, like most personal computers, say N. If
you have a system with more than one CPU, say Y.
If you say N here, the kernel will run on single and multiprocessor
machines, but will use only one CPU of a multiprocessor machine. If
you say Y here, the kernel will run on many, but not all,
singleprocessor machines. On a singleprocessor machine, the kernel
will run faster if you say N here.
Note that if you say Y here and choose architecture "586" or
"Pentium" under "Processor family", the kernel will not work on 486
architectures. Similarly, multiprocessor kernels for the "PPro"
architecture may not work on all Pentium based boards.
People using multiprocessor machines who say Y here should also say
Y to "Enhanced Real Time Clock Support", below. The "Advanced Power
Management" code will be disabled if you say Y here.
See also <file:Documentation/i386/IO-APIC.txt>,
<file:Documentation/nmi_watchdog.txt> and the SMP-HOWTO available at
<http://www.tldp.org/docs.html#howto>.
If you don't know what to do here, say N.
config X86_FIND_SMP_CONFIG
def_bool y
depends on X86_MPPARSE || X86_VOYAGER
if ACPI
config X86_MPPARSE
def_bool y
bool "Enable MPS table"
depends on X86_LOCAL_APIC
help
For old smp systems that do not have proper acpi support. Newer systems
(esp with 64bit cpus) with acpi support, MADT and DSDT will override it
endif
if !ACPI
config X86_MPPARSE
def_bool y
depends on X86_LOCAL_APIC
endif
choice
prompt "Subarchitecture Type"
default X86_PC
config X86_PC
bool "PC-compatible"
help
Choose this option if your computer is a standard PC or compatible.
config X86_ELAN
bool "AMD Elan"
depends on X86_32
help
Select this for an AMD Elan processor.
Do not use this option for K6/Athlon/Opteron processors!
If unsure, choose "PC-compatible" instead.
config X86_VOYAGER
bool "Voyager (NCR)"
depends on X86_32 && (SMP || BROKEN) && !PCI
help
Voyager is an MCA-based 32-way capable SMP architecture proprietary
to NCR Corp. Machine classes 345x/35xx/4100/51xx are Voyager-based.
*** WARNING ***
If you do not specifically know you have a Voyager based machine,
say N here, otherwise the kernel you build will not be bootable.
config X86_GENERICARCH
bool "Generic architecture"
depends on X86_32
help
This option compiles in the NUMAQ, Summit, bigsmp, ES7000, default
subarchitectures. It is intended for a generic binary kernel.
if you select them all, kernel will probe it one by one. and will
fallback to default.
if X86_GENERICARCH
config X86_NUMAQ
bool "NUMAQ (IBM/Sequent)"
depends on SMP && X86_32 && PCI && X86_MPPARSE
select NUMA
help
This option is used for getting Linux to run on a NUMAQ (IBM/Sequent)
NUMA multiquad box. This changes the way that processors are
bootstrapped, and uses Clustered Logical APIC addressing mode instead
of Flat Logical. You will need a new lynxer.elf file to flash your
firmware with - send email to <Martin.Bligh@us.ibm.com>.
config X86_SUMMIT
bool "Summit/EXA (IBM x440)"
depends on X86_32 && SMP
help
This option is needed for IBM systems that use the Summit/EXA chipset.
In particular, it is needed for the x440.
config X86_ES7000
bool "Support for Unisys ES7000 IA32 series"
depends on X86_32 && SMP
help
Support for Unisys ES7000 systems. Say 'Y' here if this kernel is
supposed to run on an IA32-based Unisys ES7000 system.
config X86_BIGSMP
bool "Support for big SMP systems with more than 8 CPUs"
depends on X86_32 && SMP
help
This option is needed for the systems that have more than 8 CPUs
and if the system is not of any sub-arch type above.
endif
config X86_VSMP
bool "Support for ScaleMP vSMP"
select PARAVIRT
depends on X86_64 && PCI
help
Support for ScaleMP vSMP systems. Say 'Y' here if this kernel is
supposed to run on these EM64T-based machines. Only choose this option
if you have one of these machines.
endchoice
config X86_VISWS
bool "SGI 320/540 (Visual Workstation)"
depends on X86_32 && PCI && !X86_VOYAGER && X86_MPPARSE && PCI_GODIRECT
help
The SGI Visual Workstation series is an IA32-based workstation
based on SGI systems chips with some legacy PC hardware attached.
Say Y here to create a kernel to run on the SGI 320 or 540.
A kernel compiled for the Visual Workstation will run on general
PCs as well. See <file:Documentation/sgi-visws.txt> for details.
config X86_RDC321X
bool "RDC R-321x SoC"
depends on X86_32
select M486
select X86_REBOOTFIXUPS
help
This option is needed for RDC R-321x system-on-chip, also known
as R-8610-(G).
If you don't have one of these chips, you should say N here.
config SCHED_NO_NO_OMIT_FRAME_POINTER
def_bool y
prompt "Single-depth WCHAN output"
depends on X86_32
help
Calculate simpler /proc/<PID>/wchan values. If this option
is disabled then wchan values will recurse back to the
caller function. This provides more accurate wchan values,
at the expense of slightly more scheduling overhead.
If in doubt, say "Y".
menuconfig PARAVIRT_GUEST
bool "Paravirtualized guest support"
help
Say Y here to get to see options related to running Linux under
various hypervisors. This option alone does not add any kernel code.
If you say N, all options in this submenu will be skipped and disabled.
if PARAVIRT_GUEST
source "arch/x86/xen/Kconfig"
config VMI
bool "VMI Guest support"
select PARAVIRT
depends on X86_32
depends on !X86_VOYAGER
help
VMI provides a paravirtualized interface to the VMware ESX server
(it could be used by other hypervisors in theory too, but is not
at the moment), by linking the kernel to a GPL-ed ROM module
provided by the hypervisor.
config KVM_CLOCK
bool "KVM paravirtualized clock"
select PARAVIRT
select PARAVIRT_CLOCK
depends on !X86_VOYAGER
help
Turning on this option will allow you to run a paravirtualized clock
when running over the KVM hypervisor. Instead of relying on a PIT
(or probably other) emulation by the underlying device model, the host
provides the guest with timing infrastructure such as time of day, and
system time
config KVM_GUEST
bool "KVM Guest support"
select PARAVIRT
depends on !X86_VOYAGER
help
This option enables various optimizations for running under the KVM
hypervisor.
source "arch/x86/lguest/Kconfig"
config PARAVIRT
bool "Enable paravirtualization code"
depends on !X86_VOYAGER
help
This changes the kernel so it can modify itself when it is run
under a hypervisor, potentially improving performance significantly
over full virtualization. However, when run without a hypervisor
the kernel is theoretically slower and slightly larger.
config PARAVIRT_CLOCK
bool
default n
endif
config PARAVIRT_DEBUG
bool "paravirt-ops debugging"
depends on PARAVIRT && DEBUG_KERNEL
help
Enable to debug paravirt_ops internals. Specifically, BUG if
a paravirt_op is missing when it is called.
config MEMTEST
bool "Memtest"
help
This option adds a kernel parameter 'memtest', which allows memtest
to be set.
memtest=0, mean disabled; -- default
memtest=1, mean do 1 test pattern;
...
memtest=4, mean do 4 test patterns.
If you are unsure how to answer this question, answer N.
config X86_SUMMIT_NUMA
def_bool y
depends on X86_32 && NUMA && X86_GENERICARCH
config X86_CYCLONE_TIMER
def_bool y
depends on X86_GENERICARCH
config ES7000_CLUSTERED_APIC
def_bool y
depends on SMP && X86_ES7000 && MPENTIUMIII
source "arch/x86/Kconfig.cpu"
config HPET_TIMER
def_bool X86_64
prompt "HPET Timer Support" if X86_32
help
Use the IA-PC HPET (High Precision Event Timer) to manage
time in preference to the PIT and RTC, if a HPET is
present.
HPET is the next generation timer replacing legacy 8254s.
The HPET provides a stable time base on SMP
systems, unlike the TSC, but it is more expensive to access,
as it is off-chip. You can find the HPET spec at
<http://www.intel.com/hardwaredesign/hpetspec.htm>.
You can safely choose Y here. However, HPET will only be
activated if the platform and the BIOS support this feature.
Otherwise the 8254 will be used for timing services.
Choose N to continue using the legacy 8254 timer.
config HPET_EMULATE_RTC
def_bool y
depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || RTC_DRV_CMOS=y)
# Mark as embedded because too many people got it wrong.
# The code disables itself when not needed.
config DMI
default y
bool "Enable DMI scanning" if EMBEDDED
help
Enabled scanning of DMI to identify machine quirks. Say Y
here unless you have verified that your setup is not
affected by entries in the DMI blacklist. Required by PNP
BIOS code.
config GART_IOMMU
bool "GART IOMMU support" if EMBEDDED
default y
select SWIOTLB
select AGP
depends on X86_64 && PCI
help
Support for full DMA access of devices with 32bit memory access only
on systems with more than 3GB. This is usually needed for USB,
sound, many IDE/SATA chipsets and some other devices.
Provides a driver for the AMD Athlon64/Opteron/Turion/Sempron GART
based hardware IOMMU and a software bounce buffer based IOMMU used
on Intel systems and as fallback.
The code is only active when needed (enough memory and limited
device) unless CONFIG_IOMMU_DEBUG or iommu=force is specified
too.
config CALGARY_IOMMU
bool "IBM Calgary IOMMU support"
select SWIOTLB
depends on X86_64 && PCI && EXPERIMENTAL
help
Support for hardware IOMMUs in IBM's xSeries x366 and x460
systems. Needed to run systems with more than 3GB of memory
properly with 32-bit PCI devices that do not support DAC
(Double Address Cycle). Calgary also supports bus level
isolation, where all DMAs pass through the IOMMU. This
prevents them from going anywhere except their intended
destination. This catches hard-to-find kernel bugs and
mis-behaving drivers and devices that do not use the DMA-API
properly to set up their DMA buffers. The IOMMU can be
turned off at boot time with the iommu=off parameter.
Normally the kernel will make the right choice by itself.
If unsure, say Y.
config CALGARY_IOMMU_ENABLED_BY_DEFAULT
def_bool y
prompt "Should Calgary be enabled by default?"
depends on CALGARY_IOMMU
help
Should Calgary be enabled by default? if you choose 'y', Calgary
will be used (if it exists). If you choose 'n', Calgary will not be
used even if it exists. If you choose 'n' and would like to use
Calgary anyway, pass 'iommu=calgary' on the kernel command line.
If unsure, say Y.
config AMD_IOMMU
bool "AMD IOMMU support"
select SWIOTLB
depends on X86_64 && PCI && ACPI
help
With this option you can enable support for AMD IOMMU hardware in
your system. An IOMMU is a hardware component which provides
remapping of DMA memory accesses from devices. With an AMD IOMMU you
can isolate the the DMA memory of different devices and protect the
system from misbehaving device drivers or hardware.
You can find out if your system has an AMD IOMMU if you look into
your BIOS for an option to enable it or if you have an IVRS ACPI
table.
# need this always selected by IOMMU for the VIA workaround
config SWIOTLB
bool
help
Support for software bounce buffers used on x86-64 systems
which don't have a hardware IOMMU (e.g. the current generation
of Intel's x86-64 CPUs). Using this PCI devices which can only
access 32-bits of memory can be used on systems with more than
3 GB of memory. If unsure, say Y.
config IOMMU_HELPER
def_bool (CALGARY_IOMMU || GART_IOMMU || SWIOTLB || AMD_IOMMU)
config MAXSMP
bool "Configure Maximum number of SMP Processors and NUMA Nodes"
depends on X86_64 && SMP
default n
help
Configure maximum number of CPUS and NUMA Nodes for this architecture.
If unsure, say N.
if MAXSMP
config NR_CPUS
int
default "4096"
endif
if !MAXSMP
config NR_CPUS
int "Maximum number of CPUs (2-4096)"
range 2 4096
depends on SMP
default "32" if X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000
default "8"
help
This allows you to specify the maximum number of CPUs which this
kernel will support. The maximum supported value is 4096 and the
minimum value which makes sense is 2.
This is purely to save memory - each supported CPU adds
approximately eight kilobytes to the kernel image.
endif
config SCHED_SMT
bool "SMT (Hyperthreading) scheduler support"
depends on X86_HT
help
SMT scheduler support improves the CPU scheduler's decision making
when dealing with Intel Pentium 4 chips with HyperThreading at a
cost of slightly increased overhead in some places. If unsure say
N here.
config SCHED_MC
def_bool y
prompt "Multi-core scheduler support"
depends on X86_HT
help
Multi-core scheduler support improves the CPU scheduler's decision
making when dealing with multi-core CPU chips at a cost of slightly
increased overhead in some places. If unsure say N here.
source "kernel/Kconfig.preempt"
config X86_UP_APIC
bool "Local APIC support on uniprocessors"
depends on X86_32 && !SMP && !(X86_VOYAGER || X86_GENERICARCH)
help
A local APIC (Advanced Programmable Interrupt Controller) is an
integrated interrupt controller in the CPU. If you have a single-CPU
system which has a processor with a local APIC, you can say Y here to
enable and use it. If you say Y here even though your machine doesn't
have a local APIC, then the kernel will still run with no slowdown at
all. The local APIC supports CPU-generated self-interrupts (timer,
performance counters), and the NMI watchdog which detects hard
lockups.
config X86_UP_IOAPIC
bool "IO-APIC support on uniprocessors"
depends on X86_UP_APIC
help
An IO-APIC (I/O Advanced Programmable Interrupt Controller) is an
SMP-capable replacement for PC-style interrupt controllers. Most
SMP systems and many recent uniprocessor systems have one.
If you have a single-CPU system with an IO-APIC, you can say Y here
to use it. If you say Y here even though your machine doesn't have
an IO-APIC, then the kernel will still run with no slowdown at all.
config X86_LOCAL_APIC
def_bool y
depends on X86_64 || (X86_32 && (X86_UP_APIC || (SMP && !X86_VOYAGER) || X86_GENERICARCH))
config X86_IO_APIC
def_bool y
depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP && !X86_VOYAGER) || X86_GENERICARCH))
config X86_VISWS_APIC
def_bool y
depends on X86_32 && X86_VISWS
config X86_MCE
bool "Machine Check Exception"
depends on !X86_VOYAGER
---help---
Machine Check Exception support allows the processor to notify the
kernel if it detects a problem (e.g. overheating, component failure).
The action the kernel takes depends on the severity of the problem,
ranging from a warning message on the console, to halting the machine.
Your processor must be a Pentium or newer to support this - check the
flags in /proc/cpuinfo for mce. Note that some older Pentium systems
have a design flaw which leads to false MCE events - hence MCE is
disabled on all P5 processors, unless explicitly enabled with "mce"
as a boot argument. Similarly, if MCE is built in and creates a
problem on some new non-standard machine, you can boot with "nomce"
to disable it. MCE support simply ignores non-MCE processors like
the 386 and 486, so nearly everyone can say Y here.
config X86_MCE_INTEL
def_bool y
prompt "Intel MCE features"
depends on X86_64 && X86_MCE && X86_LOCAL_APIC
help
Additional support for intel specific MCE features such as
the thermal monitor.
config X86_MCE_AMD
def_bool y
prompt "AMD MCE features"
depends on X86_64 && X86_MCE && X86_LOCAL_APIC
help
Additional support for AMD specific MCE features such as
the DRAM Error Threshold.
config X86_MCE_NONFATAL
tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4"
depends on X86_32 && X86_MCE
help
Enabling this feature starts a timer that triggers every 5 seconds which
will look at the machine check registers to see if anything happened.
Non-fatal problems automatically get corrected (but still logged).
Disable this if you don't want to see these messages.
Seeing the messages this option prints out may be indicative of dying
or out-of-spec (ie, overclocked) hardware.
This option only does something on certain CPUs.
(AMD Athlon/Duron and Intel Pentium 4)
config X86_MCE_P4THERMAL
bool "check for P4 thermal throttling interrupt."
depends on X86_32 && X86_MCE && (X86_UP_APIC || SMP)
help
Enabling this feature will cause a message to be printed when the P4
enters thermal throttling.
config VM86
bool "Enable VM86 support" if EMBEDDED
default y
depends on X86_32
help
This option is required by programs like DOSEMU to run 16-bit legacy
code on X86 processors. It also may be needed by software like
XFree86 to initialize some video cards via BIOS. Disabling this
option saves about 6k.
config TOSHIBA
tristate "Toshiba Laptop support"
depends on X86_32
---help---
This adds a driver to safely access the System Management Mode of
the CPU on Toshiba portables with a genuine Toshiba BIOS. It does
not work on models with a Phoenix BIOS. The System Management Mode
is used to set the BIOS and power saving options on Toshiba portables.
For information on utilities to make use of this driver see the
Toshiba Linux utilities web site at:
<http://www.buzzard.org.uk/toshiba/>.
Say Y if you intend to run this kernel on a Toshiba portable.
Say N otherwise.
config I8K
tristate "Dell laptop support"
---help---
This adds a driver to safely access the System Management Mode
of the CPU on the Dell Inspiron 8000. The System Management Mode
is used to read cpu temperature and cooling fan status and to
control the fans on the I8K portables.
This driver has been tested only on the Inspiron 8000 but it may
also work with other Dell laptops. You can force loading on other
models by passing the parameter `force=1' to the module. Use at
your own risk.
For information on utilities to make use of this driver see the
I8K Linux utilities web site at:
<http://people.debian.org/~dz/i8k/>
Say Y if you intend to run this kernel on a Dell Inspiron 8000.
Say N otherwise.
config X86_REBOOTFIXUPS
def_bool n
prompt "Enable X86 board specific fixups for reboot"
depends on X86_32 && X86
---help---
This enables chipset and/or board specific fixups to be done
in order to get reboot to work correctly. This is only needed on
some combinations of hardware and BIOS. The symptom, for which
this config is intended, is when reboot ends with a stalled/hung
system.
Currently, the only fixup is for the Geode machines using
CS5530A and CS5536 chipsets and the RDC R-321x SoC.
Say Y if you want to enable the fixup. Currently, it's safe to
enable this option even if you don't need it.
Say N otherwise.
config MICROCODE
tristate "/dev/cpu/microcode - Intel IA32 CPU microcode support"
select FW_LOADER
---help---
If you say Y here, you will be able to update the microcode on
Intel processors in the IA32 family, e.g. Pentium Pro, Pentium II,
Pentium III, Pentium 4, Xeon etc. You will obviously need the
actual microcode binary data itself which is not shipped with the
Linux kernel.
For latest news and information on obtaining all the required
ingredients for this driver, check:
<http://www.urbanmyth.org/microcode/>.
To compile this driver as a module, choose M here: the
module will be called microcode.
config MICROCODE_OLD_INTERFACE
def_bool y
depends on MICROCODE
config X86_MSR
tristate "/dev/cpu/*/msr - Model-specific register support"
help
This device gives privileged processes access to the x86
Model-Specific Registers (MSRs). It is a character device with
major 202 and minors 0 to 31 for /dev/cpu/0/msr to /dev/cpu/31/msr.
MSR accesses are directed to a specific CPU on multi-processor
systems.
config X86_CPUID
tristate "/dev/cpu/*/cpuid - CPU information support"
help
This device gives processes access to the x86 CPUID instruction to
be executed on a specific processor. It is a character device
with major 203 and minors 0 to 31 for /dev/cpu/0/cpuid to
/dev/cpu/31/cpuid.
choice
prompt "High Memory Support"
default HIGHMEM4G if !X86_NUMAQ
default HIGHMEM64G if X86_NUMAQ
depends on X86_32
config NOHIGHMEM
bool "off"
depends on !X86_NUMAQ
---help---
Linux can use up to 64 Gigabytes of physical memory on x86 systems.
However, the address space of 32-bit x86 processors is only 4
Gigabytes large. That means that, if you have a large amount of
physical memory, not all of it can be "permanently mapped" by the
kernel. The physical memory that's not permanently mapped is called
"high memory".
If you are compiling a kernel which will never run on a machine with
more than 1 Gigabyte total physical RAM, answer "off" here (default
choice and suitable for most users). This will result in a "3GB/1GB"
split: 3GB are mapped so that each process sees a 3GB virtual memory
space and the remaining part of the 4GB virtual memory space is used
by the kernel to permanently map as much physical memory as
possible.
If the machine has between 1 and 4 Gigabytes physical RAM, then
answer "4GB" here.
If more than 4 Gigabytes is used then answer "64GB" here. This
selection turns Intel PAE (Physical Address Extension) mode on.
PAE implements 3-level paging on IA32 processors. PAE is fully
supported by Linux, PAE mode is implemented on all recent Intel
processors (Pentium Pro and better). NOTE: If you say "64GB" here,
then the kernel will not boot on CPUs that don't support PAE!
The actual amount of total physical memory will either be
auto detected or can be forced by using a kernel command line option
such as "mem=256M". (Try "man bootparam" or see the documentation of
your boot loader (lilo or loadlin) about how to pass options to the
kernel at boot time.)
If unsure, say "off".
config HIGHMEM4G
bool "4GB"
depends on !X86_NUMAQ
help
Select this if you have a 32-bit processor and between 1 and 4
gigabytes of physical RAM.
config HIGHMEM64G
bool "64GB"
depends on !M386 && !M486
select X86_PAE
help
Select this if you have a 32-bit processor and more than 4
gigabytes of physical RAM.
endchoice
choice
depends on EXPERIMENTAL
prompt "Memory split" if EMBEDDED
default VMSPLIT_3G
depends on X86_32
help
Select the desired split between kernel and user memory.
If the address range available to the kernel is less than the
physical memory installed, the remaining memory will be available
as "high memory". Accessing high memory is a little more costly
than low memory, as it needs to be mapped into the kernel first.
Note that increasing the kernel address space limits the range
available to user programs, making the address space there
tighter. Selecting anything other than the default 3G/1G split
will also likely make your kernel incompatible with binary-only
kernel modules.
If you are not absolutely sure what you are doing, leave this
option alone!
config VMSPLIT_3G
bool "3G/1G user/kernel split"
config VMSPLIT_3G_OPT
depends on !X86_PAE
bool "3G/1G user/kernel split (for full 1G low memory)"
config VMSPLIT_2G
bool "2G/2G user/kernel split"
config VMSPLIT_2G_OPT
depends on !X86_PAE
bool "2G/2G user/kernel split (for full 2G low memory)"
config VMSPLIT_1G
bool "1G/3G user/kernel split"
endchoice
config PAGE_OFFSET
hex
default 0xB0000000 if VMSPLIT_3G_OPT
default 0x80000000 if VMSPLIT_2G
default 0x78000000 if VMSPLIT_2G_OPT
default 0x40000000 if VMSPLIT_1G
default 0xC0000000
depends on X86_32
config HIGHMEM
def_bool y
depends on X86_32 && (HIGHMEM64G || HIGHMEM4G)
config X86_PAE
def_bool n
prompt "PAE (Physical Address Extension) Support"
depends on X86_32 && !HIGHMEM4G
select RESOURCES_64BIT
help
PAE is required for NX support, and furthermore enables
larger swapspace support for non-overcommit purposes. It
has the cost of more pagetable lookup overhead, and also
consumes more pagetable space per process.
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support (EXPERIMENTAL)"
depends on SMP
depends on X86_64 || (X86_32 && HIGHMEM64G && (X86_NUMAQ || X86_BIGSMP || X86_SUMMIT && ACPI) && EXPERIMENTAL)
default n if X86_PC
default y if (X86_NUMAQ || X86_SUMMIT || X86_BIGSMP)
help
Enable NUMA (Non Uniform Memory Access) support.
The kernel will try to allocate memory used by a CPU on the
local memory controller of the CPU and add some more
NUMA awareness to the kernel.
For i386 this is currently highly experimental and should be only
used for kernel development. It might also cause boot failures.
For x86_64 this is recommended on all multiprocessor Opteron systems.
If the system is EM64T, you should say N unless your system is
EM64T NUMA.
comment "NUMA (Summit) requires SMP, 64GB highmem support, ACPI"
depends on X86_32 && X86_SUMMIT && (!HIGHMEM64G || !ACPI)
config K8_NUMA
def_bool y
prompt "Old style AMD Opteron NUMA detection"
depends on X86_64 && NUMA && PCI
help
Enable K8 NUMA node topology detection. You should say Y here if
you have a multi processor AMD K8 system. This uses an old
method to read the NUMA configuration directly from the builtin
Northbridge of Opteron. It is recommended to use X86_64_ACPI_NUMA
instead, which also takes priority if both are compiled in.
config X86_64_ACPI_NUMA
def_bool y
prompt "ACPI NUMA detection"
depends on X86_64 && NUMA && ACPI && PCI
select ACPI_NUMA
help
Enable ACPI SRAT based node topology detection.
# Some NUMA nodes have memory ranges that span
# other nodes. Even though a pfn is valid and
# between a node's start and end pfns, it may not
# reside on that node. See memmap_init_zone()
# for details.
config NODES_SPAN_OTHER_NODES
def_bool y
depends on X86_64_ACPI_NUMA
config NUMA_EMU
bool "NUMA emulation"
depends on X86_64 && NUMA
help
Enable NUMA emulation. A flat machine will be split
into virtual nodes when booted with "numa=fake=N", where N is the
number of nodes. This is only useful for debugging.
if MAXSMP
config NODES_SHIFT
int
default "9"
endif
if !MAXSMP
config NODES_SHIFT
int "Maximum NUMA Nodes (as a power of 2)"
range 1 9 if X86_64
default "6" if X86_64
default "4" if X86_NUMAQ
default "3"
depends on NEED_MULTIPLE_NODES
help
Specify the maximum number of NUMA Nodes available on the target
system. Increases memory reserved to accomodate various tables.
endif
config HAVE_ARCH_BOOTMEM_NODE
def_bool y
depends on X86_32 && NUMA
config ARCH_HAVE_MEMORY_PRESENT
def_bool y
depends on X86_32 && DISCONTIGMEM
config NEED_NODE_MEMMAP_SIZE
def_bool y
depends on X86_32 && (DISCONTIGMEM || SPARSEMEM)
config HAVE_ARCH_ALLOC_REMAP
def_bool y
depends on X86_32 && NUMA
config ARCH_FLATMEM_ENABLE
def_bool y
depends on X86_32 && ARCH_SELECT_MEMORY_MODEL && X86_PC && !NUMA
config ARCH_DISCONTIGMEM_ENABLE
def_bool y
x86: 64-bit, make sparsemem vmemmap the only memory model Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements indicate that DISCONTIGMEM has a higher overhead than sparsemem. And FLATMEMs benefits are minimal. So I think its best to simply standardize on sparsemem. Results of page allocator tests (test can be had via git from slab git tree branch tests) Measurements in cycle counts. 1000 allocations were performed and then the average cycle count was calculated. Order FlatMem Discontig SparseMem 0 639 665 641 1 567 647 593 2 679 774 692 3 763 967 781 4 961 1501 962 5 1356 2344 1392 6 2224 3982 2336 7 4869 7225 5074 8 12500 14048 12732 9 27926 28223 28165 10 58578 58714 58682 (Note that FlatMem is an SMP config and the rest NUMA configurations) Memory use: SMP Sparsemem ------------- Kernel size: text data bss dec hex filename 3849268 397739 1264856 5511863 541ab7 vmlinux total used free shared buffers cached Mem: 8242252 41164 8201088 0 352 11512 -/+ buffers/cache: 29300 8212952 Swap: 9775512 0 9775512 SMP Flatmem ----------- Kernel size: text data bss dec hex filename 3844612 397739 1264536 5506887 540747 vmlinux So 4.5k growth in text size vs. FLATMEM. total used free shared buffers cached Mem: 8244052 40544 8203508 0 352 11484 -/+ buffers/cache: 28708 8215344 2k growth in overall memory use after boot. NUMA discontig: text data bss dec hex filename 3888124 470659 1276504 5635287 55fcd7 vmlinux total used free shared buffers cached Mem: 8256256 56908 8199348 0 352 11496 -/+ buffers/cache: 45060 8211196 Swap: 9775512 0 9775512 NUMA sparse: text data bss dec hex filename 3896428 470659 1276824 5643911 561e87 vmlinux 8k text growth. Given that we fully inline virt_to_page and friends now that is rather good. total used free shared buffers cached Mem: 8264720 57240 8207480 0 352 11516 -/+ buffers/cache: 45372 8219348 Swap: 9775512 0 9775512 The total available memory is increased by 8k. This patch makes sparsemem the default and removes discontig and flatmem support from x86. [ akpm@linux-foundation.org: allnoconfig build fix ] Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 15:30:47 +03:00
depends on NUMA && X86_32
config ARCH_DISCONTIGMEM_DEFAULT
def_bool y
x86: 64-bit, make sparsemem vmemmap the only memory model Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements indicate that DISCONTIGMEM has a higher overhead than sparsemem. And FLATMEMs benefits are minimal. So I think its best to simply standardize on sparsemem. Results of page allocator tests (test can be had via git from slab git tree branch tests) Measurements in cycle counts. 1000 allocations were performed and then the average cycle count was calculated. Order FlatMem Discontig SparseMem 0 639 665 641 1 567 647 593 2 679 774 692 3 763 967 781 4 961 1501 962 5 1356 2344 1392 6 2224 3982 2336 7 4869 7225 5074 8 12500 14048 12732 9 27926 28223 28165 10 58578 58714 58682 (Note that FlatMem is an SMP config and the rest NUMA configurations) Memory use: SMP Sparsemem ------------- Kernel size: text data bss dec hex filename 3849268 397739 1264856 5511863 541ab7 vmlinux total used free shared buffers cached Mem: 8242252 41164 8201088 0 352 11512 -/+ buffers/cache: 29300 8212952 Swap: 9775512 0 9775512 SMP Flatmem ----------- Kernel size: text data bss dec hex filename 3844612 397739 1264536 5506887 540747 vmlinux So 4.5k growth in text size vs. FLATMEM. total used free shared buffers cached Mem: 8244052 40544 8203508 0 352 11484 -/+ buffers/cache: 28708 8215344 2k growth in overall memory use after boot. NUMA discontig: text data bss dec hex filename 3888124 470659 1276504 5635287 55fcd7 vmlinux total used free shared buffers cached Mem: 8256256 56908 8199348 0 352 11496 -/+ buffers/cache: 45060 8211196 Swap: 9775512 0 9775512 NUMA sparse: text data bss dec hex filename 3896428 470659 1276824 5643911 561e87 vmlinux 8k text growth. Given that we fully inline virt_to_page and friends now that is rather good. total used free shared buffers cached Mem: 8264720 57240 8207480 0 352 11516 -/+ buffers/cache: 45372 8219348 Swap: 9775512 0 9775512 The total available memory is increased by 8k. This patch makes sparsemem the default and removes discontig and flatmem support from x86. [ akpm@linux-foundation.org: allnoconfig build fix ] Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 15:30:47 +03:00
depends on NUMA && X86_32
config ARCH_SPARSEMEM_DEFAULT
def_bool y
depends on X86_64
config ARCH_SPARSEMEM_ENABLE
def_bool y
x86: 64-bit, make sparsemem vmemmap the only memory model Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements indicate that DISCONTIGMEM has a higher overhead than sparsemem. And FLATMEMs benefits are minimal. So I think its best to simply standardize on sparsemem. Results of page allocator tests (test can be had via git from slab git tree branch tests) Measurements in cycle counts. 1000 allocations were performed and then the average cycle count was calculated. Order FlatMem Discontig SparseMem 0 639 665 641 1 567 647 593 2 679 774 692 3 763 967 781 4 961 1501 962 5 1356 2344 1392 6 2224 3982 2336 7 4869 7225 5074 8 12500 14048 12732 9 27926 28223 28165 10 58578 58714 58682 (Note that FlatMem is an SMP config and the rest NUMA configurations) Memory use: SMP Sparsemem ------------- Kernel size: text data bss dec hex filename 3849268 397739 1264856 5511863 541ab7 vmlinux total used free shared buffers cached Mem: 8242252 41164 8201088 0 352 11512 -/+ buffers/cache: 29300 8212952 Swap: 9775512 0 9775512 SMP Flatmem ----------- Kernel size: text data bss dec hex filename 3844612 397739 1264536 5506887 540747 vmlinux So 4.5k growth in text size vs. FLATMEM. total used free shared buffers cached Mem: 8244052 40544 8203508 0 352 11484 -/+ buffers/cache: 28708 8215344 2k growth in overall memory use after boot. NUMA discontig: text data bss dec hex filename 3888124 470659 1276504 5635287 55fcd7 vmlinux total used free shared buffers cached Mem: 8256256 56908 8199348 0 352 11496 -/+ buffers/cache: 45060 8211196 Swap: 9775512 0 9775512 NUMA sparse: text data bss dec hex filename 3896428 470659 1276824 5643911 561e87 vmlinux 8k text growth. Given that we fully inline virt_to_page and friends now that is rather good. total used free shared buffers cached Mem: 8264720 57240 8207480 0 352 11516 -/+ buffers/cache: 45372 8219348 Swap: 9775512 0 9775512 The total available memory is increased by 8k. This patch makes sparsemem the default and removes discontig and flatmem support from x86. [ akpm@linux-foundation.org: allnoconfig build fix ] Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 15:30:47 +03:00
depends on X86_64 || NUMA || (EXPERIMENTAL && X86_PC)
select SPARSEMEM_STATIC if X86_32
select SPARSEMEM_VMEMMAP_ENABLE if X86_64
config ARCH_SELECT_MEMORY_MODEL
def_bool y
x86: 64-bit, make sparsemem vmemmap the only memory model Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements indicate that DISCONTIGMEM has a higher overhead than sparsemem. And FLATMEMs benefits are minimal. So I think its best to simply standardize on sparsemem. Results of page allocator tests (test can be had via git from slab git tree branch tests) Measurements in cycle counts. 1000 allocations were performed and then the average cycle count was calculated. Order FlatMem Discontig SparseMem 0 639 665 641 1 567 647 593 2 679 774 692 3 763 967 781 4 961 1501 962 5 1356 2344 1392 6 2224 3982 2336 7 4869 7225 5074 8 12500 14048 12732 9 27926 28223 28165 10 58578 58714 58682 (Note that FlatMem is an SMP config and the rest NUMA configurations) Memory use: SMP Sparsemem ------------- Kernel size: text data bss dec hex filename 3849268 397739 1264856 5511863 541ab7 vmlinux total used free shared buffers cached Mem: 8242252 41164 8201088 0 352 11512 -/+ buffers/cache: 29300 8212952 Swap: 9775512 0 9775512 SMP Flatmem ----------- Kernel size: text data bss dec hex filename 3844612 397739 1264536 5506887 540747 vmlinux So 4.5k growth in text size vs. FLATMEM. total used free shared buffers cached Mem: 8244052 40544 8203508 0 352 11484 -/+ buffers/cache: 28708 8215344 2k growth in overall memory use after boot. NUMA discontig: text data bss dec hex filename 3888124 470659 1276504 5635287 55fcd7 vmlinux total used free shared buffers cached Mem: 8256256 56908 8199348 0 352 11496 -/+ buffers/cache: 45060 8211196 Swap: 9775512 0 9775512 NUMA sparse: text data bss dec hex filename 3896428 470659 1276824 5643911 561e87 vmlinux 8k text growth. Given that we fully inline virt_to_page and friends now that is rather good. total used free shared buffers cached Mem: 8264720 57240 8207480 0 352 11516 -/+ buffers/cache: 45372 8219348 Swap: 9775512 0 9775512 The total available memory is increased by 8k. This patch makes sparsemem the default and removes discontig and flatmem support from x86. [ akpm@linux-foundation.org: allnoconfig build fix ] Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30 15:30:47 +03:00
depends on ARCH_SPARSEMEM_ENABLE
config ARCH_MEMORY_PROBE
def_bool X86_64
depends on MEMORY_HOTPLUG
source "mm/Kconfig"
config HIGHPTE
bool "Allocate 3rd-level pagetables from highmem"
depends on X86_32 && (HIGHMEM4G || HIGHMEM64G)
help
The VM uses one page table entry for each page of physical memory.
For systems with a lot of RAM, this can be wasteful of precious
low memory. Setting this option will put user-space page table
entries in high memory.
config MATH_EMULATION
bool
prompt "Math emulation" if X86_32
---help---
Linux can emulate a math coprocessor (used for floating point
operations) if you don't have one. 486DX and Pentium processors have
a math coprocessor built in, 486SX and 386 do not, unless you added
a 487DX or 387, respectively. (The messages during boot time can
give you some hints here ["man dmesg"].) Everyone needs either a
coprocessor or this emulation.
If you don't have a math coprocessor, you need to say Y here; if you
say Y here even though you have a coprocessor, the coprocessor will
be used nevertheless. (This behavior can be changed with the kernel
command line option "no387", which comes handy if your coprocessor
is broken. Try "man bootparam" or see the documentation of your boot
loader (lilo or loadlin) about how to pass options to the kernel at
boot time.) This means that it is a good idea to say Y here if you
intend to use this kernel on different machines.
More information about the internals of the Linux math coprocessor
emulation can be found in <file:arch/x86/math-emu/README>.
If you are not sure, say Y; apart from resulting in a 66 KB bigger
kernel, it won't hurt.
config MTRR
bool "MTRR (Memory Type Range Register) support"
---help---
On Intel P6 family processors (Pentium Pro, Pentium II and later)
the Memory Type Range Registers (MTRRs) may be used to control
processor access to memory ranges. This is most useful if you have
a video (VGA) card on a PCI or AGP bus. Enabling write-combining
allows bus write transfers to be combined into a larger transfer
before bursting over the PCI/AGP bus. This can increase performance
of image write operations 2.5 times or more. Saying Y here creates a
/proc/mtrr file which may be used to manipulate your processor's
MTRRs. Typically the X server should use this.
This code has a reasonably generic interface so that similar
control registers on other processors can be easily supported
as well:
The Cyrix 6x86, 6x86MX and M II processors have Address Range
Registers (ARRs) which provide a similar functionality to MTRRs. For
these, the ARRs are used to emulate the MTRRs.
The AMD K6-2 (stepping 8 and above) and K6-3 processors have two
MTRRs. The Centaur C6 (WinChip) has 8 MCRs, allowing
write-combining. All of these processors are supported by this code
and it makes sense to say Y here if you have one of them.
Saying Y here also fixes a problem with buggy SMP BIOSes which only
set the MTRRs for the boot CPU and not for the secondary CPUs. This
can lead to all sorts of problems, so it's good to say Y here.
You can safely say Y even if your machine doesn't have MTRRs, you'll
just add about 9 KB to your kernel.
See <file:Documentation/mtrr.txt> for more information.
config MTRR_SANITIZER
bool
prompt "MTRR cleanup support"
depends on MTRR
help
Convert MTRR layout from continuous to discrete, so X drivers can
add writeback entries.
Can be disabled with disable_mtrr_cleanup on the kernel command line.
The largest mtrr entry size for a continous block can be set with
mtrr_chunk_size.
If unsure, say N.
config MTRR_SANITIZER_ENABLE_DEFAULT
int "MTRR cleanup enable value (0-1)"
range 0 1
default "0"
depends on MTRR_SANITIZER
help
Enable mtrr cleanup default value
config MTRR_SANITIZER_SPARE_REG_NR_DEFAULT
int "MTRR cleanup spare reg num (0-7)"
range 0 7
default "1"
depends on MTRR_SANITIZER
help
mtrr cleanup spare entries default, it can be changed via
mtrr_spare_reg_nr=N on the kernel command line.
config X86_PAT
bool
prompt "x86 PAT support"
depends on MTRR
help
Use PAT attributes to setup page level cache control.
PATs are the modern equivalents of MTRRs and are much more
flexible than MTRRs.
Say N here if you see bootup problems (boot crash, boot hang,
spontaneous reboots) or a non-working video driver.
If unsure, say Y.
config EFI
def_bool n
prompt "EFI runtime service support"
depends on ACPI
---help---
This enables the kernel to use EFI runtime services that are
available (such as the EFI variable services).
This option is only useful on systems that have EFI firmware.
In addition, you should use the latest ELILO loader available
at <http://elilo.sourceforge.net> in order to take advantage
of EFI runtime services. However, even with this option, the
resultant kernel should continue to boot on existing non-EFI
platforms.
config IRQBALANCE
def_bool y
prompt "Enable kernel irq balancing"
depends on X86_32 && SMP && X86_IO_APIC
help
The default yes will allow the kernel to do irq load balancing.
Saying no will keep the kernel from doing irq load balancing.
config SECCOMP
def_bool y
prompt "Enable seccomp to safely compute untrusted bytecode"
depends on PROC_FS
help
This kernel feature is useful for number crunching applications
that may need to compute untrusted bytecode during their
execution. By using pipes or other transports made available to
the process as file descriptors supporting the read/write
syscalls, it's possible to isolate those applications in
their own address space using seccomp. Once seccomp is
enabled via /proc/<pid>/seccomp, it cannot be disabled
and the task is only allowed to execute a few safe syscalls
defined by each seccomp mode.
If unsure, say Y. Only embedded should say N here.
config CC_STACKPROTECTOR
bool "Enable -fstack-protector buffer overflow detection (EXPERIMENTAL)"
depends on X86_64 && EXPERIMENTAL && BROKEN
help
This option turns on the -fstack-protector GCC feature. This
feature puts, at the beginning of critical functions, a canary
value on the stack just before the return address, and validates
the value just before actually returning. Stack based buffer
overflows (that need to overwrite this return address) now also
overwrite the canary, which gets detected and the attack is then
neutralized via a kernel panic.
This feature requires gcc version 4.2 or above, or a distribution
gcc with the feature backported. Older versions are automatically
detected and for those versions, this configuration option is ignored.
config CC_STACKPROTECTOR_ALL
bool "Use stack-protector for all functions"
depends on CC_STACKPROTECTOR
help
Normally, GCC only inserts the canary value protection for
functions that use large-ish on-stack buffers. By enabling
this option, GCC will be asked to do this for ALL functions.
source kernel/Kconfig.hz
config KEXEC
bool "kexec system call"
depends on X86_BIOS_REBOOT
help
kexec is a system call that implements the ability to shutdown your
current kernel, and to start another kernel. It is like a reboot
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.
The name comes from the similarity to the exec system call.
It is an ongoing process to be certain the hardware in a machine
is properly shutdown, so do not be surprised if this code does not
initially work for you. It may help to enable device hotplugging
support. As of this writing the exact hardware interface is
strongly in flux, so no good recommendation can be made.
config CRASH_DUMP
bool "kernel crash dumps (EXPERIMENTAL)"
depends on X86_64 || (X86_32 && HIGHMEM)
help
Generate crash dump after being started by kexec.
This should be normally only set in special crash dump kernels
which are loaded in the main kernel with kexec-tools into
a specially reserved region and then later executed after
a crash by kdump/kexec. The crash dump kernel must be compiled
to a memory address not used by the main kernel or BIOS using
PHYSICAL_START, or it must be built as a relocatable image
(CONFIG_RELOCATABLE=y).
For more details see Documentation/kdump/kdump.txt
kexec jump This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y CONFIG_PM=y CONFIG_KEXEC_JUMP=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 06:45:07 +04:00
config KEXEC_JUMP
bool "kexec jump (EXPERIMENTAL)"
depends on EXPERIMENTAL
kexec jump: save/restore device state This patch implements devices state save/restore before after kexec. This patch together with features in kexec_jump patch can be used for following: - A simple hibernation implementation without ACPI support. You can kexec a hibernating kernel, save the memory image of original system and shutdown the system. When resuming, you restore the memory image of original system via ordinary kexec load then jump back. - Kernel/system debug through making system snapshot. You can make system snapshot, jump back, do some thing and make another system snapshot. - Cooperative multi-kernel/system. With kexec jump, you can switch between several kernels/systems quickly without boot process except the first time. This appears like swap a whole kernel/system out/in. - A general method to call program in physical mode (paging turning off). This can be used to invoke BIOS code under Linux. The following user-space tools can be used with kexec jump: - kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 - makedumpfile with patches are used as memory image saving tool, it can exclude free pages from original kernel memory image file. The patches and the precompiled makedumpfile can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10 - An initramfs image can be used as the root file system of kexeced kernel. An initramfs image built with "BuildRoot" can be downloaded from the following URL: initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz All user space tools above are included in the initramfs image. Usage example of simple hibernation: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_RELOCATABLE=y CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y CONFIG_PM=y CONFIG_HIBERNATION=y CONFIG_KEXEC_JUMP=y 2. Build an initramfs image contains kexec-tool and makedumpfile, or download the pre-built initramfs image, called rootfs.gz in following text. 3. Prepare a partition to save memory image of original kernel, called hibernating partition in following text. 4. Boot kernel compiled in step 1 (kernel A). 5. In the kernel A, load kernel compiled in step 1 (kernel B) with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000 --mem-max=0xffffff --initrd=rootfs.gz 6. Boot the kernel B with following shell command line: /sbin/kexec -e 7. The kernel B will boot as normal kexec. In kernel B the memory image of kernel A can be saved into hibernating partition as follow: jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='` echo $jump_back_entry > kexec_jump_back_entry cp /proc/vmcore dump.elf Then you can shutdown the machine as normal. 8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as root file system. 9. In kernel C, load the memory image of kernel A as follow: /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf 10. Jump back to the kernel A as follow: /sbin/kexec -e Then, kernel A is resumed. Implementation point: To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. Known issues: - Because the segment number supported by sys_kexec_load is limited, hibernation image with many segments may not be load. This is planned to be eliminated by adding a new flag to sys_kexec_load to make a image can be loaded with multiple sys_kexec_load invoking. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 06:45:10 +04:00
depends on KEXEC && HIBERNATION && X86_32
kexec jump This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y CONFIG_PM=y CONFIG_KEXEC_JUMP=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 06:45:07 +04:00
help
kexec jump: save/restore device state This patch implements devices state save/restore before after kexec. This patch together with features in kexec_jump patch can be used for following: - A simple hibernation implementation without ACPI support. You can kexec a hibernating kernel, save the memory image of original system and shutdown the system. When resuming, you restore the memory image of original system via ordinary kexec load then jump back. - Kernel/system debug through making system snapshot. You can make system snapshot, jump back, do some thing and make another system snapshot. - Cooperative multi-kernel/system. With kexec jump, you can switch between several kernels/systems quickly without boot process except the first time. This appears like swap a whole kernel/system out/in. - A general method to call program in physical mode (paging turning off). This can be used to invoke BIOS code under Linux. The following user-space tools can be used with kexec jump: - kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 - makedumpfile with patches are used as memory image saving tool, it can exclude free pages from original kernel memory image file. The patches and the precompiled makedumpfile can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10 - An initramfs image can be used as the root file system of kexeced kernel. An initramfs image built with "BuildRoot" can be downloaded from the following URL: initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz All user space tools above are included in the initramfs image. Usage example of simple hibernation: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_RELOCATABLE=y CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y CONFIG_PM=y CONFIG_HIBERNATION=y CONFIG_KEXEC_JUMP=y 2. Build an initramfs image contains kexec-tool and makedumpfile, or download the pre-built initramfs image, called rootfs.gz in following text. 3. Prepare a partition to save memory image of original kernel, called hibernating partition in following text. 4. Boot kernel compiled in step 1 (kernel A). 5. In the kernel A, load kernel compiled in step 1 (kernel B) with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000 --mem-max=0xffffff --initrd=rootfs.gz 6. Boot the kernel B with following shell command line: /sbin/kexec -e 7. The kernel B will boot as normal kexec. In kernel B the memory image of kernel A can be saved into hibernating partition as follow: jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='` echo $jump_back_entry > kexec_jump_back_entry cp /proc/vmcore dump.elf Then you can shutdown the machine as normal. 8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as root file system. 9. In kernel C, load the memory image of kernel A as follow: /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf 10. Jump back to the kernel A as follow: /sbin/kexec -e Then, kernel A is resumed. Implementation point: To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. Known issues: - Because the segment number supported by sys_kexec_load is limited, hibernation image with many segments may not be load. This is planned to be eliminated by adding a new flag to sys_kexec_load to make a image can be loaded with multiple sys_kexec_load invoking. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 06:45:10 +04:00
Jump between original kernel and kexeced kernel and invoke
code in physical address mode via KEXEC
kexec jump This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y CONFIG_PM=y CONFIG_KEXEC_JUMP=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 06:45:07 +04:00
config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
default "0x1000000" if X86_NUMAQ
default "0x200000" if X86_64
default "0x100000"
help
This gives the physical address where the kernel is loaded.
If kernel is a not relocatable (CONFIG_RELOCATABLE=n) then
bzImage will decompress itself to above physical address and
run from there. Otherwise, bzImage will run from the address where
it has been loaded by the boot loader and will ignore above physical
address.
In normal kdump cases one does not have to set/change this option
as now bzImage can be compiled as a completely relocatable image
(CONFIG_RELOCATABLE=y) and be used to load and run from a different
address. This option is mainly useful for the folks who don't want
to use a bzImage for capturing the crash dump and want to use a
vmlinux instead. vmlinux is not relocatable hence a kernel needs
to be specifically compiled to run from a specific memory area
(normally a reserved region) and this option comes handy.
So if you are using bzImage for capturing the crash dump, leave
the value here unchanged to 0x100000 and set CONFIG_RELOCATABLE=y.
Otherwise if you plan to use vmlinux for capturing the crash dump
change this value to start of the reserved region (Typically 16MB
0x1000000). In other words, it can be set based on the "X" value as
specified in the "crashkernel=YM@XM" command line boot parameter
passed to the panic-ed kernel. Typically this parameter is set as
crashkernel=64M@16M. Please take a look at
Documentation/kdump/kdump.txt for more details about crash dumps.
Usage of bzImage for capturing the crash dump is recommended as
one does not have to build two kernels. Same kernel can be used
as production kernel and capture kernel. Above option should have
gone away after relocatable bzImage support is introduced. But it
is present because there are users out there who continue to use
vmlinux for dump capture. This option should go away down the
line.
Don't change this unless you know what you are doing.
config RELOCATABLE
bool "Build a relocatable kernel (EXPERIMENTAL)"
depends on EXPERIMENTAL
help
This builds a kernel image that retains relocation information
so it can be loaded someplace besides the default 1MB.
The relocations tend to make the kernel binary about 10% larger,
but are discarded at runtime.
One use is for the kexec on panic case where the recovery kernel
must live at a different physical address than the primary
kernel.
Note: If CONFIG_RELOCATABLE=y, then the kernel runs from the address
it has been loaded at and the compile time physical address
(CONFIG_PHYSICAL_START) is ignored.
config PHYSICAL_ALIGN
hex
prompt "Alignment value to which kernel should be aligned" if X86_32
default "0x100000" if X86_32
default "0x200000" if X86_64
range 0x2000 0x400000
help
This value puts the alignment restrictions on physical address
where kernel is loaded and run from. Kernel is compiled for an
address which meets above alignment restriction.
If bootloader loads the kernel at a non-aligned address and
CONFIG_RELOCATABLE is set, kernel will move itself to nearest
address aligned to above value and run from there.
If bootloader loads the kernel at a non-aligned address and
CONFIG_RELOCATABLE is not set, kernel will ignore the run time
load address and decompress itself to the address it has been
compiled for and run from there. The address for which kernel is
compiled already meets above alignment restrictions. Hence the
end result is that kernel runs from a physical address meeting
above alignment restrictions.
Don't change this unless you know what you are doing.
config HOTPLUG_CPU
bool "Support for suspend on SMP and hot-pluggable CPUs (EXPERIMENTAL)"
depends on SMP && HOTPLUG && EXPERIMENTAL && !X86_VOYAGER
---help---
Say Y here to experiment with turning CPUs off and on, and to
enable suspend on SMP systems. CPUs can be controlled through
/sys/devices/system/cpu.
Say N if you want to disable CPU hotplug and don't need to
suspend.
config COMPAT_VDSO
def_bool y
prompt "Compat VDSO support"
depends on X86_32 || IA32_EMULATION
help
Map the 32-bit VDSO to the predictable old-style address too.
---help---
Say N here if you are running a sufficiently recent glibc
version (2.3.3 or later), to remove the high-mapped
VDSO mapping and to exclusively use the randomized VDSO.
If unsure, say Y.
endmenu
config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y
depends on X86_64 || (X86_32 && HIGHMEM)
config HAVE_ARCH_EARLY_PFN_TO_NID
def_bool X86_64
depends on NUMA
menu "Power management options"
depends on !X86_VOYAGER
config ARCH_HIBERNATION_HEADER
def_bool y
depends on X86_64 && HIBERNATION
source "kernel/power/Kconfig"
source "drivers/acpi/Kconfig"
config X86_APM_BOOT
bool
default y
depends on APM || APM_MODULE
menuconfig APM
tristate "APM (Advanced Power Management) BIOS support"
depends on X86_32 && PM_SLEEP
---help---
APM is a BIOS specification for saving power using several different
techniques. This is mostly useful for battery powered laptops with
APM compliant BIOSes. If you say Y here, the system time will be
reset after a RESUME operation, the /proc/apm device will provide
battery status information, and user-space programs will receive
notification of APM "events" (e.g. battery status change).
If you select "Y" here, you can disable actual use of the APM
BIOS by passing the "apm=off" option to the kernel at boot time.
Note that the APM support is almost completely disabled for
machines with more than one CPU.
In order to use APM, you will need supporting software. For location
and more information, read <file:Documentation/power/pm.txt> and the
Battery Powered Linux mini-HOWTO, available from
<http://www.tldp.org/docs.html#howto>.
This driver does not spin down disk drives (see the hdparm(8)
manpage ("man 8 hdparm") for that), and it doesn't turn off
VESA-compliant "green" monitors.
This driver does not support the TI 4000M TravelMate and the ACER
486/DX4/75 because they don't have compliant BIOSes. Many "green"
desktop machines also don't have compliant BIOSes, and this driver
may cause those machines to panic during the boot phase.
Generally, if you don't have a battery in your machine, there isn't
much point in using this driver and you should say N. If you get
random kernel OOPSes or reboots that don't seem to be related to
anything, try disabling/enabling this option (or disabling/enabling
APM in your BIOS).
Some other things you should try when experiencing seemingly random,
"weird" problems:
1) make sure that you have enough swap space and that it is
enabled.
2) pass the "no-hlt" option to the kernel
3) switch on floating point emulation in the kernel and pass
the "no387" option to the kernel
4) pass the "floppy=nodma" option to the kernel
5) pass the "mem=4M" option to the kernel (thereby disabling
all but the first 4 MB of RAM)
6) make sure that the CPU is not over clocked.
7) read the sig11 FAQ at <http://www.bitwizard.nl/sig11/>
8) disable the cache from your BIOS settings
9) install a fan for the video card or exchange video RAM
10) install a better fan for the CPU
11) exchange RAM chips
12) exchange the motherboard.
To compile this driver as a module, choose M here: the
module will be called apm.
if APM
config APM_IGNORE_USER_SUSPEND
bool "Ignore USER SUSPEND"
help
This option will ignore USER SUSPEND requests. On machines with a
compliant APM BIOS, you want to say N. However, on the NEC Versa M
series notebooks, it is necessary to say Y because of a BIOS bug.
config APM_DO_ENABLE
bool "Enable PM at boot time"
---help---
Enable APM features at boot time. From page 36 of the APM BIOS
specification: "When disabled, the APM BIOS does not automatically
power manage devices, enter the Standby State, enter the Suspend
State, or take power saving steps in response to CPU Idle calls."
This driver will make CPU Idle calls when Linux is idle (unless this
feature is turned off -- see "Do CPU IDLE calls", below). This
should always save battery power, but more complicated APM features
will be dependent on your BIOS implementation. You may need to turn
this option off if your computer hangs at boot time when using APM
support, or if it beeps continuously instead of suspending. Turn
this off if you have a NEC UltraLite Versa 33/C or a Toshiba
T400CDT. This is off by default since most machines do fine without
this feature.
config APM_CPU_IDLE
bool "Make CPU Idle calls when idle"
help
Enable calls to APM CPU Idle/CPU Busy inside the kernel's idle loop.
On some machines, this can activate improved power savings, such as
a slowed CPU clock rate, when the machine is idle. These idle calls
are made after the idle loop has run for some length of time (e.g.,
333 mS). On some machines, this will cause a hang at boot time or
whenever the CPU becomes idle. (On machines with more than one CPU,
this option does nothing.)
config APM_DISPLAY_BLANK
bool "Enable console blanking using APM"
help
Enable console blanking using the APM. Some laptops can use this to
turn off the LCD backlight when the screen blanker of the Linux
virtual console blanks the screen. Note that this is only used by
the virtual console screen blanker, and won't turn off the backlight
when using the X Window system. This also doesn't have anything to
do with your VESA-compliant power-saving monitor. Further, this
option doesn't work for all laptops -- it might not turn off your
backlight at all, or it might print a lot of errors to the console,
especially if you are using gpm.
config APM_ALLOW_INTS
bool "Allow interrupts during APM BIOS calls"
help
Normally we disable external interrupts while we are making calls to
the APM BIOS as a measure to lessen the effects of a badly behaving
BIOS implementation. The BIOS should reenable interrupts if it
needs to. Unfortunately, some BIOSes do not -- especially those in
many of the newer IBM Thinkpads. If you experience hangs when you
suspend, try setting this to Y. Otherwise, say N.
config APM_REAL_MODE_POWER_OFF
bool "Use real mode APM BIOS call to power off"
help
Use real mode APM BIOS calls to switch off the computer. This is
a work-around for a number of buggy BIOSes. Switch this option on if
your computer crashes instead of powering off properly.
endif # APM
source "arch/x86/kernel/cpu/cpufreq/Kconfig"
source "drivers/cpuidle/Kconfig"
endmenu
menu "Bus options (PCI etc.)"
config PCI
bool "PCI support"
default y
select ARCH_SUPPORTS_MSI if (X86_LOCAL_APIC && X86_IO_APIC)
help
Find out whether you have a PCI motherboard. PCI is the name of a
bus system, i.e. the way the CPU talks to the other stuff inside
your box. Other bus systems are ISA, EISA, MicroChannel (MCA) or
VESA. If you have PCI, say Y, otherwise N.
choice
prompt "PCI access mode"
depends on X86_32 && PCI
default PCI_GOANY
---help---
On PCI systems, the BIOS can be used to detect the PCI devices and
determine their configuration. However, some old PCI motherboards
have BIOS bugs and may crash if this is done. Also, some embedded
PCI-based systems don't have any BIOS at all. Linux can also try to
detect the PCI hardware directly without using the BIOS.
With this option, you can specify how Linux should detect the
PCI devices. If you choose "BIOS", the BIOS will be used,
if you choose "Direct", the BIOS won't be used, and if you
choose "MMConfig", then PCI Express MMCONFIG will be used.
If you choose "Any", the kernel will try MMCONFIG, then the
direct access method and falls back to the BIOS if that doesn't
work. If unsure, go with the default, which is "Any".
config PCI_GOBIOS
bool "BIOS"
config PCI_GOMMCONFIG
bool "MMConfig"
config PCI_GODIRECT
bool "Direct"
config PCI_GOOLPC
bool "OLPC"
depends on OLPC
config PCI_GOANY
bool "Any"
endchoice
config PCI_BIOS
def_bool y
depends on X86_32 && PCI && (PCI_GOBIOS || PCI_GOANY)
# x86-64 doesn't support PCI BIOS access from long mode so always go direct.
config PCI_DIRECT
def_bool y
depends on PCI && (X86_64 || (PCI_GODIRECT || PCI_GOANY || PCI_GOOLPC))
config PCI_MMCONFIG
def_bool y
depends on X86_32 && PCI && ACPI && (PCI_GOMMCONFIG || PCI_GOANY)
config PCI_OLPC
def_bool y
depends on PCI && OLPC && (PCI_GOOLPC || PCI_GOANY)
config PCI_DOMAINS
def_bool y
depends on PCI
config PCI_MMCONFIG
bool "Support mmconfig PCI config space access"
depends on X86_64 && PCI && ACPI
config DMAR
bool "Support for DMA Remapping Devices (EXPERIMENTAL)"
depends on X86_64 && PCI_MSI && ACPI && EXPERIMENTAL
help
DMA remapping (DMAR) devices support enables independent address
translations for Direct Memory Access (DMA) from devices.
These DMA remapping devices are reported via ACPI tables
and include PCI device scope covered by these DMA
remapping devices.
config DMAR_GFX_WA
def_bool y
prompt "Support for Graphics workaround"
depends on DMAR
help
Current Graphics drivers tend to use physical address
for DMA and avoid using DMA APIs. Setting this config
option permits the IOMMU driver to set a unity map for
all the OS-visible memory. Hence the driver can continue
to use physical addresses for DMA.
config DMAR_FLOPPY_WA
def_bool y
depends on DMAR
help
Floppy disk drivers are know to bypass DMA API calls
thereby failing to work when IOMMU is enabled. This
workaround will setup a 1:1 mapping for the first
16M to make floppy (an ISA device) work.
source "drivers/pci/pcie/Kconfig"
source "drivers/pci/Kconfig"
# x86_64 have no ISA slots, but do have ISA-style DMA.
config ISA_DMA_API
def_bool y
if X86_32
config ISA
bool "ISA support"
depends on !X86_VOYAGER
help
Find out whether you have ISA slots on your motherboard. ISA is the
name of a bus system, i.e. the way the CPU talks to the other stuff
inside your box. Other bus systems are PCI, EISA, MicroChannel
(MCA) or VESA. ISA is an older system, now being displaced by PCI;
newer boards don't support it. If you have ISA, say Y, otherwise N.
config EISA
bool "EISA support"
depends on ISA
---help---
The Extended Industry Standard Architecture (EISA) bus was
developed as an open alternative to the IBM MicroChannel bus.
The EISA bus provided some of the features of the IBM MicroChannel
bus while maintaining backward compatibility with cards made for
the older ISA bus. The EISA bus saw limited use between 1988 and
1995 when it was made obsolete by the PCI bus.
Say Y here if you are building a kernel for an EISA-based machine.
Otherwise, say N.
source "drivers/eisa/Kconfig"
config MCA
bool "MCA support" if !X86_VOYAGER
default y if X86_VOYAGER
help
MicroChannel Architecture is found in some IBM PS/2 machines and
laptops. It is a bus system similar to PCI or ISA. See
<file:Documentation/mca.txt> (and especially the web page given
there) before attempting to build an MCA bus kernel.
source "drivers/mca/Kconfig"
config SCx200
tristate "NatSemi SCx200 support"
depends on !X86_VOYAGER
help
This provides basic support for National Semiconductor's
(now AMD's) Geode processors. The driver probes for the
PCI-IDs of several on-chip devices, so its a good dependency
for other scx200_* drivers.
If compiled as a module, the driver is named scx200.
config SCx200HR_TIMER
tristate "NatSemi SCx200 27MHz High-Resolution Timer Support"
depends on SCx200 && GENERIC_TIME
default y
help
This driver provides a clocksource built upon the on-chip
27MHz high-resolution timer. Its also a workaround for
NSC Geode SC-1100's buggy TSC, which loses time when the
processor goes idle (as is done by the scheduler). The
other workaround is idle=poll boot option.
config GEODE_MFGPT_TIMER
def_bool y
prompt "Geode Multi-Function General Purpose Timer (MFGPT) events"
depends on MGEODE_LX && GENERIC_TIME && GENERIC_CLOCKEVENTS
help
This driver provides a clock event source based on the MFGPT
timer(s) in the CS5535 and CS5536 companion chip for the geode.
MFGPTs have a better resolution and max interval than the
generic PIT, and are suitable for use as high-res timers.
config OLPC
bool "One Laptop Per Child support"
default n
help
Add support for detecting the unique features of the OLPC
XO hardware.
endif # X86_32
config K8_NB
def_bool y
depends on AGP_AMD64 || (X86_64 && (GART_IOMMU || (PCI && NUMA)))
source "drivers/pcmcia/Kconfig"
source "drivers/pci/hotplug/Kconfig"
endmenu
menu "Executable file formats / Emulations"
source "fs/Kconfig.binfmt"
config IA32_EMULATION
bool "IA32 Emulation"
depends on X86_64
select COMPAT_BINFMT_ELF
help
Include code to run 32-bit programs under a 64-bit kernel. You should
likely turn this on, unless you're 100% sure that you don't have any
32-bit programs left.
config IA32_AOUT
tristate "IA32 a.out support"
depends on IA32_EMULATION && ARCH_SUPPORTS_AOUT
help
Support old a.out binaries in the 32bit emulation.
config COMPAT
def_bool y
depends on IA32_EMULATION
config COMPAT_FOR_U64_ALIGNMENT
def_bool COMPAT
depends on X86_64
config SYSVIPC_COMPAT
def_bool y
depends on X86_64 && COMPAT && SYSVIPC
endmenu
source "net/Kconfig"
source "drivers/Kconfig"
source "drivers/firmware/Kconfig"
source "fs/Kconfig"
source "arch/x86/Kconfig.debug"
source "security/Kconfig"
source "crypto/Kconfig"
source "arch/x86/kvm/Kconfig"
source "lib/Kconfig"