When booting a kexec/kdump kernel on a system that has specific memory
hotplug regions the boot will fail with warnings like:
swapper/0: page allocation failure: order:9, mode:0x84d0
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-65.el7.x86_64 #1
Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
0000000000000000 ffff8800341bd8c8 ffffffff815bcc67 ffff8800341bd950
ffffffff8113b1a0 ffff880036339b00 0000000000000009 00000000000084d0
ffff8800341bd950 ffffffff815b87ee 0000000000000000 0000000000000200
Call Trace:
[<ffffffff815bcc67>] dump_stack+0x19/0x1b
[<ffffffff8113b1a0>] warn_alloc_failed+0xf0/0x160
[<ffffffff815b87ee>] ? __alloc_pages_direct_compact+0xac/0x196
[<ffffffff8113f14f>] __alloc_pages_nodemask+0x7ff/0xa00
[<ffffffff815b417c>] vmemmap_alloc_block+0x62/0xba
[<ffffffff815b41e9>] vmemmap_alloc_block_buf+0x15/0x3b
[<ffffffff815b1ff6>] vmemmap_populate+0xb4/0x21b
[<ffffffff815b461d>] sparse_mem_map_populate+0x27/0x35
[<ffffffff815b400f>] sparse_add_one_section+0x7a/0x185
[<ffffffff815a1e9f>] __add_pages+0xaf/0x240
[<ffffffff81047359>] arch_add_memory+0x59/0xd0
[<ffffffff815a21d9>] add_memory+0xb9/0x1b0
[<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
[<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
[<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
[<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
[<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
[<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
[<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
[<ffffffff81a1fd58>] ? acpi_sleep_proc_init+0x2a/0x2a
[<ffffffff810020e2>] do_one_initcall+0xe2/0x190
[<ffffffff819e20c4>] kernel_init_freeable+0x17c/0x207
[<ffffffff819e18d0>] ? do_early_param+0x88/0x88
[<ffffffff8159fea0>] ? rest_init+0x80/0x80
[<ffffffff8159feae>] kernel_init+0xe/0x180
[<ffffffff815cca2c>] ret_from_fork+0x7c/0xb0
[<ffffffff8159fea0>] ? rest_init+0x80/0x80
Mem-Info:
Node 0 DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Node 0 DMA32 per-cpu:
CPU 0: hi: 42, btch: 7 usd: 0
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
free:872 slab_reclaimable:13 slab_unreclaimable:1880
mapped:0 shmem:0 pagetables:0 bounce:0
free_cma:0
because the system has run out of memory at boot time. This occurs
because of the following sequence in the boot:
Main kernel boots and sets E820 map. The second kernel is booted with a
map generated by the kdump service using memmap= and memmap=exactmap.
These parameters are added to the kernel parameters of the kexec/kdump
kernel. The kexec/kdump kernel has limited memory resources so as not
to severely impact the main kernel.
The system then panics and the kdump/kexec kernel boots (which is a
completely new kernel boot). During this boot ACPI is initialized and the
kernel (as can be seen above) traverses the ACPI namespace and finds an
entry for a memory device to be hotadded.
ie)
[<ffffffff815a1e9f>] __add_pages+0xaf/0x240
[<ffffffff81047359>] arch_add_memory+0x59/0xd0
[<ffffffff815a21d9>] add_memory+0xb9/0x1b0
[<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
[<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
[<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
[<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
[<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
[<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
[<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
[<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
At this point the kernel adds page table information and the the kexec/kdump
kernel runs out of memory.
This can also be reproduced by using the memmap=exactmap and mem=X
parameters on the main kernel and booting.
This patchset resolves the problem by adding a kernel parameter,
acpi_no_memhotplug, to disable ACPI memory hotplug.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
specify an initial APIC ID of the corresponding CPU you want to
disable.
This is mostly used for the kdump 2nd kernel to disable BSP to wake up
multiple CPUs without causing system reset or hang due to sending INIT
from AP to BSP.
Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
1st kernel, for example from /proc/cpuinfo and then set up this kernel
parameter for the 2nd kernel using the obtained APIC ID.
However, doing this procedure at each boot time manually is awkward,
which should be automatically done by user-land service scripts, for
example, kexec-tools on fedora/RHEL distributions.
This design is more flexible than disabling BSP in kernel boot time
automatically in that in kernel boot time we have no choice but
referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
that the method is not applicable to the systems without such BIOS
tables.
One assumption behind this design is that users get initial APIC ID of
the BSP in still healthy state and so BSP is uniquely kept in
CPU0. Thus, through the kernel parameter, only one initial APIC ID can
be specified.
In a comparison with disabled_cpu_apicid, we use read_apic_id(), not
boot_cpu_physical_apicid, because on some platforms, the variable is
modified to the apicid reported as BSP through MP table and this
function is executed with the temporarily modified
boot_cpu_physical_apicid. As a result, disabled_cpu_apicid kernel
parameter doesn't work well for apicids of APs.
Fixing the wrong handling of boot_cpu_physical_apicid requires some
reviews and tests beyond some platforms and it could take some
time. The fix here is a kind of workaround to focus on the main topic
of this patch.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/20140115064458.1545.38775.stgit@localhost6.localdomain6
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The default audit_backlog_limit is 64. This was a reasonable limit at one time.
systemd causes so much audit queue activity on startup that auditd doesn't
start before the backlog queue has already overflowed by more than a factor of
2. On a system with audit= not set on the kernel command line, this isn't an
issue since that history isn't kept for auditd when it is available. On a
system with audit=1 set on the kernel command line, kaudit tries to keep that
history until auditd is able to drain the queue.
This default can be changed by the "-b" option in audit.rules once the system
has booted, but won't help with lost messages on boot.
One way to solve this would be to increase the default backlog queue size to
avoid losing any messages before auditd is able to consume them. This would
be overkill to the embedded community and insufficient for some servers.
Another way to solve it might be to add a kconfig option to set the default
based on the system type. An embedded system would get the current (or
smaller) default, while Workstations might get more than now and servers might
get more.
None of these solutions helps if a system's compiled default is too small to
see the lost messages without compiling a new kernel.
This patch adds a kernel set-up parameter (audit already has one to
enable/disable it) "audit_backlog_limit=<n>" that overrides the default to
allow the system administrator to set the backlog limit.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Add the "audit=" kernel start-up parameter to Documentation/kernel-parameters.txt.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
This new parameter is used to control how to report HW error reporting,
especially for newer Intel platform, like Ivybridge-EX, which contains
an enhanced error decoding functionality in the firmware, i.e. eMCA.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1386310630-12529-2-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
In doc, it said that 'Currently supported controllers - "memory"',
but actually we can use cgroup_disable=cpu,cpuset and all other
controllers, so this is confusing for cgroup users without much
cgroup knowledge. We need to make it clear.
[some comments copied from Paul Menage's original patch 8bab8dded]
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Acked-by: Rob Landley <rob@landley.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
groundwork for kexec support on EFI - Borislav Petkov
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJSfMDjAAoJEC84WcCNIz1V0LIP/2WyTbJR6bL0HXwnGLpxmxag
v0VgnKRhypNboA3WEu4a9as6AdExqB7qsiWIipHuDSMj/vkfZgHAKTd2f1iRxmsJ
RZxzwV6YzvsWkdXjvpCoLWKSsvQDug++BAIti1PitW6RXRjo01t3ymo/Ho1CQrpI
hNJbB3bbihMF+uqFvdSpO0KZtZE6EtnylrfBeuo0GzqqJdTGe1MmqlWmyUlEy5JW
ZiHV8E/xTjh3N675tWPcT9hGVfCyOXXu/kPrXsJTXrdYyZL9qgA9b8SVRLs6DctX
wVgL9lNv4wobsmZJ5DxkYl9+TaF7rbshUeIJbzrQyMVJjb3TpXk/ZpspDMAEjL7e
bb76c1bAx4xZuUatR/f1ykkWKAEryAhXHkvwcbIBjebW33if1MgGJLk5udJQQv6H
j+J9ROH38MDr0Geg+pM2RnCyTz8l+q+8Mfu4Yh9TSte+ttB6fr9phs3/G+fdSUn1
0vI627v1HWzDcBh4eZjjslzJviR8PldsGVT3EsIaOnHGtk/9FPz/7n4efph4v7+9
yqTkLvQHxsAx7f0tR/qRpkEIQ9WTMXO0IO79OC13QTSATJSl+WSPTJM7ccqOgn+P
h89ssBnzlwmFHkuvTi599KVHdzOZrWTsB2zROn+NnSchJ+YAY6TXwznk3/MNo9rB
d+euVcffL4yfWZ7Bzj8Z
=w6ff
-----END PGP SIGNATURE-----
Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi
Pull EFI virtual mapping changes from Matt Fleming:
* New static EFI runtime services virtual mapping layout which is
groundwork for kexec support on EFI. (Borislav Petkov)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull security subsystem updates from James Morris:
"In this patchset, we finally get an SELinux update, with Paul Moore
taking over as maintainer of that code.
Also a significant update for the Keys subsystem, as well as
maintenance updates to Smack, IMA, TPM, and Apparmor"
and since I wanted to know more about the updates to key handling,
here's the explanation from David Howells on that:
"Okay. There are a number of separate bits. I'll go over the big bits
and the odd important other bit, most of the smaller bits are just
fixes and cleanups. If you want the small bits accounting for, I can
do that too.
(1) Keyring capacity expansion.
KEYS: Consolidate the concept of an 'index key' for key access
KEYS: Introduce a search context structure
KEYS: Search for auth-key by name rather than target key ID
Add a generic associative array implementation.
KEYS: Expand the capacity of a keyring
Several of the patches are providing an expansion of the capacity of a
keyring. Currently, the maximum size of a keyring payload is one page.
Subtract a small header and then divide up into pointers, that only gives
you ~500 pointers on an x86_64 box. However, since the NFS idmapper uses
a keyring to store ID mapping data, that has proven to be insufficient to
the cause.
Whatever data structure I use to handle the keyring payload, it can only
store pointers to keys, not the keys themselves because several keyrings
may point to a single key. This precludes inserting, say, and rb_node
struct into the key struct for this purpose.
I could make an rbtree of records such that each record has an rb_node
and a key pointer, but that would use four words of space per key stored
in the keyring. It would, however, be able to use much existing code.
I selected instead a non-rebalancing radix-tree type approach as that
could have a better space-used/key-pointer ratio. I could have used the
radix tree implementation that we already have and insert keys into it by
their serial numbers, but that means any sort of search must iterate over
the whole radix tree. Further, its nodes are a bit on the capacious side
for what I want - especially given that key serial numbers are randomly
allocated, thus leaving a lot of empty space in the tree.
So what I have is an associative array that internally is a radix-tree
with 16 pointers per node where the index key is constructed from the key
type pointer and the key description. This means that an exact lookup by
type+description is very fast as this tells us how to navigate directly to
the target key.
I made the data structure general in lib/assoc_array.c as far as it is
concerned, its index key is just a sequence of bits that leads to a
pointer. It's possible that someone else will be able to make use of it
also. FS-Cache might, for example.
(2) Mark keys as 'trusted' and keyrings as 'trusted only'.
KEYS: verify a certificate is signed by a 'trusted' key
KEYS: Make the system 'trusted' keyring viewable by userspace
KEYS: Add a 'trusted' flag and a 'trusted only' flag
KEYS: Separate the kernel signature checking keyring from module signing
These patches allow keys carrying asymmetric public keys to be marked as
being 'trusted' and allow keyrings to be marked as only permitting the
addition or linkage of trusted keys.
Keys loaded from hardware during kernel boot or compiled into the kernel
during build are marked as being trusted automatically. New keys can be
loaded at runtime with add_key(). They are checked against the system
keyring contents and if their signatures can be validated with keys that
are already marked trusted, then they are marked trusted also and can
thus be added into the master keyring.
Patches from Mimi Zohar make this usable with the IMA keyrings also.
(3) Remove the date checks on the key used to validate a module signature.
X.509: Remove certificate date checks
It's not reasonable to reject a signature just because the key that it was
generated with is no longer valid datewise - especially if the kernel
hasn't yet managed to set the system clock when the first module is
loaded - so just remove those checks.
(4) Make it simpler to deal with additional X.509 being loaded into the kernel.
KEYS: Load *.x509 files into kernel keyring
KEYS: Have make canonicalise the paths of the X.509 certs better to deduplicate
The builder of the kernel now just places files with the extension ".x509"
into the kernel source or build trees and they're concatenated by the
kernel build and stuffed into the appropriate section.
(5) Add support for userspace kerberos to use keyrings.
KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches
KEYS: Implement a big key type that can save to tmpfs
Fedora went to, by default, storing kerberos tickets and tokens in tmpfs.
We looked at storing it in keyrings instead as that confers certain
advantages such as tickets being automatically deleted after a certain
amount of time and the ability for the kernel to get at these tokens more
easily.
To make this work, two things were needed:
(a) A way for the tickets to persist beyond the lifetime of all a user's
sessions so that cron-driven processes can still use them.
The problem is that a user's session keyrings are deleted when the
session that spawned them logs out and the user's user keyring is
deleted when the UID is deleted (typically when the last log out
happens), so neither of these places is suitable.
I've added a system keyring into which a 'persistent' keyring is
created for each UID on request. Each time a user requests their
persistent keyring, the expiry time on it is set anew. If the user
doesn't ask for it for, say, three days, the keyring is automatically
expired and garbage collected using the existing gc. All the kerberos
tokens it held are then also gc'd.
(b) A key type that can hold really big tickets (up to 1MB in size).
The problem is that Active Directory can return huge tickets with lots
of auxiliary data attached. We don't, however, want to eat up huge
tracts of unswappable kernel space for this, so if the ticket is
greater than a certain size, we create a swappable shmem file and dump
the contents in there and just live with the fact we then have an
inode and a dentry overhead. If the ticket is smaller than that, we
slap it in a kmalloc()'d buffer"
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (121 commits)
KEYS: Fix keyring content gc scanner
KEYS: Fix error handling in big_key instantiation
KEYS: Fix UID check in keyctl_get_persistent()
KEYS: The RSA public key algorithm needs to select MPILIB
ima: define '_ima' as a builtin 'trusted' keyring
ima: extend the measurement list to include the file signature
kernel/system_certificate.S: use real contents instead of macro GLOBAL()
KEYS: fix error return code in big_key_instantiate()
KEYS: Fix keyring quota misaccounting on key replacement and unlink
KEYS: Fix a race between negating a key and reading the error set
KEYS: Make BIG_KEYS boolean
apparmor: remove the "task" arg from may_change_ptraced_domain()
apparmor: remove parent task info from audit logging
apparmor: remove tsk field from the apparmor_audit_struct
apparmor: fix capability to not use the current task, during reporting
Smack: Ptrace access check mode
ima: provide hash algo info in the xattr
ima: enable support for larger default filedata hash algorithms
ima: define kernel parameter 'ima_template=' to change configured default
ima: add Kconfig default measurement list template
...
The CONFIG_HPET_MMAP Kconfig option exposes the memory map of the HPET
registers to userspace. The Kconfig help points out that in some cases
this can be a security risk as some systems may erroneously configure the
map such that additional data is exposed to userspace.
This is a problem for distributions -- some users want the MMAP
functionality but it comes with a significant security risk. In an effort
to mitigate this risk, and due to the low number of users of the MMAP
functionality, I've introduced a kernel parameter, hpet_mmap_enable, that
is required in order to actually have the HPET MMAP exposed.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Matt Wilson <msw@amazon.com>
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The hot-Pluggable field in SRAT specifies which memory is hotpluggable.
As we mentioned before, if hotpluggable memory is used by the kernel, it
cannot be hot-removed. So memory hotplug users may want to set all
hotpluggable memory in ZONE_MOVABLE so that the kernel won't use it.
Memory hotplug users may also set a node as movable node, which has
ZONE_MOVABLE only, so that the whole node can be hot-removed.
But the kernel cannot use memory in ZONE_MOVABLE. By doing this, the
kernel cannot use memory in movable nodes. This will cause NUMA
performance down. And other users may be unhappy.
So we need a way to allow users to enable and disable this functionality.
In this patch, we introduce movable_node boot option to allow users to
choose to not to consume hotpluggable memory at early boot time and later
we can set it as ZONE_MOVABLE.
To achieve this, the movable_node boot option will control the memblock
allocation direction. That said, after memblock is ready, before SRAT is
parsed, we should allocate memory near the kernel image as we explained in
the previous patches. So if movable_node boot option is set, the kernel
does the following:
1. After memblock is ready, make memblock allocate memory bottom up.
2. After SRAT is parsed, make memblock behave as default, allocate memory
top down.
Users can specify "movable_node" in kernel commandline to enable this
functionality. For those who don't use memory hotplug or who don't want
to lose their NUMA performance, just don't specify anything. The kernel
will work as before.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Suggested-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86/intel-mid changes from Ingo Molnar:
"Update the 'intel mid' (mobile internet device) platform code as Intel
is rolling out more SoC designs.
This gets rid of most of the 'MRST' platform code in the process,
mostly by renaming and shuffling code around into their respective
'intel-mid' platform drivers"
* 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, intel-mid: Do not re-introduce usage of obsolete __cpuinit
intel_mid: Move platform device setups to their own platform_<device>.* files
x86: intel-mid: Add section for sfi device table
intel-mid: sfi: Allow struct devs_id.get_platform_data to be NULL
intel_mid: Moved SFI related code to sfi.c
intel_mid: Added custom handler for ipc devices
intel_mid: Added custom device_handler support
intel_mid: Refactored sfi_parse_devs() function
intel_mid: Renamed *mrst* to *intel_mid*
pci: intel_mid: Return true/false in function returning bool
intel_mid: Renamed *mrst* to *intel_mid*
mrst: Fixed indentation issues
mrst: Fixed printk/pr_* related issues
Pull x86 EFI changes from Ingo Molnar:
"Main changes:
- Add support for earlyprintk=efi which uses the EFI framebuffer.
Very useful for debugging boot problems.
- EFI stub support for large memory maps (more than 128 entries)
- EFI ARM support - this was mostly done by generalizing x86 <-> ARM
platform differences, such as by moving x86 EFI code into
drivers/firmware/efi/ and sharing it with ARM.
- Documentation updates
- misc fixes"
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
x86/efi: Add EFI framebuffer earlyprintk support
boot, efi: Remove redundant memset()
x86/efi: Fix config_table_type array termination
x86 efi: bugfix interrupt disabling sequence
x86: EFI stub support for large memory maps
efi: resolve warnings found on ARM compile
efi: Fix types in EFI calls to match EFI function definitions.
efi: Renames in handle_cmdline_files() to complete generalization.
efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
efi: Allow efi_free() to be called with size of 0
efi: use efi_get_memory_map() to get final map for x86
efi: generalize efi_get_memory_map()
efi: Rename __get_map() to efi_get_memory_map()
efi: Move unicode to ASCII conversion to shared function.
efi: Generalize relocate_kernel() for use by other architectures.
efi: Move relocate_kernel() to shared file.
efi: Enforce minimum alignment of 1 page on allocations.
efi: Rename memory allocation/free functions
efi: Add system table pointer argument to shared functions.
efi: Move common EFI stub code from x86 arch code to common location
...
We map the EFI regions needed for runtime services non-contiguously,
with preserved alignment on virtual addresses starting from -4G down
for a total max space of 64G. This way, we provide for stable runtime
services addresses across kernels so that a kexec'd kernel can still use
them.
Thus, they're mapped in a separate pagetable so that we don't pollute
the kernel namespace.
Add an efi= kernel command line parameter for passing miscellaneous
options and chicken bits from the command line.
While at it, add a chicken bit called "efi=old_map" which can be used as
a fallback to the old runtime services mapping method in case there's
some b0rkage with a particular EFI implementation (haha, it is hard to
hold up the sarcasm here...).
Also, add the UEFI RT VA space to Documentation/x86/x86_64/mm.txt.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
It's incredibly difficult to diagnose early EFI boot issues without
special hardware because earlyprintk=vga doesn't work on EFI systems.
Add support for writing to the EFI framebuffer, via earlyprintk=efi,
which will actually give users a chance of providing debug output.
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Jones <pjones@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
The IMA measurement list contains two hashes - a template data hash
and a filedata hash. The template data hash is committed to the TPM,
which is limited, by the TPM v1.2 specification, to 20 bytes. The
filedata hash is defined as 20 bytes as well.
Now that support for variable length measurement list templates was
added, the filedata hash is not limited to 20 bytes. This patch adds
Kconfig support for defining larger default filedata hash algorithms
and replacing the builtin default with one specified on the kernel
command line.
<uapi/linux/hash_info.h> contains a list of hash algorithms. The
Kconfig default hash algorithm is a subset of this list, but any hash
algorithm included in the list can be specified at boot, using the
'ima_hash=' kernel command line option.
Changelog v2:
- update Kconfig
Changelog:
- support hashes that are configured
- use generic HASH_ALGO_ definitions
- add Kconfig support
- hash_setup must be called only once (Dmitry)
- removed trailing whitespaces (Roberto Sassu)
Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
This patch allows users to specify from the kernel command line the
template descriptor, among those defined, that will be used to generate
and display measurement entries. If an user specifies a wrong template,
IMA reverts to the template descriptor set in the kernel configuration.
Signed-off-by: Roberto Sassu <roberto.sassu@polito.it>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
mrst is used as common name to represent all intel_mid type
soc's. But moorsetwon is just one of the intel_mid soc. So
renamed them to use intel_mid.
This patch mainly renames the variables and related
functions that uses *mrst* prefix with *intel_mid*.
To ensure that there are no functional changes, I have compared
the objdump of related files before and after rename and found
the only difference is symbol and name changes.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Link: http://lkml.kernel.org/r/1382049336-21316-6-git-send-email-david.a.cohen@linux.intel.com
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This allows decompress_kernel to return a new location for the kernel to
be relocated to. Additionally, enforces CONFIG_PHYSICAL_START as the
minimum relocation position when building with CONFIG_RELOCATABLE.
With CONFIG_RANDOMIZE_BASE set, the choose_kernel_location routine
will select a new location to decompress the kernel, though here it is
presently a no-op. The kernel command line option "nokaslr" is introduced
to bypass these routines.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/1381450698-28710-3-git-send-email-keescook@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Recently commit bab55417b1 ("block: support embedded device command
line partition") introduced CONFIG_CMDLINE_PARSER. However, that name
is too generic and sounds like it enables/disables generic kernel boot
arg processing, when it really is block specific.
Before this option becomes a part of a full/final release, add the BLK_
prefix to it so that it is clear in absence of any other context that it
is block specific.
In addition, fix up the following less critical items:
- help text was not really at all helpful.
- index file for Documentation was not updated
- add the new arg to Documentation/kernel-parameters.txt
- clarify wording in source comments
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Cai Zhiyong <caizhiyong@huawei.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Han Pingtian found a typo in Documentation/kernel-parameters.txt about
"kernelcore=", that "kernelcore" should be replaced with "Movable" here.
Signed-off-by: Weiping Pan <wpan@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix PV spinlocks triggering jump_label code bug
- Remove extraneous code in the tpm front driver
- Fix ballooning out of pages when non-preemptible
- Fix deadlock when using a 32-bit initial domain with large amount of memory.
- Add xen_nopvpsin parameter to the documentation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSQvzCAAoJEFjIrFwIi8fJyCIIAMENABapdLhrOiRdQ1Y7T5v1
4bogPDLwpVxHzwo/vnHcNpl35/dUZrC6wQa51Bkoqq0V8o1XmjFy3SY/EBGjEAvw
hh4qxGY0p0NNi6hKrWC8mH9u2TcluZGm1uecabkXUhl9mrAB5oBsfJdbBZ5N69gO
QXXt0j7Xwv1APwH86T0e1Lz+lulhdw2ItXP4osYkEbRYNSaaGnuwsd0Jxcb4DeMk
qhKgP7QMn3C7zDDaapJo1axeYQRBNEtv5M8+0wwMleX4yX1+IBRZeQTsRfMr7RB/
8FhssWiH15xU6Gmzgi/VR8xhTEIbQh5GWsVReGf6pqIYSxGSYTvvyhm0bVRH4JI=
=c+7u
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen fixes from Konrad Rzeszutek Wilk:
"Bug-fixes and one update to the kernel-paramters.txt documentation.
- Fix PV spinlocks triggering jump_label code bug
- Remove extraneous code in the tpm front driver
- Fix ballooning out of pages when non-preemptible
- Fix deadlock when using a 32-bit initial domain with large amount
of memory
- Add xen_nopvpsin parameter to the documentation"
* tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/spinlock: Document the xen_nopvspin parameter.
xen/p2m: check MFN is in range before using the m2p table
xen/balloon: don't alloc page while non-preemptible
xen: Do not enable spinlocks before jump_label_init() has executed
tpm: xen-tpmfront: Remove the locality sysfs attribute
tpm: xen-tpmfront: Fix default durations
Which disables in the ticketlock slowpath the Xen PV optimization's.
Useful for diagnosing issues and comparing benchmarks in
over-commit CPU scenarios.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Highlights include:
- Fix NFSv4 recovery so that it doesn't recover lost locks in cases such as
lease loss due to a network partition, where doing so may result in data
corruption. Add a kernel parameter to control choice of legacy behaviour
or not.
- Performance improvements when 2 processes are writing to the same file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
NFS clients from being able to manipulate our lease and file lockingr
state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients
- Fix the broken NFSv4 security auto-negotiation between client and server
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration support.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSLelVAAoJEGcL54qWCgDyo2IQAKOfRJyZVnf4ipxi3xLNl1QF
w/70DVSIF1S1djWN7G3vgkxj/R8KCvJ8CcvkAD2BEgRDeZJ9TtyKAdM/jYLZ+W05
7k2QKk8fkwZmc1Y2qDqFwKHzP5ZgP5L2nGx7FNhi/99wEAe47yFG3qd3rUWKrcOf
mnd863zgGDE2Q10slhoq/bywwMJo6tKZNeaIE8kPjgFbBEh/jslpAWr8dSA4QgvJ
nZ8VB5XU8L+XJ0GpHHdjYm9LvQ51DbQ6omOF+0P4fI093azKmf4ZsrjMDWT8+iu3
XkXlnQmKLGTi7yB43hHtn2NiRqwGzCcZ1Amo9PpCFaHUt1RP9cc37UhG1T+x1xWJ
STEKDbvCdQ3FU9FvbgrGEwBR0e8fNS4fZY3ToDBflIcfwre0aWs5RCodZMUD0nUI
4wY5J9NsQR/bL+v8KeUR4V4cXK8YrgL0zB4u4WYzH5Npxr5KD0NEKDNqRPhrB9l2
LLF9Haql8j76Ff0ek6UGFIZjDE0h6Fs71wLBpLj+ZWArOJ7vBuLMBSOVqNpld9+9
f2fEG7qoGF4FGTY4myH/eakMPaWnk9Ol4Ls/svSIapJ9+rePD+a93e/qnmdofIMf
4TuEYk6ERib1qXgaeDRQuCsm2YE1Co5skGMaOsRFWgReE1c12QoJQVst2nMtEKp3
uV2w8LgX18aZOZXJVkCM
=ZuW+
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- Fix NFSv4 recovery so that it doesn't recover lost locks in cases
such as lease loss due to a network partition, where doing so may
result in data corruption. Add a kernel parameter to control
choice of legacy behaviour or not.
- Performance improvements when 2 processes are writing to the same
file.
- Flush data to disk when an RPCSEC_GSS session timeout is imminent.
- Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
NFS clients from being able to manipulate our lease and file
locking state.
- Allow sharing of RPCSEC_GSS caches between different rpc clients.
- Fix the broken NFSv4 security auto-negotiation between client and
server.
- Fix rmdir() to wait for outstanding sillyrename unlinks to complete
- Add a tracepoint framework for debugging NFSv4 state recovery
issues.
- Add tracing to the generic NFS layer.
- Add tracing for the SUNRPC socket connection state.
- Clean up the rpc_pipefs mount/umount event management.
- Merge more patches from Chuck in preparation for NFSv4 migration
support"
* tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
NFSv4: Allow security autonegotiation for submounts
NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
NFSv4: Fix security auto-negotiation
NFS: Clean up nfs_parse_security_flavors()
NFS: Clean up the auth flavour array mess
NFSv4.1 Use MDS auth flavor for data server connection
NFS: Don't check lock owner compatability unless file is locked (part 2)
NFS: Don't check lock owner compatibility in writes unless file is locked
nfs4: Map NFS4ERR_WRONG_CRED to EPERM
nfs4.1: Add SP4_MACH_CRED write and commit support
nfs4.1: Add SP4_MACH_CRED stateid support
nfs4.1: Add SP4_MACH_CRED secinfo support
nfs4.1: Add SP4_MACH_CRED cleanup support
nfs4.1: Add state protection handler
nfs4.1: Minimal SP4_MACH_CRED implementation
SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
SUNRPC: Add an identifier for struct rpc_clnt
SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
...
Rename the new 'recover_locks' kernel parameter to 'recover_lost_locks'
and change the default to 'false'. Document why in
Documentation/kernel-parameters.txt
Move the 'recover_lost_locks' kernel parameter to fs/nfs/super.c to
make it easy to backport to kernels prior to 3.6.x, which don't have
a separate NFSv4 module.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction
of Intel Thunderbolt support on systems that use ACPI for signalling
Thunderbolt hotplug events. This also should make ACPIPHP work in
some cases in which it was known to have problems. From
Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov.
2) ACPI core code cleanups and dock station support cleanups from
Jiang Liu and Rafael J Wysocki.
3) Fixes for locking problems related to ACPI device hotplug from
Rafael J Wysocki.
4) ACPICA update to version 20130725 includig fixes, cleanups, support
for more than 256 GPEs per GPE block and a change to make the ACPI
PM Timer optional (we've seen systems without the PM Timer in the
field already). One of the fixes, related to the DeRefOf operator,
is necessary to prevent some Windows 8 oriented AML from causing
problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim.
5) Removal of the old and long deprecated /proc/acpi/event interface
and related driver changes from Thomas Renninger.
6) ACPI and Xen changes to make the reduced hardware sleep work with
the latter from Ben Guthro.
7) ACPI video driver cleanups and a blacklist of systems that should
not tell the BIOS that they are compatible with Windows 8 (or ACPI
backlight and possibly other things will not work on them). From
Felipe Contreras.
8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo,
Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen,
Toshi Kani, and Wei Yongjun.
9) cpufreq ondemand governor target frequency selection change to
reduce oscillations between min and max frequencies (essentially,
it causes the governor to choose target frequencies proportional
to load) from Stratos Karafotis.
10) cpufreq fixes allowing sysfs attributes file permissions to be
preserved over suspend/resume cycles Srivatsa S Bhat.
11) Removal of Device Tree parsing for CPU device nodes from multiple
cpufreq drivers that required some changes related to
of_get_cpu_node() to be made in a few architectures and in the
driver core. From Sudeep KarkadaNagesha.
12) cpufreq core fixes and cleanups related to mutual exclusion and
driver module references from Viresh Kumar, Lukasz Majewski and
Rafael J Wysocki.
13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap,
Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo,
Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd,
Stratos Karafotis, and Viresh Kumar.
14) Fixes to prevent race conditions in coupled cpuidle from happening
from Colin Cross.
15) cpuidle core fixes and cleanups from Daniel Lezcano and
Tuukka Tikkanen.
16) Assorted cpuidle fixes and cleanups from Daniel Lezcano,
Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij,
and Sahara.
17) System sleep tracing changes from Todd E Brandt and Shuah Khan.
18) PNP subsystem conversion to using struct dev_pm_ops for power
management from Shuah Khan.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABCAAGBQJSJcKhAAoJEKhOf7ml8uNsplIQAJSOshxhkkemvFOuHZ+0YIbh
R9aufjXeDkMDBi8YtU+tB7ERth1j+0LUSM0NTnP51U7e+7eSGobA9s5jSZQj2l7r
HFtnSOegLuKAfqwgfSLK91xa1rTFdfW0Kych9G2nuHtBIt6P0Oc59Cb5M0oy6QXs
nVtaDEuU//tmO71+EF5HnMJHabRTrpvtn/7NbDUpU7LZYpWJrHJFT9xt1rXNab7H
YRCATPm3kXGRg58Doc3EZE4G3D7DLvq74jWMaI089X/m5Pg1G6upqArypOy6oxdP
p2FEzYVrb2bi8fakXp7BBeO1gCJTAqIgAkbSSZHLpGhFaeEMmb9/DWPXdm2TjzMV
c1EEucvsqZWoprXgy12i5Hk814xN8d8nBBLg/UYiRJ44nc/hevXfyE9ZYj6bkseJ
+GNHmZIa1QYC05nnGli4+W4kHns8EZf/gmvIxnPuco1RN2yMWagrud5/G6Dr9M2B
hzJV6qauLVzgZso4oe79zv9aVxe/dPHKANLD/sg23WBiJJbJF1ocBlnj2Xlbpqze
pmMUWGiO/gUiS0fmpW/lAJauza5jFmSCjE4E8R0Gyn0j4YXjmMhdEanaU6J3VuCi
yVgEzYEth4sowq4AflMMLKYN+WmozDnK7taZRGmT0t+EKRFKLT6EgnNrkQgs1vKl
oawD9LM4fZ8E0yroOEme
=CgqW
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
1) ACPI-based PCI hotplug (ACPIPHP) subsystem rework and introduction
of Intel Thunderbolt support on systems that use ACPI for signalling
Thunderbolt hotplug events. This also should make ACPIPHP work in
some cases in which it was known to have problems. From
Rafael J Wysocki, Mika Westerberg and Kirill A Shutemov.
2) ACPI core code cleanups and dock station support cleanups from
Jiang Liu and Rafael J Wysocki.
3) Fixes for locking problems related to ACPI device hotplug from
Rafael J Wysocki.
4) ACPICA update to version 20130725 includig fixes, cleanups, support
for more than 256 GPEs per GPE block and a change to make the ACPI
PM Timer optional (we've seen systems without the PM Timer in the
field already). One of the fixes, related to the DeRefOf operator,
is necessary to prevent some Windows 8 oriented AML from causing
problems to happen. From Bob Moore, Lv Zheng, and Jung-uk Kim.
5) Removal of the old and long deprecated /proc/acpi/event interface
and related driver changes from Thomas Renninger.
6) ACPI and Xen changes to make the reduced hardware sleep work with
the latter from Ben Guthro.
7) ACPI video driver cleanups and a blacklist of systems that should
not tell the BIOS that they are compatible with Windows 8 (or ACPI
backlight and possibly other things will not work on them). From
Felipe Contreras.
8) Assorted ACPI fixes and cleanups from Aaron Lu, Hanjun Guo,
Kuppuswamy Sathyanarayanan, Lan Tianyu, Sachin Kamat, Tang Chen,
Toshi Kani, and Wei Yongjun.
9) cpufreq ondemand governor target frequency selection change to
reduce oscillations between min and max frequencies (essentially,
it causes the governor to choose target frequencies proportional
to load) from Stratos Karafotis.
10) cpufreq fixes allowing sysfs attributes file permissions to be
preserved over suspend/resume cycles Srivatsa S Bhat.
11) Removal of Device Tree parsing for CPU device nodes from multiple
cpufreq drivers that required some changes related to
of_get_cpu_node() to be made in a few architectures and in the
driver core. From Sudeep KarkadaNagesha.
12) cpufreq core fixes and cleanups related to mutual exclusion and
driver module references from Viresh Kumar, Lukasz Majewski and
Rafael J Wysocki.
13) Assorted cpufreq fixes and cleanups from Amit Daniel Kachhap,
Bartlomiej Zolnierkiewicz, Hanjun Guo, Jingoo Han, Joseph Lo,
Julia Lawall, Li Zhong, Mark Brown, Sascha Hauer, Stephen Boyd,
Stratos Karafotis, and Viresh Kumar.
14) Fixes to prevent race conditions in coupled cpuidle from happening
from Colin Cross.
15) cpuidle core fixes and cleanups from Daniel Lezcano and
Tuukka Tikkanen.
16) Assorted cpuidle fixes and cleanups from Daniel Lezcano,
Geert Uytterhoeven, Jingoo Han, Julia Lawall, Linus Walleij,
and Sahara.
17) System sleep tracing changes from Todd E Brandt and Shuah Khan.
18) PNP subsystem conversion to using struct dev_pm_ops for power
management from Shuah Khan.
* tag 'pm+acpi-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (217 commits)
cpufreq: Don't use smp_processor_id() in preemptible context
cpuidle: coupled: fix race condition between pokes and safe state
cpuidle: coupled: abort idle if pokes are pending
cpuidle: coupled: disable interrupts after entering safe state
ACPI / hotplug: Remove containers synchronously
driver core / ACPI: Avoid device hot remove locking issues
cpufreq: governor: Fix typos in comments
cpufreq: governors: Remove duplicate check of target freq in supported range
cpufreq: Fix timer/workqueue corruption due to double queueing
ACPI / EC: Add ASUSTEK L4R to quirk list in order to validate ECDT
ACPI / thermal: Add check of "_TZD" availability and evaluating result
cpufreq: imx6q: Fix clock enable balance
ACPI: blacklist win8 OSI for buggy laptops
cpufreq: tegra: fix the wrong clock name
cpuidle: Change struct menu_device field types
cpuidle: Add a comment warning about possible overflow
cpuidle: Fix variable domains in get_typical_interval()
cpuidle: Fix menu_device->intervals type
cpuidle: CodingStyle: Break up multiple assignments on single line
cpuidle: Check called function parameter in get_typical_interval()
...
Here's the big tty/serial driver pull request for 3.12-rc1.
Lots of n_tty reworks to resolve some very long-standing issues, removing the
3-4 different locks that were taken for every character. This code has been
beaten on for a long time in linux-next with no reported regressions.
Other than that, a range of serial and tty driver updates and revisions. Full
details in the shortlog.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.21 (GNU/Linux)
iEYEABECAAYFAlIlI6UACgkQMUfUDdst+ym7kgCgmysv/TVeqsdvmkiO2eEB4+xs
ddwAoMqkJ/enCJ2f+fC8y2Wz+5+kDrU7
=CiCp
-----END PGP SIGNATURE-----
Merge tag 'tty-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver patches from Greg KH:
"Here's the big tty/serial driver pull request for 3.12-rc1.
Lots of n_tty reworks to resolve some very long-standing issues,
removing the 3-4 different locks that were taken for every character.
This code has been beaten on for a long time in linux-next with no
reported regressions.
Other than that, a range of serial and tty driver updates and
revisions. Full details in the shortlog"
* tag 'tty-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (226 commits)
hvc_xen: Remove unnecessary __GFP_ZERO from kzalloc
serial: imx: initialize the local variable
tty: ar933x_uart: add device tree support and binding documentation
tty: ar933x_uart: allow to build the driver as a module
ARM: dts: msm: Update uartdm compatible strings
devicetree: serial: Document msm_serial bindings
serial: unify serial bindings into a single dir
serial: fsl-imx-uart: Cleanup duplicate device tree binding
tty: ar933x_uart: use config_enabled() macro to clean up ifdefs
tty: ar933x_uart: remove superfluous assignment of ar933x_uart_driver.nr
tty: ar933x_uart: use the clk API to get the uart clock
tty: serial: cpm_uart: Adding proper request of GPIO used by cpm_uart driver
serial: sirf: fix the amount of serial ports
serial: sirf: define macro for some magic numbers of USP
serial: icom: move array overflow checks earlier
TTY: amiserial, remove unnecessary platform_set_drvdata()
serial: st-asc: remove unnecessary platform_set_drvdata()
msm_serial: Send more than 1 character on the console w/ UARTDM
msm_serial: Add support for non-GSBI UARTDM devices
msm_serial: Switch clock consumer strings and simplify code
...
* acpica:
ACPICA: Update version to 20130725.
ACPICA: Update names for walk_namespace callbacks to clarify usage.
ACPICA: Return error if DerefOf resolves to a null package element.
ACPICA: Make ACPI Power Management Timer (PM Timer) optional.
ACPICA: Fix divergences of the commit - ACPICA: Expose OSI version.
ACPICA: Fix possible fault for methods that optionally have no return value.
ACPICA: DeRefOf operator: Update to fully resolve FieldUnit and BufferField refs.
ACPICA: Emit all unresolved method externals in a text block
ACPICA: Export acpi_tb_validate_rsdp().
ACPI: Add facility to remove all _OSI strings
ACPI: Add facility to disable all _OSI OS vendor strings
ACPICA: Add acpi_update_interfaces() public interface
ACPICA: Update version to 20130626
ACPICA: Fix compiler warnings for casting issues (only some compilers)
ACPICA: Remove restriction of 256 maximum GPEs in any GPE block
ACPICA: Disassembler: Expand maximum output string length to 64K
ACPICA: TableManager: Export acpi_tb_scan_memory_for_rsdp()
ACPICA: Update comments about behavior when _STA does not exist
The swapaccount kernel parameter without any values has been removed by
commit a2c8990aed ("memsw: remove noswapaccount kernel parameter") but
it seems that we didn't get rid of all the left overs.
Make sure that menuconfig help text and kernel-parameters.txt are clear
about value for the paramter and remove the stalled comment which is not
very much useful on its own.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reported-by: Gergely Risko <gergely@risko.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The virtual console has (undocumented) module parameters to set the
colors for italic and underlined text, but the default text color was
hardcoded for some reason. This made it impossible to change the color
for startup messages, or to set the default for new virtual consoles.
Add a module parameter for that, and document the entire bunch.
Any hacker who thinks that a command prompt on a "black screen with
white font" is not supicious enough can now use the kernel parameter
vt.color=10 to get a nice, evil green.
Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch changes the "acpi_osi=" boot parameter implementation so
that:
1. "acpi_osi=!" can be used to disable all _OSI OS vendor strings by
default. It is meaningless to specify "acpi_osi=!" multiple
times as it can only affect the default state of the target _OSI
strings.
2. "acpi_osi=!*" can be used to remove all _OSI OS vendor strings
and all _OSI feature group strings. It is useful to specify
"acpi_osi=!*" multiple times through kernel command line to
override the current state of the target _OSI strings.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch introduces "acpi_osi=!" command line to force Linux replying
"UNSUPPORTED" to all of the _OSI strings. This patch is based on an
ACPICA enhancement - the new API acpi_update_interfaces().
The _OSI object provides the platform with the ability to query OSPM
to determine the set of ACPI related interfaces, behaviors, or
features that the operating system supports. The argument passed to
the _OSI is a string like the followings:
1. Feature Group String, examples include
Module Device
Processor Device
3.0 _SCP Extensions
Processor Aggregator Device
...
2. OS Vendor String, examples include
Linux
FreeBSD
Windows
...
There are AML codes provided in the ACPI namespace written in the
following style to determine OSPM interfaces / features:
Method(OSCK)
{
if (CondRefOf(_OSI, Local0))
{
if (\_OSI("Windows"))
{
Return (One)
}
if (\_OSI("Windows 2006"))
{
Return (Ones)
}
Return (Zero)
}
Return (Zero)
}
There is a debugging facility implemented in Linux. Users can pass
"acpi_osi=" boot parameters to the kernel to tune the _OSI evaluation
result so that certain AML codes can be executed. Current
implementation includes:
1. 'acpi_osi=' - this makes CondRefOf(_OSI, Local0) TRUE
2. 'acpi_osi="Windows"' - this makes \_OSI("Windows") TRUE
3. 'acpi_osi="!Windows"' - this makes \_OSI("Windows") FALSE
The function to implement this feature is also used as a quirk mechanism
in the Linux ACPI subystem.
When _OSI is evaluatated by the AML codes, ACPICA replies "SUPPORTED"
to all Windows operating system vendor strings. This is because
Windows operating systems return "SUPPORTED" if the argument to the
_OSI method specifies an earlier version of Windows. Please refer to
the following MSDN document:
How to Identify the Windows Version in ACPI by Using _OSI
http://msdn.microsoft.com/en-us/library/hardware/gg463275.aspx
This adds difficulties when developers want to feed specific Windows
operating system vendor string to the BIOS codes for debugging
purpose, multiple acpi_osi="!xxx" have to be specified in the command
line to force Linux replying "UNSUPPORTED" to the Windows OS vendor
strings listed in the AML codes.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
were added to 3.10, which includes several bug fixes that have been
marked for stable.
As for new features, there were a few, but nothing to write to LWN about.
These include:
New function trigger called "dump" and "cpudump" that will cause
ftrace to dump its buffer to the console when the function is called.
The difference between "dump" and "cpudump" is that "dump" will dump
the entire contents of the ftrace buffer, where as "cpudump" will only
dump the contents of the ftrace buffer for the CPU that called the function.
Another small enhancement is a new sysctl switch called "traceoff_on_warning"
which, when enabled, will disable tracing if any WARN_ON() is triggered.
This is useful if you want to debug what caused a warning and do not
want to risk losing your trace data by the ring buffer overwriting the
data before you can disable it. There's also a kernel command line
option that will make this enabled at boot up called the same thing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJR1uF2AAoJEOdOSU1xswtMJ1IH/2LSiZAKTA2QaRgGQC/5Bb9c
XSOI1HfD/78lmUvTyb0AX8sLpkzZlvIONEQ/WaZUFo1Zjbrl45zJUwMkTE9uImEg
ZqI5x8OiiN6j4XrRbfYn3Ti060H/Jq41pZXa+shh961Vv51ilv/1yyLkoRmnjzuO
JTloPdXDV7icOqqiSdgxSdtUSv59Ef1ZdHgvvsb3aqzMC5btVQPi4kIys0ST1Tr1
pMWBY+UgvH0xYm3gvTR+W6jjDlkVZEH2alkmcinfr+uC1tm9DDqK2HA17Pd5yZ5z
HNdT76lCzf9iqRF5F8HUvUt+PIp76dNNxAt2qpB6APqAuJTojyguxXHDbY/0kzs=
=UvLi
-----END PGP SIGNATURE-----
Merge tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing changes from Steven Rostedt:
"The majority of the changes here are cleanups for the large changes
that were added to 3.10, which includes several bug fixes that have
been marked for stable.
As for new features, there were a few, but nothing to write to LWN
about. These include:
New function trigger called "dump" and "cpudump" that will cause
ftrace to dump its buffer to the console when the function is called.
The difference between "dump" and "cpudump" is that "dump" will dump
the entire contents of the ftrace buffer, where as "cpudump" will only
dump the contents of the ftrace buffer for the CPU that called the
function.
Another small enhancement is a new sysctl switch called
"traceoff_on_warning" which, when enabled, will disable tracing if any
WARN_ON() is triggered. This is useful if you want to debug what
caused a warning and do not want to risk losing your trace data by the
ring buffer overwriting the data before you can disable it. There's
also a kernel command line option that will make this enabled at boot
up called the same thing"
* tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (34 commits)
tracing: Make tracing_open_generic_{tr,tc}() static
tracing: Remove ftrace() function
tracing: Remove TRACE_EVENT_TYPE enum definition
tracing: Make tracer_tracing_{off,on,is_on}() static
tracing: Fix irqs-off tag display in syscall tracing
uprobes: Fix return value in error handling path
tracing: Fix race between deleting buffer and setting events
tracing: Add trace_array_get/put() to event handling
tracing: Get trace_array ref counts when accessing trace files
tracing: Add trace_array_get/put() to handle instance refs better
tracing: Protect ftrace_trace_arrays list in trace_events.c
tracing: Make trace_marker use the correct per-instance buffer
ftrace: Do not run selftest if command line parameter is set
tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()
tracing: Use flag buffer_disabled for irqsoff tracer
tracing/kprobes: Turn trace_probe->files into list_head
tracing: Fix disabling of soft disable
tracing: Add missing syscall_metadata comment
tracing: Simplify code for showing of soft disabled flag
tracing/kprobes: Kill probe_enable_lock
...
Merge together the unicore32, arm, and x86 reboot= command line
parameter handling.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull libata updates from Tejun Heo:
"Overview of changes:
- The rest of maintainer email address updates.
- Some core updates - more robust default behavior for port
multipliers, better error reporting for SG_IO commands, and a way
to better work around now ancient and probably pretty rare PATA ->
SATA bridges with ATAPI devices.
- sata_rcar stabilization.
- Some hardware PCI ID additions and one-off low level driver
updates."
* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (22 commits)
AHCI: use ATA_BUSY
libata-zpodd: must use ata_tf_init()
ahci: AHCI-mode SATA patch for Intel Coleto Creek DeviceIDs
ata_piix: IDE-mode SATA patch for Intel Coleto Creek DeviceIDs
libata: cleanup SAT error translation
ahci: sata: add support for exynos5440 sata
libata: skip SRST for all SIMG [34]7x port-multipliers
ahci: remove pmp link online check in FBS EH
sata highbank: add bit-banged SGPIO driver support
ahci: make ahci_transmit_led_message into a function pointer
sata_rcar: fix compilation warning in sata_rcar_thaw()
sata_highbank: increase retry count but shorten duration for Calxeda controller
ata: use pci_get_drvdata()
ipr: qc_fill_rtf() method should not store alternate status register
sata_rcar: add 'base' local variable to some functions
sata_rcar: correct 'sata_rcar_sht'
sata_rcar: kill superfluous code in sata_rcar_bmdma_fill_sg()
libata: do not limit R-Car SATA driver to shmobile
ata: use platform_{get,set}_drvdata()
AHCI: Make distinct names for ports in /proc/interrupts
...
- Hotplug changes allowing device hot-removal operations to fail
gracefully (instead of crashing the kernel) if they cannot be
carried out completely. From Rafael J Wysocki and Toshi Kani.
- Freezer update from Colin Cross and Mandeep Singh Baines targeted
at making the freezing of tasks a bit less heavy weight operation.
- cpufreq resume fix from Srivatsa S Bhat for a regression introduced
during the 3.10 cycle causing some cpufreq sysfs attributes to
return wrong values to user space after resume.
- New freqdomain_cpus sysfs attribute for the acpi-cpufreq driver to
provide information previously available via related_cpus from
Lan Tianyu.
- cpufreq fixes and cleanups from Viresh Kumar, Jacob Shin,
Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia, Arnd Bergmann, and
Tang Yuantian.
- Fix for an ACPICA regression causing suspend/resume issues to
appear on some systems introduced during the 3.4 development cycle
from Lv Zheng.
- ACPICA fixes and cleanups from Bob Moore, Tomasz Nowicki, Lv Zheng,
Chao Guan, and Zhang Rui.
- New cupidle driver for Xilinx Zynq processors from Michal Simek.
- cpuidle fixes and cleanups from Daniel Lezcano.
- Changes to make suspend/resume work correctly in Xen guests from
Konrad Rzeszutek Wilk.
- ACPI device power management fixes and cleanups from Fengguang Wu
and Rafael J Wysocki.
- ACPI documentation updates from Lv Zheng, Aaron Lu and Hanjun Guo.
- Fix for the IA-64 issue that was the reason for reverting commit
9f29ab1 and updates of the ACPI scan code from Rafael J Wysocki.
- Mechanism for adding CMOS RTC address space handlers from Lan Tianyu
(to allow some EC-related breakage to be fixed on some systems).
- Spec-compliant implementation of acpi_os_get_timer() from
Mika Westerberg.
- Modification of do_acpi_find_child() to execute _STA in order to
to avoid situations in which a pointer to a disabled device object
is returned instead of an enabled one with the same _ADR value.
From Jeff Wu.
- Intel BayTrail PCH (Platform Controller Hub) support for the ACPI
Intel Low-Power Subsystems (LPSS) driver and modificaions of that
driver to work around a couple of known BIOS issues from
Mika Westerberg and Heikki Krogerus.
- EC driver fix from Vasiliy Kulikov to make it use get_user() and
put_user() instead of dereferencing user space pointers blindly.
- Assorted ACPI code cleanups from Bjorn Helgaas, Nicholas Mazzuca and
Toshi Kani.
- Modification of the "runtime idle" helper routine to take the return
values of the callbacks executed by it into account and to call
rpm_suspend() if they return 0, which allows some code bloat
reduction to be done, from Rafael J Wysocki and Alan Stern.
- New trace points for PM QoS from Sahara <keun-o.park@windriver.com>.
- PM QoS documentation update from Lan Tianyu.
- Assorted core PM code cleanups and changes from Bernie Thompson,
Bjorn Helgaas, Julius Werner, and Shuah Khan.
- New devfreq driver for the Exynos5-bus device from Abhilash Kesavan.
- Minor devfreq cleanups, fixes and MAINTAINERS update from
MyungJoo Ham, Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and
Wei Yongjun.
- OMAP Adaptive Voltage Scaling (AVS) SmartReflex voltage control
driver updates from Andrii Tseglytskyi and Nishanth Menon.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJR0ZNOAAoJEKhOf7ml8uNsDLYP/0EU4rmvw0TWTITfp6RS1KDE
9GwBn96ZR4Q5bJd9gBCTPSqhHOYMqxWEUp99sn/M2wehG1pk/jw5LO56+2IhM3UZ
g1HDcJ7te2nVT/iXsKiAGTVhU9Rk0aYwoVSknwk27qpIBGxW9w/s5tLX8pY3Q3Zq
wL/7aTPjyL+PFFFEaxgH7qLqsl3DhbtYW5AriUBTkXout/tJ4eO1b7MNBncLDh8X
VQ/0DNCKE95VEJfkO4rk9RKUyVp9GDn0i+HXCD/FS4IA5oYzePdVdNDmXf7g+swe
CGlTZq8pB+oBpDiHl4lxzbNrKQjRNbGnDUkoRcWqn0nAw56xK+vmYnWJhW99gQ/I
fKnvxeLca5po1aiqmC4VSJxZIatFZqLrZAI4dzoCLWY+bGeTnCKmj0/F8ytFnZA2
8IuLLs7/dFOaHXV/pKmpg6FAlFa9CPxoqRFoyqb4M0GjEarADyalXUWsPtG+6xCp
R/p0CISpwk+guKZR/qPhL7M654S7SHrPwd2DPF0KgGsvk+G2GhoB8EzvD8BVp98Z
9siCGCdgKQfJQVI6R0k9aFmn/4gRQIAgyPhkhv9tqULUUkiaXki+/t8kPfnb8O/d
zep+CA57E2G8MYLkDJfpFeKS7GpPD6TIdgFdGmOUC0Y6sl9iTdiw4yTx8O2JM37z
rHBZfYGkJBrbGRu+Q1gs
=VBBq
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the total number of ACPI commits is slightly greater than
the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
remains the most active patch submitter.
To me, the most significant change is the addition of offline/online
device operations to the driver core (with the Greg's blessing) and
the related modifications of the ACPI core hotplug code. Next are the
freezer updates from Colin Cross that should make the freezing of
tasks a bit less heavy weight.
We also have a couple of regression fixes, a number of fixes for
issues that have not been identified as regressions, two new drivers
and a bunch of cleanups all over.
Highlights:
- Hotplug changes to support graceful hot-removal failures.
It sometimes is necessary to fail device hot-removal operations
gracefully if they cannot be carried out completely. For example,
if memory from a memory module being hot-removed has been allocated
for the kernel's own use and cannot be moved elsewhere, it's
desirable to fail the hot-removal operation in a graceful way
rather than to crash the kernel, but currenty a success or a kernel
crash are the only possible outcomes of an attempted memory
hot-removal. Needless to say, that is not a very attractive
alternative and it had to be addressed.
However, in order to make it work for memory, I first had to make
it work for CPUs and for this purpose I needed to modify the ACPI
processor driver. It's been split into two parts, a resident one
handling the low-level initialization/cleanup and a modular one
playing the actual driver's role (but it binds to the CPU system
device objects rather than to the ACPI device objects representing
processors). That's been sort of like a live brain surgery on a
patient who's riding a bike.
So this is a little scary, but since we found and fixed a couple of
regressions it caused to happen during the early linux-next testing
(a month ago), nobody has complained.
As a bonus we remove some duplicated ACPI hotplug code, because the
ACPI-based CPU hotplug is now going to use the common ACPI hotplug
code.
- Lighter weight freezing of tasks.
These changes from Colin Cross and Mandeep Singh Baines are
targeted at making the freezing of tasks a bit less heavy weight
operation. They reduce the number of tasks woken up every time
during the freezing, by using the observation that the freezer
simply doesn't need to wake up some of them and wait for them all
to call refrigerator(). The time needed for the freezer to decide
to report a failure is reduced too.
Also reintroduced is the check causing a lockdep warining to
trigger when try_to_freeze() is called with locks held (which is
generally unsafe and shouldn't happen).
- cpufreq updates
First off, a commit from Srivatsa S Bhat fixes a resume regression
introduced during the 3.10 cycle causing some cpufreq sysfs
attributes to return wrong values to user space after resume. The
fix is kind of fresh, but also it's pretty obvious once Srivatsa
has identified the root cause.
Second, we have a new freqdomain_cpus sysfs attribute for the
acpi-cpufreq driver to provide information previously available via
related_cpus. From Lan Tianyu.
Finally, we fix a number of issues, mostly related to the
CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
up some code. The majority of changes from Viresh Kumar with bits
from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
Arnd Bergmann, and Tang Yuantian.
- ACPICA update
A usual bunch of updates from the ACPICA upstream.
During the 3.4 cycle we introduced support for ACPI 5 extended
sleep registers, but they are only supposed to be used if the
HW-reduced mode bit is set in the FADT flags and the code attempted
to use them without checking that bit. That caused suspend/resume
regressions to happen on some systems. Fix from Lv Zheng causes
those registers to be used only if the HW-reduced mode bit is set.
Apart from this some other ACPICA bugs are fixed and code cleanups
are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
Zhang Rui.
- cpuidle updates
New driver for Xilinx Zynq processors is added by Michal Simek.
Multidriver support simplification, addition of some missing
kerneldoc comments and Kconfig-related fixes come from Daniel
Lezcano.
- ACPI power management updates
Changes to make suspend/resume work correctly in Xen guests from
Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
cleanups and fixes of the ACPI device power state selection
routine.
- ACPI documentation updates
Some previously missing pieces of ACPI documentation are added by
Lv Zheng and Aaron Lu (hopefully, that will help people to
uderstand how the ACPI subsystem works) and one outdated doc is
updated by Hanjun Guo.
- Assorted ACPI updates
We finally nailed down the IA-64 issue that was the reason for
reverting commit 9f29ab11dd ("ACPI / scan: do not match drivers
against objects having scan handlers"), so we can fix it and move
the ACPI scan handler check added to the ACPI video driver back to
the core.
A mechanism for adding CMOS RTC address space handlers is
introduced by Lan Tianyu to allow some EC-related breakage to be
fixed on some systems.
A spec-compliant implementation of acpi_os_get_timer() is added by
Mika Westerberg.
The evaluation of _STA is added to do_acpi_find_child() to avoid
situations in which a pointer to a disabled device object is
returned instead of an enabled one with the same _ADR value. From
Jeff Wu.
Intel BayTrail PCH (Platform Controller Hub) support is added to
the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
driver is modified to work around a couple of known BIOS issues.
Changes from Mika Westerberg and Heikki Krogerus.
The EC driver is fixed by Vasiliy Kulikov to use get_user() and
put_user() instead of dereferencing user space pointers blindly.
Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
Kani.
- Assorted power management updates
The "runtime idle" helper routine is changed to take the return
values of the callbacks executed by it into account and to call
rpm_suspend() if they return 0, which allows us to reduce the
overall code bloat a bit (by dropping some code that's not
necessary any more after that modification).
The runtime PM documentation is updated by Alan Stern (to reflect
the "runtime idle" behavior change).
New trace points for PM QoS are added by Sahara
(<keun-o.park@windriver.com>).
PM QoS documentation is updated by Lan Tianyu.
Code cleanups are made and minor issues are addressed by Bernie
Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.
- devfreq updates
New driver for the Exynos5-bus device from Abhilash Kesavan.
Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.
- OMAP power management updates
Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
updates from Andrii Tseglytskyi and Nishanth Menon."
* tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
cpufreq: Fix cpufreq regression after suspend/resume
ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
PM / Sleep: Warn about system time after resume with pm_trace
cpufreq: don't leave stale policy pointer in cdbs->cur_policy
acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
cpufreq: make sure frequency transitions are serialized
ACPI: implement acpi_os_get_timer() according the spec
ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
ACPI: Add CMOS RTC Operation Region handler support
ACPI / processor: Drop unused variable from processor_perflib.c
cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
...
Pull security subsystem updates from James Morris:
"In this update, Smack learns to love IPv6 and to mount a filesystem
with a transmutable hierarchy (i.e. security labels are inherited
from parent directory upon creation rather than creating process).
The rest of the changes are maintenance"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (37 commits)
tpm/tpm_i2c_infineon: Remove unused header file
tpm: tpm_i2c_infinion: Don't modify i2c_client->driver
evm: audit integrity metadata failures
integrity: move integrity_audit_msg()
evm: calculate HMAC after initializing posix acl on tmpfs
maintainers: add Dmitry Kasatkin
Smack: Fix the bug smackcipso can't set CIPSO correctly
Smack: Fix possible NULL pointer dereference at smk_netlbl_mls()
Smack: Add smkfstransmute mount option
Smack: Improve access check performance
Smack: Local IPv6 port based controls
tpm: fix regression caused by section type conflict of tpm_dev_release() in ppc builds
maintainers: Remove Kent from maintainers
tpm: move TPM_DIGEST_SIZE defintion
tpm_tis: missing platform_driver_unregister() on error in init_tis()
security: clarify cap_inode_getsecctx description
apparmor: no need to delay vfree()
apparmor: fix fully qualified name parsing
apparmor: fix setprocattr arg processing for onexec
apparmor: localize getting the security context to a few macros
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRzhAdAAoJEKurIx+X31iByQEP/1eAEHNiSCrrKUm/ovGCiVBZ
w+ZRJkcKYBj+/3cQUeMds7OevmM8fSU5t+4zGpikobQ75xCjz7vJ6nQNLrSylqQG
Z4weVme6a529szRwv6H9an+psj5wP0yWAbCgBSw4/O3cogDS0eMD8T6ZhEi8mWJ9
PRSdqOLdxl7AZpJpJpP3GF7VQLJLDbVv56FcKgAZ3mtFM6qr7HFiTtnTc+Jl33V5
bM7v23HXyoyRk+JOUMtO+4BlrkKzbtL4Zh8aGw4s71QX3zZbP0BIw9sKffNMD+2k
oYXOcJjTwbtxQqXKoSKOtvLUQzf/SDqzlFf7yU38Spw+nRECPhVVRO0aAXKOmFKD
74qyNuW6ZXMw7xH4M7iG6MuM/cZIvueDtp0RSZVc6Gmb5CGb1aXz4GeotvKF5Bev
nVL65UVo6XwfB4m77ayPYqfD6KX0BsWHdCzWABNl8WbgyCY5dXRJJs/c9iLCl05I
EXRgWLJK/H/wOY9mj3NGcvRicYJigZP2luJEdwn8mVlpZl+3wstDcgq9Ijxr0LGi
bj2H9ro3vSaXpAnJw3FR9QbXgH/8Nbb8sV7yBddy9zntrnfEfvrYtrhOKR+fPrQL
C5zXeaXGEnqCEcYNESSjyhDCa1az49BlJC5ebEHHqdviHzRpVVi/EG920GViWw+D
z+z25N5hi4yoUF/+1vc/
=Dm06
-----END PGP SIGNATURE-----
Merge tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull thermal power-limit update from Tony Luck:
"Thermal limit warnings are too scary and cause unnecessary concern"
* tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
x86 thermal: Disable power limit notification interrupt by default
x86 thermal: Delete power-limit-notification console messages
Pull workqueue changes from Tejun Heo:
"Surprisingly, Lai and I didn't break too many things implementing
custom pools and stuff last time around and there aren't any follow-up
changes necessary at this point.
The only change in this pull request is Viresh's patches to make some
per-cpu workqueues to behave as unbound workqueues dependent on a boot
param whose default can be configured via a config option. This leads
to higher processing overhead / lower bandwidth as more work items are
bounced across CPUs; however, it can lead to noticeable powersave in
certain configurations - ~10% w/ idlish constant workload on a
big.LITTLE configuration according to Viresh.
This is because per-cpu workqueues interfere with how the scheduler
perceives whether or not each CPU is idle by forcing pinned tasks on
them, which makes the scheduler's power-aware scheduling decisions
less effective.
Its effectiveness is likely less pronounced on homogenous
configurations and this type of optimization can probably be made
automatic; however, the changes are pretty minimal and the affected
workqueues are clearly marked, so it's an easy gain for some
configurations for the time being with pretty unintrusive changes."
* 'for-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
fbcon: queue work on power efficient wq
block: queue work on power efficient wq
PHYLIB: queue work on system_power_efficient_wq
workqueue: Add system wide power_efficient workqueues
workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues
Add description for video module's parameter brightness_switch_enabled
into kernel-parameters.txt.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch moves the integrity_audit_msg() function and defintion to
security/integrity/, the parent directory, renames the 'ima_audit'
boot command line option to 'integrity_audit', and fixes the Kconfig
help text to reflect the actual code.
Changelog:
- Fixed ifdef inclusion of integrity_audit_msg() (Fengguang Wu)
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Add a traceoff_on_warning option in both the kernel command line as well
as a sysctl option. When set, any WARN*() function that is hit will cause
the tracing_on variable to be cleared, which disables writing to the
ring buffer.
This is useful especially when tracing a bug with function tracing. When
a warning is hit, the print caused by the warning can flood the trace with
the functions that producing the output for the warning. This can make the
resulting trace useless by either hiding where the bug happened, or worse,
by overflowing the buffer and losing the trace of the bug totally.
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The package power limit notification interrupt is primarily for
system diagnosis, and should not be blindly enabled on every
system by default -- particuarly since Linux does nothing in the
handler except count how many times it has been called...
Add a new kernel cmdline parameter "int_pln_enable" for situations where
users want to oberve these events via existing system counters:
$ grep TRM /proc/interrupts
$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*
https://bugzilla.kernel.org/show_bug.cgi?id=36182
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Pull block layer fixes from Jens Axboe:
"Outside of bcache (which really isn't super big), these are all
few-liners. There are a few important fixes in here:
- Fix blk pm sleeping when holding the queue lock
- A small collection of bcache fixes that have been done and tested
since bcache was included in this merge window.
- A fix for a raid5 regression introduced with the bio changes.
- Two important fixes for mtip32xx, fixing an oops and potential data
corruption (or hang) due to wrong bio iteration on stacked devices."
* 'for-linus' of git://git.kernel.dk/linux-block:
scatterlist: sg_set_buf() argument must be in linear mapping
raid5: Initialize bi_vcnt
pktcdvd: silence static checker warning
block: remove refs to XD disks from documentation
blkpm: avoid sleep when holding queue lock
mtip32xx: Correctly handle bio->bi_idx != 0 conditions
mtip32xx: Fix NULL pointer dereference during module unload
bcache: Fix error handling in init code
bcache: clarify free/available/unused space
bcache: drop "select CLOSURES"
bcache: Fix incompatible pointer type warning
Some device require DMADIR to be enabled, but are not detected as such
by atapi_id_dmadir. One such example is "Asus Serillel 2"
SATA-host-to-PATA-device bridge: the bridge itself requires DMADIR,
even if the bridged device does not.
As atapi_dmadir module parameter can cause problems with some devices
(as per Tejun Heo's memory), enabling it globally may not be possible
depending on the hardware.
This patch adds atapi_dmadir in the form of a "force" horkage value,
allowing global, per-bus and per-device control.
Signed-off-by: Vincent Pelletier <plr.vincent@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit d1a6f4f197
"block: delete super ancient PC-XT driver for 1980's hardware"
deleted the XD disk driver, but there are still a few
references to it in the documentation directory. Delete
the remnants and thus also free up the major block device
13 for reuse.
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no point. We would just squeeze the guest to put more and
more pages in the swap disk without any purpose.
The only time it makes sense to use the selfballooning and shrinking
is when frontswap is being utilized.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If tmem is built-in or a module, the user has the option on
the command line to influence it by doing: tmem.<some option>
instead of having a variety of "nocleancache", and
"nofrontswap". The others: "noselfballooning" and "selfballooning";
and "noselfshrink" are in a different driver xen-selfballoon.c
and the patches:
xen/tmem: Remove the usage of 'noselfshrink' and use 'tmem.selfshrink' bool instead.
xen/tmem: Remove the usage of 'noselfballoon','selfballoon' and use 'tmem.selfballon' bool instead.
remove them.
Also add documentation.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Workqueues can be performance or power-oriented. Currently, most workqueues are
bound to the CPU they were created on. This gives good performance (due to cache
effects) at the cost of potentially waking up otherwise idle cores (Idle from
scheduler's perspective. Which may or may not be physically idle) just to
process some work. To save power, we can allow the work to be rescheduled on a
core that is already awake.
Workqueues created with the WQ_UNBOUND flag will allow some power savings.
However, we don't change the default behaviour of the system. To enable
power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
be turned on. This option can also be overridden by the
workqueue.power_efficient boot parameter.
tj: Updated config description and comments. Renamed
CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
The updates are mostly about the x86 IOMMUs this time. Exceptions are
the groundwork for the PAMU IOMMU from Freescale (for a PPC platform)
and an extension to the IOMMU group interface. On the x86 side this
includes a workaround for VT-d to disable interrupt remapping on broken
chipsets. On the AMD-Vi side the most important new feature is a kernel
command-line interface to override broken information in IVRS ACPI
tables and get interrupt remapping working this way. Besides that there
are small fixes all over the place.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRh2vAAAoJECvwRC2XARrjbjkP/jzKzeffybUpQsIJF8rs/IEt
hSwqpGLr6WR5FdneEH9fiBIp4pyMDXmuAb/2ZNgB+DgPN3xgqmWVo4WLk7pMo3BS
/xIz/lu7hIX3AtKt807pL9+rPdhGYEJ43Vmr4bW9x0l1kuNXy6fmMLcN5FaPKjV4
p4hY4jOstEgtYQw4wi39/9b4FsYoipZizkOUSdtCzWwTv7jOHH7/Wra8iZyzL6Je
1VlF/efp0ytTcwLdHOfGwPCIlZrQRtQCM4SqdAUG9bOL3ARR9Yu/0iW1295nbLzo
CQX5CfKePvo/fGxki1jcBi+UCyxYKPosB5kCxmh4MAxCg/VzzMsaME/A73tLJa6W
Y29bbjwPoBPMq03HX8S9R5QWY8HpFujUUp+J4TXcKuTgYEV28WfLu1uaeKD716nM
LoXUojov7Cj8ZQZnhyu5l+XNaephBZLfw/8bM6bAxhlKXwAjmLiS5Z+srPl1GJee
5GCV+L94JifHLZaREWh3JFsh9O3W7Wno2++c4JU32uCWJHXH7tMgs2P8n5AY9rnT
Km1a9y6w2MF3Gg9j4y6u75m0XnFTNzYjeJMUtqVlwVhNHhgaXfuIWY63xOQCLJs1
ThTHOjoh0VqONGobR/ywn+0ouo9X07DnWpluyFd+zY3XK0UE0NOu9XMr4i6TWxOf
mlzWoEKxtw36XGHB/FtQ
=MVc/
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The updates are mostly about the x86 IOMMUs this time.
Exceptions are the groundwork for the PAMU IOMMU from Freescale (for a
PPC platform) and an extension to the IOMMU group interface.
On the x86 side this includes a workaround for VT-d to disable
interrupt remapping on broken chipsets. On the AMD-Vi side the most
important new feature is a kernel command-line interface to override
broken information in IVRS ACPI tables and get interrupt remapping
working this way.
Besides that there are small fixes all over the place."
* tag 'iommu-updates-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (24 commits)
iommu/tegra: Fix printk formats for dma_addr_t
iommu: Add a function to find an iommu group by id
iommu/vt-d: Remove warning for HPET scope type
iommu: Move swap_pci_ref function to drivers/iommu/pci.h.
iommu/vt-d: Disable translation if already enabled
iommu/amd: fix error return code in early_amd_iommu_init()
iommu/AMD: Per-thread IOMMU Interrupt Handling
iommu: Include linux/err.h
iommu/amd: Workaround for ERBT1312
iommu/amd: Document ivrs_ioapic and ivrs_hpet parameters
iommu/amd: Don't report firmware bugs with cmd-line ivrs overrides
iommu/amd: Add ioapic and hpet ivrs override
iommu/amd: Add early maps for ioapic and hpet
iommu/amd: Extend IVRS special device data structure
iommu/amd: Move add_special_device() to __init
iommu: Fix compile warnings with forward declarations
iommu/amd: Properly initialize irq-table lock
iommu/amd: Use AMD specific data structure for irq remapping
iommu/amd: Remove map_sg_no_iommu()
iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
...
Pull 'full dynticks' support from Ingo Molnar:
"This tree from Frederic Weisbecker adds a new, (exciting! :-) core
kernel feature to the timer and scheduler subsystems: 'full dynticks',
or CONFIG_NO_HZ_FULL=y.
This feature extends the nohz variable-size timer tick feature from
idle to busy CPUs (running at most one task) as well, potentially
reducing the number of timer interrupts significantly.
This feature got motivated by real-time folks and the -rt tree, but
the general utility and motivation of full-dynticks runs wider than
that:
- HPC workloads get faster: CPUs running a single task should be able
to utilize a maximum amount of CPU power. A periodic timer tick at
HZ=1000 can cause a constant overhead of up to 1.0%. This feature
removes that overhead - and speeds up the system by 0.5%-1.0% on
typical distro configs even on modern systems.
- Real-time workload latency reduction: CPUs running critical tasks
should experience as little jitter as possible. The last remaining
source of kernel-related jitter was the periodic timer tick.
- A single task executing on a CPU is a pretty common situation,
especially with an increasing number of cores/CPUs, so this feature
helps desktop and mobile workloads as well.
The cost of the feature is mainly related to increased timer
reprogramming overhead when a CPU switches its tick period, and thus
slightly longer to-idle and from-idle latency.
Configuration-wise a third mode of operation is added to the existing
two NOHZ kconfig modes:
- CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
as a config option. This is the traditional Linux periodic tick
design: there's a HZ tick going on all the time, regardless of
whether a CPU is idle or not.
- CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
periodic tick when a CPU enters idle mode.
- CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
tick when a CPU is idle, also slows the tick down to 1 Hz (one
timer interrupt per second) when only a single task is running on a
CPU.
The .config behavior is compatible: existing !CONFIG_NO_HZ and
CONFIG_NO_HZ=y settings get translated to the new values, without the
user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
default.
This feature is based on a lot of infrastructure work that has been
steadily going upstream in the last 2-3 cycles: related RCU support
and non-periodic cputime support in particular is upstream already.
This tree adds the final pieces and activates the feature. The pull
request is marked RFC because:
- it's marked 64-bit only at the moment - the 32-bit support patch is
small but did not get ready in time.
- it has a number of fresh commits that came in after the merge
window. The overwhelming majority of commits are from before the
merge window, but still some aspects of the tree are fresh and so I
marked it RFC.
- it's a pretty wide-reaching feature with lots of effects - and
while the components have been in testing for some time, the full
combination is still not very widely used. That it's default-off
should reduce its regression abilities and obviously there are no
known regressions with CONFIG_NO_HZ_FULL=y enabled either.
- the feature is not completely idempotent: there is no 100%
equivalent replacement for a periodic scheduler/timer tick. In
particular there's ongoing work to map out and reduce its effects
on scheduler load-balancing and statistics. This should not impact
correctness though, there are no known regressions related to this
feature at this point.
- it's a pretty ambitious feature that with time will likely be
enabled by most Linux distros, and we'd like you to make input on
its design/implementation, if you dislike some aspect we missed.
Without flaming us to crisp! :-)
Future plans:
- there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
the periodic tick altogether when there's a single busy task on a
CPU. We'd first like 1 Hz to be exposed more widely before we go
for the 0 Hz target though.
- once we reach 0 Hz we can remove the periodic tick assumption from
nr_running>=2 as well, by essentially interrupting busy tasks only
as frequently as the sched_latency constraints require us to do -
once every 4-40 msecs, depending on nr_running.
I am personally leaning towards biting the bullet and doing this in
v3.10, like the -rt tree this effort has been going on for too long -
but the final word is up to you as usual.
More technical details can be found in Documentation/timers/NO_HZ.txt"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
sched: Keep at least 1 tick per second for active dynticks tasks
rcu: Fix full dynticks' dependency on wide RCU nocb mode
nohz: Protect smp_processor_id() in tick_nohz_task_switch()
nohz_full: Add documentation.
cputime_nsecs: use math64.h for nsec resolution conversion helpers
nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
nohz: Reduce overhead under high-freq idling patterns
nohz: Remove full dynticks' superfluous dependency on RCU tree
nohz: Fix unavailable tick_stop tracepoint in dynticks idle
nohz: Add basic tracing
nohz: Select wide RCU nocb for full dynticks
nohz: Disable the tick when irq resume in full dynticks CPU
nohz: Re-evaluate the tick for the new task after a context switch
nohz: Prepare to stop the tick on irq exit
nohz: Implement full dynticks kick
nohz: Re-evaluate the tick from the scheduler IPI
sched: New helper to prevent from stopping the tick in full dynticks
sched: Kick full dynticks CPU that have more than one task enqueued.
perf: New helper to prevent full dynticks CPUs from stopping tick
perf: Kick full dynticks CPU if events rotation is needed
...
This branch contains platform updates for 3.10. Among the highlights:
- Support for the new Atmel Cortex-A5 based platforms (SAMA5D3)
- New support for CSR SiRFatlas6 SoCs
- A handful of updates for NVidia T114 (a.k.a. Tegra 4)
- A bunch of updates for the shmobile platforms
- A handful of updates for davinci
- A few updates for Qualcomm MSM
- Plus a handful of other patches, defconfig updates, etc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRgg+LAAoJEIwa5zzehBx3ePcP/3NUsSOTRQ2SZIVpyjnWOhkf
RMZiRaVsxrY0BPfDB9E2Vcb6lannKmACTujs/Ux7kJC22BreuFM1PnZoDfhkRuSE
n/nVB1981XJS82z2uONRSZGlUPSGWYzhTTUDJ0nHiBGmIGf5ctnC0iYWp3As3lv9
kNY14H7NkwQ4zBVNEMu7WfW8d2IJgqZJgR9xhZPv5fOZ+LlQmK6VaHWTmQtjyea1
bG1qoJ0dPbfJB4Vnr3a49rBkSJxZUiv8xQucw9+vo+ADRi64M4sZ1Jj2vVyDpqZp
F4fxBNMVvg7xM0TcBbItFFYJBXlUjeT4z+UI5iYjkbnE7EV9ndFeZXHCWX1qzOSy
X/nrJKuoe7ISQanBE9SHS9DpDGlkPDO0Mn0vb1f2VUQOY513pt/D1iFYEucZ6WCN
fWUYtvt5GayidUr55D1U8ssbE0oGt2rizd9x7GUk4KbRVAnUUNopIQAhXrefTrZm
jfdZNDckJ2F3aq8IPjsKuyJTpe61xD4Wvb3P/pEE3Q8fowPF5WIxXV+qjqHQ9vtt
Tz4LkP/YdynVFGmhOwz3QZmPaQItaabaYyCcZ5cVCvt5mdxx5VuHYppafhCPJz+V
KCQpKi1azuIv+sDR+nlGOl6+Ideea3s7TsRudfbmQFp5GsqkqOdJzR9gbbKmJauQ
4JPpRd+4W8wC8zXQnhVY
=HXX3
-----END PGP SIGNATURE-----
Merge tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC platform updates from Olof Johansson:
"This branch contains part 1 of the platform updates for 3.10. Among
the highlights:
- Support for the new Atmel Cortex-A5 based platforms (SAMA5D3)
- New support for CSR SiRFatlas6 SoCs
- A handful of updates for NVidia T114 (a.k.a. Tegra 4)
- A bunch of updates for the shmobile platforms
- A handful of updates for davinci
- A few updates for Qualcomm MSM
- Plus a handful of other patches, defconfig updates, etc."
* tag 'soc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (135 commits)
ARM: tegra: pm: fix build error w/o PM_SLEEP
ARM: davinci: ensure global variables are declared
ARM: davinci: sram.c: fix incorrect type in assignment
ARM: davinci: da8xx dt: make file local symbols static
ARM: davinci: da8xx: add remoteproc support
ARM: socfpga: Upgrade clk driver for socfpga to make use of dts clock entries
ARM: socfpga: Add clock entries into device tree
ARM: socfpga: Enable soft reset
ARM: EXYNOS: replace cpumask by the corresponding macro
ARM: EXYNOS: handle properly the return values
ARM: EXYNOS: factor out the idle states
ARM: OMAP4: Enable fix for Cortex-A9 erratas
ARM: OMAP2+: Export SoC information to userspace
ARM: OMAP2+: SoC name and revision unification
ARM: OMAP2+: Move common part of late init into common function
ARM: tegra: pm: remove duplicated include from pm.c
ARM: davinci: da850: override mmc DT node device name
ARM: davinci: da850: add mmc DT entries
mmc: davinci_mmc: add DT support
ARM: SAMSUNG: check processor type before cache restoration in resume
...
The full dynticks tree needs the latest RCU and sched
upstream updates in order to fix some dependencies.
Merge a common upstream merge point that has these
updates.
Conflicts:
include/linux/perf_event.h
kernel/rcutree.h
kernel/rcutree_plugin.h
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Pull x86 debug update from Ingo Molnar:
"Two small changes: a documentation update and a constification"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, early-printk: Update earlyprintk documentation (and kill x86 copy)
x86: Constify a few items
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle are mostly related to preparatory work
for the full-dynticks work:
- Remove restrictions on no-CBs CPUs, make RCU_FAST_NO_HZ take
advantage of numbered callbacks, do callback accelerations based on
numbered callbacks. Posted to LKML at
https://lkml.org/lkml/2013/3/18/960
- RCU documentation updates. Posted to LKML at
https://lkml.org/lkml/2013/3/18/570
- Miscellaneous fixes. Posted to LKML at
https://lkml.org/lkml/2013/3/18/594"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
rcu: Make rcu_accelerate_cbs() note need for future grace periods
rcu: Abstract rcu_start_future_gp() from rcu_nocb_wait_gp()
rcu: Rename n_nocb_gp_requests to need_future_gp
rcu: Push lock release to rcu_start_gp()'s callers
rcu: Repurpose no-CBs event tracing to future-GP events
rcu: Rearrange locking in rcu_start_gp()
rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
rcu: Accelerate RCU callbacks at grace-period end
rcu: Export RCU_FAST_NO_HZ parameters to sysfs
rcu: Distinguish "rcuo" kthreads by RCU flavor
rcu: Add event tracing for no-CBs CPUs' grace periods
rcu: Add event tracing for no-CBs CPUs' callback registration
rcu: Introduce proper blocking to no-CBs kthreads GP waits
rcu: Provide compile-time control for no-CBs CPUs
rcu: Tone down debugging during boot-up and shutdown.
rcu: Add softirq-stall indications to stall-warning messages
rcu: Documentation update
rcu: Make bugginess of code sample more evident
rcu: Fix hlist_bl_set_first_rcu() annotation
rcu: Delete unused rcu_node "wakemask" field
...
Pull workqueue updates from Tejun Heo:
"A lot of activities on workqueue side this time. The changes achieve
the followings.
- WQ_UNBOUND workqueues - the workqueues which are per-cpu - are
updated to be able to interface with multiple backend worker pools.
This involved a lot of churning but the end result seems actually
neater as unbound workqueues are now a lot closer to per-cpu ones.
- The ability to interface with multiple backend worker pools are
used to implement unbound workqueues with custom attributes.
Currently the supported attributes are the nice level and CPU
affinity. It may be expanded to include cgroup association in
future. The attributes can be specified either by calling
apply_workqueue_attrs() or through /sys/bus/workqueue/WQ_NAME/* if
the workqueue in question is exported through sysfs.
The backend worker pools are keyed by the actual attributes and
shared by any workqueues which share the same attributes. When
attributes of a workqueue are changed, the workqueue binds to the
worker pool with the specified attributes while leaving the work
items which are already executing in its previous worker pools
alone.
This allows converting custom worker pool implementations which
want worker attribute tuning to use workqueues. The writeback pool
is already converted in block tree and there are a couple others
are likely to follow including btrfs io workers.
- WQ_UNBOUND's ability to bind to multiple worker pools is also used
to make it NUMA-aware. Because there's no association between work
item issuer and the specific worker assigned to execute it, before
this change, using unbound workqueue led to unnecessary cross-node
bouncing and it couldn't be helped by autonuma as it requires tasks
to have implicit node affinity and workers are assigned randomly.
After these changes, an unbound workqueue now binds to multiple
NUMA-affine worker pools so that queued work items are executed in
the same node. This is turned on by default but can be disabled
system-wide or for individual workqueues.
Crypto was requesting NUMA affinity as encrypting data across
different nodes can contribute noticeable overhead and doing it
per-cpu was too limiting for certain cases and IO throughput could
be bottlenecked by one CPU being fully occupied while others have
idle cycles.
While the new features required a lot of changes including
restructuring locking, it didn't complicate the execution paths much.
The unbound workqueue handling is now closer to per-cpu ones and the
new features are implemented by simply associating a workqueue with
different sets of backend worker pools without changing queue,
execution or flush paths.
As such, even though the amount of change is very high, I feel
relatively safe in that it isn't likely to cause subtle issues with
basic correctness of work item execution and handling. If something
is wrong, it's likely to show up as being associated with worker pools
with the wrong attributes or OOPS while workqueue attributes are being
changed or during CPU hotplug.
While this creates more backend worker pools, it doesn't add too many
more workers unless, of course, there are many workqueues with unique
combinations of attributes. Assuming everything else is the same,
NUMA awareness costs an extra worker pool per NUMA node with online
CPUs.
There are also a couple things which are being routed outside the
workqueue tree.
- block tree pulled in workqueue for-3.10 so that writeback worker
pool can be converted to unbound workqueue with sysfs control
exposed. This simplifies the code, makes writeback workers
NUMA-aware and allows tuning nice level and CPU affinity via sysfs.
- The conversion to workqueue means that there's no 1:1 association
between a specific worker, which makes writeback folks unhappy as
they want to be able to tell which filesystem caused a problem from
backtrace on systems with many filesystems mounted. This is
resolved by allowing work items to set debug info string which is
printed when the task is dumped. As this change involves unifying
implementations of dump_stack() and friends in arch codes, it's
being routed through Andrew's -mm tree."
* 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (84 commits)
workqueue: use kmem_cache_free() instead of kfree()
workqueue: avoid false negative WARN_ON() in destroy_workqueue()
workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
workqueue: implement NUMA affinity for unbound workqueues
workqueue: introduce put_pwq_unlocked()
workqueue: introduce numa_pwq_tbl_install()
workqueue: use NUMA-aware allocation for pool_workqueues
workqueue: break init_and_link_pwq() into two functions and introduce alloc_unbound_pwq()
workqueue: map an unbound workqueues to multiple per-node pool_workqueues
workqueue: move hot fields of workqueue_struct to the end
workqueue: make workqueue->name[] fixed len
workqueue: add workqueue->unbound_attrs
workqueue: determine NUMA node of workers accourding to the allowed cpumask
workqueue: drop 'H' from kworker names of unbound worker pools
workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]
workqueue: move pwq_pool_locking outside of get/put_unbound_pool()
workqueue: fix memory leak in apply_workqueue_attrs()
workqueue: fix unbound workqueue attrs hashing / comparison
workqueue: fix race condition in unbound workqueue free path
workqueue: remove pwq_lock which is no longer used
...
existing platforms, as well as adoption of the framework by new
platforms and devices. Some long-needed fixes to the core framework are
here as well as new features such as improved initialization of clocks
from DT as well as framework reentrancy for nested clock operations.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRfqtLAAoJEDqPOy9afJhJsxwP/RLvfeeMIU3804ahVNK2C59h
ehJ06ZP+b0u0A7+YSC7CX1pHXIFW+UoZgYLJiLdV2kEdpOIKMELZyUcEVB97u1Of
TVlsmHfTLv2zVAq/LYRVSKFYeMUd/6RRoq7Cm6hoj638IVeXG7C+8pei2aVZe++t
1ENmb4UGFJ7NLfpE5zQ3fEuIfHfuWA8Od6SmPaV/YG5Io8HgkDGF3/tCJURJGII6
xLN2Rh8qbFktJLVvKe6yLyvUEZiWh8A6HNPyNiFYYGX11wU76zK2wMN3BW6Nn/kW
3PubzISoKRaoCZvuVK+CoLWnhFl2LteFVVmL1TBc/jxJe6q+rLX33sXl1q9K+SLt
POnHf/7nDyO3zbZWgfRR1r3FdeZqdLYw8HVsLcOKFcv9n1UligzuUNml5PklKwNh
BDMmSo5ytS1QPV1e9ZtVrk6IyvDyrenwfDW1Mw43ST6D23FVrivywB4X9ur6WljI
d1/CBvQXQZ11Hd4OAvqRL8QYFJvc5WlERjSd1j6I6XS6xioKOTKMkUC/KpRcCid9
avA6mJ5k/a1jTojvh2wl37paI//OzY0VDlxRSeMZIu9Dsn29DnPlE5CLg535Ovu+
mn9OtLFEDNnlgWCMQYUehGd7ITgtwrB/fxxNeBbMYjDz4AIirR2BIvMR7I8CMTQz
M0rHu8NpwKH6eqC6kAup
=+LO3
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux
Pull clock framework update from Michael Turquette:
"The common clock framework changes for 3.10 include many fixes for
existing platforms, as well as adoption of the framework by new
platforms and devices.
Some long-needed fixes to the core framework are here as well as new
features such as improved initialization of clocks from DT as well as
framework reentrancy for nested clock operations."
* tag 'clk-for-linus-3.10' of git://git.linaro.org/people/mturquette/linux: (44 commits)
clk: add clk_ignore_unused option to keep boot clocks on
clk: ux500: fix mismatched types
clk: vexpress: Add separate SP810 driver
clk: si5351: make clk-si5351 depend on CONFIG_OF
clk: export __clk_get_flags for modular clock providers
clk: vt8500: Missing breaks in vtwm_pll_round_rate/_set_rate.
clk: sunxi: Unify oscillator clock
clk: composite: allow fixed rates & fixed dividers
clk: composite: rename 'div' references to 'rate'
clk: add si5351 i2c common clock driver
clk: add device tree fixed-factor-clock binding support
clk: Properly handle notifier return values
clk: ux500: abx500: Define clock tree for ab850x
clk: ux500: Add support for sysctrl clocks
clk: mvebu: Fix valid value range checking for cpu_freq_select
clk: Fixup locking issues for clk_set_parent
clk: Fixup errorhandling for clk_set_parent
clk: Restructure code for __clk_reparent
clk: sunxi: drop an unnecesary kmalloc
clk: sunxi: drop CLK_IGNORE_UNUSED
...
Along with the usual minor fixes and clean ups there are a few major
changes with this pull request.
1) Multiple buffers for the ftrace facility
This feature has been requested by many people over the last few years.
I even heard that Google was about to implement it themselves. I finally
had time and cleaned up the code such that you can now create multiple
instances of the ftrace buffer and have different events go to different
buffers. This way, a low frequency event will not be lost in the noise
of a high frequency event.
Note, currently only events can go to different buffers, the tracers
(ie. function, function_graph and the latency tracers) still can only
be written to the main buffer.
2) The function tracer triggers have now been extended.
The function tracer had two triggers. One to enable tracing when a
function is hit, and one to disable tracing. Now you can record a
stack trace on a single (or many) function(s), take a snapshot of the
buffer (copy it to the snapshot buffer), and you can enable or disable
an event to be traced when a function is hit.
3) A perf clock has been added.
A "perf" clock can be chosen to be used when tracing. This will cause
ftrace to use the same clock as perf uses, and hopefully this will make
it easier to interleave the perf and ftrace data for analysis.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJRfnTPAAoJEOdOSU1xswtMqYYH/1WIdrwXmxHflErnYkCIr3sU
QtYae2K5A1HcgiqOvRJrdWMOt016iMx5CaQQyBFM1vvMiPY0sTWRmwNxDfZzz9LN
10jRvWEzZSLtzl+a9mkFWLEpr5nR/QODOxkWFCnRWscp46sp04LSTxGDYsOnPQZB
sam/AQ1h4xA+DqDBChm9BDEUEPorGleTlN54LBaCGgSFGvrbF+eAg2s4vHNAQAvQ
8d5xjSE9zC7J+FqbVxvJTbKI3+EqKL6hMsJKsKfi0SI+FuxBaFMSltXck5zKyTI4
HpNJzXCmw+v90Tju7oMkPHh6RTbESPCHoGU+wqE52fM6m7oScVeuI/kfc6USwU4=
=W1n+
-----END PGP SIGNATURE-----
Merge tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"Along with the usual minor fixes and clean ups there are a few major
changes with this pull request.
1) Multiple buffers for the ftrace facility
This feature has been requested by many people over the last few
years. I even heard that Google was about to implement it themselves.
I finally had time and cleaned up the code such that you can now
create multiple instances of the ftrace buffer and have different
events go to different buffers. This way, a low frequency event will
not be lost in the noise of a high frequency event.
Note, currently only events can go to different buffers, the tracers
(ie function, function_graph and the latency tracers) still can only
be written to the main buffer.
2) The function tracer triggers have now been extended.
The function tracer had two triggers. One to enable tracing when a
function is hit, and one to disable tracing. Now you can record a
stack trace on a single (or many) function(s), take a snapshot of the
buffer (copy it to the snapshot buffer), and you can enable or disable
an event to be traced when a function is hit.
3) A perf clock has been added.
A "perf" clock can be chosen to be used when tracing. This will cause
ftrace to use the same clock as perf uses, and hopefully this will
make it easier to interleave the perf and ftrace data for analysis."
* tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits)
tracepoints: Prevent null probe from being added
tracing: Compare to 1 instead of zero for is_signed_type()
tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT
ftrace: Get rid of ftrace_profile_bits
tracing: Check return value of tracing_init_dentry()
tracing: Get rid of unneeded key calculation in ftrace_hash_move()
tracing: Reset ftrace_graph_filter_enabled if count is zero
tracing: Fix off-by-one on allocating stat->pages
kernel: tracing: Use strlcpy instead of strncpy
tracing: Update debugfs README file
tracing: Fix ftrace_dump()
tracing: Rename trace_event_mutex to trace_event_sem
tracing: Fix comment about prefix in arch_syscall_match_sym_name()
tracing: Convert trace_destroy_fields() to static
tracing: Move find_event_field() into trace_events.c
tracing: Use TRACE_MAX_PRINT instead of constant
tracing: Use pr_warn_once instead of open coded implementation
ring-buffer: Add ring buffer startup selftest
tracing: Bring Documentation/trace/ftrace.txt up to date
tracing: Add "perf" trace_clock
...
Conflicts:
kernel/trace/ftrace.c
kernel/trace/trace.c
This is primarily useful when there's a driver that doesn't claim clocks
properly, but the bootloader leaves them on. It's not expected to be used
in normal cases, but for bringup and debug it's very useful to have the
option to not gate unclaimed clocks that are still on.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: fixed up trivial merge issue]
Pull kdump fixes from Peter Anvin:
"The kexec/kdump people have found several problems with the support
for loading over 4 GiB that was introduced in this merge cycle. This
is partly due to a number of design problems inherent in the way the
various pieces of kdump fit together (it is pretty horrifically manual
in many places.)
After a *lot* of iterations this is the patchset that was agreed upon,
but of course it is now very late in the cycle. However, because it
changes both the syntax and semantics of the crashkernel option, it
would be desirable to avoid a stable release with the broken
interfaces."
I'm not happy with the timing, since originally the plan was to release
the final 3.9 tomorrow. But apparently I'm doing an -rc8 instead...
* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kexec: use Crash kernel for Crash kernel low
x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
x86, kdump: Retore crashkernel= to allocate under 896M
x86, kdump: Set crashkernel_low automatically
Document the new kernel commandline parameters in the
appropriate file.
Reviewed-by: Shuah Khan <shuahkhan@gmail.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
We need full dynticks CPU to also be RCU nocb so
that we don't have to keep the tick to handle RCU
callbacks.
Make sure the range passed to nohz_full= boot
parameter is a subset of rcu_nocbs=
The CPUs that fail to meet this requirement will be
excluded from the nohz_full range. This is checked
early in boot time, before any CPU has the opportunity
to stop its tick.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
The timekeeping job must be able to run early on boot
because there may be some pre-SMP (and thus pre-initcalls )
components that rely on it. The IO-APIC is one such users
as it tests the timer health by watching jiffies progression.
Given that it happens before we know the initial online
set, we can't rely on it to select a timekeeper. We need
one before SMP time otherwise we simply crash on boot.
To fix this and keep things simple for now, force the boot CPU
outside of the full dynticks range in any case and do this early
on kernel parameter parsing time.
We might want a trickier solution later, expecially for aSMP
architectures that need to assign housekeeping tasks to arbitrary
low power CPUs.
But it's still first pass KISS time for now.
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Per hpa, use crashkernel=X,high crashkernel=Y,low instead of
crashkernel_hign=X crashkernel_low=Y. As that could be extensible.
-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
description in kernel-parameters.txt
still keep crashkernel=X override any crashkernel=X,high
crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
checking in parse_crashkernel_simple() found by HATAYAMA.
-v5: Change delimiter back to , according to HPA.
also separate parse_suffix from parse_simper according to vivek.
so we can avoid @pos in that path.
-v6: Tight the checking about crashkernel=X,highblahblah,high
found by HTYAYAMA.
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-5-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Vivek found old kexec-tools does not work new kernel anymore.
So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.
-v2: let crashkernel=X override crashkernel_high=
update description about _high will be ignored by crashkernel=X
-v3: update description about kernel-parameters.txt according to Vivek.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-4-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.
We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y. This causes regression
if iommu is not enabled. Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.
Set crashkernel_low automatically if the user does not specify that.
For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.
-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.
-v6: Change description about swiotlb referring according to HATAYAMA.
Reported-by: WANG Chao <chaowang@redhat.com>
Tested-by: WANG Chao <chaowang@redhat.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1366089828-19692-2-git-send-email-yinghai@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Using this parameter one can disable the storage_size/2 check if
he is really sure that the UEFI does sane gc and fulfills the spec.
This parameter is useful if a devices uses more than 50% of the
storage by default.
The Intel DQSW67 desktop board is such a sucker for exmaple.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Add remoteproc platform device for controlling the DSP
on da8xx. The patch uses CMA-based reservation of physical
memory block for DSP use. A new kernel command-line parameter
has been added to allow boot-time specification of the physical
memory block.
Signed-off-by: Robert Tivy <rtivy@ti.com>
[nsekhar@ti.com: edit commit message for readability and
style improvements]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
"Extended nohz" was used as a naming base for the full dynticks
API and Kconfig symbols. It reflects the fact the system tries
to stop the tick in more places than just idle.
But that "extended" name is a bit opaque and vague. Rename it to
"full" makes it clearer what the system tries to do under this
config: try to shutdown the tick anytime it can. The various
constraints that prevent that to happen shouldn't be considered
as fundamental properties of this feature but rather technical
issues that may be solved in the future.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Documentation/kernel-parameters.txt and
Documentation/x86/x86_64/boot-options.txt contain virtually
identical text describing earlyprintk.
This consolidates the two copies and updates the documentation a
bit. No one ever documented the:
earlyprintk=serial,0x1008,115200
syntax, nor mentioned that ARM is now a supported earlyprintk
arch.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rob Landley <rob@landley.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20130410210338.E2930E98@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Unbound workqueues are now NUMA aware. Let's add some control knobs
and update sysfs interface accordingly.
* Add kernel param workqueue.numa_disable which disables NUMA affinity
globally.
* Replace sysfs file "pool_id" with "pool_ids" which contain
node:pool_id pairs. This change is userland-visible but "pool_id"
hasn't seen a release yet, so this is okay.
* Add a new sysf files "numa" which can toggle NUMA affinity on
individual workqueues. This is implemented as attrs->no_numa whichn
is special in that it isn't part of a pool's attributes. It only
affects how apply_workqueue_attrs() picks which pools to use.
After "pool_ids" change, first_pwq() doesn't have any user left.
Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
The PPC specific kernel parameter 'noresidual' was removed in v2.6.27,
though commit 917f0af9e5 ("powerpc: Remove
arch/ppc and include/asm-ppc").
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Because RCU callbacks are now associated with the number of the grace
period that they must wait for, CPUs can now take advance callbacks
corresponding to grace periods that ended while a given CPU was in
dyntick-idle mode. This eliminates the need to try forcing the RCU
state machine while entering idle, thus reducing the CPU intensiveness
of RCU_FAST_NO_HZ, which should increase its energy efficiency.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently, the per-no-CBs-CPU kthreads are named "rcuo" followed by
the CPU number, for example, "rcuo". This is problematic given that
there are either two or three RCU flavors, each of which gets a per-CPU
kthread with exactly the same name. This commit therefore introduces
a one-letter abbreviation for each RCU flavor, namely 'b' for RCU-bh,
'p' for RCU-preempt, and 's' for RCU-sched. This abbreviation is used
to distinguish the "rcuo" kthreads, for example, for CPU 0 we would have
"rcuob/0", "rcuop/0", and "rcuos/0".
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
For extreme usecases such as Real Time or HPC, having
the ability to shutdown the tick when a single task runs
on a CPU is a desired feature:
* Reducing the amount of interrupts improves throughput
for CPU-bound tasks. The CPU is less distracted from its
real job, from an execution time and from the cache point
of views.
* This also improve latency response as we have less critical
sections.
Start with introducing a very simple interface to define
full dynticks CPU: use a boot time option defined cpumask
through the "nohz_extended=" kernel parameter. CPUs that
are part of this range will have their tick shutdown
whenever possible: provided they run a single task and
they don't do kernel activity that require the periodic
tick. These details will be later documented in
Documentation/*
An online CPU must be kept outside this range to handle the
timekeeping.
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
If debugging the kernel, and the developer wants to use
tracing_snapshot() in places where tracing_snapshot_alloc() may
be difficult (or more likely, the developer is lazy and doesn't
want to bother with tracing_snapshot_alloc() at all), then adding
alloc_snapshot
to the kernel command line parameter will tell ftrace to allocate
the snapshot buffer (if configured) when it allocates the main
tracing buffer.
I also noticed that ring_buffer_expanded and tracing_selftest_disabled
had inconsistent use of boolean "true" and "false" with "0" and "1".
I cleaned that up too.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This adds core architecture support for Imagination's Meta processor
cores, followed by some later miscellaneous arch/metag cleanups and
fixes which I kept separate to ease review:
- Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
- A few fixes all over, particularly for symbol prefixes
- A few privilege protection fixes
- Several cleanups (setup.c includes, split out a lot of metag_ksyms.c)
- Fix some missing exports
- Convert hugetlb to use vm_unmapped_area()
- Copy device tree to non-init memory
- Provide dma_get_sgtable()
Signed-off-by: James Hogan <james.hogan@imgtec.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRMmVXAAoJEKHZs+irPybfivgP/inEXqJyfw59omQdjwvYcU/a
/u0MJ3UKSNS3U+HknfaFCy/Nwk1dqPLjqqyVC1V6AbUPBXlaEwGcimlNRx2uRjdq
Uh96upMLHsNuF/xiiR477g3RwY0egIJdM1R1bGi3mZ3vVrNQGF+wbni6f61xCWGz
M/4rDglpQvE79oLhYdgj6tidZtHQT0YWtERA9W90zkQWXGYmpFPKBKbfZAi5+rKQ
U6Gpg26orUugzXNaxltJEYKE8gjLTppEabx8DARnItZ4zCMy4dw5RBJ35RFvQw6e
eSmfgTy9w9WqBMY2+QMSgU0KQt1IITCzX7OlOXC0jALQJXoU0WWbOELlBVQLCwF1
T0OcR/5ZP/hIlOk5Dh+e9U3AtbASXdMtqA0ZUe78woH1CBf7Nc/0c0vRg23EdMh8
lnHDJxT/UqskoOYLI4kgWbEdLDy4uTh19U2pVi7VCo7ksLB9Bj9Xc8VSKgscSXTl
OwzN+c4Jgtu8FDFTp+Af4AT8pYGJ08j8L2ErsV2sOv3Q44U5WXdrMz3GSgwXj8+4
wZk3HvdkQVkMD5sJCUZgAswaN6BnbB0pHdCz4wMQ8jR/Ogs015Ipk64Ecym9S/4n
uES7PnDtt/4lb5EyX2ScbvdnZTAFTaaP7OOhC77BOQvbQjIW1tkAcxWJqRry86uS
iM0BFgK6Ohx3geqa5Ft0
=65cR
-----END PGP SIGNATURE-----
Merge tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag
Pull new ImgTec Meta architecture from James Hogan:
"This adds core architecture support for Imagination's Meta processor
cores, followed by some later miscellaneous arch/metag cleanups and
fixes which I kept separate to ease review:
- Support for basic Meta 1 (ATP) and Meta 2 (HTP) core architecture
- A few fixes all over, particularly for symbol prefixes
- A few privilege protection fixes
- Several cleanups (setup.c includes, split out a lot of
metag_ksyms.c)
- Fix some missing exports
- Convert hugetlb to use vm_unmapped_area()
- Copy device tree to non-init memory
- Provide dma_get_sgtable()"
* tag 'metag-v3.9-rc1-v4' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag: (61 commits)
metag: Provide dma_get_sgtable()
metag: prom.h: remove declaration of metag_dt_memblock_reserve()
metag: copy devicetree to non-init memory
metag: cleanup metag_ksyms.c includes
metag: move mm/init.c exports out of metag_ksyms.c
metag: move usercopy.c exports out of metag_ksyms.c
metag: move setup.c exports out of metag_ksyms.c
metag: move kick.c exports out of metag_ksyms.c
metag: move traps.c exports out of metag_ksyms.c
metag: move irq enable out of irqflags.h on SMP
genksyms: fix metag symbol prefix on crc symbols
metag: hugetlb: convert to vm_unmapped_area()
metag: export clear_page and copy_page
metag: export metag_code_cache_flush_all
metag: protect more non-MMU memory regions
metag: make TXPRIVEXT bits explicit
metag: kernel/setup.c: sort includes
perf: Enable building perf tools for Meta
metag: add boot time LNKGET/LNKSET check
metag: add __init to metag_cache_probe()
...
Add basic metag documentation. This includes an outline description of
the ABIs (including syscall ABI) and calling conventions, similar to the
one in Documentation/frv/.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Rob Landley <rob@landley.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: linux-doc@vger.kernel.org
Tim found:
WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5
Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d1955258 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")
It turns out movable_map has some problems, and it breaks several things
1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.
2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.
3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.
4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.
If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.
We have workaround patch that could fix some problems, but some can not
be fixed.
So just remove that offending commit and related ones including:
f7210e6c4a ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")
01a178a94e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")
27168d38fa ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")
e8d1955258 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")
fb06bc8e5f ("page_alloc: bootmem limit with movablecore_map")
42f47e27e7 ("page_alloc: make movablemem_map have higher priority")
6981ec3114 ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")
34b71f1e04 ("page_alloc: add movable_memmap kernel parameter")
4d59a75125 ("x86: get pg_data_t's memory from other node")
Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.
Reported-by: Tim Gardner <tim.gardner@canonical.com>
Reported-by: Don Morris <don.morris@hp.com>
Bisected-by: Don Morris <don.morris@hp.com>
Tested-by: Don Morris <don.morris@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more x86 fixes from Peter Anvin:
"Additional x86 fixes. Three of these patches are pure documentation,
two are pretty trivial; the remaining one fixes boot problems on some
non-BIOS machines."
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Make sure we can boot in the case the BDA contains pure garbage
x86, efi: Mark disable_runtime as __initdata
x86, doc: Fix incorrect comment about 64-bit code segment descriptors
doc, kernel-parameters: Document 'console=hvc<n>'
doc, xen: Mention 'earlyprintk=xen' in the documentation.
ACPI: Overriding ACPI tables via initrd only works with an initrd and on X86
Host bridge hotplug
- Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
- Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
- Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
- Stop caching _PRT and make independent of bus numbers (Yinghai Lu)
PCI device hotplug
- Clean up cpqphp dead code (Sasha Levin)
- Disable ARI unless device and upstream bridge support it (Yijing Wang)
- Initialize all hot-added devices (not functions 0-7) (Yijing Wang)
Power management
- Don't touch ASPM if disabled (Joe Lawrence)
- Fix ASPM link state management (Myron Stowe)
Miscellaneous
- Fix PCI_EXP_FLAGS accessor (Alex Williamson)
- Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
- Document hotplug resource and MPS parameters (Yijing Wang)
- Add accessor for PCIe capabilities (Myron Stowe)
- Drop pciehp suspend/resume messages (Paul Bolle)
- Make pci_slot built-in only (not a module) (Jiang Liu)
- Remove unused PCI/ACPI bind ops (Jiang Liu)
- Removed used pci_root_bus (Bjorn Helgaas)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRKS3hAAoJEFmIoMA60/r8xxoP/j1CS4oCZAnBIVT9fKBkis+/
CENcfHIUKj6J9iMfJEVvqBELvqaLqtpeNwAGMcGPxV7VuT3K1QumChfaTpRDP0HC
VDRmrjcmfenEK+YPOG7acsrTyqk2wjpLOyu9MKRxtC5u7tF6376KQpkEFpO4haL4
eUHTxfE76OkrPBSvx3+PUSf6jqrvrNbjX8K6HdDVVlm3sVAQKmYJU/Wphv2NPOqa
CAMyCzEGybFjr8hDRwvWgr+06c718GMwQUbnrPdHXAe7lMNMrN/XVBmU9ABN3Aas
icd3lrDs+yPObgcO/gT8+sAZErCtdJ9zuHGYHdYpRbIQj/5JT4TMk7tw/Bj7vKY9
Mqmho9GR5YmYTRN9f1r+2n5AQ/KYWXJDrRNOnt5/ys5BOM3vwJ7WJ902zpSwtFQp
nLX+oD/hLfzpnoIQGDuBAoAXp2Kam3XWRgVvG78buRNrPj+kUzimk14a8qQeY+CB
El6UKuwi5Uv/qgs1gAqqjmZmsAkon2DnsRZa6Fl8NTkDlis7LY4gp9OU38ySFpB+
PhCmRyCZmDDqTVtwj6XzR3nPQ5LBSbvsTfgMxYMIUSXHa06tyb2q5p4mEIas0OmU
RKaP5xQqZuTgD8fbdYrx0xgSrn7JHt/j/X//Qs6unlLCWhlpm3LjJZKxyw2FwBGr
o4Lci+PiBh3MowCrju9D
=ER3b
-----END PGP SIGNATURE-----
Merge tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI changes from Bjorn Helgaas:
"Host bridge hotplug
- Major overhaul of ACPI host bridge add/start (Rafael Wysocki, Yinghai Lu)
- Major overhaul of PCI/ACPI binding (Rafael Wysocki, Yinghai Lu)
- Split out ACPI host bridge and ACPI PCI device hotplug (Yinghai Lu)
- Stop caching _PRT and make independent of bus numbers (Yinghai Lu)
PCI device hotplug
- Clean up cpqphp dead code (Sasha Levin)
- Disable ARI unless device and upstream bridge support it (Yijing Wang)
- Initialize all hot-added devices (not functions 0-7) (Yijing Wang)
Power management
- Don't touch ASPM if disabled (Joe Lawrence)
- Fix ASPM link state management (Myron Stowe)
Miscellaneous
- Fix PCI_EXP_FLAGS accessor (Alex Williamson)
- Disable Bus Master in pci_device_shutdown (Konstantin Khlebnikov)
- Document hotplug resource and MPS parameters (Yijing Wang)
- Add accessor for PCIe capabilities (Myron Stowe)
- Drop pciehp suspend/resume messages (Paul Bolle)
- Make pci_slot built-in only (not a module) (Jiang Liu)
- Remove unused PCI/ACPI bind ops (Jiang Liu)
- Removed used pci_root_bus (Bjorn Helgaas)"
* tag 'pci-v3.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (51 commits)
PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers
PCI: Fix PCI Express Capability accessors for PCI_EXP_FLAGS
ACPI / PCI: Make pci_slot built-in only, not a module
PCI/PM: Clear state_saved during suspend
PCI: Use atomic_inc_return() rather than atomic_add_return()
PCI: Catch attempts to disable already-disabled devices
PCI: Disable Bus Master unconditionally in pci_device_shutdown()
PCI: acpiphp: Remove dead code for PCI host bridge hotplug
PCI: acpiphp: Create companion ACPI devices before creating PCI devices
PCI: Remove unused "rc" in virtfn_add_bus()
PCI: pciehp: Drop suspend/resume ENTRY messages
PCI/ASPM: Don't touch ASPM if forcibly disabled
PCI/ASPM: Deallocate upstream link state even if device is not PCIe
PCI: Document MPS parameters pci=pcie_bus_safe, pci=pcie_bus_perf, etc
PCI: Document hpiosize= and hpmemsize= resource reservation parameters
PCI: Use PCI Express Capability accessor
PCI: Introduce accessor to retrieve PCIe Capabilities Register
PCI: Put pci_dev in device tree as early as possible
PCI: Skip attaching driver in device_add()
PCI: acpiphp: Keep driver loaded even if no slots found
...
Both the PowerPC hypervisor and Xen hypervisor can utilize the
hvc driver.
Cc: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1361825650-14031-3-git-send-email-konrad.wilk@oracle.com
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The earlyprintk for Xen PV guests utilizes a simple hypercall
(console_io) to provide output to Xen emergency console.
Note that the Xen hypervisor should be booted with 'loglevel=all'
to output said information.
Reported-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1361825650-14031-2-git-send-email-konrad.wilk@oracle.com
Cc: <stable@vger.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We now provide an option for users who don't want to specify physical
memory address in kernel commandline.
/*
* For movablemem_map=acpi:
*
* SRAT: |_____| |_____| |_________| |_________| ......
* node id: 0 1 1 2
* hotpluggable: n y y n
* movablemem_map: |_____| |_________|
*
* Using movablemem_map, we can prevent memblock from allocating memory
* on ZONE_MOVABLE at boot time.
*/
So user just specify movablemem_map=acpi, and the kernel will use
hotpluggable info in SRAT to determine which memory ranges should be set
as ZONE_MOVABLE.
If all the memory ranges in SRAT is hotpluggable, then no memory can be
used by kernel. But before parsing SRAT, memblock has already reserve
some memory ranges for other purposes, such as for kernel image, and so
on. We cannot prevent kernel from using these memory. So we need to
exclude these ranges even if these memory is hotpluggable.
Furthermore, there could be several memory ranges in the single node
which the kernel resides in. We may skip one range that have memory
reserved by memblock, but if the rest of memory is too small, then the
kernel will fail to boot. So, make the whole node which the kernel
resides in un-hotpluggable. Then the kernel has enough memory to use.
NOTE: Using this way will cause NUMA performance down because the
whole node will be set as ZONE_MOVABLE, and kernel cannot use memory
on it. If users don't want to lose NUMA performance, just don't use
it.
[akpm@linux-foundation.org: fix warning]
[akpm@linux-foundation.org: use strcmp()]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add functions to parse movablemem_map boot option. Since the option
could be specified more then once, all the maps will be stored in the
global variable movablemem_map.map array.
And also, we keep the array in monotonic increasing order by start_pfn.
And merge all overlapped ranges.
[akpm@linux-foundation.org: improve comment]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: remove unneeded parens]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Tested-by: Lin Feng <linfeng@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 mm changes from Peter Anvin:
"This is a huge set of several partly interrelated (and concurrently
developed) changes, which is why the branch history is messier than
one would like.
The *really* big items are two humonguous patchsets mostly developed
by Yinghai Lu at my request, which completely revamps the way we
create initial page tables. In particular, rather than estimating how
much memory we will need for page tables and then build them into that
memory -- a calculation that has shown to be incredibly fragile -- we
now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
a #PF handler which creates temporary page tables on demand.
This has several advantages:
1. It makes it much easier to support things that need access to data
very early (a followon patchset uses this to load microcode way
early in the kernel startup).
2. It allows the kernel and all the kernel data objects to be invoked
from above the 4 GB limit. This allows kdump to work on very large
systems.
3. It greatly reduces the difference between Xen and native (Xen's
equivalent of the #PF handler are the temporary page tables created
by the domain builder), eliminating a bunch of fragile hooks.
The patch series also gets us a bit closer to W^X.
Additional work in this pull is the 64-bit get_user() work which you
were also involved with, and a bunch of cleanups/speedups to
__phys_addr()/__pa()."
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
x86, mm: Move reserving low memory later in initialization
x86, doc: Clarify the use of asm("%edx") in uaccess.h
x86, mm: Redesign get_user with a __builtin_choose_expr hack
x86: Be consistent with data size in getuser.S
x86, mm: Use a bitfield to mask nuisance get_user() warnings
x86/kvm: Fix compile warning in kvm_register_steal_time()
x86-32: Add support for 64bit get_user()
x86-32, mm: Remove reference to alloc_remap()
x86-32, mm: Remove reference to resume_map_numa_kva()
x86-32, mm: Rip out x86_32 NUMA remapping code
x86/numa: Use __pa_nodebug() instead
x86: Don't panic if can not alloc buffer for swiotlb
mm: Add alloc_bootmem_low_pages_nopanic()
x86, 64bit, mm: hibernate use generic mapping_init
x86, 64bit, mm: Mark data/bss/brk to nx
x86: Merge early kernel reserve for 32bit and 64bit
x86: Add Crash kernel low reservation
x86, kdump: Remove crashkernel range find limit for 64bit
memblock: Add memblock_mem_size()
x86, boot: Not need to check setup_header version for setup_data
...