Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs pile (part one) from Al Viro:
"Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
locking violations, etc.
The most visible changes here are death of FS_REVAL_DOT (replaced with
"has ->d_weak_revalidate()") and a new helper getting from struct file
to inode. Some bits of preparation to xattr method interface changes.
Misc patches by various people sent this cycle *and* ocfs2 fixes from
several cycles ago that should've been upstream right then.
PS: the next vfs pile will be xattr stuff."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
saner proc_get_inode() calling conventions
proc: avoid extra pde_put() in proc_fill_super()
fs: change return values from -EACCES to -EPERM
fs/exec.c: make bprm_mm_init() static
ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
ocfs2: fix possible use-after-free with AIO
ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
target: writev() on single-element vector is pointless
export kernel_write(), convert open-coded instances
fs: encode_fh: return FILEID_INVALID if invalid fid_type
kill f_vfsmnt
vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
nfsd: handle vfs_getattr errors in acl protocol
switch vfs_getattr() to struct path
default SET_PERSONALITY() in linux/elf.h
ceph: prepopulate inodes only when request is aborted
d_hash_and_lookup(): export, switch open-coded instances
9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
9p: split dropping the acls from v9fs_set_create_acl()
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRJ+kUAAoJEKurIx+X31iBKeUQAJ3x0QN6fiT0PXu4wPQYGeGV
XJi/9WXkmdHi4grhNYvk/RE7F/jsm0GJjpmq+GngO4KkUPDPKzM9ayP0RrLkWV81
h/aDsp5P04KlvA46hM2/q6DRCJZxx0kyBRrAA/3qpWDU6Xmvr4zmcG6IKKc/gsAS
BFYiXTmIyNmca7Xwv23T0nNKn6j725gxfe0evaMHlhqFQsY/qbi94dltgCrCi2dD
CtzqhFaZtGQ0ic/fZ42zi4BWFr23rpDKYYWeDg8LOyueNX1GaDeMDIioQkoa43mA
/IVHWJnqtaeq92BFRf3rfHHnB61mRcTRuZ4cYoINB5O/b505RN8C3TmVwfUXQWRk
zw0rbXWkzZZL6DKKQcCc12fZV2z2EZhwhZgtD/pwyUMvBW+pNhaXIUpjBbm//NoW
u4AAQqsR52C+VAtoyRVLEAt0Pa56TCiz64B9noPS9Tew6Vjx/Gy4NgQ2lJ5SLw1F
UBr4mVmUxXfciq5n/RLpg5z3LoUjTQUYyJkTJj47+nmfvWeUNfUAP78nowAbHNqm
jnvw8hnwEnhWB5RMpJWZtFq3K61WWOtJ9RXAU6JtzYcwHsEBvn3qqaLoh8/hPBnK
+IVcygAF+2GPwaid7Qb7rmLBhIIDPQp+lmS0OS/NqfxVlsHq/F8Q5keMvtIyDToS
DiAvEnrU1wToeNfUR4R6
=v9Yo
-----END PGP SIGNATURE-----
Merge tag 'please-pull-vm_unwrapped' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 update from Tony Luck:
"ia64 vm patch series that was cooking in -mm tree"
* tag 'please-pull-vm_unwrapped' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
mm: use vm_unmapped_area() in hugetlbfs on ia64 architecture
mm: use vm_unmapped_area() on ia64 architecture
lockdep, but it's a mechanical change.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRJAcuAAoJENkgDmzRrbjxsw0P/3eXb+LddYnx0V0uHYdKpCUf
4vdW7X0fX3Z+aUK69IWRL/6ahoO4TpaHYGHBDjEoivyQ0GDq14X7JNWsYYt3LdMf
3wmDgRc2cn/mZOJbFeVpNV8ox5l/xc0CUvV+iQ8tMjfQItXMXgWUFZKMECsXKSO6
eex3lrw9M2jAX2uL8LQPp9W8xtKu24nSZRC6tH5riE/8fCzi1cZPPAqfxP5c8Lee
ZXtbCRSyAFENZLpKyMe1PC7HvtJyi5NDn9xwOQiXULZV/VOlvP94DGBLIKCM/6dn
4QvZxpG0P0uOlpCgRAVLyh/z7g4XY4VF/fHopLCmEcqLsvgD+V2LQpQ9zWUalLPC
Z+pUpz2vu0gIddPU1nR8R6oGpEdJ8O12aJle62p/RSXWZGx12qUQ+Tamu0tgKcv1
AsiJfbUGNDYfxgU6sHsoQjl2f68LTVckCU1C1LqEbW/S104EIORtGx30CHM4LRiO
32kDC5TtgYDBKQAIqJ4bL48ZMh+9W3uX40p7xzOI5khHQjvswUKa3jcxupU0C1uv
lx8KXo7pn8WT33QGysWC782wJCgJuzSc2vRn+KQoqoynuHGM6agaEtR59gil3QWO
rQEcxH63BBRDgHlg4FM9IkJwwsnC3PWKL8gbX0uAWXAPMbgapJkuuGZAwt0WDGVK
+GszxsFkCjlW0mK0egTb
=tiSY
-----END PGP SIGNATURE-----
Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"The sweeping change is to make add_taint() explicitly indicate whether
to disable lockdep, but it's a mechanical change."
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Add option to not sign modules during modules_install
MODSIGN: Add -s <signature> option to sign-file
MODSIGN: Specify the hash algorithm on sign-file command line
MODSIGN: Simplify Makefile with a Kconfig helper
module: clean up load_module a little more.
modpost: Ignore ARC specific non-alloc sections
module: constify within_module_*
taint: add explicit flag to show whether lock dep is still OK.
module: printk message when module signature fail taints kernel.
Pull signal handling cleanups from Al Viro:
"This is the first pile; another one will come a bit later and will
contain SYSCALL_DEFINE-related patches.
- a bunch of signal-related syscalls (both native and compat)
unified.
- a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
(fixing several potential problems with missing argument
validation, while we are at it)
- a lot of now-pointless wrappers killed
- a couple of architectures (cris and hexagon) forgot to save
altstack settings into sigframe, even though they used the
(uninitialized) values in sigreturn; fixed.
- microblaze fixes for delivery of multiple signals arriving at once
- saner set of helpers for signal delivery introduced, several
architectures switched to using those."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
x86: convert to ksignal
sparc: convert to ksignal
arm: switch to struct ksignal * passing
alpha: pass k_sigaction and siginfo_t using ksignal pointer
burying unused conditionals
make do_sigaltstack() static
arm64: switch to generic old sigaction() (compat-only)
arm64: switch to generic compat rt_sigaction()
arm64: switch compat to generic old sigsuspend
arm64: switch to generic compat rt_sigqueueinfo()
arm64: switch to generic compat rt_sigpending()
arm64: switch to generic compat rt_sigprocmask()
arm64: switch to generic sigaltstack
sparc: switch to generic old sigsuspend
sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
sparc: kill sign-extending wrappers for native syscalls
kill sparc32_open()
sparc: switch to use of generic old sigaction
sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
mips: switch to generic sys_fork() and sys_clone()
...
Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
- not enough memory for file structures
- operation is not allowed
- user is over its limit
Currently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.
Return error through pointer. Change all callers to preserve this error code.
[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Update the ia64 arch_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
- Rework of the ACPI namespace scanning code from Rafael J. Wysocki
with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg,
Toshi Kani, and Yinghai Lu.
- ACPI power resources handling and ACPI device PM update from
Rafael J. Wysocki.
- ACPICA update to version 20130117 from Bob Moore and Lv Zheng
with contributions from Aaron Lu, Chao Guan, Jesper Juhl, and
Tim Gardner.
- Support for Intel Lynxpoint LPSS from Mika Westerberg.
- cpuidle update from Len Brown including Intel Haswell support, C1
state for intel_idle, removal of global pm_idle.
- cpuidle fixes and cleanups from Daniel Lezcano.
- cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri
with contributions from Stratos Karafotis and Rickard Andersson.
- Intel P-states driver for Sandy Bridge processors from
Dirk Brandewie.
- cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn.
- cpufreq fixes related to ordering issues between acpi-cpufreq and
powernow-k8 from Borislav Petkov and Matthew Garrett.
- cpufreq support for Calxeda Highbank processors from Mark Langsdorf
and Rob Herring.
- cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update
from Shawn Guo.
- cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat,
and Inderpal Singh.
- Support for "lightweight suspend" from Zhang Rui.
- Removal of the deprecated power trace API from Paul Gortmaker.
- Assorted updates from Andreas Fleig, Colin Ian King,
Davidlohr Bueso, Joseph Salisbury, Kees Cook, Li Fei,
Nishanth Menon, ShuoX Liu, Srinivas Pandruvada, Tejun Heo,
Thomas Renninger, and Yasuaki Ishimatsu.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJRIsArAAoJEKhOf7ml8uNsD6MP/j7C4NA+GTq6RdwoJt+Yki0K
9Ep8I4pEuRFoN/oskv24EyQhpGJIk6UxWcJ/DWFBc+1VhmKORta7k2Idv/wlJA77
s7AcDveA9xcDh+TVfbh87TeuiMSXiSdDZbiaQO+wMizWJAF3F84AnjiAqqqyQcSK
bA5/Siz/vWlt9PyYDaQtHTVE4lpvPuVcQdYewsdaH2PsmUjvIg/TUzg28CTrdyvv
eHOdBK9R0/OLQLhzRbL0VOGJ//wEl+HJRO0QEhTKPgdQ1e/VH/4Zu5WSzF8P/x4C
s2f8U4IKQqulDuDHXtpMpelFm7hRWgsOqZLkcyXLs+0dvSM9CTPO6P0ZaImxUctk
5daHWEsXUnCErDQawt1mcZP8l6qnxofMQIfLXyPVzvlSnHyToTmrtXa1v2u4AuL/
hOo4MYWsFNUmRdtGFFGlExGgEDZ4G5NwiYjRBl/6XJ3v4nhnnMbuzxP8scpoe5m1
8tjroJHZFUUs/mFU/H+oRbHzSzXPmp1sddNaTg4OpVmTn3DDh6ljnFhiItd1Ndw0
5ldVbSe6ETq5RoK0TbzvQOeVpa9F3JfqbrXLQPqfd2iz/No41LQYG1uShRYuXKuA
wfEcc+c9VMd3FILu05pGwBnU8VS9VbxTYMz7xDxg6b29Ywnb7u+Q1ycCk2gFYtkS
E2oZDuyewTJxaskzYsNr
=wijn
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
- Rework of the ACPI namespace scanning code from Rafael J. Wysocki
with contributions from Bjorn Helgaas, Jiang Liu, Mika Westerberg,
Toshi Kani, and Yinghai Lu.
- ACPI power resources handling and ACPI device PM update from Rafael
J Wysocki.
- ACPICA update to version 20130117 from Bob Moore and Lv Zheng with
contributions from Aaron Lu, Chao Guan, Jesper Juhl, and Tim Gardner.
- Support for Intel Lynxpoint LPSS from Mika Westerberg.
- cpuidle update from Len Brown including Intel Haswell support, C1
state for intel_idle, removal of global pm_idle.
- cpuidle fixes and cleanups from Daniel Lezcano.
- cpufreq fixes and cleanups from Viresh Kumar and Fabio Baltieri with
contributions from Stratos Karafotis and Rickard Andersson.
- Intel P-states driver for Sandy Bridge processors from Dirk
Brandewie.
- cpufreq driver for Marvell Kirkwood SoCs from Andrew Lunn.
- cpufreq fixes related to ordering issues between acpi-cpufreq and
powernow-k8 from Borislav Petkov and Matthew Garrett.
- cpufreq support for Calxeda Highbank processors from Mark Langsdorf
and Rob Herring.
- cpufreq driver for the Freescale i.MX6Q SoC and cpufreq-cpu0 update
from Shawn Guo.
- cpufreq Exynos fixes and cleanups from Jonghwan Choi, Sachin Kamat,
and Inderpal Singh.
- Support for "lightweight suspend" from Zhang Rui.
- Removal of the deprecated power trace API from Paul Gortmaker.
- Assorted updates from Andreas Fleig, Colin Ian King, Davidlohr Bueso,
Joseph Salisbury, Kees Cook, Li Fei, Nishanth Menon, ShuoX Liu,
Srinivas Pandruvada, Tejun Heo, Thomas Renninger, and Yasuaki
Ishimatsu.
* tag 'pm+acpi-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (267 commits)
PM idle: remove global declaration of pm_idle
unicore32 idle: delete stray pm_idle comment
openrisc idle: delete pm_idle
mn10300 idle: delete pm_idle
microblaze idle: delete pm_idle
m32r idle: delete pm_idle, and other dead idle code
ia64 idle: delete pm_idle
cris idle: delete idle and pm_idle
ARM64 idle: delete pm_idle
ARM idle: delete pm_idle
blackfin idle: delete pm_idle
sparc idle: rename pm_idle to sparc_idle
sh idle: rename global pm_idle to static sh_idle
x86 idle: rename global pm_idle to static x86_idle
APM idle: register apm_cpu_idle via cpuidle
cpufreq / intel_pstate: Add kernel command line option disable intel_pstate.
cpufreq / intel_pstate: Change to disallow module build
tools/power turbostat: display SMI count by default
intel_idle: export both C1 and C1E
ACPI / hotplug: Fix concurrency issues and memory leaks
...
pm_idle() on ia64 was a synonym for default_idle().
So simply invoke default_idle() directly.
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-ia64@vger.kernel.org
Typical cputime stats infrastructure relies on the timer tick and
its periodic polling on the CPU to account the amount of time
spent by the CPUs and the tasks per high level domains such as
userspace, kernelspace, guest, ...
Now we are preparing to implement full dynticks capability on
Linux for Real Time and HPC users who want full CPU isolation.
This feature requires a cputime accounting that doesn't depend
on the timer tick.
To implement it, this new cputime infrastructure plugs into
kernel/user/guest boundaries to take snapshots of cputime and
flush these to the stats when needed. This performs pretty
much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
and cputime snaphots are synchronized between write and read
side such that the latter can safely retrieve the pending tickless
cputime of a task and add it to its latest cputime snapshot to
return the correct result to the user.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRBsKnAAoJEIUkVEdQjox3lMgP/2R6DU2f8PyGIao3hne4M3Pu
L3q+mAG53b24Dy014KeW7gd8yv45fE7wp/rs8CGLte9VzbLkRCDSFQPgBuXVagRj
tV5nfAuqD0wHTnA+HhBE3l3C2RKAPGIu79rBpnIR/QIPPl8Z3Dby8YgmxEQKDf8G
j7MEBu2LthSuqEi2ZXemnO5r0oEnQAzAp4TTi/M38k0Fmt59nOGyjLnI+xHYCBMa
1pnz7j3jjR9NJExGu8iVvbo+jupuQngP8qmkLXHvYnj/TEJNwzO1hHVoSwOpjYpS
9ycl+T8IKQLbAkBywLtq3Mzde43xt/t8wYyGZ0oAV+Z7MIpz/9YIfDJwqQeqoNbD
dAdbNjKMbsxCgmrnyqSagfMQg/r3CPZ4vf40TMCaN4gNUJC4Ie+E4kPRKRh59+PB
Ukthmqujn0f40LAa+HXTUuzafd3b0s/ewH+8FuQ6LAG9b5+WnoN8JTJ5u6+ydokO
ZleeOowuRZZEg+abQ8Sm2GRm/BzN29gi/npb//I+ZDXWv/+3yccgsiPjCRzCAAaO
g1RmYryFSRUwHQbGNNypVWVuOLWvrBQ4jqbGO7BBuBByZMSHryKxR6mb+inH3qLE
xIDM9SdSJisc292OzoFKwVZki4MaXaadJXJduVvqYlZQvXXs7eAa4wo3euhtVITD
NLQO5OZXE4oIQmDFb0FV
=1Tzp
-----END PGP SIGNATURE-----
Merge tag 'full-dynticks-cputime-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core
Pull full-dynticks (user-space execution is undisturbed and
receives no timer IRQs) preparation changes that convert the
cputime accounting code to be full-dynticks ready,
from Frederic Weisbecker:
"This implements the cputime accounting on full dynticks CPUs.
Typical cputime stats infrastructure relies on the timer tick and
its periodic polling on the CPU to account the amount of time
spent by the CPUs and the tasks per high level domains such as
userspace, kernelspace, guest, ...
Now we are preparing to implement full dynticks capability on
Linux for Real Time and HPC users who want full CPU isolation.
This feature requires a cputime accounting that doesn't depend
on the timer tick.
To implement it, this new cputime infrastructure plugs into
kernel/user/guest boundaries to take snapshots of cputime and
flush these to the stats when needed. This performs pretty
much like CONFIG_VIRT_CPU_ACCOUNTING except that context location
and cputime snaphots are synchronized between write and read
side such that the latter can safely retrieve the pending tickless
cputime of a task and add it to its latest cputime snapshot to
return the correct result to the user."
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Do some ground preparatory work before adding guest_enter()
and guest_exit() context tracking callbacks. Those will
be later used to read the guest cputime safely when we
run in full dynticks mode.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
If we want to stop the tick further idle, we need to be
able to account the cputime without using the tick.
Virtual based cputime accounting solves that problem by
hooking into kernel/user boundaries.
However implementing CONFIG_VIRT_CPU_ACCOUNTING require
low level hooks and involves more overhead. But we already
have a generic context tracking subsystem that is required
for RCU needs by archs which plan to shut down the tick
outside idle.
This patch implements a generic virtual based cputime
accounting that relies on these generic kernel/user hooks.
There are some upsides of doing this:
- This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
if context tracking is already built (already necessary for RCU in full
tickless mode).
- We can rely on the generic context tracking subsystem to dynamically
(de)activate the hooks, so that we can switch anytime between virtual
and tick based accounting. This way we don't have the overhead
of the virtual accounting when the tick is running periodically.
And one downside:
- There is probably more overhead than a native virtual based cputime
accounting. But this relies on hooks that are already set anyway.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The ia64 function "thread_matches()" has no users since commit
e868a55c2a ("[IA64] remove find_thread_for_addr()"). Remove it.
This allows us to make ptrace_check_attach() static to kernel/ptrace.c,
which is good since we'll need to change the semantics of it and fix up
all the callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here are the remaining __dev* removal patches against the 3.8-rc2 tree.
All of these patches were previously sent to the subsystem maintainers,
most of them were picked up and pushed to you, but there were a number
that fell through the cracks, and new drivers were added during the
merge window, so this series cleans up the rest of the instances of
these markings.
Third time's the charm...
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlDmHOIACgkQMUfUDdst+ykTZgCePgK84Im3FFooEXJwaPbaf4ls
lO4AoMEDoWK+BHWOsjQwFPOwFFPEN2Xh
=6oAQ
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core __dev* removal patches - take 3 - from Greg Kroah-Hartman:
"Here are the remaining __dev* removal patches against the 3.8-rc2
tree. All of these patches were previously sent to the subsystem
maintainers, most of them were picked up and pushed to you, but there
were a number that fell through the cracks, and new drivers were added
during the merge window, so this series cleans up the rest of the
instances of these markings.
Third time's the charm...
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fixed up trivial conflict with the pinctrl pull in pinctrl-sirf.c.
* tag 'driver-core-3.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits)
misc: remove __dev* attributes.
include: remove __dev* attributes.
Documentation: remove __dev* attributes.
Drivers: misc: remove __dev* attributes.
Drivers: block: remove __dev* attributes.
Drivers: bcma: remove __dev* attributes.
Drivers: char: remove __dev* attributes.
Drivers: clocksource: remove __dev* attributes.
Drivers: ssb: remove __dev* attributes.
Drivers: dma: remove __dev* attributes.
Drivers: gpu: remove __dev* attributes.
Drivers: infinband: remove __dev* attributes.
Drivers: memory: remove __dev* attributes.
Drivers: mmc: remove __dev* attributes.
Drivers: iommu: remove __dev* attributes.
Drivers: power: remove __dev* attributes.
Drivers: message: remove __dev* attributes.
Drivers: macintosh: remove __dev* attributes.
Drivers: mfd: remove __dev* attributes.
pstore: remove __dev* attributes.
...
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, __devinitdata,
and __devexit from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux was granted a new system call to load modules by file descriptor
in commit 34e1169d99 ("module: add syscall to load module from fd").
Wire it up for ia64 (ready for the Chrome port :-)
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull big execve/kernel_thread/fork unification series from Al Viro:
"All architectures are converted to new model. Quite a bit of that
stuff is actually shared with architecture trees; in such cases it's
literally shared branch pulled by both, not a cherry-pick.
A lot of ugliness and black magic is gone (-3KLoC total in this one):
- kernel_thread()/kernel_execve()/sys_execve() redesign.
We don't do syscalls from kernel anymore for either kernel_thread()
or kernel_execve():
kernel_thread() is essentially clone(2) with callback run before we
return to userland, the callbacks either never return or do
successful do_execve() before returning.
kernel_execve() is a wrapper for do_execve() - it doesn't need to
do transition to user mode anymore.
As a result kernel_thread() and kernel_execve() are
arch-independent now - they live in kernel/fork.c and fs/exec.c
resp. sys_execve() is also in fs/exec.c and it's completely
architecture-independent.
- daemonize() is gone, along with its parts in fs/*.c
- struct pt_regs * is no longer passed to do_fork/copy_process/
copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.
- sys_fork()/sys_vfork()/sys_clone() unified; some architectures
still need wrappers (ones with callee-saved registers not saved in
pt_regs on syscall entry), but the main part of those suckers is in
kernel/fork.c now."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
do_coredump(): get rid of pt_regs argument
print_fatal_signal(): get rid of pt_regs argument
ptrace_signal(): get rid of unused arguments
get rid of ptrace_signal_deliver() arguments
new helper: signal_pt_regs()
unify default ptrace_signal_deliver
flagday: kill pt_regs argument of do_fork()
death to idle_regs()
don't pass regs to copy_process()
flagday: don't pass regs to copy_thread()
bfin: switch to generic vfork, get rid of pointless wrappers
xtensa: switch to generic clone()
openrisc: switch to use of generic fork and clone
unicore32: switch to generic clone(2)
score: switch to generic fork/vfork/clone
c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
mn10300: switch to generic fork/vfork/clone
h8300: switch to generic fork/vfork/clone
tile: switch to generic clone()
...
Conflicts:
arch/microblaze/include/asm/Kbuild
namespace collision from other work being merged.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQxi0vAAoJEKurIx+X31iBEmEP/RfgYV/tR4GYWPL+/PZRS8VP
LqwGcq7GqscYPfe1zv6NXxI6A7jdyB+dMhuBtE+lCkhiea/z7r4FgNS3AJE/klTc
90Gpi7DwVck4laDEOG3BRDhFFuN06/m/c+TfKuAUKNmOSXwJqh8mLdIk8XXzH1PQ
B53iTOzIR2gy3rLO3tg78NZSywvyvl3kTmOrCqh+EoTCF7u51Y5KzzV+eiGyBv58
WwQF13ZTKF5zZ7unexE2QitnseKqhEe32DF9RTd4Efwh0grt0VzBg8eC3Ajr+qtr
Ub9eGxjWo3eynl/DCWIpoj7daBnYQcP6PDwjCLueQzEq5wC2aR/v7oJTxHU49ewt
+gBX/B5KT2IwkLGAd/uBCRyWY3Xghd2BYA14yc/OF0n6ZvB31dX1321u4VqEZIu1
TUV68H67P1IWml3EgtZp893GnjRhhRmcjTrZq4lP1rFt0bFoJT/lR4BXX0Ghe+ge
ICsBRmsxhCBpygnEJ6PvAB6iksTMmLRlqks/XkS1i+7MkF31/PpA2UDAEsmFMCqR
f7q1/LXuLXED4yCJZlKrorO1iiNcstIxMHLI6CLnDwnvy1d7wFcDHMRmkoFrmzJz
YH3Q5w9BgONOAVw2fi8QW4tFVB5qHiegbVlgpqzRVbcSSc9bgtqpjzzhQnatiNZh
w171vHhmw8juRej5hma/
=yOb1
-----END PGP SIGNATURE-----
Merge tag 'please-pull-misc-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 fix from Tony Luck:
"Miscellaneous ia64 fix for 3.8. Just need to avoid a pending
namespace collision from other work being merged."
* tag 'please-pull-misc-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] Resolve name space collision for cache_show()
Pull scheduler updates from Ingo Molnar:
"The biggest change affects group scheduling: we now track the runnable
average on a per-task entity basis, allowing a smoother, exponential
decay average based load/weight estimation instead of the previous
binary on-the-runqueue/off-the-runqueue load weight method.
This will inevitably disturb workloads that were in some sort of
borderline balancing state or unstable equilibrium, so an eye has to
be kept on regressions.
For that reason the new load average is only limited to group
scheduling (shares distribution) at the moment (which was also hurting
the most from the prior, crude weight calculation and whose scheduling
quality wins most from this change) - but we plan to extend this to
regular SMP balancing as well in the future, which will simplify and
speed up things a bit.
Other changes involve ongoing preparatory work to extend NOHZ to the
scheduler as well, eventually allowing completely irq-free user-space
execution."
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
Revert "sched/autogroup: Fix crash on reboot when autogroup is disabled"
cputime: Comment cputime's adjusting code
cputime: Consolidate cputime adjustment code
cputime: Rename thread_group_times to thread_group_cputime_adjusted
cputime: Move thread_group_cputime() to sched code
vtime: Warn if irqs aren't disabled on system time accounting APIs
vtime: No need to disable irqs on vtime_account()
vtime: Consolidate a bit the ctx switch code
vtime: Explicitly account pending user time on process tick
vtime: Remove the underscore prefix invasion
sched/autogroup: Fix crash on reboot when autogroup is disabled
cputime: Separate irqtime accounting from generic vtime
cputime: Specialize irq vtime hooks
kvm: Directly account vtime to system on guest switch
vtime: Make vtime_account_system() irqsafe
vtime: Gather vtime declarations to their own header file
sched: Describe CFS load-balancer
sched: Introduce temporary FAIR_GROUP_SCHED dependency for load-tracking
sched: Make __update_entity_runnable_avg() fast
sched: Update_cfs_shares at period edge
...
Here is the "big" char/misc driver patches for 3.8-rc1. I'm starting to
put random driver subsystems that I had previously sent you through the
driver-core tree in this tree, as it makes more sense to do so.
Nothing major here, the various __dev* removals, some mei driver
updates, and other random driver-specific things from the different
maintainers and developers.
Note, some MFD drivers got added through this tree, and they are also
coming in through the "real" MFD tree as well, due to some major
mis-communication between me and the different developers. If you have
any merge conflicts, take the ones from the MFD tree, not these ones,
sorry about that.
All of this has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlDHj7AACgkQMUfUDdst+ym7pQCgxhFDGQRJimG+Ddag+ghfLhQh
Ql0AoJsWVFvQjb7q1NO7OvOABaxjEJdu
=na5b
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull Char/Misc driver merge from Greg Kroah-Hartman:
"Here is the "big" char/misc driver patches for 3.8-rc1. I'm starting
to put random driver subsystems that I had previously sent you through
the driver-core tree in this tree, as it makes more sense to do so.
Nothing major here, the various __dev* removals, some mei driver
updates, and other random driver-specific things from the different
maintainers and developers.
Note, some MFD drivers got added through this tree, and they are also
coming in through the "real" MFD tree as well, due to some major
mis-communication between me and the different developers. If you
have any merge conflicts, take the ones from the MFD tree, not these
ones, sorry about that.
All of this has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
Fix up trivial conflict in drivers/mmc/host/Kconfig due to new drivers
having been added (both at the end, as usual..)
* tag 'char-misc-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (84 commits)
MAINTAINERS: remove drivers/staging/hv/
misc/st_kim: Free resources in the error path of probe()
drivers/char: for hpet, add count checking, and ~0UL instead of -1
w1-gpio: Simplify & get rid of defines
w1-gpio: Pinctrl-fy
extcon: remove use of __devexit_p
extcon: remove use of __devinit
extcon: remove use of __devexit
drivers: uio: Only allocate new private data when probing device tree node
drivers: uio_dmem_genirq: Allow partial success when opening device
drivers: uio_dmem_genirq: Don't use DMA_ERROR_CODE to indicate unmapped regions
drivers: uio_dmem_genirq: Don't mix address spaces for dynamic region vaddr
uio: remove use of __devexit
uio: remove use of __devinitdata
uio: remove use of __devinit
uio: remove use of __devexit_p
char: remove use of __devexit
char: remove use of __devinitconst
char: remove use of __devinitdata
char: remove use of __devinit
...
System time accounting APIs such as vtime_account_system() and
vtime_account_idle() need to be irqsafe. Current callers include
irq entry, exit and kvm, all of which have been checked against that
requirement. Now it's better to grow that with an automatic check
in case we have further callers or we missed something.
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
On ia64 and powerpc, vtime context switch only consists
in flushing system and user pending time, plus a few
arch housekeeping.
Consolidate that into a generic implementation. s390 is
a special case because pending user and system time accounting
there is hard to dissociate. So it's keeping its own implementation.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
All vtime implementations just flush the user time on process
tick. Consolidate that in generic code by calling a user time
accounting helper. This avoids an indirect call in ia64 and
prepare to also consolidate vtime context switch code.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Prepending irq-unsafe vtime APIs with underscores was actually
a bad idea as the result is a big mess in the API namespace that
is even waiting to be further extended. Also these helpers
are always called from irq safe callers except kvm. Just
provide a vtime_account_system_irqsafe() for this specific
case so that we can remove the underscore prefix on other
vtime functions.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
These functions might be called from modules as well so make sure
they are exported.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We have a local static function named rather generically
"cache_show()". Changes in progress in the slab code want
to use this same name globally - so they are adding their
declaration to <linux/slab.h> which then causes the compiler
to choke with:
arch/ia64/kernel/topology.c:278: error: conflicting types for 'cache_show'
Fix by adding an "ia64_" prefix to our local function.
Signed-off-by: Tony Luck <tony.luck@intel.com>
vtime_account_system() currently has only one caller with
vtime_account() which is irq safe.
Now we are going to call it from other places like kvm where
irqs are not always disabled by the time we account the cputime.
So let's make it irqsafe. The arch implementation part is now
prefixed with "__".
vtime_account_idle() arch implementation is prefixed accordingly
to stay consistent.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
This patch fixes the /dev/mem driver to use phys_addr_t for physical
addresses. This is required on PAE systems, especially those that run
entirely out of >4G physical memory space.
Signed-off-by: Cyril Chemparathy <cyril@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
the only thing we use that for is in copy_thread(), where the way
we set it will result in rbs_size being 0. Just move that calculating
rbs_size and copying rbs to non-kernel-thread side of
if (user_mode(regs)) in copy_thread() and set rbs_size to 0 on
kernel thread side.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
It used to be called with NULL regs when creating idle threads on
secondaries. Not used that way since wli had buried that kludge...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Used to be used by kernel_thread(); dead code for 8 years...
Note that it's not sys_clone/sys_clone2 - those are used just
fine.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull third pile of VFS updates from Al Viro:
"Stuff from Jeff Layton, mostly. Sanitizing interplay between audit
and namei, removing a lot of insanity from audit_inode() mess and
getting things ready for his ESTALE patchset."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
procfs: don't need a PATH_MAX allocation to hold a string representation of an int
vfs: embed struct filename inside of names_cache allocation if possible
audit: make audit_inode take struct filename
vfs: make path_openat take a struct filename pointer
vfs: turn do_path_lookup into wrapper around struct filename variant
audit: allow audit code to satisfy getname requests from its names_list
vfs: define struct filename and have getname() return it
vfs: unexport getname and putname symbols
acct: constify the name arg to acct_on
vfs: allocate page instead of names_cache buffer in mount_block_root
audit: overhaul __audit_inode_child to accomodate retrying
audit: optimize audit_compare_dname_path
audit: make audit_compare_dname_path use parent_len helper
audit: remove dirlen argument to audit_compare_dname_path
audit: set the name_len in audit_inode for parent lookups
audit: add a new "type" field to audit_names struct
audit: reverse arguments to audit_inode_child
audit: no need to walk list in audit_inode if name is NULL
audit: pass in dentry to audit_copy_inode wherever possible
audit: remove unnecessary NULL ptr checks from do_path_lookup
getname() is intended to copy pathname strings from userspace into a
kernel buffer. The result is just a string in kernel space. It would
however be quite helpful to be able to attach some ancillary info to
the string.
For instance, we could attach some audit-related info to reduce the
amount of audit-related processing needed. When auditing is enabled,
we could also call getname() on the string more than once and not
need to recopy it from userspace.
This patchset converts the getname()/putname() interfaces to return
a struct instead of a string. For now, the struct just tracks the
string in kernel space and the original userland pointer for it.
Later, we'll add other information to the struct as it becomes
convenient.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull timer core update from Thomas Gleixner:
- Bug fixes (one for a longstanding dead loop issue)
- Rework of time related vsyscalls
- Alarm timer updates
- Jiffies updates to remove compile time dependencies
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Cast raw_interval to u64 to avoid shift overflow
timers: Fix endless looping between cascade() and internal_add_timer()
time/jiffies: bring back unconditional LATCH definition
time: Convert x86_64 to using new update_vsyscall
time: Only do nanosecond rounding on GENERIC_TIME_VSYSCALL_OLD systems
time: Introduce new GENERIC_TIME_VSYSCALL
time: Convert CONFIG_GENERIC_TIME_VSYSCALL to CONFIG_GENERIC_TIME_VSYSCALL_OLD
time: Move update_vsyscall definitions to timekeeper_internal.h
time: Move timekeeper structure to timekeeper_internal.h for vsyscall changes
jiffies: Remove compile time assumptions about CLOCK_TICK_RATE
jiffies: Kill unused TICK_USEC_TO_NSEC
alarmtimer: Rename alarmtimer_remove to alarmtimer_dequeue
alarmtimer: Remove unused helpers & defines
alarmtimer: Use hrtimer per-alarm instead of per-base
alarmtimer: Implement minimum alarm interval for allowing suspend
Pull pile 2 of execve and kernel_thread unification work from Al Viro:
"Stuff in there: kernel_thread/kernel_execve/sys_execve conversions for
several more architectures plus assorted signal fixes and cleanups.
There'll be more (in particular, real fixes for the alpha
do_notify_resume() irq mess)..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (43 commits)
alpha: don't open-code trace_report_syscall_{enter,exit}
Uninclude linux/freezer.h
m32r: trim masks
avr32: trim masks
tile: don't bother with SIGTRAP in setup_frame
microblaze: don't bother with SIGTRAP in setup_rt_frame()
mn10300: don't bother with SIGTRAP in setup_frame()
frv: no need to raise SIGTRAP in setup_frame()
x86: get rid of duplicate code in case of CONFIG_VM86
unicore32: remove pointless test
h8300: trim _TIF_WORK_MASK
parisc: decide whether to go to slow path (tracesys) based on thread flags
parisc: don't bother looping in do_signal()
parisc: fix double restarts
bury the rest of TIF_IRET
sanitize tsk_is_polling()
bury _TIF_RESTORE_SIGMASK
unicore32: unobfuscate _TIF_WORK_MASK
mips: NOTIFY_RESUME is not needed in TIF masks
mips: merge the identical "return from syscall" per-ABI code
...
Conflicts:
arch/arm/include/asm/thread_info.h
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:
| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.
Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQaf0EAAoJEKurIx+X31iBAq4QAIQpopjc7NECvmcSE8BFbYzJ
fMyG5GumkMq6o70lGZZjyRsHObyvYA4Os4pzJyDz27mycnd9gQg29F4j/b/X79ls
Bs/S9jBfnic5ed1Ti/wJCtLOQSq04n+70DolMr6D3+oHygRX5VvlnvWg44H7SIv/
37SFzpsxAHcWLoJaDC8US2uX0xTJ9YrARse5iiRUqRvoYP6hvpPE7+sPu0eWKEwH
Yr+A6ffOe6KdpxOevGBRx5W72JMImH+iI+vACdFC1Yl5Ad5dxA/7myByzQDVcjaP
u2qu8afRd4S2sSSvXigknIH18fyGxgS2WXzQmbn5fOSuVlP4OfEMklhbxL8r1VRZ
ANm53dcBsj8cHqzEdbtZLBLVJdxMfhFSB/BTrSflmEKaQ02cDOywt/uYM78umjHE
2sFN1fd4mipsqYlyV1nG6tEWp+v8GCW27ohtYKI5h3NKuyz3d3GWGnUhf+Z7cpkJ
MzUvhc08XCCkuRmFc0zW+NTjmoHN2Hn8P8nQurxrXFWfDaJ68I4JGyWrbPBqve2R
ZWqPSGRuV5kSEzp6iaE4l646IGC2RjYMltoHCTJS7abIwiQfY8ac3X3pfDDNAHIo
+fVmECngHGEUbd8RONkjYXaZ9ahXpbqQNn9bXlnISJfRqBaqCP3dDryLtkLDe7vI
TCr4eAuNKPzIUrIOK8NH
=S7H/
-----END PGP SIGNATURE-----
Merge tag 'please-pull-ia64-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 update from Tony Luck:
"Usual mish-mash of ia64 fixes for next merge window"
* tag 'please-pull-ia64-for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] xen: Fix return value check in xencomm_vtop()
[IA64] Must enable interrupts in do_notify_resume_user before calling tracehook_notify_resume()
[IA64] kexec: Move the dereference below the NULL test
[IA64] Fix a node distance bug
Pull scheduler changes from Ingo Molnar:
"Continued quest to clean up and enhance the cputime code by Frederic
Weisbecker, in preparation for future tickless kernel features.
Other than that, smallish changes."
Fix up trivial conflicts due to additions next to each other in arch/{x86/}Kconfig
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
cputime: Make finegrained irqtime accounting generally available
cputime: Gather time/stats accounting config options into a single menu
ia64: Reuse system and user vtime accounting functions on task switch
ia64: Consolidate user vtime accounting
vtime: Consolidate system/idle context detection
cputime: Use a proper subsystem naming for vtime related APIs
sched: cpu_power: enable ARCH_POWER
sched/nohz: Clean up select_nohz_load_balancer()
sched: Fix load avg vs. cpu-hotplug
sched: Remove __ARCH_WANT_INTERRUPTS_ON_CTXSW
sched: Fix nohz_idle_balance()
sched: Remove useless code in yield_to()
sched: Add time unit suffix to sched sysctl knobs
sched/debug: Limit sd->*_idx range on sysctl
sched: Remove AFFINE_WAKEUPS feature flag
s390: Remove leftover account_tick_vtime() header
cputime: Consolidate vtime handling on context switch
sched: Move cputime code to its own file
cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING
tile: Remove SD_PREFER_LOCAL leftover
...
Factorize the code that accounts user time into a
single function to avoid code duplication.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Move the code that finds out to which context we account the
cputime into generic layer.
Archs that consider the whole time spent in the idle task as idle
time (ia64, powerpc) can rely on the generic vtime_account()
and implement vtime_account_system() and vtime_account_idle(),
letting the generic code to decide when to call which API.
Archs that have their own meaning of idle time, such as s390
that only considers the time spent in CPU low power mode as idle
time, can just override vtime_account().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Use a naming based on vtime as a prefix for virtual based
cputime accounting APIs:
- account_system_vtime() -> vtime_account()
- account_switch_vtime() -> vtime_task_switch()
It makes it easier to allow for further declension such
as vtime_account_system(), vtime_account_idle(), ... if we
want to find out the context we account to from generic code.
This also make it better to know on which subsystem these APIs
refer to.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
To help migrate archtectures over to the new update_vsyscall method,
redfine CONFIG_GENERIC_TIME_VSYSCALL as CONFIG_GENERIC_TIME_VSYSCALL_OLD
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Since users will need to include timekeeper_internal.h, move
update_vsyscall definitions to timekeeper_internal.h.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Traditionally, the entire idle task served as an RCU quiescent state.
But when RCU read side critical sections started appearing within the
idle loop, this traditional strategy became untenable. The fix was to
create new RCU APIs named rcu_idle_enter() and rcu_idle_exit(), which
must be called by each architecture's idle loop so that RCU can tell
when it is safe to ignore a given idle CPU.
Unfortunately, this fix was never applied to ia64, a shortcoming remedied
by this commit.
Reported by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Cc: <stable@vger.kernel.org> # 3.3+
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
These ia64 uses of current_uid and current_gid slipped through the
cracks when I was converting everything to kuids and kgids convert
them now.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The dereference should be moved below the NULL test.
spatch with a semantic match is used to found this.
(http://coccinelle.lip6.fr/)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The archs that implement virtual cputime accounting all
flush the cputime of a task when it gets descheduled
and sometimes set up some ground initialization for the
next task to account its cputime.
These archs all put their own hooks in their context
switch callbacks and handle the off-case themselves.
Consolidate this by creating a new account_switch_vtime()
callback called in generic code right after a context switch
and that these archs must implement to flush the prev task
cputime and initialize the next task cputime related state.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Otherwise you could run into:
WARN_ON in numa_register_memblks(), because node_possible_map is zero
References: https://bugzilla.novell.com/show_bug.cgi?id=757888
On this machine (ProLiant ML570 G3) the SRAT table contains:
- No processor affinities
- One memory affinity structure (which is set disabled)
CC: Per Jessen <per@opensuse.org>
CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Merge Andrew's second set of patches:
- MM
- a few random fixes
- a couple of RTC leftovers
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
rtc/rtc-88pm80x: remove unneed devm_kfree
rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
tmpfs: distribute interleave better across nodes
mm: remove redundant initialization
mm: warn if pg_data_t isn't initialized with zero
mips: zero out pg_data_t when it's allocated
memcg: gix memory accounting scalability in shrink_page_list
mm/sparse: remove index_init_lock
mm/sparse: more checks on mem_section number
mm/sparse: optimize sparse_index_alloc
memcg: add mem_cgroup_from_css() helper
memcg: further prevent OOM with too many dirty pages
memcg: prevent OOM with too many dirty pages
mm: mmu_notifier: fix freed page still mapped in secondary MMU
mm: memcg: only check anon swapin page charges for swap cache
mm: memcg: only check swap cache pages for repeated charging
mm: memcg: split swapin charge function into private and public part
mm: memcg: remove needless !mm fixup to init_mm when charging
mm: memcg: remove unneeded shmem charge type
...
from interrupts for /dev/random and /dev/urandom. The goal is to
addresses weaknesses discussed in the paper "Mining your Ps and Qs:
Detection of Widespread Weak Keys in Network Devices", by Nadia
Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will
be published in the Proceedings of the 21st Usenix Security Symposium,
August 2012. (See https://factorable.net for more information and an
extended version of the paper.)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQF/0DAAoJENNvdpvBGATwIowQAOep9QKtLrBvb2lwIRVmeiy8
lRf7V/tYZnz4FePbR0W92JQfKYkCV8yyOO0bmeRzWL3v4m+lRwDTSyA1DDyQMoH+
LOMzvDKSLJMSXTXdSOIr1WYACphViCR/9CrbMBCKSkYfZLJ1MdaEDxT3rcpTGD0T
6iknUweiSkHHhkerU5yQL7FKzD5kYUe0hsF47w7QVlHRHJsW2fsZqkFoh+RpnhNw
03u+djxNGBo9qV81vZ9D1b0vA9uRlEjoWOOEG2XE4M2iq6TUySueA72dQnCwunfi
3kG/u1Swv2dgq6aRrP3H7zdwhYSourGxziu3jNhEKwKEohrxYY7xjNX3RVeTqP67
AzlKsOTWpRLIDrzjSLlb8VxRQiZewu8Unex3e1G+eo20sbcIObHGrxNp7K00zZvd
QZiMHhOwItwFTe4lBO+XbqH2JKbL9/uJmwh5EipMpQTraKO9E6N3CJiUHjzBLo2K
iGDZxRMKf4gVJRwDxbbP6D70JPVu8ZJ09XVIpsXQ3Z1xNqaMF0QdCmP3ty56q1o0
NvkSXxPKrijZs8Sk0rVDqnJ3ll8PuDnXMv5eDtL42VT818I5WxESn9djjwEanGv0
TYxbFub/NRxmPEE5B2Js5FBpqsLf5f282OSMeS/5WLBbnHJR1OoPoAhGVpHvxntC
bi5FC1OolqhvzVIdsqgt
=u7KM
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random subsystem patches from Ted Ts'o:
"This patch series contains a major revamp of how we collect entropy
from interrupts for /dev/random and /dev/urandom.
The goal is to addresses weaknesses discussed in the paper "Mining
your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
which will be published in the Proceedings of the 21st Usenix Security
Symposium, August 2012. (See https://factorable.net for more
information and an extended version of the paper.)"
Fix up trivial conflicts due to nearby changes in
drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
random: mix in architectural randomness in extract_buf()
dmi: Feed DMI table to /dev/random driver
random: Add comment to random_initialize()
random: final removal of IRQF_SAMPLE_RANDOM
um: remove IRQF_SAMPLE_RANDOM which is now a no-op
sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
[ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
...
vm_stat_account() accounts the shared_vm, stack_vm and reserved_vm now.
But we can also account for total_vm in the vm_stat_account() which makes
the code tidy.
Even for mprotect_fixup(), we can get the right result in the end.
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The most important part of these updates is the IOMMU groups code
enhancement written by Alex Williamson. It abstracts the problem that a
given hardware IOMMU can't isolate any given device from any other
device (e.g. 32 bit PCI devices can't usually be isolated). Devices that
can't be isolated are grouped together. This code is required for the
upcoming VFIO framework.
Another IOMMU-API change written by be is the introduction of domain
attributes. This makes it easier to handle GART-like IOMMUs with the
IOMMU-API because now the start-address and the size of the domain
address space can be queried.
Besides that there are a few cleanups and fixes for the NVidia Tegra
IOMMU drivers and the reworked init-code for the AMD IOMMU. The later is
from my patch-set to support interrupt remapping. The rest of this
patch-set requires x86 changes which are not mergabe yet. So full
support for interrupt remapping with AMD IOMMUs will come in a future
merge window.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQDV/MAAoJECvwRC2XARrjSDcP+gJbtSHDMyZ71zyfQfAZcxJt
rTqLbdZRtIjrjgtKSEDp8u5Bo5TK9dAYoZVuJMOZewFzwI/fSfbRsWp1PU0I88Fr
ZzM+/o1N9MLvf1e3kRVOzNzUfku+jTQgUBD4txsbtQzc/IeGHe9qP1Bqzs/xg4Pk
SjWu7pLNYxaER10z76nRodNn6zGjsc7GFdOW8cJu2HOAHhisIAR291jSQgd6Rz9r
zWqSTsXIEzYt2CtU3G2/tFJ554Mp8v5F80gHo+0Ldw8aNxlD6nGtbqGNt+KI8qTv
MUL8KJ0TNms9CZdti1CSlSNp51VgJi2GaWKCaDAkYuuER2IbC/8Yp/p2DIIA0GNp
HpziIs+dauZPWfZHc6oU7lJAClGAG4MUx7CysVIOzl7ML/Bf4mjYv0faGf5YQfyE
weOR+OPPIWDUwgjzHKMAboA4ijkE/v+EKjOaN/S9rEqFEMKC99fwGkf9wUcpZTne
8lzdI2JrgYNDWMVNYlomeLD4lBAbxb/QsnRUa33igjr0MclvMDkp5HaO631Z1+Zx
be2z8Rl1CtMwS4qeaOXoeaoNWHU26+oJRZNtCGi/Fw4aKqYXP1dnE/m0GtqEP9Yi
+CU2rKbZn3j0+ZcQjCQop8FREPrZ2/Uaji70b6G7WZ2ApcqBxzBffpbMKOmd6T1D
HIzGh0fpdYNDuwn6Txit
=MbAC
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The most important part of these updates is the IOMMU groups code
enhancement written by Alex Williamson. It abstracts the problem that
a given hardware IOMMU can't isolate any given device from any other
device (e.g. 32 bit PCI devices can't usually be isolated). Devices
that can't be isolated are grouped together. This code is required
for the upcoming VFIO framework.
Another IOMMU-API change written by me is the introduction of domain
attributes. This makes it easier to handle GART-like IOMMUs with the
IOMMU-API because now the start-address and the size of the domain
address space can be queried.
Besides that there are a few cleanups and fixes for the NVidia Tegra
IOMMU drivers and the reworked init-code for the AMD IOMMU. The
latter is from my patch-set to support interrupt remapping. The rest
of this patch-set requires x86 changes which are not mergabe yet. So
full support for interrupt remapping with AMD IOMMUs will come in a
future merge window."
* tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits)
iommu/amd: Fix hotplug with iommu=pt
iommu/amd: Add missing spin_lock initialization
iommu/amd: Convert iommu initialization to state machine
iommu/amd: Introduce amd_iommu_init_dma routine
iommu/amd: Move unmap_flush message to amd_iommu_init_dma_ops()
iommu/amd: Split enable_iommus() routine
iommu/amd: Introduce early_amd_iommu_init routine
iommu/amd: Move informational prinks out of iommu_enable
iommu/amd: Split out PCI related parts of IOMMU initialization
iommu/amd: Use acpi_get_table instead of acpi_table_parse
iommu/amd: Fix sparse warnings
iommu/tegra: Don't call alloc_pdir with as->lock
iommu/tegra: smmu: Fix unsleepable memory allocation at alloc_pdir()
iommu/tegra: smmu: Remove unnecessary sanity check at alloc_pdir()
iommu/exynos: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/tegra: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/msm: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/omap: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/vt-d: Implement DOMAIN_ATTR_GEOMETRY attribute
iommu/amd: Implement DOMAIN_ATTR_GEOMETRY attribute
...
Pull trivial tree from Jiri Kosina:
"Trivial updates all over the place as usual."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits)
Fix typo in include/linux/clk.h .
pci: hotplug: Fix typo in pci
iommu: Fix typo in iommu
video: Fix typo in drivers/video
Documentation: Add newline at end-of-file to files lacking one
arm,unicore32: Remove obsolete "select MISC_DEVICES"
module.c: spelling s/postition/position/g
cpufreq: Fix typo in cpufreq driver
trivial: typo in comment in mksysmap
mach-omap2: Fix typo in debug message and comment
scsi: aha152x: Fix sparse warning and make printing pointer address more portable.
Change email address for Steve Glendinning
Btrfs: fix typo in convert_extent_bit
via: Remove bogus if check
netprio_cgroup.c: fix comment typo
backlight: fix memory leak on obscure error path
Documentation: asus-laptop.txt references an obsolete Kconfig item
Documentation: ManagementStyle: fixed typo
mm/vmscan: cleanup comment error in balance_pgdat
mm: cleanup on the comments of zone_reclaim_stat
...
With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.
[ Merged in fixes from Sedat to find some last missing references to
rand_initialize_irq() ]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
The iommu=group_mf is really no longer needed with the addition of ACS
support in IOMMU drivers creating groups. Most multifunction devices
will now be grouped already. If a device has gone to the trouble of
exposing ACS, trust that it works. We can use the device specific ACS
function for fixing devices we trust individually. This largely
reverts bcb71abe.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Pull third pile of signal handling patches from Al Viro:
"This time it's mostly helpers and conversions to them; there's a lot
of stuff remaining in the tree, but that'll either go in -rc2
(isolated bug fixes, ideally via arch maintainers' trees) or will sit
there until the next cycle."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
x86: get rid of calling do_notify_resume() when returning to kernel mode
blackfin: check __get_user() return value
whack-a-mole with TIF_FREEZE
FRV: Optimise the system call exit path in entry.S [ver #2]
FRV: Shrink TIF_WORK_MASK [ver #2]
FRV: Prevent syscall exit tracing and notify_resume at end of kernel exceptions
new helper: signal_delivered()
powerpc: get rid of restore_sigmask()
most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set
set_restore_sigmask() is never called without SIGPENDING (and never should be)
TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set
don't call try_to_freeze() from do_signal()
pull clearing RESTORE_SIGMASK into block_sigmask()
sh64: failure to build sigframe != signal without handler
openrisc: tracehook_signal_handler() is supposed to be called on success
new helper: sigmask_to_save()
new helper: restore_saved_sigmask()
new helpers: {clear,test,test_and_clear}_restore_sigmask()
HAVE_RESTORE_SIGMASK is defined on all architectures now
Pull vfs changes from Al Viro.
"A lot of misc stuff. The obvious groups:
* Miklos' atomic_open series; kills the damn abuse of
->d_revalidate() by NFS, which was the major stumbling block for
all work in that area.
* ripping security_file_mmap() and dealing with deadlocks in the
area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
general.
* ->encode_fh() switched to saner API; insane fake dentry in
mm/cleancache.c gone.
* assorted annotations in fs (endianness, __user)
* parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
* ->update_time() work from Josef.
* other bits and pieces all over the place.
Normally it would've been in two or three pull requests, but
signal.git stuff had eaten a lot of time during this cycle ;-/"
Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
'truncate_range' inode method was removed by the VM changes, the VFS
update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
to sparse fix added twice, with other changes nearby).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
nfs: don't open in ->d_revalidate
vfs: retry last component if opening stale dentry
vfs: nameidata_to_filp(): don't throw away file on error
vfs: nameidata_to_filp(): inline __dentry_open()
vfs: do_dentry_open(): don't put filp
vfs: split __dentry_open()
vfs: do_last() common post lookup
vfs: do_last(): add audit_inode before open
vfs: do_last(): only return EISDIR for O_CREAT
vfs: do_last(): check LOOKUP_DIRECTORY
vfs: do_last(): make ENOENT exit RCU safe
vfs: make follow_link check RCU safe
vfs: do_last(): use inode variable
vfs: do_last(): inline walk_component()
vfs: do_last(): make exit RCU safe
vfs: split do_lookup()
Btrfs: move over to use ->update_time
fs: introduce inode operation ->update_time
reiserfs: get rid of resierfs_sync_super
reiserfs: mark the superblock as dirty a bit later
...
Does block_sigmask() + tracehook_signal_handler(); called when
sigframe has been successfully built. All architectures converted
to it; block_sigmask() itself is gone now (merged into this one).
I'm still not too happy with the signature, but that's a separate
story (IMO we need a structure that would contain signal number +
siginfo + k_sigaction, so that get_signal_to_deliver() would fill one,
signal_delivered(), handle_signal() and probably setup...frame() -
take one).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only 3 out of 63 do not. Renamed the current variant to __set_current_blocked(),
added set_current_blocked() that will exclude unblockable signals, switched
open-coded instances to it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
replace boilerplate "should we use ->saved_sigmask or ->blocked?"
with calls of obvious inlined helper...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
first fruits of ..._restore_sigmask() helpers: now we can take
boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK
and restore the blocked mask from ->saved_mask" into a common
helper. Open-coded instances switched...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
get_unmapped_area() returns -E... on failure, not 0. Moreover, the
wrapper around it is completely pointless.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull first series of signal handling cleanups from Al Viro:
"This is just the first part of the queue (about a half of it);
assorted fixes all over the place in signal handling.
This one ends with all sigsuspend() implementations switched to
generic one (->saved_sigmask-based).
With this, a bunch of assorted old buglets are fixed and most of the
missing bits of NOTIFY_RESUME hookup are in place. Two more fixes sit
in arm and um trees respectively, and there's a couple of broken ones
that need obvious fixes - parisc and avr32 check TIF_NOTIFY_RESUME
only on one of two codepaths; fixes for that will happen in the next
series"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (55 commits)
unicore32: if there's no handler we need to restore sigmask, syscall or no syscall
xtensa: add handling of TIF_NOTIFY_RESUME
microblaze: drop 'oldset' argument of do_notify_resume()
microblaze: handle TIF_NOTIFY_RESUME
score: add handling of NOTIFY_RESUME to do_notify_resume()
m68k: add TIF_NOTIFY_RESUME and handle it.
sparc: kill ancient comment in sparc_sigaction()
h8300: missing checks of __get_user()/__put_user() return values
frv: missing checks of __get_user()/__put_user() return values
cris: missing checks of __get_user()/__put_user() return values
powerpc: missing checks of __get_user()/__put_user() return values
sh: missing checks of __get_user()/__put_user() return values
sparc: missing checks of __get_user()/__put_user() return values
avr32: struct old_sigaction is never used
m32r: struct old_sigaction is never used
xtensa: xtensa_sigaction doesn't exist
alpha: tidy signal delivery up
score: don't open-code force_sigsegv()
cris: don't open-code force_sigsegv()
blackfin: don't open-code force_sigsegv()
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPvQqdAAoJEKurIx+X31iBg6gP/0gclssJFuc509wn18ng+zmP
QOfRqsQ/Qta0he4t5WuLzQ7Ht26RlivIVwd23tHcRdgQzdEztXB7J6iz/PwQ7l8A
Cgf/BrZ1r7F54JQh06y9B6ClggB2VEwj711fqTQtr8nRazxfuj+3m5L+iVQjgzlq
DGlcoFN1rfl2FZqzCPFChqZslyZMm5QVd9cHZaEfP2srU3Y52JbcbiErzxDdSK28
Q2jvX6iPalXEizk4oIXh8qz8KhnX29oTH+ujY1PtxaP6G+7cPJy+601GB5mAzG0K
vn1tb1+Lwu1/gKJ3soO3elFX3PFW4TVlK/TF/ahnVXe8aDXkrftP1SsBsXQ+Xd6Y
jFl2iNZkGKKuAIMu6+mjgQ9LWt3Xa0qTAIlDoOgRYK/OlWs3cGlCD07+rzsqtbcC
amXS4eNa4AYwb5Qo5BFX2Hejqsock78nmG+AUOvmYOHHDNFEvmeNCfhDDcE+iHRq
Nm0ZINAXXWqJTtMwrH/DaXO/U7LRbuwzMBd/hCX5aC5xcd2fwERDtisFyZfvDSX6
0vVflWYfUpUdnf/KeGHh4C3AFmroy3P9S0+aKJJQ4NZ7oQE/15zn7MRPWVC+6hzY
/OgeoUQKaIh8Q/oEs8P+xOarl91sdF34ZgEkVHDM5Nuzt8EgUFxhMAvDJ+7h8npA
fQ3y7YMjRDNIzpPIKGRQ
=WOV+
-----END PGP SIGNATURE-----
Merge tag 'ia64-3.5-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull Itanium fixes from Tony Luck.
* tag 'ia64-3.5-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
[IA64] Liberate the signal layer from IA64 assembler
[IA64] Add cmpxchg.h to exported userspace headers
[IA64] Fix fast syscall version of getcpu()
[IA64] Removed "task_size" element from thread_struct - it is now constant
As described in e6fa16ab ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block is
pending in the shared queue.
Also, use the new helper function introduced in commit 5e6292c0f2
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate code
across architectures. In the past some architectures got this code wrong,
so using this helper function should stop that from happening again.
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently IA64 has a assembler implementation of sigrtprocmask.
Having a single architecture implement this in assembler language
is a serious maintenance problem that inhibits further evolution of the
signal subsystem. Everyone who wants to do deep changes to signals
would need to learn that assembler language.
Whatever performance improvements IA64 gets from this it cannot be worth
the price in maintainability.
We have some locking problems in signal that need to be fixed,
but this roadblock needs to be removed first.
So just disable the special assembler IA64 implementation and fall back to a
normal syscall there.
Acked-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
GETCPU(2) says:
int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);
...
When either cpu or node is NULL nothing is written to the respective pointer.
But the fast system call path had no checks for NULL, and would
thus return -EFAULT if either (or both) of these were NULL.
Reported-by: Mike Frysinger <vapier@gentoo.org>
Tested-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
IA64 does not set CONFIG_ARCH_HAVE_IDLE_WAIT and cpuidle uses a
generic function now. Remove the unused code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20120507175652.392394511@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20120420124557.380965133@linutronix.de
Preparatory patch to make the idle thread allocation for secondary
cpus generic.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
pfm_vm_munmap() is simply vm_munmap() and pfm_remove_smpl_mapping()
always get current as the first argument.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Like the vm_brk() function, this is the same as "do_munmap()", except it
does the VM locking for the caller.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull DMA mapping branch from Marek Szyprowski:
"Short summary for the whole series:
A few limitations have been identified in the current dma-mapping
design and its implementations for various architectures. There exist
more than one function for allocating and freeing the buffers:
currently these 3 are used dma_{alloc, free}_coherent,
dma_{alloc,free}_writecombine, dma_{alloc,free}_noncoherent.
For most of the systems these calls are almost equivalent and can be
interchanged. For others, especially the truly non-coherent ones
(like ARM), the difference can be easily noticed in overall driver
performance. Sadly not all architectures provide implementations for
all of them, so the drivers might need to be adapted and cannot be
easily shared between different architectures. The provided patches
unify all these functions and hide the differences under the already
existing dma attributes concept. The thread with more references is
available here:
http://www.spinics.net/lists/linux-sh/msg09777.html
These patches are also a prerequisite for unifying DMA-mapping
implementation on ARM architecture with the common one provided by
dma_map_ops structure and extending it with IOMMU support. More
information is available in the following thread:
http://thread.gmane.org/gmane.linux.kernel.cross-arch/12819
More works on dma-mapping framework are planned, especially in the
area of buffer sharing and managing the shared mappings (together with
the recently introduced dma_buf interface: commit d15bd7ee44
"dma-buf: Introduce dma buffer sharing mechanism").
The patches in the current set introduce a new alloc/free methods
(with support for memory attributes) in dma_map_ops structure, which
will later replace dma_alloc_coherent and dma_alloc_writecombine
functions."
People finally started piping up with support for merging this, so I'm
merging it as the last of the pending stuff from the merge window.
Looks like pohmelfs is going to wait for 3.5 and more external support
for merging.
* 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
common: DMA-mapping: add NON-CONSISTENT attribute
common: DMA-mapping: add WRITE_COMBINE attribute
common: dma-mapping: introduce mmap method
common: dma-mapping: remove old alloc_coherent and free_coherent methods
Hexagon: adapt for dma_map_ops changes
Unicore32: adapt for dma_map_ops changes
Microblaze: adapt for dma_map_ops changes
SH: adapt for dma_map_ops changes
Alpha: adapt for dma_map_ops changes
SPARC: adapt for dma_map_ops changes
PowerPC: adapt for dma_map_ops changes
MIPS: adapt for dma_map_ops changes
X86 & IA64: adapt for dma_map_ops changes
common: dma-mapping: introduce generic alloc() and free() methods
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPc+5PAAoJENkgDmzRrbjx8qwQAIRGDWGAJ7fiu8QBVbjycXJG
7828enxrbBQodNmc+uAkYvTv3KEoi8tlweMsk/lWDv8WovZV4IlQDEFCX/f4hWVY
S+2PmqJkN/alsG3dXd00zotK9mOJD+mQPAdjUBaNnRdp3QoV3YrjgihkWiL23DyT
dZTgqXdbUJkHk/d9YD1qcDvWdSr1EufSLYa52PhLJqYiYVk8zCdX82deJX1MWh64
v9I6htA73ORoX4JBGsFAOHO8fmLaq1yhBUMHOL4+gfEJVv4kSTU05GgepBHQP1fm
BbG2hN6G4vqqiqhV5A59+h271o/2d/KBGKx8/twRGk8tNJIwTIVnr/qcGuUfytC3
vA1fmq3vul0bzbqRgph8bGJyoVIg8CHjq24BFJQOXiQ1/6HOvjxnKBYs+3sVA829
ZYQYuEoRKmTsD3vv3nmcqAdZZDzehBQ499bEqDNsnQRLOjOVNag/pJSaENkeVC4T
CKYXt9BEabYnermPLdrjiabPE27GaEznX11SzCSXiWJsKX2kJnvz5RxVo8nlh1fc
/KQWJyWi/QVmAdy4eCJFp48513BqncHvKtPZ6zN9+Y6NHKmnmAqieZhh4yV/SCqi
EcK2oHQXmioKldn5DANQjeUCWlmEYXHbR08ahGRLNc7GZ1qKCgDr8+WEC0XYB/gQ
XLH3KKLM+VmvtonqjDV7
=W59/
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://github.com/rustyrussell/linux
Pull cpumask cleanups from Rusty Russell:
"(Somehow forgot to send this out; it's been sitting in linux-next, and
if you don't want it, it can sit there another cycle)"
I'm a sucker for things that actually delete lines of code.
Fix up trivial conflict in arch/arm/kernel/kprobes.c, where Rusty fixed
a user of &cpu_online_map to be cpu_online_mask, but that code got
deleted by commit b21d55e98a ("ARM: 7332/1: extract out code patch
function from kprobes").
* tag 'for-linus' of git://github.com/rustyrussell/linux:
cpumask: remove old cpu_*_map.
documentation: remove references to cpu_*_map.
drivers/cpufreq/db8500-cpufreq: remove references to cpu_*_map.
remove references to cpu_*_map in arch/
Pull timer core updates from Thomas Gleixner.
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ia64: vsyscall: Add missing paranthesis
alarmtimer: Don't call rtc_timer_init() when CONFIG_RTC_CLASS=n
x86: vdso: Put declaration before code
x86-64: Inline vdso clock_gettime helpers
x86-64: Simplify and optimize vdso clock_gettime monotonic variants
kernel-time: fix s/then/than/ spelling errors
time: remove no_sync_cmos_clock
time: Avoid scary backtraces when warning of > 11% adj
alarmtimer: Make sure we initialize the rtctimer
ntp: Fix leap-second hrtimer livelock
x86, tsc: Skip refined tsc calibration on systems with reliable TSC
rtc: Provide flag for rtc devices that don't support UIE
ia64: vsyscall: Use seqcount instead of seqlock
x86: vdso: Use seqcount instead of seqlock
x86: vdso: Remove bogus locking in update_vsyscall_tz()
time: Remove bogus comments
time: Fix change_clocksource locking
time: x86: Fix race switching from vsyscall to non-vsyscall clock
Fix this build error on ia64:
In file included from include/linux/sched.h:92,
from arch/ia64/kernel/asm-offsets.c:9:
include/linux/llist.h:59:25: error: asm/cmpxchg.h: No such file or directory
make[1]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
Right now we don't seem to need any actual contents for the
asm/cmpxchg.h to make the build work ... so leave the migration of
xchg() and cmpxchg() to this new header file for a future patch.
Also process.c needs <asm/switch_to.h> (for definition of pfm_syst_info).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>