WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Russell King	2ef75701d1	ARM: CPU hotplug: fix abuse of irqdesc->node irqdesc's node member is supposed to mark the numa node number for the interrupt. Our use of it is non-standard. Remove this, replacing the functionality with a test of the affinity mask. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2011-07-21 15:00:27 +01:00
Huang Ying	050438ed5a	kexec, x86: Fix incorrect jump back address if not preserving context In kexec jump support, jump back address passed to the kexeced kernel via function calling ABI, that is, the function call return address is the jump back entry. Furthermore, jump back entry == 0 should be used to signal that the jump back or preserve context is not enabled in the original kernel. But in the current implementation the stack position used for function call return address is not cleared context preservation is disabled. The patch fixes this bug. Reported-and-tested-by: Yin Kangkai <kangkai.yin@intel.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/1310607277-25029-1-git-send-email-ying.huang@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 11:19:28 +02:00
Alan Cox	43605ef188	x86, config: Introduce an INTEL_MID configuration We need to carve up the configuration between: - MID general - Moorestown specific - Medfield specific - Future devices As a base point create an INTEL_MID configuration property. We make the existing MRST configuration a sub-option. This means that the rest of the kernel config can still use X86_MRST checks without anything going backwards. After this is merged future patches will tidy up which devices are MID and which are X86_MRST, as well as add options for Medfield. Signed-off-by: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/r/20110712164859.7642.84136.stgit@bob.linux.org.uk Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 10:35:14 +02:00
Sergei Shtylyov	38175051f8	x86, quirks: Use pci_dev->revision This code uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so it wasn't converted by commit `44c10138fd` ("PCI: Change all drivers to use pci_device->revision") before being moved to arch/x86/... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/201107111901.39281.sshtylyov@ru.mvista.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 10:26:00 +02:00
Greg Dietsche	a6c23905ff	x86, smpboot: Mark the names[] array in __inquire_remote_apic() as const This array is read-only. Make it explicit by marking as const. Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu> Link: http://lkml.kernel.org/r/1309482653-23648-1-git-send-email-Gregory.Dietsche@cuw.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 10:04:51 +02:00
Jan Beulich	ef68c8f87e	x86: Serialize EFI time accesses on rtc_lock The EFI specification requires that callers of the time related runtime functions serialize with other CMOS accesses in the kernel, as the EFI time functions may choose to also use the legacy CMOS RTC. Besides fixing a latent bug, this is a prerequisite to safely enable the rtc-efi driver for x86, which ought to be preferred over rtc-cmos on all EFI platforms. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Matthew Garrett <mjg59@srcf.ucam.org> Cc: <mjg@redhat.com> Link: http://lkml.kernel.org/r/4E257E33020000780004E319@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Matthew Garrett <mjg@redhat.com>	2011-07-21 09:21:00 +02:00
Jan Beulich	ac619f4eba	x86: Serialize SMP bootup CMOS accesses on rtc_lock With CPU hotplug, there is a theoretical race between other CMOS (namely RTC) accesses and those done in the SMP secondary processor bringup path. I am unware of the problem having been noticed by anyone in practice, but it would very likely be rather spurious and very hard to reproduce. So to be on the safe side, acquire rtc_lock around those accesses. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/4E257AE7020000780004E2FF@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 09:20:59 +02:00
Jan Beulich	a750036f35	x86: Fix write lock scalability 64-bit issue With the write lock path simply subtracting RW_LOCK_BIAS there is, on large systems, the theoretical possibility of overflowing the 32-bit value that was used so far (namely if 128 or more CPUs manage to do the subtraction, but don't get to do the inverse addition in the failure path quickly enough). A first measure is to modify RW_LOCK_BIAS itself - with the new value chosen, it is good for up to 2048 CPUs each allowed to nest over 2048 times on the read path without causing an issue. Quite possibly it would even be sufficient to adjust the bias a little further, assuming that allowing for significantly less nesting would suffice. However, as the original value chosen allowed for even more nesting levels, to support more than 2048 CPUs (possible currently only for 64-bit kernels) the lock itself gets widened to 64 bits. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258E0D020000780004E3F0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 09:03:36 +02:00
Jan Beulich	a738669464	x86: Unify rwsem assembly implementation Rather than having two functionally identical implementations for 32- and 64-bit configurations, use the previously extended assembly abstractions to fold the rwsem two implementations into a shared one. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258DF3020000780004E3ED@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 09:03:32 +02:00
Jan Beulich	4625cd6379	x86: Unify rwlock assembly implementation Rather than having two functionally identical implementations for 32- and 64-bit configurations, extend the existing assembly abstractions enough to fold the two rwlock implementations into a shared one. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4E258DD7020000780004E3EA@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-21 09:03:31 +02:00
Josef Bacik	02c24a8218	fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers Btrfs needs to be able to control how filemap_write_and_wait_range() is called in fsync to make it less of a painful operation, so push down taking i_mutex and the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some file systems can drop taking the i_mutex altogether it seems, like ext3 and ocfs2. For correctness sake I just pushed everything down in all cases to make sure that we keep the current behavior the same for everybody, and then each individual fs maintainer can make up their mind about what to do from there. Thanks, Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:59 -04:00
Al Viro	c066b65abf	arm: don't create useless copies to pass into debugfs_create_dir() its first argument is const char * and it's really not modified... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:52 -04:00
Al Viro	12520c438f	switch assorted clock drivers to debugfs_remove_recursive() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 20:47:51 -04:00
Linus Torvalds	919d25a710	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86. reboot: Make Dell Latitude E6320 use reboot=pci x86, doc only: Correct real-mode kernel header offset for init_size x86: Disable AMD_NUMA for 32bit for now	2011-07-20 15:33:59 -07:00
Zhangfei Gao	a5556f3cc7	ARM: mmp2: update mmp2_defconfig to support mmc 1. support brownstone 2. support mmc 3. support basic filesystem and language 4. remove dynamic_debug, since too many log during access sd Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Mark F. Brown <mark.brown314@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2011-07-20 17:20:52 -04:00
Zhangfei Gao	bfed345edf	mmc: sdhci-pxa: move platform data to include/linux/platform_data As suggested by Arnd, move platform data to include/linux/platform_data in order to improve build coverage for the driver. Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Chris Ball <cjb@laptop.org>	2011-07-20 17:20:52 -04:00
Zhangfei Gao	6f984f3b26	mmc: update mmp2 mmc resources in arch/arm Update MMP2 platform code to "sdhci-pxav3", following the driver rename. Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Mark F. Brown <mark.brown314@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2011-07-20 17:20:50 -04:00
Zhangfei Gao	a702c8abb2	mmc: host: split up sdhci-pxa, create sdhci-pxav3.c sdhci-pltfm driver for PXAV3 SoCs, such as MMP2. Signed-off-by: Zhangfei Gao <zhangfei.gao@marvell.com> Signed-off-by: Philip Rakity <prakity@marvell.com> Acked-by: Philip Rakity <prakity@marvell.com> Acked-by: Mark F. Brown <mark.brown314@gmail.com> Signed-off-by: Chris Ball <cjb@laptop.org>	2011-07-20 17:20:49 -04:00
Lai Jiangshan	f218a7ee7a	ia64,rcu: Convert call_rcu(sn_irq_info_free) to kfree_rcu() The rcu callback sn_irq_info_free() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(sn_irq_info_free). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jes Sorensen <jes@sgi.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2011-07-20 14:10:13 -07:00
Al Viro	1ba1068186	switch do_spufs_create() to user_path_create(), fix double-unlock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-20 01:44:06 -04:00
Linus Torvalds	47126d807a	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: davinci: DM365 EVM: fix video input mux bits ARM: davinci: Check for NULL return from irq_alloc_generic_chip arm: davinci: Fix low level gpio irq handlers' argument	2011-07-19 22:10:05 -07:00
Catalin Marinas	140d5dc1d7	ARM: 7000/1: LPAE: Use long long printk format for displaying the pud Currently using just long but this is not enough for the LPAE format (64-bit entries). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2011-07-19 12:01:27 +01:00
Dave Martin	540b573875	ARM: 6999/1: head, zImage: Always Enter the kernel in ARM state Currently, the documented kernel entry requirements are not explicit about whether the kernel should be entered in ARM or Thumb, leading to an ambiguitity about how to enter Thumb-2 kernels. As a result, the kernel is reliant on the zImage decompressor to enter the kernel proper in the correct instruction set state. This patch changes the boot entry protocol for head.S and Image to be the same as for zImage: in all cases, the kernel is now entered in ARM. Documentation/arm/Booting is updated to reflect this new policy. A different rule will be needed for Cortex-M class CPUs as and when support for those lands in mainline, since these CPUs don't support the ARM instruction set at all: a note is added to the effect that the kernel must be entered in Thumb on such systems. Signed-off-by: Dave Martin <dave.martin@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2011-07-19 12:00:53 +01:00
Russell King	4348810a24	ARM: btc: avoid invalidating the branch target cache on kernel TLB maintanence Kernel space needs very little in the way of BTC maintanence as most mappings which are created and destroyed are non-executable, and so could never enter the instruction stream. The case which does warrant BTC maintanence is when a module is loaded. This creates a new executable mapping, but at that point the pages have not been initialized with code and data, so at that point they contain unpredictable information. Invalidating the BTC at this stage serves little useful purpose. Before we execute module code, we call flush_icache_range(), which deals with the BTC maintanence requirements. This ensures that we have a BTC maintanence operation before we execute code via the newly created mapping. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2011-07-19 11:44:06 +01:00
Jon Povey	9daedd833a	davinci: DM365 EVM: fix video input mux bits Video input mux settings for tvp7002 and imager inputs were swapped. Comment was correct. Tested on EVM with tvp7002 input. Signed-off-by: Jon Povey <jon.povey@racelogic.co.uk> Acked-by: Manjunath Hadli <manjunath.hadli@ti.com> Cc: stable@kernel.org Signed-off-by: Sekhar Nori <nsekhar@ti.com>	2011-07-19 14:00:29 +05:30
Todd Poynor	33e1e5e317	ARM: davinci: Check for NULL return from irq_alloc_generic_chip Avoid NULL dereference of irq_alloc_generic_chip return in low memory conditions. Signed-off-by: Todd Poynor <toddpoynor@google.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com>	2011-07-19 12:49:00 +05:30
Jeremy Fitzhardinge	2a6f6d0955	xen/multicall: move *idx fields to start of mc_buffer The CPU would prefer small offsets. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:46 -07:00
Jeremy Fitzhardinge	eac303bf2e	xen/multicall: special-case singleton hypercalls Singleton calls seem to end up being pretty common, so just directly call the hypercall rather than going via multicall. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:45 -07:00
Jeremy Fitzhardinge	4a7b005dbf	xen/multicalls: add unlikely around slowpath in __xen_mc_entry() Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:45 -07:00
Jeremy Fitzhardinge	ffc78767f2	xen/multicalls: disable MC_DEBUG It's useful - and probably should be a config - but its very heavyweight, especially with the tracing stuff to help sort out problems. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:28 -07:00
Jeremy Fitzhardinge	bc7fe1d977	xen/mmu: tune pgtable alloc/release Make sure the fastpath code is inlined. Batch the page permission change and the pin/unpin, and make sure that it can be batched with any adjacent set_pte/pmd/etc operations. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:28 -07:00
Jeremy Fitzhardinge	dcf7435cfe	xen/mmu: use extend_args for more mmuext updates Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge	c8eed1719a	xen/trace: add tlb flush tracepoints Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge	ab78f7ad2c	xen/trace: add segment desc tracing Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge	5f94fb5b8e	xen/trace: add xen_pgd_(un)pin tracepoints Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge	c2ba050d2e	xen/trace: add ptpage alloc/release tracepoints Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge	8470880791	xen/trace: add mmu tracepoints Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:27 -07:00
Jeremy Fitzhardinge	c796f213a6	xen/trace: add multicall tracing Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:26 -07:00
Jeremy Fitzhardinge	f04e2ee41d	xen/trace: set up tracepoint skeleton Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:04 -07:00
Jeremy Fitzhardinge	84cdee76b1	xen/multicalls: remove debugfs stats Remove debugfs stats to make way for tracing. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2011-07-18 15:43:04 -07:00
Russell King	07f1c295de	Merge branch 'dma' of http://git.linaro.org/git/people/nico/linux into devel-stable	2011-07-18 23:00:42 +01:00
Nicolas Pitre	fb89fcfb15	ARM: ARM_DMA_ZONE_SIZE is no more One less dependency on mach/memory.h. Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:30:57 -04:00
Nicolas Pitre	fded1ef9c5	ARM: mach-shark: move ARM_DMA_ZONE_SIZE to mdesc->dma_zone_size Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:30:04 -04:00
Nicolas Pitre	e9107ab623	ARM: mach-sa1100: move ARM_DMA_ZONE_SIZE to mdesc->dma_zone_size Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:30:03 -04:00
Nicolas Pitre	00e9125e06	ARM: mach-realview: move from ARM_DMA_ZONE_SIZE to mdesc->dma_zone_size Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:30:02 -04:00
Nicolas Pitre	805e88dc40	ARM: mach-pxa: move from ARM_DMA_ZONE_SIZE to mdesc->dma_zone_size Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:30:01 -04:00
Nicolas Pitre	7553ee777b	ARM: mach-ixp4xx: move from ARM_DMA_ZONE_SIZE to mdesc->dma_zone_size Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:30:00 -04:00
Nicolas Pitre	5065c71d75	ARM: mach-h720x: move from ARM_DMA_ZONE_SIZE to mdesc->dma_zone_size Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:29:59 -04:00
Nicolas Pitre	f68deabf3d	ARM: mach-davinci: move from ARM_DMA_ZONE_SIZE to mdesc->dma_zone_size Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:29:58 -04:00
Nicolas Pitre	4fddcaebb9	ARM: add dma_zone_size to the machine_desc structure Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:29:57 -04:00
Borislav Petkov	8c400f6ce0	x86, vdso: Drop now wrong comment Now that `1b3f2a72bb` is in, it is very important that the below lying comment be removed! :-) Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20110718191054.GA18359@liondog.tnic Acked-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-18 12:29:50 -07:00
Nicolas Pitre	650320181a	ARM: change ARM_DMA_ZONE_SIZE into a variable Having this value defined at compile time prevents multiple machines with conflicting definitions to coexist. Move it to a variable in preparation for having a per machine value selected at run time. This is relevant only when CONFIG_ZONE_DMA is selected. Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-18 15:29:41 -04:00
Linus Torvalds	247dc220f4	Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM / MIPS: Convert i8259.c to using syscore_ops	2011-07-17 12:47:27 -07:00
Linus Torvalds	94b0522ab7	Merge branch 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung * 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: ARM: SAMSUNG: DMA Cleanup as per sparse ARM: SAMSUNG: Check NULL return from irq_alloc_generic_chip	2011-07-17 12:47:11 -07:00
Linus Torvalds	d4bd4b40f8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc: sun4m SMP: fix wrong shift instruction in IPI handler sparc32,leon: Added __init declaration to leon_flush_needed() sparc/irqs: Do not trace arch_local_{,irq_} functions	2011-07-17 12:43:58 -07:00
Will Simoneau	1ef48593bd	sparc: sun4m SMP: fix wrong shift instruction in IPI handler This shift instruction appears to be shifting in the wrong direction. Without this change, my SparcStation-20MP hangs just after bringing up the second CPU: Entering SMP Mode... Starting CPU 2 at f02b4e90 Brought up 2 CPUs Total of 2 processors activated (99.52 BogoMIPS). * stuck * Signed-off-by: Will Simoneau <simoneau@ele.uri.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-16 10:45:12 -07:00
Sangwook Lee	d670ac019f	ARM: SAMSUNG: DMA Cleanup as per sparse Function declaration differs between file: dma.c and file:dma.h and SPARSE (Documentation/sparse.txt) gives error messages All dma channels are members of 'enum dma_ch' and not 'unsigned int' Please have a look at channel definitions in: arch/arm/mach-s3c64xx/include/mach/dma.h arch/arm/plat-samsung/include/plat/s3c-dma-pl330.h arch/arm/mach-s3c2410/include/mach/dma.h So all arguments should be of type 'enum dma_ch' Signed-off-by: Sangwook Lee <sangwook.lee@linaro.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>	2011-07-16 15:50:19 +09:00
Todd Poynor	691abd0abf	ARM: SAMSUNG: Check NULL return from irq_alloc_generic_chip Signed-off-by: Todd Poynor <toddpoynor@google.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>	2011-07-16 11:13:47 +09:00
Richard Cochran	900b170af4	ARM: fix regression in IXP4xx clocksource Commit `234b6ceddb` clocksource: convert ARM 32-bit up counting clocksources broke the build for ixp4xx and made big endian operation impossible. This commit restores the original behaviour. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl> [ Thomas says that we might want to have generic BE accessor functions to the MMIO clock source, but that hasn't happened yet, so in the meantime this seems to be the short-term fix for the particular problem - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-15 18:02:35 -07:00
Rafael J. Wysocki	84652e834b	PM / MIPS: Convert i8259.c to using syscore_ops The code in arch/mips/kernel/i8259.c still hasn't been converted to using struct syscore_ops instead of a sysdev for resume and shutdown. As a result, this code doesn't build any more after suspend, resume and shutdown callbacks have been removed from struct sysdev_class. Fix this problem by converting i8259.c to using syscore_ops. Reported-and-tested-by: Roland Vossen <rvossen@broadcom.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Ralf Baechle <ralf@linux-mips.org>	2011-07-16 00:59:54 +02:00
Len Brown	17edf2d79f	x86, intel, power: Correct the MSR_IA32_ENERGY_PERF_BIAS message Fix the printk_once() so that it actually prints (didn't print before due to a stray comma.) [ hpa: changed to an incremental patch and adjusted the description accordingly. ] Signed-off-by: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107151732480.18606@x980 Cc: <table@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-15 15:13:55 -07:00
Rafael J. Wysocki	7ae033cc0d	Merge branch 'pm-runtime' into for-linus * pm-runtime: OMAP: PM: disable idle on suspend for GPIO and UART OMAP: PM: omap_device: add API to disable idle on suspend OMAP: PM: omap_device: add system PM methods for PM domain handling OMAP: PM: omap_device: conditionally use PM domain runtime helpers PM / Runtime: Add new helper function: pm_runtime_status_suspended() PM / Runtime: Consistent utilization of deferred_resume PM / Runtime: Prevent runtime_resume from racing with probe PM / Runtime: Replace "run-time" with "runtime" in documentation PM / Runtime: Improve documentation of enable, disable and barrier PM: Limit race conditions between runtime PM and system sleep (v2) PCI / PM: Detect early wakeup in pci_pm_prepare() PM / Runtime: Return special error code if runtime PM is disabled PM / Runtime: Update documentation of interactions with system sleep	2011-07-15 23:59:25 +02:00
Rafael J. Wysocki	ba1389d74f	Merge branch 'pm-domains' into for-linus * pm-domains: (33 commits) ARM / shmobile: Return -EBUSY from A4LC power off if A3RV is active PM / Domains: Take .power_off() error code into account ARM / shmobile: Use genpd_queue_power_off_work() ARM / shmobile: Use pm_genpd_poweroff_unused() PM / Domains: Introduce function to power off all unused PM domains PM / Domains: Queue up power off work only if it is not pending PM / Domains: Improve handling of wakeup devices during system suspend PM / Domains: Do not restore all devices on power off error PM / Domains: Allow callbacks to execute all runtime PM helpers PM / Domains: Do not execute device callbacks under locks PM / Domains: Make failing pm_genpd_prepare() clean up properly PM / Domains: Set device state to "active" during system resume ARM: mach-shmobile: sh7372 A3RV requires A4LC PM / Domains: Export pm_genpd_poweron() in header ARM: mach-shmobile: sh7372 late pm domain off ARM: mach-shmobile: Runtime PM late init callback ARM: mach-shmobile: sh7372 D4 support ARM: mach-shmobile: sh7372 A4MP support ARM: mach-shmobile: sh7372: make sure that fsi is peripheral of spu2 ARM: mach-shmobile: sh7372 A3SG support ...	2011-07-15 23:59:09 +02:00
Russell King	4aa96ccf9e	Merge branch 'kprobes-thumb' of git://git.yxit.co.uk/linux into devel-stable	2011-07-15 10:06:42 +01:00
Oleg Nesterov	73d382decc	x86: Kill handle_signal()->set_fs() handle_signal()->set_fs() has a nice comment which explains what set_fs() is, but it doesn't explain why it is needed and why it depends on CONFIG_X86_64. Afaics, the history of this confusion is: 1. I guess today nobody can explain why it was needed in arch/i386/kernel/signal.c, perhaps it was always wrong. This predates 2.4.0 kernel. 2. then it was copy-and-past'ed to the new x86_64 arch. 3. then it was removed from i386 (but not from x86_64) by `b93b6ca3` "i386: remove unnecessary code". 4. then it was reintroduced under CONFIG_X86_64 when x86 unified i386 and x86_64, because the patch above didn't touch x86_64. Remove it. ->addr_limit should be correct. Even if it was possible that it is wrong, it is too late to fix it after setup_rt_frame(). Linus commented in: http://lkml.kernel.org/r/alpine.LFD.0.999.0707170902570.19166@woody.linux-foundation.org ... about the equivalent bit from i386: Heh. I think it's entirely historical. Please realize that the whole reason that function is called "set_fs()" is that it literally used to set the %fs segment register, not "->addr_limit". So I think the "set_fs(USER_DS)" is there _only_ to match the other regs->xds = __USER_DS; regs->xes = __USER_DS; regs->xss = __USER_DS; regs->xcs = __USER_CS; things, and never mattered. And now it matters even less, and has been copied to all other architectures where it is just totally insane. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20110710164424.GA20261@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 21:46:20 -07:00
Oleg Nesterov	9b42962074	x86, do_signal: Simplify the TS_RESTORE_SIGMASK logic 1. do_signal() looks at TS_RESTORE_SIGMASK and calculates the mask which should be stored in the signal frame, then it passes "oldset" to the callees, down to setup_rt_frame(). This is ugly, setup_rt_frame() can do this itself and nobody else needs this sigset_t. Move this code into setup_rt_frame. 2. do_signal() also clears TS_RESTORE_SIGMASK if handle_signal() succeeds. We can move this to setup_rt_frame() as well, this avoids the unnecessary checks and makes the logic more clear. 3. use set_current_blocked() instead of sigprocmask(SIG_SETMASK), sigprocmask() should be avoided. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20110710182203.GA27979@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 21:22:11 -07:00
Oleg Nesterov	3982294b03	x86, signals: Convert the X86_32 code to use set_current_blocked() sys_sigsuspend() and sys_sigreturn() change ->blocked directly. This is not correct, see the changelog in `e6fa16ab` "signal: sigprocmask() should do retarget_shared_pending()" Change them to use set_current_blocked(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20110710192727.GA31759@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 21:21:57 -07:00
Oleg Nesterov	905f29e2aa	x86, signals: Convert the IA32_EMULATION code to use set_current_blocked() sys32_sigsuspend() and sys32_*sigreturn() change ->blocked directly. This is not correct, see the changelog in `e6fa16ab` "signal: sigprocmask() should do retarget_shared_pending()" Change them to use set_current_blocked(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20110710192724.GA31755@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 21:21:31 -07:00
Andy Lutomirski	574c44fa8f	ia64: Replace clocksource.fsys_mmio with generic arch data Now that clocksource.archdata is available, use it for ia64-specific code. Cc: Clemens Ladisch <clemens@ladisch.de> Cc: linux-ia64@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/d31de0ee0842a0e322fb6441571c2b0adb323fa2.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 17:57:09 -07:00
Andy Lutomirski	98d0ac38ca	x86-64: Move vread_tsc and vread_hpet into the vDSO The vsyscall page now consists entirely of trap instructions. Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/637648f303f2ef93af93bae25186e9a1bea093f5.1310639973.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 17:57:05 -07:00
H. Peter Anvin	4bb82178f5	x86, msr: Fix typo in ENERGY_PERF_BIAS_POWERSAVE Fix a trivial typo in the name of the constant ENERGY_PERF_BIAS_POWERSAVE. This didn't cause trouble because this constant is not currently used for anything. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/tip-abe48b108247e9b90b4c6739662a2e5c765ed114@git.kernel.org	2011-07-14 14:58:44 -07:00
Michał Mirosław	512e4002ab	net: m68k/nfeth: Remove wrong usage of dev->flags Remove wrong setting of dev->flags. NETIF_F_NO_CSUM maps to IFF_DEBUG there, so looks like a mistake. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-07-14 14:38:17 -07:00
Cyrill Gorcunov	f912987097	perf, x86: P4 PMU - Introduce event alias feature Instead of hw_nmi_watchdog_set_attr() weak function and appropriate x86_pmu::hw_watchdog_set_attr() call we introduce even alias mechanism which allow us to drop this routines completely and isolate quirks of Netburst architecture inside P4 PMU code only. The main idea remains the same though -- to allow nmi-watchdog and perf top run simultaneously. Note the aliasing mechanism applies to generic PERF_COUNT_HW_CPU_CYCLES event only because arbitrary event (say passed as RAW initially) might have some additional bits set inside ESCR register changing the behaviour of event and we can't guarantee anymore that alias event will give the same result. P.S. Thanks a huge to Don and Steven for for testing and early review. Acked-by: Don Zickus <dzickus@redhat.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Ingo Molnar <mingo@elte.hu> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Stephane Eranian <eranian@google.com> CC: Lin Ming <ming.m.lin@intel.com> CC: Arnaldo Carvalho de Melo <acme@redhat.com> CC: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/20110708201712.GS23657@sun Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-07-14 17:25:04 -04:00
Shawn Guo	04ee585495	spi/imx: use soc name in spi device type naming scheme Software defined version number is not stable enough to be used in device type naming scheme. The patch changes it to use implicit soc name for spi device type definition. In this way, we can easily align the naming scheme with device tree binding, which comes later. It removes fifosize from spi_imx_data and adds devtype there, so that fifosize can be set in an inline function according to devtype. Also, cpu_is_mx can be replaced by inline functions checking devtype. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2011-07-14 13:47:02 -06:00
Len Brown	abe48b1082	x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS Since 2.6.36 (`23016bf0d2`), Linux prints the existence of "epb" in /proc/cpuinfo, Since 2.6.38 (`d5532ee7b4`), the x86_energy_perf_policy(8) utility has been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS. However, the typical BIOS fails to initialize the MSR, presumably because this is handled by high-volume shrink-wrap operating systems... Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8). As a result, WSM-EP, SNB, and later hardware from Intel will run in its default hardware power-on state (performance), which assumes that users care for performance at all costs and not for energy efficiency. While that is fine for performance benchmarks, the hardware's intended default operating point is "normal" mode... Initialize the MSR to the "normal" by default during kernel boot. x86_energy_perf_policy(8) is available to change the default after boot, should the user have a different preference. Signed-off-by: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107140051020.18606@x980 Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>	2011-07-14 12:13:42 -07:00
Rafael J. Wysocki	5ca80817e2	ARM / shmobile: Return -EBUSY from A4LC power off if A3RV is active Since the A4LC should only be powered off if the A3RV is off, make the A4LC's power down routine return -EBUSY if A3RV is not off to indicate to the core that it doesn't want to power off the domain in that case. This will cause the core to regard A4LC as active, so the pm_genpd_poweron() in pd_power_down_a3rv() is not necessary any more. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Magnus Damm <damm@opensource.se>	2011-07-14 20:59:35 +02:00
Rafael J. Wysocki	0bc5b2debb	ARM / shmobile: Use genpd_queue_power_off_work() Make pd_power_down_a3rv() use genpd_queue_power_off_work() to queue up the powering off of the A4LC domain to avoid queuing it up when it is pending. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Magnus Damm <damm@opensource.se>	2011-07-14 20:59:07 +02:00
David S. Miller	6a7ebdf2fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/bluetooth/l2cap_core.c	2011-07-14 07:56:40 -07:00
Glauber Costa	095c0aa83e	sched: adjust scheduler cpu power for stolen time This patch makes update_rq_clock() aware of steal time. The mechanism of operation is not different from irq_time, and follows the same principles. This lives in a CONFIG option itself, and can be compiled out independently of the rest of steal time reporting. The effect of disabling it is that the scheduler will still report steal time (that cannot be disabled), but won't use this information for cpu power adjustments. Everytime update_rq_clock_task() is invoked, we query information about how much time was stolen since last call, and feed it into sched_rt_avg_update(). Although steal time reporting in account_process_tick() keeps track of the last time we read the steal clock, in prev_steal_time, this patch do it independently using another field, prev_steal_time_rq. This is because otherwise, information about time accounted in update_process_tick() would never reach us in update_rq_clock(). Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-14 12:59:47 +03:00
Glauber Costa	747f292583	ia64: add jump labels for paravirt Since in a later patch I intend to call jump labels inside CONFIG_PARAVIRT, IA64 would fail to compile if they are not provided. This patch provides those jump labels for the IA64 architecture. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Isaku Yamahata <yamahata@valinux.co.jp> Acked-by: Rik van Riel <riel@redhat.com> CC: Tony Luck <tony.luck@intel.com> CC: Eddie Dong <eddie.dong@intel.com> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> CC: Eric B Munson <emunson@mgebm.net> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-14 12:59:45 +03:00
Glauber Costa	3c404b578f	KVM guest: Add a pv_ops stub for steal time This patch adds a function pointer in one of the many paravirt_ops structs, to allow guests to register a steal time function. Besides a steal time function, we also declare two jump_labels. They will be used to allow the steal time code to be easily bypassed when not in use. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-14 12:59:44 +03:00
Glauber Costa	c9aaa8957f	KVM: Steal time implementation To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM, while the vcpu had meaningful work to do - halt time does not count. This information is acquired through the run_delay field of delayacct/schedstats infrastructure, that counts time spent in a runqueue but not running. Steal time is a per-cpu information, so the traditional MSR-based infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the memory area address containing information about steal time This patch contains the hypervisor part of the steal time infrasructure, and can be backported independently of the guest portion. [avi, yongjie: export delayacct_on, to avoid build failures in some configs] Signed-off-by: Glauber Costa <glommer@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Rik van Riel <riel@redhat.com> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Yongjie Ren <yongjie.ren@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-14 12:59:14 +03:00
Kukjin Kim	15964d3885	[CPUFREQ] Move compile for S3C64XX cpufreq to /drivers/cpufreq Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Dave Jones <davej@redhat.com>	2011-07-13 18:29:53 -04:00
Kukjin Kim	f7d770790f	[CPUFREQ] Move ARM Samsung cpufreq drivers to drivers/cpufreq/ According to discussion of the ARM arch subsystem migration, ARM cpufreq drivers move to drivers/cpufreq. So this patch adds Kconfig.arm for ARM like x86 and adds Samsung S5PV210 and EXYNOS4210 cpufreq driver compile in there. As a note, otherw will be moved. Cc: Dave Jones <davej@redhat.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Dave Jones <davej@redhat.com>	2011-07-13 18:29:51 -04:00
Mark Brown	be2de99bea	[CPUFREQ/S3C64xx] Move S3C64xx CPUfreq driver into drivers/cpufreq This is a straight code motion patch, there are no changes to the driver itself. The Kconfig is left untouched as the ARM CPUfreq Kconfig is all in one big block in arm/Kconfig and should be moved en masse rather than being done piecemeal. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Dave Jones <davej@redhat.com>	2011-07-13 18:29:51 -04:00
Andy Lutomirski	433bd805e5	clocksource: Replace vread with generic arch data The vread field was bloating struct clocksource everywhere except x86_64, and I want to change the way this works on x86_64, so let's split it out into per-arch data. Cc: x86@kernel.org Cc: Clemens Ladisch <clemens@ladisch.de> Cc: linux-ia64@vger.kernel.org Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/3ae5ec76a168eaaae63f08a2a1060b91aa0b7759.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 11:23:12 -07:00
Andy Lutomirski	7f79ad15f3	x86-64: Add --no-undefined to vDSO build This gives much nicer diagnostics when something goes wrong. It's supported at least as far back as binutils 2.15. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/de0b50920469ff6359c529526e7639fdd36fa83c.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 11:23:09 -07:00
Andy Lutomirski	1b3f2a72bb	x86-64: Allow alternative patching in the vDSO This code is short enough and different enough from the module loader that it's not worth trying to share anything. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/e73112e4381fff29e31b882c2d0856822edaea53.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 11:23:07 -07:00
Andy Lutomirski	59e97e4d6f	x86: Make alternative instruction pointers relative This save a few bytes on x86-64 and means that future patches can apply alternatives to unrelocated code. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/ff64a6b9a1a3860ca4a7b8b6dc7b4754f9491cd7.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 11:22:56 -07:00
Andy Lutomirski	c9712944b2	x86-64: Improve vsyscall emulation CS and RIP handling Three fixes here: - Send SIGSEGV if called from compat code or with a funny CS. - Don't BUG on impossible addresses. - Add a missing local_irq_disable. This patch also removes an unused variable. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/6fb2b13ab39b743d1e4f466eef13425854912f7f.1310563276.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 11:22:55 -07:00
Jon Medhurst	8f2ffa00fb	ARM: kprobes: Remove now unused code Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:51 +00:00
Jon Medhurst	0239269db6	ARM: kprobes: Decode ARM preload (immediate) instructions These were missing from the previous implementation. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:51 +00:00
Jon Medhurst	465f1ea595	ARM: kprobes: Reject probing of unprivileged load and store instructions These occur extremely rarely in the kernel and writing test cases for them is difficult. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:51 +00:00
Jon Medhurst	711bf10633	ARM: kprobes: Use new versions of emulate_ldr() and emulate_str() Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:51 +00:00
Jon Medhurst	3c48fbb147	ARM: kprobes: Add new versions of emulate_ldr() and emulate_str() These use the register calling conventions required by the new decoding table framework for calling simulated instructions. We rename the old versions of these functions to *_old for now. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:50 +00:00
Jon Medhurst	12ce5d3388	ARM: kprobes: Add emulate_rdlo12rdhi16rn0rm8_rwflags_nopc() This is the emulation function for the instruction format used by the ARM multiply long instructions. It replaces use of prep_emulate_rdhi16rdlo12rs8rm0_wflags(). Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:50 +00:00
Jon Medhurst	c82584ebdf	ARM: kprobes: Add emulate_rd12rm0_noflags_nopc() This is the emulation function for the instruction format used by the ARM bit-field manipulation instructions. Various other instruction forms can also make use of this and it is used to replace use of prep_emulate_rd12{rm0}{_modify} Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:50 +00:00
Jon Medhurst	35fab77469	ARM: kprobes: Replace use of prep_emulate_rd12rn16rm0_wflags() These can now use emulate_rd12rn16rm0_rwflags_nopc(). Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:50 +00:00
Jon Medhurst	6091dfae4f	ARM: kprobes: Add emulate_rd16rn12rm0rs8_rwflags_nopc() This is the emulation function for the instruction format used by the ARM multiply-accumulate instructions. These don't allow use of PC so we don't have to add special cases for this. This function is used to replace use of prep_emulate_rd16rs8rm0_wflags and prep_emulate_rd16rn12rs8rm0_wflags. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:50 +00:00
Jon Medhurst	e9a92859e9	ARM: kprobes: Migrate remaining instruction decoding functions to tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:50 +00:00
Jon Medhurst	0d32e7d11b	ARM: kprobes: Migrate ARM space_cccc_100x to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:50 +00:00
Jon Medhurst	56d8fbddc2	ARM: kprobes: Migrate ARM space_cccc_01xx to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:49 +00:00
Jon Medhurst	ad2e81a78d	ARM: kprobes: Migrate ARM space_cccc_0111__1 to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:49 +00:00
Jon Medhurst	2ce5d03307	ARM: kprobes: Migrate ARM space_cccc_0110__1 to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:49 +00:00
Jon Medhurst	0e44e9a0fa	ARM: kprobes: Add emulate_rd12rn16rm0_rwflags_nopc() This is the emulation function for the instruction format used by the ARM media instructions. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:49 +00:00
Jon Medhurst	c038f3af50	ARM: kprobes: Migrate ARM space_cccc_001x to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:49 +00:00
Jon Medhurst	75f115c087	ARM: kprobes: Migrate ARM space_cccc_000x to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:49 +00:00
Jon Medhurst	6c8a192929	ARM: kprobes: Migrate ARM LDRD and STRD to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:49 +00:00
Jon Medhurst	8723942f7b	ARM: kprobes: Add emulate_ldrdstrd() This is an emulation function for the LDRD and STRD instructions. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:48 +00:00
Jon Medhurst	3535a89ab2	ARM: kprobes: Migrate ARM data-processing (register) instructions to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:48 +00:00
Jon Medhurst	9f596e5126	ARM: kprobes: Add emulate_rd12rn16rm0rs8_rwflags() This is the emulation function for the instruction format used by the ARM data-processing instructions. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:48 +00:00
Jon Medhurst	7be7ee2d29	ARM: kprobes: Add BLX macro This is for use by inline assembler which will be added to kprobes-arm.c It saves memory when used on newer ARM architectures and also provides correct interworking should ARM probes be required on Thumb kernels in the future. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:48 +00:00
Jon Medhurst	df4fa1f8dd	ARM: kprobes: Add alu_write_pc() This writes a new value to PC which was obtained as the result of an ARM ALU instruction. For ARMv7 and later this performs interworking. On ARM kernels we shouldn't encounter any ALU instructions trying to switch to Thumb mode so support for this isn't strictly necessary. However, the approach taken in all other instruction decoding is for us to avoid unpredictable modification of the PC for security reasons. This is usually achieved by rejecting insertion of probes on problematic instruction, but for ALU instructions we can't do this as it depends on the contents of the CPU registers at the time the probe is hit. So, as we require some form of run-time checking to trap undesirable PC modification, we may as well simulate the instructions correctly, i.e. in the way they would behave in the absence of a probe. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:48 +00:00
Jon Medhurst	9a5c1284a3	ARM: kprobes: Migrate ARM space_1111 to decoding tables Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:48 +00:00
Jon Medhurst	bb1085f827	ARM: kprobes: Decode 32-bit Thumb multiply and absolute difference instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:48 +00:00
Jon Medhurst	231fb150c6	ARM: kprobes: Decode 32-bit Thumb long multiply and divide instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:47 +00:00
Jon Medhurst	31656c1a9a	ARM: kprobes: Decode 32-bit Thumb data-processing (register) instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:47 +00:00
Jon Medhurst	d691023b62	ARM: kprobes: Decode 32-bit Thumb load/store single data item instructions We will reject probing of unprivileged load and store instructions. These rarely occur and writing test cases for them is difficult. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:47 +00:00
Jon Medhurst	46009cc5c5	ARM: kprobes: Decode 32-bit Thumb memory hint instructions We'll treat the preload instructions as nops as they are just performance hints. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:47 +00:00
Jon Medhurst	6a0d1a1c56	ARM: kprobes: Reject 32-bit Thumb coprocessor and SIMD instructions The kernel doesn't currently support VFP or Neon code, and probing of code with CP15 operations is fraught with bad consequences. So we will just reject probing these instructions. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:47 +00:00
Jon Medhurst	ce715c772f	ARM: kprobes: Decode 32-bit Thumb branch instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:46 +00:00
Jon Medhurst	b06f3ee34d	ARM: kprobes: Decode 32-bit miscellaneous control instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:46 +00:00
Jon Medhurst	7848786a7a	ARM: kprobes: Decode 32-bit Thumb data-processing (plain binary immediate) instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:46 +00:00
Jon Medhurst	2fcaf7e758	ARM: kprobes: Decode 32-bit Thumb data-processing (modified immediate) instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:46 +00:00
Jon Medhurst	080e001326	ARM: kprobes: Decode 32-bit Thumb data-processing (shifted register) instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:46 +00:00
Jon Medhurst	dd212bd3cb	ARM: kprobes: Decode 32-bit Thumb table branch instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:46 +00:00
Jon Medhurst	b48354d358	ARM: kprobes: Decode 32-bit Thumb load/store dual and load/store exclusive instructions We reject probing of load/store exclusive instructions because any emulation routine could never succeed in gaining exclusive access as the exception framework clears the exclusivity monitor when a probes breakpoint is hit. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:46 +00:00
Jon Medhurst	eaf1d06500	ARM: kprobes: Decode 32-bit Thumb load/store multiple instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:45 +00:00
Jon Medhurst	3d4a99785a	ARM: kprobes: Optimise emulation of LDM and STM This patch improves the performance of LDM and STM instruction emulation. This is desirable because. - jprobes and kretprobes probe the first instruction in a function and, when the frame pointer is omitted, this instruction is often a STM used to push registers onto the stack. - The STM and LDM instructions are common in the body and tail of functions. - At the same time as being a common instruction form, they also have one of the slowest and most complicated simulation routines. The approach taken to optimisation is to use emulation rather than simulation, that is, a modified form of the instruction is run with an appropriate register context. Benchmarking on an OMAP3530 shows the optimised emulation is between 2 and 3 times faster than the simulation routines. On a Kirkwood based device the relative performance was very significantly better than this. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:45 +00:00
Jon Medhurst	235a4ce79f	ARM: kprobes: Add common decoding function for LDM and STM The encoding of these instructions is substantially the same for both ARM and Thumb, so we can have common decoding and simulation functions. This patch moves the simulation functions from kprobes-arm.c to kprobes-common.c. It also adds a new simulation function (simulate_ldm1_pc) for the case where we load into PC because this may need to interwork. The instruction decoding is done by a custom function (kprobe_decode_ldmstm) rather than just relying on decoding table entries because we will later be adding optimisation code. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:45 +00:00
Jon Medhurst	263e368a2f	ARM: kprobes: Add load_write_pc() This writes a value to PC which was obtained as the result of a LDR or LDM instruction. For ARMv5T and later this must perform interworking. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:45 +00:00
Jon Medhurst	f39ca8b488	ARM: kprobes: Decode 32-bit Thumb hint instructions For hints which may have observable effects, like SEV (send event), we use kprobe_emulate_none which emulates the hint by executing the original instruction. For NOP we simulate the instruction using kprobe_simulate_nop, which does nothing. As probes execute with interrupts disabled this is also used for hints which may block for an indefinite time, like WFE (wait for event). Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:45 +00:00
Jon Medhurst	0a188ccb5e	ARM: kprobes: Reject 16-bit Thumb SETEND, CPS and BKPT instructions These are very rare and/or problematic to emulate so we will take the easy option and disallow probing them (as does the existing ARM implementation). Rejecting these instructions doesn't actually require any entries in the decoding table as it is the default case for instructions which aren't found. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:45 +00:00
Jon Medhurst	396b41f68d	ARM: kprobes: Decode 16-bit Thumb branch instructions We previously changed the behaviour of probes so that conditional instructions don't fire when the condition isn't met. For ARM branches, and Thumb branches in IT blocks, this means they don't fire if the branch isn't taken. For consistency, we implement the same for Thumb conditional branch instructions. This involves setting up insn_check_cc to point to the relevant condition checking function. As the emulation routine is only called when this condition passes, it doesn't need to check again and can unconditionally update PC. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:45 +00:00
Jon Medhurst	444956677e	ARM: kprobes: Reject 16-bit Thumb SVC and UNDEFINED instructions SVC (SWI) instructions shouldn't occur in kernel code so we don't need to be able to probe them. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:44 +00:00
Jon Medhurst	5b94faf8d7	ARM: kprobes: Decode 16-bit Thumb IT instruction The normal Thumb singlestepping routine updates the IT state after calling the instruction handler. We don't what this to happen after the IT instruction simulation sets the IT state, therefore we need to provide a custom singlestep routine. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:44 +00:00
Jon Medhurst	fd0c8d8a48	ARM: kprobes: Decode 16-bit Thumb PUSH and POP instructions These instructions are equivalent to stmdb sp!,{r0-r7,lr} ldmdb sp!,{r0-r7,pc} and we emulate them by transforming them into the 32-bit Thumb instructions stmdb r9!,{r0-r7,r8} ldmdb r9!,{r0-r7,r8} This is simpler, and almost certainly executes faster, than writing simulation functions. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:44 +00:00
Jon Medhurst	32818f31f8	ARM: kprobes: Decode 16-bit Thumb CBZ and bit manipulation instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:44 +00:00
Jon Medhurst	2f33582904	ARM: kprobes: Decode 16-bit Thumb PC- and SP-relative address instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:44 +00:00
Jon Medhurst	f869514282	ARM: kprobes: Decode 16-bit Thumb load and store instructions Most of these instructions only operate on the low registers R0-R7 so they can make use of t16_emulate_loregs_rwflags. The instructions which use SP or PC for addressing have their own simulation functions. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:44 +00:00
Jon Medhurst	3b5940e811	ARM: kprobes: Decode 16-bit Thumb special data instructions These data-processing instructions operate on the full range of CPU registers, so to simulate them we have to modify the registers used by the instruction. We can't make use of the decoding table framework to do this because the registers aren't encoded cleanly in separate nibbles, therefore we need a custom decode function. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:43 +00:00
Jon Medhurst	a9c3c29e72	ARM: kprobes: Decode 16-bit Thumb BX and BLX instructions Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:43 +00:00
Jon Medhurst	059987ffa7	ARM: kprobes: Add bx_write_pc() This writes a value to PC, with interworking. I.e. switches to Thumb or ARM mode depending on the state of the least significant bit. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:43 +00:00
Jon Medhurst	02d194f647	ARM: kprobes: Decode 16-bit Thumb data-processing instructions These instructions only operate on the low registers R0-R7, therefore it is possible to emulate them by executing the original instruction unaltered if we restore and save these registers. This is what t16_emulate_loregs does. Some of these instructions don't update the PSR when they execute in an IT block, so there are two flavours of emulation functions: t16_emulate_loregs_{noit}rwflags Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:43 +00:00
Jon Medhurst	7460bce423	ARM: ptrace: Add APSR_MASK definition to ptrace.h APSR_MASK can be used to extract the APSR bits from the CPSR. The comment for these definitions is also changed because it was inaccurate as the existing defines didn't refer to any part of the APSR. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:43 +00:00
Jon Medhurst	3f92dfed6a	ARM: kprobes: Decode 16-bit Thumb hint instructions For hints which may have observable effects, like SEV (send event), we use kprobe_emulate_none which emulates the hint by executing the original instruction. For NOP we simulate the instruction using kprobe_simulate_nop, which does nothing. As probes execute with interrupts disabled this is also used for hints which may block for an indefinite time, like WFE (wait for event). Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:43 +00:00
Jon Medhurst	0d1a095aa1	ARM: kprobes: Infrastructure for table driven decoding of CPU instructions The existing ARM instruction decoding functions are a mass of if/else code. Rather than follow this pattern for Thumb instruction decoding this patch implements an infrastructure for a new table driven scheme. This has several advantages: - Reduces the kernel size by approx 2kB. (The ARM instruction decoding will eventually have -3.1kB code, +1.3kB data; with similar or better estimated savings for Thumb decoding.) - Allows programmatic checking of decoding consistency and test case coverage. - Provides more uniform source code and is therefore, arguably, clearer. For a detailed explanation of how decoding tables work see the in-source documentation in kprobes.h, and also for kprobe_decode_insn(). Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:43 +00:00
Jon Medhurst	e2960317d4	ARM: kprobes: Extend arch_specific_insn to add pointer to emulated instruction When we come to emulating Thumb instructions then, to interwork correctly, the code on in the instruction slot must be invoked with a function pointer which has the least significant bit set. Rather that set this by hand in every Thumb emulation function we will add a new field for this purpose to arch_specific_insn, called insn_fn. This also enables us to seamlessly share emulation functions between ARM and Thumb code. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:42 +00:00
Jon Medhurst	c6a7d97d57	ARM: kprobes: Add hooks to override singlestep() When a probe fires we must single-step the instruction which was replaced by a breakpoint. As the steps to do this vary between ARM and Thumb instructions we need a way to customise single-stepping. This is done by adding a new hook called insn_singlestep to arch_specific_insn which is initialised by the instruction decoding functions. These single-step hooks must update PC and call the instruction handler. For Thumb instructions an additional step of updating ITSTATE is needed. We do this after calling the handler because some handlers will need to test if they are running in an IT block. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:42 +00:00
Jon Medhurst	3b26945597	ARM: kprobes: Use conditional breakpoints for ARM probes Now we no longer trigger probes on conditional instructions when the condition is false, we can make use of conditional instructions as breakpoints in ARM code to avoid taking unnecessary exceptions. Note, we can't rely on not getting an exception when the condition check fails, as that is Implementation Defined on newer ARM architectures. We therefore still need to perform manual condition checks as well. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:42 +00:00
Jon Medhurst	3cca6c2435	ARM: kprobes: Don't trigger probes on conditional instructions when condition is false This patch changes the behavior of kprobes on ARM so that: Kprobes on conditional instructions don't trigger when the condition is false. For conditional branches, this means that they don't trigger in the branch not taken case. Rationale: When probes are placed onto conditionally executed instructions in a Thumb IT block, they may not fire if the condition is not met. This is because we use invalid instructions for breakpoints and "it is IMPLEMENTATION DEFINED whether the instruction executes as a NOP or causes an Undefined Instruction exception". Therefore, for consistency, we will ignore all probes on any conditional instructions when the condition is false. Alternative solutions seem to be too complex to implement or inconsistent. This issue was discussed on linux.arm.kernel in the thread titled "[RFC] kprobes with thumb2 conditional code" See http://comments.gmane.org/gmane.linux.linaro.devel/2985 Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:42 +00:00
Jon Medhurst	6aaa8b5570	ARM: kprobes: Add it_advance() This advances the ITSTATE bits in CPSR to their values for the next instruction. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:42 +00:00
Jon Medhurst	eaf4f33fec	ARM: kprobes: Add condition code checking to Thumb emulation Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:42 +00:00
Jon Medhurst	aceb487ab2	ARM: kprobes: Add Thumb breakpoint support Extend the breakpoint insertion and catching functions to support Thumb code. As breakpoints are no longer of a fixed size, the flush_insns macro is modified to take a size argument instead of an instruction count. Note, we need both 16- and 32-bit Thumb breakpoints, because if we were to use a 16-bit breakpoint to replace a 32-bit instruction which was in an IT block, and the condition check failed, then the breakpoint may not fire (it's unpredictable behaviour) and the CPU could then try and execute the second half of the 32-bit Thumb instruction. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:42 +00:00
Jon Medhurst	856bc35639	ARM: Kconfig: Allow kprobes on Thumb-2 kernels Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:41 +00:00
Jon Medhurst	2437170710	ARM: kprobes: Add Thumb instruction decoding stubs Extend arch_prepare_kprobe to support probing of Thumb code. For the actual decoding of Thumb instructions, stub functions are added which currently just reject the probe. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:41 +00:00
Jon Medhurst	de41984003	ARM: kprobes: Make kprobes framework work on Thumb-2 kernels Fix up kprobes framework so that it builds and correctly interworks on Thumb-2 kernels. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:41 +00:00
Jon Medhurst	aea490299f	ARM: kprobes: Make str_pc_offset a constant on ARMv7 The str_pc_offset value is architecturally defined on ARMv7 onwards so we can make it a compile time constant. This means on Thumb kernels the runtime checking code isn't needed, which saves us from having to fix it to work for Thumb. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:41 +00:00
Jon Medhurst	6c8df3300f	ARM: kprobes: Move find_str_pc_offset into kprobes-common.c Move str_pc_offset into kprobes-common.c as it will be needed by common code later. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:41 +00:00
Jon Medhurst	1b59d87466	ARM: kprobes: Move is_writeback define to header file. This will be used later in other files. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:41 +00:00
Jon Medhurst	0ab4c02dda	ARM: kprobes: Add kprobes-common.c This file will contain the instruction decoding and emulation code which is common to both ARM and Thumb instruction sets. For now, we will just move over condition_checks from kprobes-arm.c This table is also renamed to kprobe_condition_checks to avoid polluting the public namespace with a too generic name. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:41 +00:00
Jon Medhurst	221bf15ffd	ARM: kprobes: Split out internal parts of kprobes.h Later, we will be adding a considerable amount of internal implementation definitions to kprobe header files and it would be good to have these in local header file along side the source code, rather than pollute the existing header which is include by all users of kprobes. To this end, we add arch/arm/kernel/kprobes.h and move into this the existing internal defintions from arch/arm/include/asm/kprobes.h Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:40 +00:00
Jon Medhurst	691b2ff294	ARM: kprobes: Rename kprobes-decode.c to kprobes-arm.c This file contains decoding and emulation functions for the ARM instruction set. As we will later be adding a file for Thumb and a file with common decoding functions, this renaming makes things clearer. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:40 +00:00
Jon Medhurst	592201a9f1	ARM: Thumb-2: Support Thumb-2 in undefined instruction handler This patch allows undef_hook's to be specified for 32-bit Thumb instructions and also to be used for thumb kernel-side code. 32-bit Thumb instructions are specified in the form: ((first_half << 16 ) \| second_half) which matches the layout used by the ARM ARM. ptrace was handling 32-bit Thumb instructions by hooking the first halfword and manually checking the second half. This method would be broken by this patch so it is migrated to make use of the new Thumb-2 support. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:40 +00:00
Jon Medhurst	594810621d	ARM: Thumb-2: Fix exception return sequence to restore stack correctly The implementation of svc_exit didn't take into account any stack hole created by svc_entry; as happens with the undef handler when kprobes are configured. The fix is to read the saved value of SP rather than trying to calculate it. Signed-off-by: Jon Medhurst <tixy@yxit.co.uk> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>	2011-07-13 17:32:40 +00:00
Rafael J. Wysocki	796204142a	ARM / shmobile: Use pm_genpd_poweroff_unused() Make shmobile use pm_genpd_poweroff_unused() instead of the open-coded powering off PM domains without devices in use. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Magnus Damm <damm@opensource.se>	2011-07-13 12:32:07 +02:00
Tejun Heo	1e01979c8f	x86, numa: Implement pfn -> nid mapping granularity check SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use sections array to map pfn to nid which is limited in granularity. If NUMA nodes are laid out such that the mapping cannot be accurate, boot will fail triggering BUG_ON() in mminit_verify_page_links(). On 32bit, it's 512MiB w/ PAE and SPARSEMEM. This seems to have been granular enough until commit `2706a0bf7b` (x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too). Apparently, there is a machine which aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT. This led to the following BUG_ON(). On node 0 totalpages: 2096615 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 3927 pages, LIFO batch:0 Normal zone: 1740 pages used for memmap Normal zone: 220978 pages, LIFO batch:31 HighMem zone: 16405 pages used for memmap HighMem zone: 1853533 pages, LIFO batch:31 BUG: Int 6: CR2 (null) EDI (null) ESI 00000002 EBP 00000002 ESP c1543ecc EBX f2400000 EDX 00000006 ECX (null) EAX 00000001 err (null) EIP c16209aa CS 00000060 flg 00010002 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000 (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe (null) f7200b80 c16395f0 00200a02 f7200a80 (null) 000375fe 00000002 (null) Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17 Call Trace: [<c136b1e5>] ? early_fault+0x2e/0x2e [<c16209aa>] ? mminit_verify_page_links+0x12/0x42 [<c1620613>] ? memmap_init_zone+0xaf/0x10c [<c1620929>] ? free_area_init_node+0x2b9/0x2e3 [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451 [<c1601d80>] ? paging_init+0x112/0x118 [<c15f578d>] ? setup_arch+0x791/0x82f [<c15f43d9>] ? start_kernel+0x6a/0x257 This patch implements node_map_pfn_alignment() which determines maximum internode alignment and update numa_register_memblks() to reject NUMA configuration if alignment exceeds the pfn -> nid mapping granularity of the memory model as determined by PAGES_PER_SECTION. This makes the problematic machine boot w/ flatmem by rejecting the NUMA config and provides protection against crazy NUMA configurations. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com> Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Conny Seidel <conny.seidel@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 21:58:29 -07:00
Tejun Heo	d0ead15738	x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to SPARSEMEM; however, it calls each mapping unit ELEMENT instead of SECTION. This patch renames it to SECTION so that PAGES_PER_SECTION is valid for both DISCONTIGMEM and SPARSEMEM. This will be used by the next patch to implement mapping granularity check. This patch is trivial constant rename. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110712074422.GA2872@htj.dyndns.org Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 21:58:11 -07:00
Maxime Ripard	3628c3f5c8	x86. reboot: Make Dell Latitude E6320 use reboot=pci The Dell Latitude E6320 doesn't reboot unless reboot=pci is set. Force it thanks to DMI. Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Link: http://lkml.kernel.org/r/1309269451-4966-1-git-send-email-maxime.ripard@free-electrons.com Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-07-12 21:42:48 -07:00
Naga Chumbalkar	42f0efc5aa	x86, ioapic: Print IR_IO_APIC_route_entry when IR is enabled When IR (interrupt remapping) is enabled print_IO_APIC() displays output according to legacy RTE (redirection table entry) definitons: NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect: 00 00 1 0 0 0 0 0 0 00 01 00 0 0 0 0 0 0 0 01 02 00 0 0 0 0 0 0 0 02 03 00 1 0 0 0 0 0 0 03 04 00 1 0 0 0 0 0 0 04 05 00 1 0 0 0 0 0 0 05 06 00 1 0 0 0 0 0 0 06 ... The above output is as per Sec 3.2.4 of the IOAPIC datasheet: 82093AA I/O Advanced Programmable Interrupt Controller (IOAPIC): http://download.intel.com/design/chipsets/datashts/29056601.pdf Instead the output should display the fields as discussed in Sec 5.5.1 of the VT-d specification: (Intel Virtualization Technology for Directed I/O: http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf) After the fix: NR Indx Fmt Mask Trig IRR Pol Stat Indx2 Zero Vect: 00 0000 0 1 0 0 0 0 0 0 00 01 000F 1 0 0 0 0 0 0 0 01 02 0001 1 0 0 0 0 0 0 0 02 03 0002 1 1 0 0 0 0 0 0 03 04 0011 1 1 0 0 0 0 0 0 04 05 0004 1 1 0 0 0 0 0 0 05 06 0005 1 1 0 0 0 0 0 0 06 ... Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Link: http://lkml.kernel.org/r/20110712211658.2939.93123.sendpatchset@nchumbalkar.americas.cpqcorp.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 20:17:58 -07:00
Naga Chumbalkar	3040db92ee	x86, ioapic: Print IRTE when IR is enabled When "apic=debug" is used as a boot parameter, Linux prints the IOAPIC routing entries in "dmesg". Below is output from IOAPIC whose apic_id is 8: # dmesg \| grep "routing entry" IOAPIC[8]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0) IOAPIC[8]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0) IOAPIC[8]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0) ... Similarly, when IR (interrupt remapping) is enabled, and the IRTE (interrupt remapping table entry) is set up we should display it. After the fix: # dmesg \| grep IRTE IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:31 Dest:00000000 SID:00F1 SQ:0 SVT:1) IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:30 Dest:00000000 SID:00F1 SQ:0 SVT:1) IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:33 Dest:00000000 SID:00F1 SQ:0 SVT:1) ... The IRTE is defined in Sec 9.5 of the Intel VT-d Specification. Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Link: http://lkml.kernel.org/r/20110712211704.2939.71291.sendpatchset@nchumbalkar.americas.cpqcorp.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 14:34:00 -07:00
Naga Chumbalkar	2597085228	x86, x2apic: Preserve high 32-bits of IA32_APIC_BASE MSR If there's no special reason to zero-out the "high" 32-bits of the IA32_APIC_BASE MSR, let's preserve it. The x2APIC Specification doesn't explicitly state any such requirement. (Sec 2.2 in: http://www.intel.com/Assets/PDF/manual/318148.pdf). Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Link: http://lkml.kernel.org/r/20110712055831.2498.78521.sendpatchset@nchumbalkar.americas.cpqcorp.net Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 14:33:49 -07:00
Linus Torvalds	8d86e5f914	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/mm: Fix memory_block_size_bytes() for non-pseries mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header	2011-07-12 14:21:19 -07:00
Linus Torvalds	d93a881dd7	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: pcmcia: pxa2xx/vpac270: free gpios on exist rather than requesting ARM: pxa/raumfeld: fix device name for codec ak4104 ARM: pxa/raumfeld: display initialisation fixes ARM: pxa/raumfeld: adapt to upcoming hardware change ARM: pxa: fix gpio_to_chip() clash with gpiolib namespace genirq: replace irq_gc_ack() with {set,clr}_bit variants (fwd) arm: mach-vt8500: add forgotten irq_data conversion ARM: pxa168: correct nand pmu setting ARM: pxa910: correct nand pmu setting ARM: pxa: fix PGSR register address calculation	2011-07-12 14:19:51 -07:00
Kevin Hilman	c8c9fda506	OMAP: PM: disable idle on suspend for GPIO and UART Until these drivers are runtime PM converted, their device power states are managed by calling custom driver hooks late in the idle/suspend path. Therefore, do not let the suspend/resume core code automatically idle these devices since they will be managed manually by the OMAP PM core very late in the idle/suspend path. Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-07-12 22:48:42 +02:00
Kevin Hilman	80c6d1e65c	OMAP: PM: omap_device: add API to disable idle on suspend By default, omap_devices will be automatically idled on suspend (and re-enabled on resume.) Using this new API, device init code can disable this feature if desired. NOTE: any driver/device that has been runtime PM converted should not be using this API. Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-07-12 22:48:29 +02:00
Kevin Hilman	c03f007a8b	OMAP: PM: omap_device: add system PM methods for PM domain handling In the omap_device PM domain callbacks, use omap_device idle/enable to automatically manage device idle states during system suspend/resume. If an omap_device has not already been runtime suspended, the ->suspend_noirq() method of the PM domain will use omap_device_idle() to idle the HW after calling the driver's ->runtime_suspend() callback. Similarily, upon resume, if the device was suspended during ->suspend_noirq(), the ->resume_noirq() method of the PM domain will use omap_device_enable() to enable the HW and then call the driver's ->runtime_resume() callback. If a device has already been runtime suspended, the noirq methods of the PM domain leave the device runtime suspended by default. However, if a driver needs to runtime resume a device during suspend (for example, to change its wakeup settings), it may do so using pm_runtime_get* in it's ->suspend() callback. Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-07-12 22:48:19 +02:00
Kevin Hilman	256a543597	OMAP: PM: omap_device: conditionally use PM domain runtime helpers Only build and use the runtime PM helper functions only when runtime PM is actually enabled. Signed-off-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2011-07-12 22:48:03 +02:00
Christoph Lameter	688d3be815	percpu: Fixup __this_cpu_xchg* operations Somehow we got into a situation where the __this_cpu_xchg() operations were not defined in the same way as this_cpu_xchg() and friends. I had some build failures under 32 bit that were addressed by these fixes. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-07-12 13:47:16 +02:00
Glauber Costa	9ddabbe72e	KVM: KVM Steal time guest/host interface To implement steal time, we need the hypervisor to pass the guest information about how much time was spent running other processes outside the VM. This is per-vcpu, and using the kvmclock structure for that is an abuse we decided not to make. In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that holds the memory area address containing information about steal time This patch contains the headers for it. I am keeping it separate to facilitate backports to people who wants to backport the kernel part but not the hypervisor, or the other way around. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:17:03 +03:00
Glauber Costa	4b6b35f55c	KVM: Add constant to represent KVM MSRs enabled bit in guest/host interface This patch is simple, put in a different commit so it can be more easily shared between guest and hypervisor. It just defines a named constant to indicate the enable bit for KVM-specific MSRs. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:17:02 +03:00
Alexander Graf	29d03158f9	KVM: PPC: Remove prog_flags Commit c8f729d408 (KVM: PPC: Deliver program interrupts right away instead of queueing them) made away with all users of prog_flags, so we can just remove it from the headers. Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:17:00 +03:00
Paul Mackerras	9e368f2915	KVM: PPC: book3s_hv: Add support for PPC970-family processors This adds support for running KVM guests in supervisor mode on those PPC970 processors that have a usable hypervisor mode. Unfortunately, Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to 1), but the YDL PowerStation does have a usable hypervisor mode. There are several differences between the PPC970 and POWER7 in how guests are managed. These differences are accommodated using the CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature bits. Notably, on PPC970: * The LPCR, LPID or RMOR registers don't exist, and the functions of those registers are provided by bits in HID4 and one bit in HID0. * External interrupts can be directed to the hypervisor, but unlike POWER7 they are masked by MSR[EE] in non-hypervisor modes and use SRR0/1 not HSRR0/1. * There is no virtual RMA (VRMA) mode; the guest must use an RMO (real mode offset) area. * The TLB entries are not tagged with the LPID, so it is necessary to flush the whole TLB on partition switch. Furthermore, when switching partitions we have to ensure that no other CPU is executing the tlbie or tlbsync instructions in either the old or the new partition, otherwise undefined behaviour can occur. * The PMU has 8 counters (PMC registers) rather than 6. * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist. * The SLB has 64 entries rather than 32. * There is no mediated external interrupt facility, so if we switch to a guest that has a virtual external interrupt pending but the guest has MSR[EE] = 0, we have to arrange to have an interrupt pending for it so that we can get control back once it re-enables interrupts. We do that by sending ourselves an IPI with smp_send_reschedule after hard-disabling interrupts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:59 +03:00
Paul Mackerras	969391c58a	powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to indicate that we have a usable hypervisor mode, and another to indicate that the processor conforms to PowerISA version 2.06. We also add another bit to indicate that the processor conforms to ISA version 2.01 and set that for PPC970 and derivatives. Some PPC970 chips (specifically those in Apple machines) have a hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode is not useful in the sense that there is no way to run any code in supervisor mode (HV=0 PR=0). On these processors, the LPES0 and LPES1 bits in HID4 are always 0, and we use that as a way of detecting that hypervisor mode is not useful. Where we have a feature section in assembly code around code that only applies on POWER7 in hypervisor mode, we use a construct like END_FTR_SECTION_IFSET(CPU_FTR_HVMODE \| CPU_FTR_ARCH_206) The definition of END_FTR_SECTION_IFSET is such that the code will be enabled (not overwritten with nops) only if all bits in the provided mask are set. Note that the CPU feature check in __tlbie() only needs to check the ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called if we are running bare-metal, i.e. in hypervisor mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:58 +03:00
Paul Mackerras	aa04b4cc5b	KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests This adds infrastructure which will be needed to allow book3s_hv KVM to run on older POWER processors, including PPC970, which don't support the Virtual Real Mode Area (VRMA) facility, but only the Real Mode Offset (RMO) facility. These processors require a physically contiguous, aligned area of memory for each guest. When the guest does an access in real mode (MMU off), the address is compared against a limit value, and if it is lower, the address is ORed with an offset value (from the Real Mode Offset Register (RMOR)) and the result becomes the real address for the access. The size of the RMA has to be one of a set of supported values, which usually includes 64MB, 128MB, 256MB and some larger powers of 2. Since we are unlikely to be able to allocate 64MB or more of physically contiguous memory after the kernel has been running for a while, we allocate a pool of RMAs at boot time using the bootmem allocator. The size and number of the RMAs can be set using the kvm_rma_size=xx and kvm_rma_count=xx kernel command line options. KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability of the pool of preallocated RMAs. The capability value is 1 if the processor can use an RMA but doesn't require one (because it supports the VRMA facility), or 2 if the processor requires an RMA for each guest. This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the pool and returns a file descriptor which can be used to map the RMA. It also returns the size of the RMA in the argument structure. Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION ioctl calls from userspace. To cope with this, we now preallocate the kvm->arch.ram_pginfo array when the VM is created with a size sufficient for up to 64GB of guest memory. Subsequently we will get rid of this array and use memory associated with each memslot instead. This moves most of the code that translates the user addresses into host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level to kvmppc_core_prepare_memory_region. Also, instead of having to look up the VMA for each page in order to check the page size, we now check that the pages we get are compound pages of 16MB. However, if we are adding memory that is mapped to an RMA, we don't bother with calling get_user_pages_fast and instead just offset from the base pfn for the RMA. Typically the RMA gets added after vcpus are created, which makes it inconvenient to have the LPCR (logical partition control register) value in the vcpu->arch struct, since the LPCR controls whether the processor uses RMA or VRMA for the guest. This moves the LPCR value into the kvm->arch struct and arranges for the MER (mediated external request) bit, which is the only bit that varies between vcpus, to be set in assembly code when going into the guest if there is a pending external interrupt request. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
Paul Mackerras	371fefd6f2	KVM: PPC: Allow book3s_hv guests to use SMT processor modes This lifts the restriction that book3s_hv guests can only run one hardware thread per core, and allows them to use up to 4 threads per core on POWER7. The host still has to run single-threaded. This capability is advertised to qemu through a new KVM_CAP_PPC_SMT capability. The return value of the ioctl querying this capability is the number of vcpus per virtual CPU core (vcore), currently 4. To use this, the host kernel should be booted with all threads active, and then all the secondary threads should be offlined. This will put the secondary threads into nap mode. KVM will then wake them from nap mode and use them for running guest code (while they are still offline). To wake the secondary threads, we send them an IPI using a new xics_wake_cpu() function, implemented in arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage we assume that the platform has a XICS interrupt controller and we are using icp-native.c to drive it. Since the woken thread will need to acknowledge and clear the IPI, we also export the base physical address of the XICS registers using kvmppc_set_xics_phys() for use in the low-level KVM book3s code. When a vcpu is created, it is assigned to a virtual CPU core. The vcore number is obtained by dividing the vcpu number by the number of threads per core in the host. This number is exported to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes to run the guest in single-threaded mode, it should make all vcpu numbers be multiples of the number of threads per core. We distinguish three states of a vcpu: runnable (i.e., ready to execute the guest), blocked (that is, idle), and busy in host. We currently implement a policy that the vcore can run only when all its threads are runnable or blocked. This way, if a vcpu needs to execute elsewhere in the kernel or in qemu, it can do so without being starved of CPU by the other vcpus. When a vcore starts to run, it executes in the context of one of the vcpu threads. The other vcpu threads all go to sleep and stay asleep until something happens requiring the vcpu thread to return to qemu, or to wake up to run the vcore (this can happen when another vcpu thread goes from busy in host state to blocked). It can happen that a vcpu goes from blocked to runnable state (e.g. because of an interrupt), and the vcore it belongs to is already running. In that case it can start to run immediately as long as the none of the vcpus in the vcore have started to exit the guest. We send the next free thread in the vcore an IPI to get it to start to execute the guest. It synchronizes with the other threads via the vcore->entry_exit_count field to make sure that it doesn't go into the guest if the other vcpus are exiting by the time that it is ready to actually enter the guest. Note that there is no fixed relationship between the hardware thread number and the vcpu number. Hardware threads are assigned to vcpus as they become runnable, so we will always use the lower-numbered hardware threads in preference to higher-numbered threads if not all the vcpus in the vcore are runnable, regardless of which vcpus are runnable. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:57 +03:00
David Gibson	54738c0971	KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode This improves I/O performance for guests using the PAPR paravirtualization interface by making the H_PUT_TCE hcall faster, by implementing it in real mode. H_PUT_TCE is used for updating virtual IOMMU tables, and is used both for virtual I/O and for real I/O in the PAPR interface. Since this moves the IOMMU tables into the kernel, we define a new KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables. The ioctl returns a file descriptor which can be used to mmap the newly created table. The qemu driver models use them in the same way as userspace managed tables, but they can be updated directly by the guest with a real-mode H_PUT_TCE implementation, reducing the number of host/guest context switches during guest IO. There are certain circumstances where it is useful for userland qemu to write to the TCE table even if the kernel H_PUT_TCE path is used most of the time. Specifically, allowing this will avoid awkwardness when we need to reset the table. More importantly, we will in the future need to write the table in order to restore its state after a checkpoint resume or migration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:56 +03:00
Paul Mackerras	a8606e20e4	KVM: PPC: Handle some PAPR hcalls in the kernel This adds the infrastructure for handling PAPR hcalls in the kernel, either early in the guest exit path while we are still in real mode, or later once the MMU has been turned back on and we are in the full kernel context. The advantage of handling hcalls in real mode if possible is that we avoid two partition switches -- and this will become more important when we support SMT4 guests, since a partition switch means we have to pull all of the threads in the core out of the guest. The disadvantage is that we can only access the kernel linear mapping, not anything vmalloced or ioremapped, since the MMU is off. This also adds code to handle the following hcalls in real mode: H_ENTER Add an HPTE to the hashed page table H_REMOVE Remove an HPTE from the hashed page table H_READ Read HPTEs from the hashed page table H_PROTECT Change the protection bits in an HPTE H_BULK_REMOVE Remove up to 4 HPTEs from the hashed page table H_SET_DABR Set the data address breakpoint register Plus code to handle the following hcalls in the kernel: H_CEDE Idle the vcpu until an interrupt or H_PROD hcall arrives H_PROD Wake up a ceded vcpu H_REGISTER_VPA Register a virtual processor area (VPA) The code that runs in real mode has to be in the base kernel, not in the module, if KVM is compiled as a module. The real-mode code can only access the kernel linear mapping, not vmalloc or ioremap space. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:55 +03:00
Paul Mackerras	de56a948b9	KVM: PPC: Add support for Book3S processors in hypervisor mode This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:54 +03:00
Paul Mackerras	3c42bf8a71	KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu There are several fields in struct kvmppc_book3s_shadow_vcpu that temporarily store bits of host state while a guest is running, rather than anything relating to the particular guest or vcpu. This splits them out into a new kvmppc_host_state structure and modifies the definitions in asm-offsets.c to suit. On 32-bit, we have a kvmppc_host_state structure inside the kvmppc_book3s_shadow_vcpu since the assembly code needs to be able to get to them both with one pointer. On 64-bit they are separate fields in the PACA. This means that on 64-bit we don't need to copy the kvmppc_host_state in and out on vcpu load/unload, and in future will mean that the book3s_hv code doesn't need a shadow_vcpu struct in the PACA at all. That does mean that we have to be careful not to rely on any values persisting in the hstate field of the paca across any point where we could block or get preempted. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:53 +03:00
Paul Mackerras	923c53caea	powerpc: Set up LPCR for running guest partitions In hypervisor mode, the LPCR controls several aspects of guest partitions, including virtual partition memory mode, and also controls whether the hypervisor decrementer interrupts are enabled. This sets up LPCR at boot time so that guest partitions will use a virtual real memory area (VRMA) composed of 16MB large pages, and hypervisor decrementer interrupts are disabled. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:52 +03:00
Paul Mackerras	df6909e5d5	KVM: PPC: Move guest enter/exit down into subarch-specific code Instead of doing the kvm_guest_enter/exit() and local_irq_dis/enable() calls in powerpc.c, this moves them down into the subarch-specific book3s_pr.c and booke.c. This eliminates an extra local_irq_enable() call in book3s_pr.c, and will be needed for when we do SMT4 guest support in the book3s hypervisor mode code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:51 +03:00
Paul Mackerras	f9e0554dec	KVM: PPC: Pass init/destroy vm and prepare/commit memory region ops down This arranges for the top-level arch/powerpc/kvm/powerpc.c file to pass down some of the calls it gets to the lower-level subarchitecture specific code. The lower-level implementations (in booke.c and book3s.c) are no-ops. The coming book3s_hv.c will need this. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:50 +03:00
Paul Mackerras	3cf658b605	KVM: PPC: Deliver program interrupts right away instead of queueing them Doing so means that we don't have to save the flags anywhere and gets rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c. Doing so is OK because a program interrupt won't be generated at the same time as any other synchronous interrupt. If a program interrupt and an asynchronous interrupt (external or decrementer) are generated at the same time, the program interrupt will be delivered, which is correct because it has a higher priority, and then the asynchronous interrupt will be masked. We don't ever generate system reset or machine check interrupts to the guest, but if we did, then we would need to make sure they got delivered rather than the program interrupt. The current code would be wrong in this situation anyway since it would deliver the program interrupt as well as the reset/machine check interrupt. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:49 +03:00
Paul Mackerras	b01c8b54a1	powerpc, KVM: Rework KVM checks in first-level interrupt handlers Instead of branching out-of-line with the DO_KVM macro to check if we are in a KVM guest at the time of an interrupt, this moves the KVM check inline in the first-level interrupt handlers. This speeds up the non-KVM case and makes sure that none of the interrupt handlers are missing the check. Because the first-level interrupt handlers are now larger, some things had to be move out of line in exceptions-64s.S. This all necessitated some minor changes to the interrupt entry code in KVM. This also streamlines the book3s_32 KVM test. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:48 +03:00
Paul Mackerras	f05ed4d56e	KVM: PPC: Split out code from book3s.c into book3s_pr.c In preparation for adding code to enable KVM to use hypervisor mode on 64-bit Book 3S processors, this splits book3s.c into two files, book3s.c and book3s_pr.c, where book3s_pr.c contains the code that is specific to running the guest in problem state (user mode) and book3s.c contains code which should apply to all Book 3S processors. In doing this, we abstract some details, namely the interrupt offset, updating the interrupt pending flag, and detecting if the guest is in a critical section. These are all things that will be different when we use hypervisor mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:47 +03:00
Paul Mackerras	c4befc58a0	KVM: PPC: Move fields between struct kvm_vcpu_arch and kvmppc_vcpu_book3s This moves the slb field, which represents the state of the emulated SLB, from the kvmppc_vcpu_book3s struct to the kvm_vcpu_arch, and the hpte_hash_[v]pte[_long] fields from kvm_vcpu_arch to kvmppc_vcpu_book3s. This is in accord with the principle that the kvm_vcpu_arch struct represents the state of the emulated CPU, and the kvmppc_vcpu_book3s struct holds the auxiliary data structures used in the emulation. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:46 +03:00
Paul Mackerras	149dbdb185	KVM: PPC: Fix machine checks on 32-bit Book3S Commit 69acc0d3ba ("KVM: PPC: Resolve real-mode handlers through function exports") resulted in vcpu->arch.trampoline_lowmem and vcpu->arch.trampoline_enter ending up with kernel virtual addresses rather than physical addresses. This is OK on 64-bit Book3S machines, which ignore the top 4 bits of the effective address in real mode, but on 32-bit Book3S machines, accessing these addresses in real mode causes machine check interrupts, as the hardware uses the whole effective address as the physical address in real mode. This fixes the problem by using __pa() to convert these addresses to physical addresses. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:45 +03:00
Takuya Yoshikawa	3c8c652ae4	KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic() Suggested by Ingo and Avi. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:44 +03:00
Takuya Yoshikawa	92c1c1e85b	KVM: MMU: Rename the walk label in walk_addr_generic() The current name does not explain the meaning well. So give it a better name "retry_walk" to show that we are trying the walk again. This was suggested by Ingo Molnar. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:43 +03:00
Takuya Yoshikawa	134291bf3c	KVM: MMU: Clean up the error handling of walk_addr_generic() Avoid two step jump to the error handling part. This eliminates the use of the variables present and rsvd_fault. We also use the const type qualifier to show that write/user/fetch_fault do not change in the function. Both of these were suggested by Ingo Molnar. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:42 +03:00
Marcelo Tosatti	f8f7e5ee10	Revert "KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB" This reverts commit bee931d31e588b8eb86b7edee32fac2d16930cd7. TLB flush should be done lazily during guest entry, in kvm_mmu_load(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:41 +03:00
Scott Wood	1aee47a027	KVM: PPC: e500: Don't search over the entire TLB0. Only look in the 4 entries that could possibly contain the entry we're looking for. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:40 +03:00
Liu Yu	dd9ebf1f94	KVM: PPC: e500: Add shadow PID support Dynamically assign host PIDs to guest PIDs, splitting each guest PID into multiple host (shadow) PIDs based on kernel/user and MSR[IS/DS]. Use both PID0 and PID1 so that the shadow PIDs for the right mode can be selected, that correspond both to guest TID = zero and guest TID = guest PID. This allows us to significantly reduce the frequency of needing to invalidate the entire TLB. When the guest mode or PID changes, we just update the host PID0/PID1. And since the allocation of shadow PIDs is global, multiple guests can share the TLB without conflict. Note that KVM does not yet support the guest setting PID1 or PID2 to a value other than zero. This will need to be fixed for nested KVM to work. Until then, we enforce the requirement for guest PID1/PID2 to stay zero by failing the emulation if the guest tries to set them to something else. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:39 +03:00
Liu Yu	08b7fa92b9	KVM: PPC: e500: Stop keeping shadow TLB Instead of a fully separate set of TLB entries, keep just the pfn and dirty status. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:38 +03:00
Scott Wood	a4cd8b23ac	KVM: PPC: e500: enable magic page This is a shared page used for paravirtualization. It is always present in the guest kernel's effective address space at the address indicated by the hypercall that enables it. The physical address specified by the hypercall is not used, as e500 does not have real mode. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:37 +03:00
Scott Wood	9973d54eea	KVM: PPC: e500: Support large page mappings of PFNMAP vmas. This allows large pages to be used on guest mappings backed by things like /dev/mem, resulting in a significant speedup when guest memory is mapped this way (it's useful for directly-assigned MMIO, too). This is not a substitute for hugetlbfs integration, but is useful for configurations where devices are directly assigned on chips without an IOMMU -- in these cases, we need guest physical and true physical to match, and be contiguous, so static reservation and mapping via /dev/mem is the most straightforward way to set things up. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:36 +03:00
Scott Wood	59c1f4e35c	KVM: PPC: e500: Eliminate shadow_pages[], and use pfns instead. This is in line with what other architectures do, and will allow us to map things other than ordinary, unreserved kernel pages -- such as dedicated devices, or large contiguous reserved regions. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:35 +03:00
Scott Wood	0ef309956c	KVM: PPC: e500: don't use MAS0 as intermediate storage. This avoids races. It also means that we use the shadow TLB way, rather than the hardware hint -- if this is a problem, we could do a tlbsx before inserting a TLB0 entry. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:34 +03:00
Scott Wood	6fc4d1eb91	KVM: PPC: e500: Disable preloading TLB1 in tlb_load(). Since TLB1 loading doesn't check the shadow TLB before allocating another entry, you can get duplicates. Once shadow PIDs are enabled in a later patch, we won't need to invalidate the TLB on every switch, so this optimization won't be needed anyway. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:33 +03:00
Scott Wood	4cd35f675b	KVM: PPC: e500: Save/restore SPE state This is done lazily. The SPE save will be done only if the guest has used SPE since the last preemption or heavyweight exit. Restore will be done only on demand, when enabling MSR_SPE in the shadow MSR, in response to an SPE fault or mtmsr emulation. For SPEFSCR, Linux already switches it on context switch (non-lazily), so the only remaining bit is to save it between qemu and the guest. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:32 +03:00
Scott Wood	ecee273fc4	KVM: PPC: booke: use shadow_msr Keep the guest MSR and the guest-mode true MSR separate, rather than modifying the guest MSR on each guest entry to produce a true MSR. Any bits which should be modified based on guest MSR must be explicitly propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in kvmppc_set_msr(). While we're modifying the guest entry code, reorder a few instructions to bury some load latencies. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:32 +03:00
Scott Wood	c51584d52e	powerpc/e500: SPE register saving: take arbitrary struct offset Previously, these macros hardcoded THREAD_EVR0 as the base of the save area, relative to the base register passed. This base offset is now passed as a separate macro parameter, allowing reuse with other SPE save areas, such as used by KVM. Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:31 +03:00
yu liu	685659ee70	powerpc/e500: Save SPEFCSR in flush_spe_to_thread() giveup_spe() saves the SPE state which is protected by MSR[SPE]. However, modifying SPEFSCR does not trap when MSR[SPE]=0. And since SPEFSCR is already saved/restored in _switch(), not all the callers want to save SPEFSCR again. Thus, saving SPEFSCR should not belong to giveup_spe(). This patch moves SPEFSCR saving to flush_spe_to_thread(), and cleans up the caller that needs to save SPEFSCR accordingly. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:30 +03:00
Alexander Graf	a22a2daccf	KVM: PPC: Resolve real-mode handlers through function exports Up until now, Book3S KVM had variables stored in the kernel that a kernel module or the kvm code in the kernel could read from to figure out where some real mode helper functions are located. This is all unnecessary. The high bits of the EA get ignore in real mode, so we can just use the pointer as is. Also, it's a lot easier on relocations when we use the normal way of resolving the address to a function, instead of jumping through hoops. This patch fixes compilation with CONFIG_RELOCATABLE=y. Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:29 +03:00
Stuart Yoder	24294b9a3f	KVM: PPC: fix partial application of "exit timing in ticks" When http://www.spinics.net/lists/kvm-ppc/msg02664.html was applied to produce commit b51e7aa7ed6d8d134d02df78300ab0f91cfff4d2, the removal of the conversion in add_exit_timing was left out. Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:28 +03:00
Avi Kivity	45bd07b9d5	KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB kvm_set_cr0() and kvm_set_cr4(), and possible other functions, assume that kvm_mmu_reset_context() flushes the guest TLB. However, it does not. Fix by flushing the tlb (and syncing the new root as well). Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:27 +03:00
Avi Kivity	411c588dfb	KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0 When CR0.WP=0, we sometimes map user pages as kernel pages (to allow the kernel to write to them). Unfortunately this also allows the kernel to fetch from these pages, even if CR4.SMEP is set. Adjust for this by also setting NX on the spte in these circumstances. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:26 +03:00
Yang, Wei	a01c8f9b4e	KVM: Enable ERMS feature support for KVM This patch exposes ERMS feature to KVM guests. The REP MOVSB/STOSB instruction can enhance fast strings attempts to move as much of the data with larger size load/stores as possible. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:25 +03:00
Yang, Wei	176f61da82	KVM: Expose RDWRGSFS bit to KVM guests This patch exposes RDWRGSFS bit to KVM guests. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:24 +03:00
Yang, Wei	74dc2b4ffe	KVM: Add RDWRGSFS support when setting CR4 This patch adds RDWRGSFS support when setting CR4. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:23 +03:00
Yang, Wei	d9c3476d8a	KVM: Remove RDWRGSFS bit from CR4_RESERVED_BITS This patch removes RDWRGSFS bit from CR4_RESERVED_BITS. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:22 +03:00
Yang, Wei Y	4a00efdf0c	KVM: Enable DRNG feature support for KVM This patch exposes DRNG feature to KVM guests. The RDRAND instruction can provide software with sequences of random numbers generated from white noise. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:21 +03:00
Andre Przywara	02668b061d	KVM: fix XSAVE bit scanning (now properly) commit 123108f1c1aafd51d6a5c79cc04d7999dd88a930 tried to fix KVMs XSAVE valid feature scanning, but it was wrong. It was not considering the sparse nature of this bitfield, instead reading values from uninitialized members of the entries array. This patch now separates subleaf indicies from KVM's array indicies and fills the entry before querying it's value. This fixes AVX support in KVM guests. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:20 +03:00
Yang, Wei Y	e57d4a356a	KVM: Add instruction fetch checking when walking guest page table This patch adds instruction fetch checking when walking guest page table, to implement SMEP when emulating instead of executing natively. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:15 +03:00
Yang, Wei Y	611c120f74	KVM: Mask function7 ebx against host capability word9 This patch masks CPUID leaf 7 ebx against host capability word9. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:14 +03:00
Yang, Wei Y	c68b734fba	KVM: Add SMEP support when setting CR4 This patch adds SMEP handling when setting CR4. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:13 +03:00
Yang, Wei Y	8d9c975fc5	KVM: Remove SMEP bit from CR4_RESERVED_BITS This patch removes SMEP bit from CR4_RESERVED_BITS. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Shan, Haitao <haitao.shan@intel.com> Signed-off-by: Li, Xin <xin.li@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 13:16:12 +03:00
Nadav Har'El	509c75ea19	KVM: nVMX: Fix bug preventing more than two levels of nesting The nested VMX feature is supposed to fully emulate VMX for the guest. This (theoretically) not only allows it to run its own guests, but also also to further emulate VMX for its own guests, and allow arbitrarily deep nesting. This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH by L2, which prevented deeper nesting. Deeper nesting now works (I only actually tested L3), but is currently absurdly slow, to the point of being unusable. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:11 +03:00
Avi Kivity	9dac77fa40	KVM: x86 emulator: fold decode_cache into x86_emulate_ctxt This saves a lot of pointless casts x86_emulate_ctxt and decode_cache. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:09 +03:00
Avi Kivity	36dd9bb5ce	KVM: x86 emulator: rename decode_cache::eip to _eip The name eip conflicts with a field of the same name in x86_emulate_ctxt, which we plan to fold decode_cache into. The name _eip is unfortunate, but what's really needed is a refactoring here, not a better name. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:09 +03:00
Jan Kiszka	2e4ce7f574	KVM: VMX: Silence warning on 32-bit hosts a is unused now on CONFIG_X86_32. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:08 +03:00
Takuya Yoshikawa	f411e6cdc2	KVM: x86 emulator: Use opcode::execute for CLI/STI(FA/FB) Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:07 +03:00
Takuya Yoshikawa	d06e03adcb	KVM: x86 emulator: Use opcode::execute for LOOP/JCXZ LOOP/LOOPcc : E0-E2 JCXZ/JECXZ/JRCXZ : E3 Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:06 +03:00
Takuya Yoshikawa	5c5df76b8b	KVM: x86 emulator: Clean up INT n/INTO/INT 3(CC/CD/CE) Call emulate_int() directly to avoid spaghetti goto's. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:04 +03:00
Takuya Yoshikawa	1bd5f469b2	KVM: x86 emulator: Use opcode::execute for MOV(8C/8E) Different functions for those which take segment register operands. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:03 +03:00
Takuya Yoshikawa	ebda02c2a5	KVM: x86 emulator: Use opcode::execute for RET(C3) Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:02 +03:00
Takuya Yoshikawa	e4f973ae91	KVM: x86 emulator: Use opcode::execute for XCHG(86/87) In addition, replace one "goto xchg" with an em_xchg() call. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:01 +03:00
Takuya Yoshikawa	9f21ca599c	KVM: x86 emulator: Use opcode::execute for TEST(84/85, A8/A9) Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:16:00 +03:00
Takuya Yoshikawa	db5b0762f3	KVM: x86 emulator: Use opcode::execute for some instructions Move the following functions to the opcode tables: RET (Far return) : CB IRET : CF JMP (Jump far) : EA SYSCALL : 0F 05 CLTS : 0F 06 SYSENTER : 0F 34 SYSEXIT : 0F 35 Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:59 +03:00
Takuya Yoshikawa	e01991e71a	KVM: x86 emulator: Rename emulate_xxx() to em_xxx() The next patch will change these to be called by opcode::execute. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:58 +03:00
Takuya Yoshikawa	9d74191ab1	KVM: x86 emulator: Use the pointers ctxt and c consistently We should use the local variables ctxt and c when the emulate_ctxt and decode appears many times. At least, we need to be consistent about how we use these in a function. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:57 +03:00
Russell King	022ae537b2	ARM: dma: replace ISA_DMA_THRESHOLD with a variable ISA_DMA_THRESHOLD has been unused by non-arch code, so lets now get rid of it from ARM by replacing it with arm_dma_zone_mask. Move dma_supported() and dma_set_mask() out of line, and have dma_supported() check this new variable instead. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2011-07-12 11:08:12 +01:00
Ido Yariv	f299bb9527	arm: davinci: Fix low level gpio irq handlers' argument Commit `7416401` ("arm: davinci: Fix fallout from generic irq chip conversion") introduced a bug, causing low level interrupt handlers to get a bogus irq number as an argument. The gpio irq handler falsely assumes that the handler data is the irq base number and that is no longer true. Set the irq handler data to be a pointer to the corresponding gpio controller. The chained irq handler can then use it to extract both the irq base number and the gpio registers structure. Signed-off-by: Ido Yariv <ido@wizery.com> CC: Thomas Gleixner <tglx@linutronix.de> [nsekhar@ti.com: renamed "ctl" to "d", simplified indexing logic for chips and took care of odd bank handling in irq handler] Signed-off-by: Sekhar Nori <nsekhar@ti.com>	2011-07-12 14:21:43 +05:30
Nadav Har'El	2844d84905	KVM: nVMX: Miscellenous small corrections Small corrections of KVM (spelling, etc.) not directly related to nested VMX. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	7b8050f570	KVM: nVMX: Add VMX to list of supported cpuid features If the "nested" module option is enabled, add the "VMX" CPU feature to the list of CPU features KVM advertises with the KVM_GET_SUPPORTED_CPUID ioctl. Qemu uses this ioctl, and intersects KVM's list with its own list of desired cpu features (depending on the -cpu option given to qemu) to determine the final list of features presented to the guest. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	7991825b85	KVM: nVMX: Additional TSC-offset handling In the unlikely case that L1 does not capture MSR_IA32_TSC, L0 needs to emulate this MSR write by L2 by modifying vmcs02.tsc_offset. We also need to set vmcs12.tsc_offset, for this change to survive the next nested entry (see prepare_vmcs02()). Additionally, we also need to modify vmx_adjust_tsc_offset: The semantics of this function is that the TSC of all guests on this vcpu, L1 and possibly several L2s, need to be adjusted. To do this, we need to adjust vmcs01's tsc_offset (this offset will also apply to each L2s we enter). We can't set vmcs01 now, so we have to remember this adjustment and apply it when we later exit to L1. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	36cf24e01e	KVM: nVMX: Further fixes for lazy FPU loading KVM's "Lazy FPU loading" means that sometimes L0 needs to set CR0.TS, even if a guest didn't set it. Moreover, L0 must also trap CR0.TS changes and NM exceptions, even if we have a guest hypervisor (L1) who didn't want these traps. And of course, conversely: If L1 wanted to trap these events, we must let it, even if L0 is not interested in them. This patch fixes some existing KVM code (in update_exception_bitmap(), vmx_fpu_activate(), vmx_fpu_deactivate()) to do the correct merging of L0's and L1's needs. Note that handle_cr() was already fixed in the above patch, and that new code in introduced in previous patches already handles CR0 correctly (see prepare_vmcs02(), prepare_vmcs12(), and nested_vmx_vmexit()). Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00
Nadav Har'El	eeadf9e755	KVM: nVMX: Handling of CR0 and CR4 modifying instructions When L2 tries to modify CR0 or CR4 (with mov or clts), and modifies a bit which L1 asked to shadow (via CR[04]_GUEST_HOST_MASK), we already do the right thing: we let L1 handle the trap (see nested_vmx_exit_handled_cr() in a previous patch). When L2 modifies bits that L1 doesn't care about, we let it think (via CR[04]_READ_SHADOW) that it did these modifications, while only changing (in GUEST_CR[04]) the bits that L0 doesn't shadow. This is needed for corect handling of CR0.TS for lazy FPU loading: L0 may want to leave TS on, while pretending to allow the guest to change it. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00
Nadav Har'El	66c78ae40c	KVM: nVMX: Correct handling of idt vectoring info This patch adds correct handling of IDT_VECTORING_INFO_FIELD for the nested case. When a guest exits while delivering an interrupt or exception, we get this information in IDT_VECTORING_INFO_FIELD in the VMCS. When L2 exits to L1, there's nothing we need to do, because L1 will see this field in vmcs12, and handle it itself. However, when L2 exits and L0 handles the exit itself and plans to return to L2, L0 must inject this event to L2. In the normal non-nested case, the idt_vectoring_info case is discovered after the exit, and the decision to inject (though not the injection itself) is made at that point. However, in the nested case a decision of whether to return to L2 or L1 also happens during the injection phase (see the previous patches), so in the nested case we can only decide what to do about the idt_vectoring_info right after the injection, i.e., in the beginning of vmx_vcpu_run, which is the first time we know for sure if we're staying in L2. Therefore, when we exit L2 (is_guest_mode(vcpu)), we disable the regular vmx_complete_interrupts() code which queues the idt_vectoring_info for injection on next entry - because such injection would not be appropriate if we will decide to exit to L1. Rather, we just save the idt_vectoring_info and related fields in vmcs12 (which is a convenient place to save these fields). On the next entry in vmx_vcpu_run (after the injection phase, potentially exiting to L1 to inject an event requested by user space), if we find ourselves in L1 we don't need to do anything with those values we saved (as explained above). But if we find that we're in L2, or rather still at L2 (it's not nested_run_pending, meaning that this is the first round of L2 running after L1 having just launched it), we need to inject the event saved in those fields - by writing the appropriate VMCS fields. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00

... 3 4 5 6 7 ...

59552 Коммитов