Граф коммитов

59552 Коммитов

Автор SHA1 Сообщение Дата
Jon Medhurst 0d32e7d11b ARM: kprobes: Migrate ARM space_cccc_100x to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:50 +00:00
Jon Medhurst 56d8fbddc2 ARM: kprobes: Migrate ARM space_cccc_01xx to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:49 +00:00
Jon Medhurst ad2e81a78d ARM: kprobes: Migrate ARM space_cccc_0111__1 to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:49 +00:00
Jon Medhurst 2ce5d03307 ARM: kprobes: Migrate ARM space_cccc_0110__1 to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:49 +00:00
Jon Medhurst 0e44e9a0fa ARM: kprobes: Add emulate_rd12rn16rm0_rwflags_nopc()
This is the emulation function for the instruction format used by the
ARM media instructions.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:49 +00:00
Jon Medhurst c038f3af50 ARM: kprobes: Migrate ARM space_cccc_001x to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:49 +00:00
Jon Medhurst 75f115c087 ARM: kprobes: Migrate ARM space_cccc_000x to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:49 +00:00
Jon Medhurst 6c8a192929 ARM: kprobes: Migrate ARM LDRD and STRD to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:49 +00:00
Jon Medhurst 8723942f7b ARM: kprobes: Add emulate_ldrdstrd()
This is an emulation function for the LDRD and STRD instructions.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:48 +00:00
Jon Medhurst 3535a89ab2 ARM: kprobes: Migrate ARM data-processing (register) instructions to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:48 +00:00
Jon Medhurst 9f596e5126 ARM: kprobes: Add emulate_rd12rn16rm0rs8_rwflags()
This is the emulation function for the instruction format used by the
ARM data-processing instructions.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:48 +00:00
Jon Medhurst 7be7ee2d29 ARM: kprobes: Add BLX macro
This is for use by inline assembler which will be added to kprobes-arm.c
It saves memory when used on newer ARM architectures and also provides
correct interworking should ARM probes be required on Thumb kernels in
the future.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:48 +00:00
Jon Medhurst df4fa1f8dd ARM: kprobes: Add alu_write_pc()
This writes a new value to PC which was obtained as the result of an ARM
ALU instruction. For ARMv7 and later this performs interworking.

On ARM kernels we shouldn't encounter any ALU instructions trying to
switch to Thumb mode so support for this isn't strictly necessary.
However, the approach taken in all other instruction decoding is for us
to avoid unpredictable modification of the PC for security reasons. This
is usually achieved by rejecting insertion of probes on problematic
instruction, but for ALU instructions we can't do this as it depends on
the contents of the CPU registers at the time the probe is hit. So, as
we require some form of run-time checking to trap undesirable PC
modification, we may as well simulate the instructions correctly, i.e.
in the way they would behave in the absence of a probe.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:48 +00:00
Jon Medhurst 9a5c1284a3 ARM: kprobes: Migrate ARM space_1111 to decoding tables
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:48 +00:00
Jon Medhurst bb1085f827 ARM: kprobes: Decode 32-bit Thumb multiply and absolute difference instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:48 +00:00
Jon Medhurst 231fb150c6 ARM: kprobes: Decode 32-bit Thumb long multiply and divide instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:47 +00:00
Jon Medhurst 31656c1a9a ARM: kprobes: Decode 32-bit Thumb data-processing (register) instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:47 +00:00
Jon Medhurst d691023b62 ARM: kprobes: Decode 32-bit Thumb load/store single data item instructions
We will reject probing of unprivileged load and store instructions.
These rarely occur and writing test cases for them is difficult.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:47 +00:00
Jon Medhurst 46009cc5c5 ARM: kprobes: Decode 32-bit Thumb memory hint instructions
We'll treat the preload instructions as nops as they are just
performance hints.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:47 +00:00
Jon Medhurst 6a0d1a1c56 ARM: kprobes: Reject 32-bit Thumb coprocessor and SIMD instructions
The kernel doesn't currently support VFP or Neon code, and probing of
code with CP15 operations is fraught with bad consequences. So we will
just reject probing these instructions.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:47 +00:00
Jon Medhurst ce715c772f ARM: kprobes: Decode 32-bit Thumb branch instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:46 +00:00
Jon Medhurst b06f3ee34d ARM: kprobes: Decode 32-bit miscellaneous control instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:46 +00:00
Jon Medhurst 7848786a7a ARM: kprobes: Decode 32-bit Thumb data-processing (plain binary immediate) instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:46 +00:00
Jon Medhurst 2fcaf7e758 ARM: kprobes: Decode 32-bit Thumb data-processing (modified immediate) instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:46 +00:00
Jon Medhurst 080e001326 ARM: kprobes: Decode 32-bit Thumb data-processing (shifted register) instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:46 +00:00
Jon Medhurst dd212bd3cb ARM: kprobes: Decode 32-bit Thumb table branch instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:46 +00:00
Jon Medhurst b48354d358 ARM: kprobes: Decode 32-bit Thumb load/store dual and load/store exclusive instructions
We reject probing of load/store exclusive instructions because any
emulation routine could never succeed in gaining exclusive access as the
exception framework clears the exclusivity monitor when a probes
breakpoint is hit.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:46 +00:00
Jon Medhurst eaf1d06500 ARM: kprobes: Decode 32-bit Thumb load/store multiple instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:45 +00:00
Jon Medhurst 3d4a99785a ARM: kprobes: Optimise emulation of LDM and STM
This patch improves the performance of LDM and STM instruction
emulation. This is desirable because.

- jprobes and kretprobes probe the first instruction in a function and,
  when the frame pointer is omitted, this instruction is often a STM
  used to push registers onto the stack.

- The STM and LDM instructions are common in the body and tail of
  functions.

- At the same time as being a common instruction form, they also have
  one of the slowest and most complicated simulation routines.

The approach taken to optimisation is to use emulation rather than
simulation, that is, a modified form of the instruction is run with
an appropriate register context.

Benchmarking on an OMAP3530 shows the optimised emulation is between 2
and 3 times faster than the simulation routines. On a Kirkwood based
device the relative performance was very significantly better than this.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:45 +00:00
Jon Medhurst 235a4ce79f ARM: kprobes: Add common decoding function for LDM and STM
The encoding of these instructions is substantially the same for both
ARM and Thumb, so we can have common decoding and simulation functions.

This patch moves the simulation functions from kprobes-arm.c to
kprobes-common.c. It also adds a new simulation function
(simulate_ldm1_pc) for the case where we load into PC because this may
need to interwork.

The instruction decoding is done by a custom function
(kprobe_decode_ldmstm) rather than just relying on decoding table
entries because we will later be adding optimisation code.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:45 +00:00
Jon Medhurst 263e368a2f ARM: kprobes: Add load_write_pc()
This writes a value to PC which was obtained as the result of a
LDR or LDM instruction. For ARMv5T and later this must perform
interworking.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:45 +00:00
Jon Medhurst f39ca8b488 ARM: kprobes: Decode 32-bit Thumb hint instructions
For hints which may have observable effects, like SEV (send event), we
use kprobe_emulate_none which emulates the hint by executing the
original instruction.

For NOP we simulate the instruction using kprobe_simulate_nop, which
does nothing. As probes execute with interrupts disabled this is also
used for hints which may block for an indefinite time, like WFE (wait
for event).

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:45 +00:00
Jon Medhurst 0a188ccb5e ARM: kprobes: Reject 16-bit Thumb SETEND, CPS and BKPT instructions
These are very rare and/or problematic to emulate so we will take the
easy option and disallow probing them (as does the existing ARM
implementation).

Rejecting these instructions doesn't actually require any entries in the
decoding table as it is the default case for instructions which aren't
found.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:45 +00:00
Jon Medhurst 396b41f68d ARM: kprobes: Decode 16-bit Thumb branch instructions
We previously changed the behaviour of probes so that conditional
instructions don't fire when the condition isn't met. For ARM branches,
and Thumb branches in IT blocks, this means they don't fire if the
branch isn't taken.

For consistency, we implement the same for Thumb conditional branch
instructions. This involves setting up insn_check_cc to point to the
relevant condition checking function. As the emulation routine is only
called when this condition passes, it doesn't need to check again and
can unconditionally update PC.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:45 +00:00
Jon Medhurst 444956677e ARM: kprobes: Reject 16-bit Thumb SVC and UNDEFINED instructions
SVC (SWI) instructions shouldn't occur in kernel code so we don't
need to be able to probe them.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:44 +00:00
Jon Medhurst 5b94faf8d7 ARM: kprobes: Decode 16-bit Thumb IT instruction
The normal Thumb singlestepping routine updates the IT state after
calling the instruction handler. We don't what this to happen after the
IT instruction simulation sets the IT state, therefore we need to
provide a custom singlestep routine.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:44 +00:00
Jon Medhurst fd0c8d8a48 ARM: kprobes: Decode 16-bit Thumb PUSH and POP instructions
These instructions are equivalent to

	stmdb sp!,{r0-r7,lr}
	ldmdb sp!,{r0-r7,pc}

and we emulate them by transforming them into the 32-bit Thumb
instructions

	stmdb r9!,{r0-r7,r8}
	ldmdb r9!,{r0-r7,r8}

This is simpler, and almost certainly executes faster, than writing
simulation functions.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:44 +00:00
Jon Medhurst 32818f31f8 ARM: kprobes: Decode 16-bit Thumb CBZ and bit manipulation instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:44 +00:00
Jon Medhurst 2f33582904 ARM: kprobes: Decode 16-bit Thumb PC- and SP-relative address instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:44 +00:00
Jon Medhurst f869514282 ARM: kprobes: Decode 16-bit Thumb load and store instructions
Most of these instructions only operate on the low registers R0-R7
so they can make use of t16_emulate_loregs_rwflags.

The instructions which use SP or PC for addressing have their own
simulation functions.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:44 +00:00
Jon Medhurst 3b5940e811 ARM: kprobes: Decode 16-bit Thumb special data instructions
These data-processing instructions operate on the full range of CPU
registers, so to simulate them we have to modify the registers used
by the instruction. We can't make use of the decoding table framework to
do this because the registers aren't encoded cleanly in separate
nibbles, therefore we need a custom decode function.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:43 +00:00
Jon Medhurst a9c3c29e72 ARM: kprobes: Decode 16-bit Thumb BX and BLX instructions
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:43 +00:00
Jon Medhurst 059987ffa7 ARM: kprobes: Add bx_write_pc()
This writes a value to PC, with interworking. I.e. switches to Thumb or
ARM mode depending on the state of the least significant bit.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:43 +00:00
Jon Medhurst 02d194f647 ARM: kprobes: Decode 16-bit Thumb data-processing instructions
These instructions only operate on the low registers R0-R7, therefore
it is possible to emulate them by executing the original instruction
unaltered if we restore and save these registers. This is what
t16_emulate_loregs does.

Some of these instructions don't update the PSR when they execute in an
IT block, so there are two flavours of emulation functions:
t16_emulate_loregs_{noit}rwflags

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:43 +00:00
Jon Medhurst 7460bce423 ARM: ptrace: Add APSR_MASK definition to ptrace.h
APSR_MASK can be used to extract the APSR bits from the CPSR. The
comment for these definitions is also changed because it was inaccurate
as the existing defines didn't refer to any part of the APSR.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:43 +00:00
Jon Medhurst 3f92dfed6a ARM: kprobes: Decode 16-bit Thumb hint instructions
For hints which may have observable effects, like SEV (send event), we
use kprobe_emulate_none which emulates the hint by executing the
original instruction.

For NOP we simulate the instruction using kprobe_simulate_nop, which
does nothing. As probes execute with interrupts disabled this is also
used for hints which may block for an indefinite time, like WFE (wait
for event).

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:43 +00:00
Jon Medhurst 0d1a095aa1 ARM: kprobes: Infrastructure for table driven decoding of CPU instructions
The existing ARM instruction decoding functions are a mass of if/else
code. Rather than follow this pattern for Thumb instruction decoding
this patch implements an infrastructure for a new table driven scheme.

This has several advantages:

- Reduces the kernel size by approx 2kB. (The ARM instruction decoding
  will eventually have -3.1kB code, +1.3kB data; with similar or better
  estimated savings for Thumb decoding.)

- Allows programmatic checking of decoding consistency and test case
  coverage.

- Provides more uniform source code and is therefore, arguably, clearer.

For a detailed explanation of how decoding tables work see the in-source
documentation in kprobes.h, and also for kprobe_decode_insn().

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:43 +00:00
Jon Medhurst e2960317d4 ARM: kprobes: Extend arch_specific_insn to add pointer to emulated instruction
When we come to emulating Thumb instructions then, to interwork
correctly, the code on in the instruction slot must be invoked with a
function pointer which has the least significant bit set. Rather that
set this by hand in every Thumb emulation function we will add a new
field for this purpose to arch_specific_insn, called insn_fn.

This also enables us to seamlessly share emulation functions between ARM
and Thumb code.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:42 +00:00
Jon Medhurst c6a7d97d57 ARM: kprobes: Add hooks to override singlestep()
When a probe fires we must single-step the instruction which was
replaced by a breakpoint. As the steps to do this vary between ARM and
Thumb instructions we need a way to customise single-stepping.

This is done by adding a new hook called insn_singlestep to
arch_specific_insn which is initialised by the instruction decoding
functions.

These single-step hooks must update PC and call the instruction handler.
For Thumb instructions an additional step of updating ITSTATE is needed.
We do this after calling the handler because some handlers will need to
test if they are running in an IT block.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:42 +00:00
Jon Medhurst 3b26945597 ARM: kprobes: Use conditional breakpoints for ARM probes
Now we no longer trigger probes on conditional instructions when the
condition is false, we can make use of conditional instructions as
breakpoints in ARM code to avoid taking unnecessary exceptions.

Note, we can't rely on not getting an exception when the condition check
fails, as that is Implementation Defined on newer ARM architectures. We
therefore still need to perform manual condition checks as well.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:42 +00:00
Jon Medhurst 3cca6c2435 ARM: kprobes: Don't trigger probes on conditional instructions when condition is false
This patch changes the behavior of kprobes on ARM so that:

    Kprobes on conditional instructions don't trigger when the
    condition is false. For conditional branches, this means that
    they don't trigger in the branch not taken case.

Rationale:

When probes are placed onto conditionally executed instructions in a
Thumb IT block, they may not fire if the condition is not met. This
is because we use invalid instructions for breakpoints and "it is
IMPLEMENTATION DEFINED whether the instruction executes as a NOP or
causes an Undefined Instruction exception". Therefore, for consistency,
we will ignore all probes on any conditional instructions when the
condition is false. Alternative solutions seem to be too complex to
implement or inconsistent.

This issue was discussed on linux.arm.kernel in the thread titled
"[RFC] kprobes with thumb2 conditional code" See
http://comments.gmane.org/gmane.linux.linaro.devel/2985

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:42 +00:00
Jon Medhurst 6aaa8b5570 ARM: kprobes: Add it_advance()
This advances the ITSTATE bits in CPSR to their values for the next
instruction.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:42 +00:00
Jon Medhurst eaf4f33fec ARM: kprobes: Add condition code checking to Thumb emulation
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:42 +00:00
Jon Medhurst aceb487ab2 ARM: kprobes: Add Thumb breakpoint support
Extend the breakpoint insertion and catching functions to support Thumb
code.

As breakpoints are no longer of a fixed size, the flush_insns macro
is modified to take a size argument instead of an instruction count.

Note, we need both 16- and 32-bit Thumb breakpoints, because if we
were to use a 16-bit breakpoint to replace a 32-bit instruction which
was in an IT block, and the condition check failed, then the breakpoint
may not fire (it's unpredictable behaviour) and the CPU could then try
and execute the second half of the 32-bit Thumb instruction.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:42 +00:00
Jon Medhurst 856bc35639 ARM: Kconfig: Allow kprobes on Thumb-2 kernels
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:41 +00:00
Jon Medhurst 2437170710 ARM: kprobes: Add Thumb instruction decoding stubs
Extend arch_prepare_kprobe to support probing of Thumb code. For
the actual decoding of Thumb instructions, stub functions are
added which currently just reject the probe.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:41 +00:00
Jon Medhurst de41984003 ARM: kprobes: Make kprobes framework work on Thumb-2 kernels
Fix up kprobes framework so that it builds and correctly interworks on
Thumb-2 kernels.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:41 +00:00
Jon Medhurst aea490299f ARM: kprobes: Make str_pc_offset a constant on ARMv7
The str_pc_offset value is architecturally defined on ARMv7 onwards so
we can make it a compile time constant. This means on Thumb kernels the
runtime checking code isn't needed, which saves us from having to fix it
to work for Thumb.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:41 +00:00
Jon Medhurst 6c8df3300f ARM: kprobes: Move find_str_pc_offset into kprobes-common.c
Move str_pc_offset into kprobes-common.c as it will be needed by common
code later.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:41 +00:00
Jon Medhurst 1b59d87466 ARM: kprobes: Move is_writeback define to header file.
This will be used later in other files.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:41 +00:00
Jon Medhurst 0ab4c02dda ARM: kprobes: Add kprobes-common.c
This file will contain the instruction decoding and emulation code
which is common to both ARM and Thumb instruction sets.

For now, we will just move over condition_checks from kprobes-arm.c
This table is also renamed to kprobe_condition_checks to avoid polluting
the public namespace with a too generic name.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:41 +00:00
Jon Medhurst 221bf15ffd ARM: kprobes: Split out internal parts of kprobes.h
Later, we will be adding a considerable amount of internal
implementation definitions to kprobe header files and it would be good
to have these in local header file along side the source code, rather
than pollute the existing header which is include by all users of
kprobes.

To this end, we add arch/arm/kernel/kprobes.h and move into this the
existing internal defintions from arch/arm/include/asm/kprobes.h

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:40 +00:00
Jon Medhurst 691b2ff294 ARM: kprobes: Rename kprobes-decode.c to kprobes-arm.c
This file contains decoding and emulation functions for the ARM
instruction set. As we will later be adding a file for Thumb and a
file with common decoding functions, this renaming makes things clearer.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:40 +00:00
Jon Medhurst 592201a9f1 ARM: Thumb-2: Support Thumb-2 in undefined instruction handler
This patch allows undef_hook's to be specified for 32-bit Thumb
instructions and also to be used for thumb kernel-side code.

32-bit Thumb instructions are specified in the form:
	((first_half << 16 ) | second_half)
which matches the layout used by the ARM ARM.

ptrace was handling 32-bit Thumb instructions by hooking the first
halfword and manually checking the second half. This method would be
broken by this patch so it is migrated to make use of the new Thumb-2
support.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:40 +00:00
Jon Medhurst 594810621d ARM: Thumb-2: Fix exception return sequence to restore stack correctly
The implementation of svc_exit didn't take into account any stack hole
created by svc_entry; as happens with the undef handler when kprobes are
configured. The fix is to read the saved value of SP rather than trying
to calculate it.

Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-13 17:32:40 +00:00
Rafael J. Wysocki 796204142a ARM / shmobile: Use pm_genpd_poweroff_unused()
Make shmobile use pm_genpd_poweroff_unused() instead of the
open-coded powering off PM domains without devices in use.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Magnus Damm <damm@opensource.se>
2011-07-13 12:32:07 +02:00
Tejun Heo 1e01979c8f x86, numa: Implement pfn -> nid mapping granularity check
SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
sections array to map pfn to nid which is limited in granularity.  If
NUMA nodes are laid out such that the mapping cannot be accurate, boot
will fail triggering BUG_ON() in mminit_verify_page_links().

On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
granular enough until commit 2706a0bf7b (x86, NUMA: Enable
CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
led to the following BUG_ON().

 On node 0 totalpages: 2096615
   DMA zone: 32 pages used for memmap
   DMA zone: 0 pages reserved
   DMA zone: 3927 pages, LIFO batch:0
   Normal zone: 1740 pages used for memmap
   Normal zone: 220978 pages, LIFO batch:31
   HighMem zone: 16405 pages used for memmap
   HighMem zone: 1853533 pages, LIFO batch:31
 BUG: Int 6: CR2   (null)
      EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
      EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
      err   (null)  EIP c16209aa   CS 00000060  flg 00010002
 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
          (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
        f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
 Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17
 Call Trace:
  [<c136b1e5>] ? early_fault+0x2e/0x2e
  [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
  [<c1620613>] ? memmap_init_zone+0xaf/0x10c
  [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
  [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
  [<c1601d80>] ? paging_init+0x112/0x118
  [<c15f578d>] ? setup_arch+0x791/0x82f
  [<c15f43d9>] ? start_kernel+0x6a/0x257

This patch implements node_map_pfn_alignment() which determines
maximum internode alignment and update numa_register_memblks() to
reject NUMA configuration if alignment exceeds the pfn -> nid mapping
granularity of the memory model as determined by PAGES_PER_SECTION.

This makes the problematic machine boot w/ flatmem by rejecting the
NUMA config and provides protection against crazy NUMA configurations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Conny Seidel <conny.seidel@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 21:58:29 -07:00
Tejun Heo d0ead15738 x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/
DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to
SPARSEMEM; however, it calls each mapping unit ELEMENT instead of
SECTION.  This patch renames it to SECTION so that PAGES_PER_SECTION
is valid for both DISCONTIGMEM and SPARSEMEM.  This will be used by
the next patch to implement mapping granularity check.

This patch is trivial constant rename.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110712074422.GA2872@htj.dyndns.org
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 21:58:11 -07:00
Maxime Ripard 3628c3f5c8 x86. reboot: Make Dell Latitude E6320 use reboot=pci
The Dell Latitude E6320 doesn't reboot unless reboot=pci is set.
Force it thanks to DMI.

Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Link: http://lkml.kernel.org/r/1309269451-4966-1-git-send-email-maxime.ripard@free-electrons.com
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-07-12 21:42:48 -07:00
Naga Chumbalkar 42f0efc5aa x86, ioapic: Print IR_IO_APIC_route_entry when IR is enabled
When IR (interrupt remapping) is enabled print_IO_APIC() displays output according
to legacy RTE (redirection table entry) definitons:

 NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 00  1    0    0   0   0    0    0    00
 01 00  0    0    0   0   0    0    0    01
 02 00  0    0    0   0   0    0    0    02
 03 00  1    0    0   0   0    0    0    03
 04 00  1    0    0   0   0    0    0    04
 05 00  1    0    0   0   0    0    0    05
 06 00  1    0    0   0   0    0    0    06
...

The above output is as per Sec 3.2.4 of the IOAPIC datasheet:
82093AA I/O Advanced Programmable Interrupt Controller (IOAPIC):
http://download.intel.com/design/chipsets/datashts/29056601.pdf

Instead the output should display the fields as discussed in Sec 5.5.1
of the VT-d specification:

(Intel Virtualization Technology for Directed I/O:
http://download.intel.com/technology/computing/vptech/Intel(r)_VT_for_Direct_IO.pdf)

After the fix:
 NR Indx Fmt Mask Trig IRR Pol Stat Indx2 Zero Vect:
 00 0000 0   1    0    0   0   0    0     0    00
 01 000F 1   0    0    0   0   0    0     0    01
 02 0001 1   0    0    0   0   0    0     0    02
 03 0002 1   1    0    0   0   0    0     0    03
 04 0011 1   1    0    0   0   0    0     0    04
 05 0004 1   1    0    0   0   0    0     0    05
 06 0005 1   1    0    0   0   0    0     0    06
...

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712211658.2939.93123.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 20:17:58 -07:00
Naga Chumbalkar 3040db92ee x86, ioapic: Print IRTE when IR is enabled
When "apic=debug" is used as a boot parameter, Linux prints the IOAPIC routing
entries in "dmesg". Below is output from IOAPIC whose apic_id is 8:

# dmesg | grep "routing entry"
IOAPIC[8]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
IOAPIC[8]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
IOAPIC[8]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0)
...

Similarly, when IR (interrupt remapping) is enabled, and the IRTE
(interrupt remapping table entry) is set up we should display it.

After the fix:

# dmesg | grep IRTE
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:31 Dest:00000000 SID:00F1 SQ:0 SVT:1)
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:30 Dest:00000000 SID:00F1 SQ:0 SVT:1)
IOAPIC[8]: Set IRTE entry (P:1 FPD:0 Dst_Mode:0 Redir_hint:1 Trig_Mode:0 Dlvry_Mode:0 Avail:0 Vector:33 Dest:00000000 SID:00F1 SQ:0 SVT:1)
...

The IRTE is defined in Sec 9.5 of the Intel VT-d Specification.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712211704.2939.71291.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 14:34:00 -07:00
Naga Chumbalkar 2597085228 x86, x2apic: Preserve high 32-bits of IA32_APIC_BASE MSR
If there's no special reason to zero-out the "high" 32-bits of the IA32_APIC_BASE
MSR, let's preserve it.

The x2APIC Specification doesn't explicitly state any such requirement. (Sec 2.2
in: http://www.intel.com/Assets/PDF/manual/318148.pdf).

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110712055831.2498.78521.sendpatchset@nchumbalkar.americas.cpqcorp.net
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-12 14:33:49 -07:00
Linus Torvalds 8d86e5f914 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/mm: Fix memory_block_size_bytes() for non-pseries
  mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
2011-07-12 14:21:19 -07:00
Linus Torvalds d93a881dd7 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc:
  pcmcia: pxa2xx/vpac270: free gpios on exist rather than requesting
  ARM: pxa/raumfeld: fix device name for codec ak4104
  ARM: pxa/raumfeld: display initialisation fixes
  ARM: pxa/raumfeld: adapt to upcoming hardware change
  ARM: pxa: fix gpio_to_chip() clash with gpiolib namespace
  genirq: replace irq_gc_ack() with {set,clr}_bit variants (fwd)
  arm: mach-vt8500: add forgotten irq_data conversion
  ARM: pxa168: correct nand pmu setting
  ARM: pxa910: correct nand pmu setting
  ARM: pxa: fix PGSR register address calculation
2011-07-12 14:19:51 -07:00
Kevin Hilman c8c9fda506 OMAP: PM: disable idle on suspend for GPIO and UART
Until these drivers are runtime PM converted, their device power
states are managed by calling custom driver hooks late in the
idle/suspend path.  Therefore, do not let the suspend/resume core code
automatically idle these devices since they will be managed manually
by the OMAP PM core very late in the idle/suspend path.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-12 22:48:42 +02:00
Kevin Hilman 80c6d1e65c OMAP: PM: omap_device: add API to disable idle on suspend
By default, omap_devices will be automatically idled on suspend
(and re-enabled on resume.)  Using this new API, device init code
can disable this feature if desired.

NOTE: any driver/device that has been runtime PM converted should
      not be using this API.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-12 22:48:29 +02:00
Kevin Hilman c03f007a8b OMAP: PM: omap_device: add system PM methods for PM domain handling
In the omap_device PM domain callbacks, use omap_device idle/enable to
automatically manage device idle states during system suspend/resume.

If an omap_device has not already been runtime suspended, the
->suspend_noirq() method of the PM domain will use omap_device_idle()
to idle the HW after calling the driver's ->runtime_suspend()
callback.  Similarily, upon resume, if the device was suspended during
->suspend_noirq(), the ->resume_noirq() method of the PM domain will
use omap_device_enable() to enable the HW and then call the driver's
->runtime_resume() callback.

If a device has already been runtime suspended, the noirq methods of
the PM domain leave the device runtime suspended by default.

However, if a driver needs to runtime resume a device during suspend
(for example, to change its wakeup settings), it may do so using
pm_runtime_get* in it's ->suspend() callback.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-12 22:48:19 +02:00
Kevin Hilman 256a543597 OMAP: PM: omap_device: conditionally use PM domain runtime helpers
Only build and use the runtime PM helper functions only when runtime
PM is actually enabled.

Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-12 22:48:03 +02:00
Christoph Lameter 688d3be815 percpu: Fixup __this_cpu_xchg* operations
Somehow we got into a situation where the __this_cpu_xchg() operations were
not defined in the same way as this_cpu_xchg() and friends. I had some build
failures under 32 bit that were addressed by these fixes.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2011-07-12 13:47:16 +02:00
Glauber Costa 9ddabbe72e KVM: KVM Steal time guest/host interface
To implement steal time, we need the hypervisor to pass the guest information
about how much time was spent running other processes outside the VM.
This is per-vcpu, and using the kvmclock structure for that is an abuse
we decided not to make.

In this patchset, I am introducing a new msr, KVM_MSR_STEAL_TIME, that
holds the memory area address containing information about steal time

This patch contains the headers for it. I am keeping it separate to facilitate
backports to people who wants to backport the kernel part but not the
hypervisor, or the other way around.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:17:03 +03:00
Glauber Costa 4b6b35f55c KVM: Add constant to represent KVM MSRs enabled bit in guest/host interface
This patch is simple, put in a different commit so it can be more easily
shared between guest and hypervisor. It just defines a named constant
to indicate the enable bit for KVM-specific MSRs.

Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric B Munson <emunson@mgebm.net>
CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:17:02 +03:00
Alexander Graf 29d03158f9 KVM: PPC: Remove prog_flags
Commit c8f729d408 (KVM: PPC: Deliver program interrupts right away instead
of queueing them) made away with all users of prog_flags, so we can just
remove it from the headers.

Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:17:00 +03:00
Paul Mackerras 9e368f2915 KVM: PPC: book3s_hv: Add support for PPC970-family processors
This adds support for running KVM guests in supervisor mode on those
PPC970 processors that have a usable hypervisor mode.  Unfortunately,
Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to
1), but the YDL PowerStation does have a usable hypervisor mode.

There are several differences between the PPC970 and POWER7 in how
guests are managed.  These differences are accommodated using the
CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature
bits.  Notably, on PPC970:

* The LPCR, LPID or RMOR registers don't exist, and the functions of
  those registers are provided by bits in HID4 and one bit in HID0.

* External interrupts can be directed to the hypervisor, but unlike
  POWER7 they are masked by MSR[EE] in non-hypervisor modes and use
  SRR0/1 not HSRR0/1.

* There is no virtual RMA (VRMA) mode; the guest must use an RMO
  (real mode offset) area.

* The TLB entries are not tagged with the LPID, so it is necessary to
  flush the whole TLB on partition switch.  Furthermore, when switching
  partitions we have to ensure that no other CPU is executing the tlbie
  or tlbsync instructions in either the old or the new partition,
  otherwise undefined behaviour can occur.

* The PMU has 8 counters (PMC registers) rather than 6.

* The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist.

* The SLB has 64 entries rather than 32.

* There is no mediated external interrupt facility, so if we switch to
  a guest that has a virtual external interrupt pending but the guest
  has MSR[EE] = 0, we have to arrange to have an interrupt pending for
  it so that we can get control back once it re-enables interrupts.  We
  do that by sending ourselves an IPI with smp_send_reschedule after
  hard-disabling interrupts.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:59 +03:00
Paul Mackerras 969391c58a powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits
This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
indicate that we have a usable hypervisor mode, and another to indicate
that the processor conforms to PowerISA version 2.06.  We also add
another bit to indicate that the processor conforms to ISA version 2.01
and set that for PPC970 and derivatives.

Some PPC970 chips (specifically those in Apple machines) have a
hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
is not useful in the sense that there is no way to run any code in
supervisor mode (HV=0 PR=0).  On these processors, the LPES0 and LPES1
bits in HID4 are always 0, and we use that as a way of detecting that
hypervisor mode is not useful.

Where we have a feature section in assembly code around code that
only applies on POWER7 in hypervisor mode, we use a construct like

END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)

The definition of END_FTR_SECTION_IFSET is such that the code will
be enabled (not overwritten with nops) only if all bits in the
provided mask are set.

Note that the CPU feature check in __tlbie() only needs to check the
ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
if we are running bare-metal, i.e. in hypervisor mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:58 +03:00
Paul Mackerras aa04b4cc5b KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility.  These processors require a physically
contiguous, aligned area of memory for each guest.  When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access.  The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.

Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator.  The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.

KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs.  The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.

This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA.  It
also returns the size of the RMA in the argument structure.

Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace.  To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory.  Subsequently we will get rid of this
array and use memory associated with each memslot instead.

This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region.  Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB.  However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.

Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest.  This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:57 +03:00
Paul Mackerras 371fefd6f2 KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7.  The host still has to run single-threaded.

This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability.  The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.

To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode.  KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline).  To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c.  In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it.  Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.

When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host.  This number is exported
to userspace via the KVM_CAP_PPC_SMT capability.  If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.

We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host.  We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked.  This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.

When a vcore starts to run, it executes in the context of one of the
vcpu threads.  The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).

It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running.  In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest.  It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.

Note that there is no fixed relationship between the hardware thread
number and the vcpu number.  Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:57 +03:00
David Gibson 54738c0971 KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode
This improves I/O performance for guests using the PAPR
paravirtualization interface by making the H_PUT_TCE hcall faster, by
implementing it in real mode.  H_PUT_TCE is used for updating virtual
IOMMU tables, and is used both for virtual I/O and for real I/O in the
PAPR interface.

Since this moves the IOMMU tables into the kernel, we define a new
KVM_CREATE_SPAPR_TCE ioctl to allow qemu to create the tables.  The
ioctl returns a file descriptor which can be used to mmap the newly
created table.  The qemu driver models use them in the same way as
userspace managed tables, but they can be updated directly by the
guest with a real-mode H_PUT_TCE implementation, reducing the number
of host/guest context switches during guest IO.

There are certain circumstances where it is useful for userland qemu
to write to the TCE table even if the kernel H_PUT_TCE path is used
most of the time.  Specifically, allowing this will avoid awkwardness
when we need to reset the table.  More importantly, we will in the
future need to write the table in order to restore its state after a
checkpoint resume or migration.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:56 +03:00
Paul Mackerras a8606e20e4 KVM: PPC: Handle some PAPR hcalls in the kernel
This adds the infrastructure for handling PAPR hcalls in the kernel,
either early in the guest exit path while we are still in real mode,
or later once the MMU has been turned back on and we are in the full
kernel context.  The advantage of handling hcalls in real mode if
possible is that we avoid two partition switches -- and this will
become more important when we support SMT4 guests, since a partition
switch means we have to pull all of the threads in the core out of
the guest.  The disadvantage is that we can only access the kernel
linear mapping, not anything vmalloced or ioremapped, since the MMU
is off.

This also adds code to handle the following hcalls in real mode:

H_ENTER       Add an HPTE to the hashed page table
H_REMOVE      Remove an HPTE from the hashed page table
H_READ        Read HPTEs from the hashed page table
H_PROTECT     Change the protection bits in an HPTE
H_BULK_REMOVE Remove up to 4 HPTEs from the hashed page table
H_SET_DABR    Set the data address breakpoint register

Plus code to handle the following hcalls in the kernel:

H_CEDE        Idle the vcpu until an interrupt or H_PROD hcall arrives
H_PROD        Wake up a ceded vcpu
H_REGISTER_VPA Register a virtual processor area (VPA)

The code that runs in real mode has to be in the base kernel, not in
the module, if KVM is compiled as a module.  The real-mode code can
only access the kernel linear mapping, not vmalloc or ioremap space.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:55 +03:00
Paul Mackerras de56a948b9 KVM: PPC: Add support for Book3S processors in hypervisor mode
This adds support for KVM running on 64-bit Book 3S processors,
specifically POWER7, in hypervisor mode.  Using hypervisor mode means
that the guest can use the processor's supervisor mode.  That means
that the guest can execute privileged instructions and access privileged
registers itself without trapping to the host.  This gives excellent
performance, but does mean that KVM cannot emulate a processor
architecture other than the one that the hardware implements.

This code assumes that the guest is running paravirtualized using the
PAPR (Power Architecture Platform Requirements) interface, which is the
interface that IBM's PowerVM hypervisor uses.  That means that existing
Linux distributions that run on IBM pSeries machines will also run
under KVM without modification.  In order to communicate the PAPR
hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
to include/linux/kvm.h.

Currently the choice between book3s_hv support and book3s_pr support
(i.e. the existing code, which runs the guest in user mode) has to be
made at kernel configuration time, so a given kernel binary can only
do one or the other.

This new book3s_hv code doesn't support MMIO emulation at present.
Since we are running paravirtualized guests, this isn't a serious
restriction.

With the guest running in supervisor mode, most exceptions go straight
to the guest.  We will never get data or instruction storage or segment
interrupts, alignment interrupts, decrementer interrupts, program
interrupts, single-step interrupts, etc., coming to the hypervisor from
the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
exception entry path so that we don't have to do the KVM test on entry
to those exception handlers.

We do however get hypervisor decrementer, hypervisor data storage,
hypervisor instruction storage, and hypervisor emulation assist
interrupts, so we have to handle those.

In hypervisor mode, real-mode accesses can access all of RAM, not just
a limited amount.  Therefore we put all the guest state in the vcpu.arch
and use the shadow_vcpu in the PACA only for temporary scratch space.
We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
We don't have a shared page with the guest, but we still need a
kvm_vcpu_arch_shared struct to store the values of various registers,
so we include one in the vcpu_arch struct.

The POWER7 processor has a restriction that all threads in a core have
to be in the same partition.  MMU-on kernel code counts as a partition
(partition 0), so we have to do a partition switch on every entry to and
exit from the guest.  At present we require the host and guest to run
in single-thread mode because of this hardware restriction.

This code allocates a hashed page table for the guest and initializes
it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
require that the guest memory is allocated using 16MB huge pages, in
order to simplify the low-level memory management.  This also means that
we can get away without tracking paging activity in the host for now,
since huge pages can't be paged or swapped.

This also adds a few new exports needed by the book3s_hv code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:54 +03:00
Paul Mackerras 3c42bf8a71 KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu
There are several fields in struct kvmppc_book3s_shadow_vcpu that
temporarily store bits of host state while a guest is running,
rather than anything relating to the particular guest or vcpu.
This splits them out into a new kvmppc_host_state structure and
modifies the definitions in asm-offsets.c to suit.

On 32-bit, we have a kvmppc_host_state structure inside the
kvmppc_book3s_shadow_vcpu since the assembly code needs to be able
to get to them both with one pointer.  On 64-bit they are separate
fields in the PACA.  This means that on 64-bit we don't need to
copy the kvmppc_host_state in and out on vcpu load/unload, and
in future will mean that the book3s_hv code doesn't need a
shadow_vcpu struct in the PACA at all.  That does mean that we
have to be careful not to rely on any values persisting in the
hstate field of the paca across any point where we could block
or get preempted.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:53 +03:00
Paul Mackerras 923c53caea powerpc: Set up LPCR for running guest partitions
In hypervisor mode, the LPCR controls several aspects of guest
partitions, including virtual partition memory mode, and also controls
whether the hypervisor decrementer interrupts are enabled.  This sets
up LPCR at boot time so that guest partitions will use a virtual real
memory area (VRMA) composed of 16MB large pages, and hypervisor
decrementer interrupts are disabled.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:52 +03:00
Paul Mackerras df6909e5d5 KVM: PPC: Move guest enter/exit down into subarch-specific code
Instead of doing the kvm_guest_enter/exit() and local_irq_dis/enable()
calls in powerpc.c, this moves them down into the subarch-specific
book3s_pr.c and booke.c.  This eliminates an extra local_irq_enable()
call in book3s_pr.c, and will be needed for when we do SMT4 guest
support in the book3s hypervisor mode code.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:51 +03:00
Paul Mackerras f9e0554dec KVM: PPC: Pass init/destroy vm and prepare/commit memory region ops down
This arranges for the top-level arch/powerpc/kvm/powerpc.c file to
pass down some of the calls it gets to the lower-level subarchitecture
specific code.  The lower-level implementations (in booke.c and book3s.c)
are no-ops.  The coming book3s_hv.c will need this.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:50 +03:00
Paul Mackerras 3cf658b605 KVM: PPC: Deliver program interrupts right away instead of queueing them
Doing so means that we don't have to save the flags anywhere and gets
rid of the last reference to to_book3s(vcpu) in arch/powerpc/kvm/book3s.c.

Doing so is OK because a program interrupt won't be generated at the
same time as any other synchronous interrupt.  If a program interrupt
and an asynchronous interrupt (external or decrementer) are generated
at the same time, the program interrupt will be delivered, which is
correct because it has a higher priority, and then the asynchronous
interrupt will be masked.

We don't ever generate system reset or machine check interrupts to the
guest, but if we did, then we would need to make sure they got delivered
rather than the program interrupt.  The current code would be wrong in
this situation anyway since it would deliver the program interrupt as
well as the reset/machine check interrupt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:49 +03:00
Paul Mackerras b01c8b54a1 powerpc, KVM: Rework KVM checks in first-level interrupt handlers
Instead of branching out-of-line with the DO_KVM macro to check if we
are in a KVM guest at the time of an interrupt, this moves the KVM
check inline in the first-level interrupt handlers.  This speeds up
the non-KVM case and makes sure that none of the interrupt handlers
are missing the check.

Because the first-level interrupt handlers are now larger, some things
had to be move out of line in exceptions-64s.S.

This all necessitated some minor changes to the interrupt entry code
in KVM.  This also streamlines the book3s_32 KVM test.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:48 +03:00
Paul Mackerras f05ed4d56e KVM: PPC: Split out code from book3s.c into book3s_pr.c
In preparation for adding code to enable KVM to use hypervisor mode
on 64-bit Book 3S processors, this splits book3s.c into two files,
book3s.c and book3s_pr.c, where book3s_pr.c contains the code that is
specific to running the guest in problem state (user mode) and book3s.c
contains code which should apply to all Book 3S processors.

In doing this, we abstract some details, namely the interrupt offset,
updating the interrupt pending flag, and detecting if the guest is
in a critical section.  These are all things that will be different
when we use hypervisor mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:47 +03:00
Paul Mackerras c4befc58a0 KVM: PPC: Move fields between struct kvm_vcpu_arch and kvmppc_vcpu_book3s
This moves the slb field, which represents the state of the emulated
SLB, from the kvmppc_vcpu_book3s struct to the kvm_vcpu_arch, and the
hpte_hash_[v]pte[_long] fields from kvm_vcpu_arch to kvmppc_vcpu_book3s.
This is in accord with the principle that the kvm_vcpu_arch struct
represents the state of the emulated CPU, and the kvmppc_vcpu_book3s
struct holds the auxiliary data structures used in the emulation.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:46 +03:00
Paul Mackerras 149dbdb185 KVM: PPC: Fix machine checks on 32-bit Book3S
Commit 69acc0d3ba ("KVM: PPC: Resolve real-mode handlers through
function exports") resulted in vcpu->arch.trampoline_lowmem and
vcpu->arch.trampoline_enter ending up with kernel virtual addresses
rather than physical addresses.  This is OK on 64-bit Book3S machines,
which ignore the top 4 bits of the effective address in real mode,
but on 32-bit Book3S machines, accessing these addresses in real mode
causes machine check interrupts, as the hardware uses the whole
effective address as the physical address in real mode.

This fixes the problem by using __pa() to convert these addresses
to physical addresses.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:45 +03:00
Takuya Yoshikawa 3c8c652ae4 KVM: MMU: Introduce is_last_gpte() to clean up walk_addr_generic()
Suggested by Ingo and Avi.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:44 +03:00
Takuya Yoshikawa 92c1c1e85b KVM: MMU: Rename the walk label in walk_addr_generic()
The current name does not explain the meaning well.  So give it a better
name "retry_walk" to show that we are trying the walk again.

This was suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:43 +03:00
Takuya Yoshikawa 134291bf3c KVM: MMU: Clean up the error handling of walk_addr_generic()
Avoid two step jump to the error handling part.  This eliminates the use
of the variables present and rsvd_fault.

We also use the const type qualifier to show that write/user/fetch_fault
do not change in the function.

Both of these were suggested by Ingo Molnar.

Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:42 +03:00
Marcelo Tosatti f8f7e5ee10 Revert "KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB"
This reverts commit bee931d31e588b8eb86b7edee32fac2d16930cd7.

TLB flush should be done lazily during guest entry, in
kvm_mmu_load().

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:41 +03:00
Scott Wood 1aee47a027 KVM: PPC: e500: Don't search over the entire TLB0.
Only look in the 4 entries that could possibly contain the
entry we're looking for.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:40 +03:00
Liu Yu dd9ebf1f94 KVM: PPC: e500: Add shadow PID support
Dynamically assign host PIDs to guest PIDs, splitting each guest PID into
multiple host (shadow) PIDs based on kernel/user and MSR[IS/DS].  Use
both PID0 and PID1 so that the shadow PIDs for the right mode can be
selected, that correspond both to guest TID = zero and guest TID = guest
PID.

This allows us to significantly reduce the frequency of needing to
invalidate the entire TLB.  When the guest mode or PID changes, we just
update the host PID0/PID1.  And since the allocation of shadow PIDs is
global, multiple guests can share the TLB without conflict.

Note that KVM does not yet support the guest setting PID1 or PID2 to
a value other than zero.  This will need to be fixed for nested KVM
to work.  Until then, we enforce the requirement for guest PID1/PID2
to stay zero by failing the emulation if the guest tries to set them
to something else.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:39 +03:00
Liu Yu 08b7fa92b9 KVM: PPC: e500: Stop keeping shadow TLB
Instead of a fully separate set of TLB entries, keep just the
pfn and dirty status.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:38 +03:00
Scott Wood a4cd8b23ac KVM: PPC: e500: enable magic page
This is a shared page used for paravirtualization.  It is always present
in the guest kernel's effective address space at the address indicated
by the hypercall that enables it.

The physical address specified by the hypercall is not used, as
e500 does not have real mode.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:37 +03:00
Scott Wood 9973d54eea KVM: PPC: e500: Support large page mappings of PFNMAP vmas.
This allows large pages to be used on guest mappings backed by things like
/dev/mem, resulting in a significant speedup when guest memory
is mapped this way (it's useful for directly-assigned MMIO, too).

This is not a substitute for hugetlbfs integration, but is useful for
configurations where devices are directly assigned on chips without an
IOMMU -- in these cases, we need guest physical and true physical to
match, and be contiguous, so static reservation and mapping via /dev/mem
is the most straightforward way to set things up.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:36 +03:00
Scott Wood 59c1f4e35c KVM: PPC: e500: Eliminate shadow_pages[], and use pfns instead.
This is in line with what other architectures do, and will allow us to
map things other than ordinary, unreserved kernel pages -- such as
dedicated devices, or large contiguous reserved regions.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:35 +03:00
Scott Wood 0ef309956c KVM: PPC: e500: don't use MAS0 as intermediate storage.
This avoids races.  It also means that we use the shadow TLB way,
rather than the hardware hint -- if this is a problem, we could do
a tlbsx before inserting a TLB0 entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:34 +03:00
Scott Wood 6fc4d1eb91 KVM: PPC: e500: Disable preloading TLB1 in tlb_load().
Since TLB1 loading doesn't check the shadow TLB before allocating another
entry, you can get duplicates.

Once shadow PIDs are enabled in a later patch, we won't need to
invalidate the TLB on every switch, so this optimization won't be
needed anyway.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:33 +03:00
Scott Wood 4cd35f675b KVM: PPC: e500: Save/restore SPE state
This is done lazily.  The SPE save will be done only if the guest has
used SPE since the last preemption or heavyweight exit.  Restore will be
done only on demand, when enabling MSR_SPE in the shadow MSR, in response
to an SPE fault or mtmsr emulation.

For SPEFSCR, Linux already switches it on context switch (non-lazily), so
the only remaining bit is to save it between qemu and the guest.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:32 +03:00
Scott Wood ecee273fc4 KVM: PPC: booke: use shadow_msr
Keep the guest MSR and the guest-mode true MSR separate, rather than
modifying the guest MSR on each guest entry to produce a true MSR.

Any bits which should be modified based on guest MSR must be explicitly
propagated from vcpu->arch.shared->msr to vcpu->arch.shadow_msr in
kvmppc_set_msr().

While we're modifying the guest entry code, reorder a few instructions
to bury some load latencies.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:32 +03:00
Scott Wood c51584d52e powerpc/e500: SPE register saving: take arbitrary struct offset
Previously, these macros hardcoded THREAD_EVR0 as the base of the save
area, relative to the base register passed.  This base offset is now
passed as a separate macro parameter, allowing reuse with other SPE
save areas, such as used by KVM.

Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:31 +03:00
yu liu 685659ee70 powerpc/e500: Save SPEFCSR in flush_spe_to_thread()
giveup_spe() saves the SPE state which is protected by MSR[SPE].
However, modifying SPEFSCR does not trap when MSR[SPE]=0.
And since SPEFSCR is already saved/restored in _switch(),
not all the callers want to save SPEFSCR again.
Thus, saving SPEFSCR should not belong to giveup_spe().

This patch moves SPEFSCR saving to flush_spe_to_thread(),
and cleans up the caller that needs to save SPEFSCR accordingly.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:30 +03:00
Alexander Graf a22a2daccf KVM: PPC: Resolve real-mode handlers through function exports
Up until now, Book3S KVM had variables stored in the kernel that a kernel module
or the kvm code in the kernel could read from to figure out where some real mode
helper functions are located.

This is all unnecessary. The high bits of the EA get ignore in real mode, so we
can just use the pointer as is. Also, it's a lot easier on relocations when we
use the normal way of resolving the address to a function, instead of jumping
through hoops.

This patch fixes compilation with CONFIG_RELOCATABLE=y.

Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:29 +03:00
Stuart Yoder 24294b9a3f KVM: PPC: fix partial application of "exit timing in ticks"
When http://www.spinics.net/lists/kvm-ppc/msg02664.html
was applied to produce commit b51e7aa7ed6d8d134d02df78300ab0f91cfff4d2,
the removal of the conversion in add_exit_timing was left out.

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12 13:16:28 +03:00
Avi Kivity 45bd07b9d5 KVM: MMU: make kvm_mmu_reset_context() flush the guest TLB
kvm_set_cr0() and kvm_set_cr4(), and possible other functions,
assume that kvm_mmu_reset_context() flushes the guest TLB.  However,
it does not.

Fix by flushing the tlb (and syncing the new root as well).

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:27 +03:00
Avi Kivity 411c588dfb KVM: MMU: Adjust shadow paging to work when SMEP=1 and CR0.WP=0
When CR0.WP=0, we sometimes map user pages as kernel pages (to allow
the kernel to write to them).  Unfortunately this also allows the kernel
to fetch from these pages, even if CR4.SMEP is set.

Adjust for this by also setting NX on the spte in these circumstances.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:26 +03:00
Yang, Wei a01c8f9b4e KVM: Enable ERMS feature support for KVM
This patch exposes ERMS feature to KVM guests.

The REP MOVSB/STOSB instruction can enhance fast strings attempts to
move as much of the data with larger size load/stores as possible.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:25 +03:00
Yang, Wei 176f61da82 KVM: Expose RDWRGSFS bit to KVM guests
This patch exposes RDWRGSFS bit to KVM guests.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:24 +03:00
Yang, Wei 74dc2b4ffe KVM: Add RDWRGSFS support when setting CR4
This patch adds RDWRGSFS support when setting CR4.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:23 +03:00
Yang, Wei d9c3476d8a KVM: Remove RDWRGSFS bit from CR4_RESERVED_BITS
This patch removes RDWRGSFS bit from CR4_RESERVED_BITS.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:22 +03:00
Yang, Wei Y 4a00efdf0c KVM: Enable DRNG feature support for KVM
This patch exposes DRNG feature to KVM guests.

The RDRAND instruction can provide software with sequences of
random numbers generated from white noise.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:21 +03:00
Andre Przywara 02668b061d KVM: fix XSAVE bit scanning (now properly)
commit 123108f1c1aafd51d6a5c79cc04d7999dd88a930 tried to fix KVMs
XSAVE valid feature scanning, but it was wrong. It was not considering
the sparse nature of this bitfield, instead reading values from
uninitialized members of the entries array.
This patch now separates subleaf indicies from KVM's array indicies
and fills the entry before querying it's value.
This fixes AVX support in KVM guests.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:20 +03:00
Yang, Wei Y e57d4a356a KVM: Add instruction fetch checking when walking guest page table
This patch adds instruction fetch checking when walking guest page table,
to implement SMEP when emulating instead of executing natively.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:15 +03:00
Yang, Wei Y 611c120f74 KVM: Mask function7 ebx against host capability word9
This patch masks CPUID leaf 7 ebx against host capability word9.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:14 +03:00
Yang, Wei Y c68b734fba KVM: Add SMEP support when setting CR4
This patch adds SMEP handling when setting CR4.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:13 +03:00
Yang, Wei Y 8d9c975fc5 KVM: Remove SMEP bit from CR4_RESERVED_BITS
This patch removes SMEP bit from CR4_RESERVED_BITS.

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Shan, Haitao <haitao.shan@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:12 +03:00
Nadav Har'El 509c75ea19 KVM: nVMX: Fix bug preventing more than two levels of nesting
The nested VMX feature is supposed to fully emulate VMX for the guest. This
(theoretically) not only allows it to run its own guests, but also also
to further emulate VMX for its own guests, and allow arbitrarily deep nesting.

This patch fixes a bug (discovered by Kevin Tian) in handling a VMLAUNCH
by L2, which prevented deeper nesting.

Deeper nesting now works (I only actually tested L3), but is currently
*absurdly* slow, to the point of being unusable.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:11 +03:00
Avi Kivity 9dac77fa40 KVM: x86 emulator: fold decode_cache into x86_emulate_ctxt
This saves a lot of pointless casts x86_emulate_ctxt and decode_cache.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:09 +03:00
Avi Kivity 36dd9bb5ce KVM: x86 emulator: rename decode_cache::eip to _eip
The name eip conflicts with a field of the same name in x86_emulate_ctxt,
which we plan to fold decode_cache into.

The name _eip is unfortunate, but what's really needed is a refactoring
here, not a better name.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:09 +03:00
Jan Kiszka 2e4ce7f574 KVM: VMX: Silence warning on 32-bit hosts
a is unused now on CONFIG_X86_32.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:08 +03:00
Takuya Yoshikawa f411e6cdc2 KVM: x86 emulator: Use opcode::execute for CLI/STI(FA/FB)
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:07 +03:00
Takuya Yoshikawa d06e03adcb KVM: x86 emulator: Use opcode::execute for LOOP/JCXZ
LOOP/LOOPcc      : E0-E2
  JCXZ/JECXZ/JRCXZ : E3

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:06 +03:00
Takuya Yoshikawa 5c5df76b8b KVM: x86 emulator: Clean up INT n/INTO/INT 3(CC/CD/CE)
Call emulate_int() directly to avoid spaghetti goto's.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:04 +03:00
Takuya Yoshikawa 1bd5f469b2 KVM: x86 emulator: Use opcode::execute for MOV(8C/8E)
Different functions for those which take segment register operands.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:03 +03:00
Takuya Yoshikawa ebda02c2a5 KVM: x86 emulator: Use opcode::execute for RET(C3)
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:02 +03:00
Takuya Yoshikawa e4f973ae91 KVM: x86 emulator: Use opcode::execute for XCHG(86/87)
In addition, replace one "goto xchg" with an em_xchg() call.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:01 +03:00
Takuya Yoshikawa 9f21ca599c KVM: x86 emulator: Use opcode::execute for TEST(84/85, A8/A9)
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:16:00 +03:00
Takuya Yoshikawa db5b0762f3 KVM: x86 emulator: Use opcode::execute for some instructions
Move the following functions to the opcode tables:

  RET (Far return) : CB
  IRET             : CF
  JMP (Jump far)   : EA

  SYSCALL          : 0F 05
  CLTS             : 0F 06
  SYSENTER         : 0F 34
  SYSEXIT          : 0F 35

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:15:59 +03:00
Takuya Yoshikawa e01991e71a KVM: x86 emulator: Rename emulate_xxx() to em_xxx()
The next patch will change these to be called by opcode::execute.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:15:58 +03:00
Takuya Yoshikawa 9d74191ab1 KVM: x86 emulator: Use the pointers ctxt and c consistently
We should use the local variables ctxt and c when the emulate_ctxt and
decode appears many times.  At least, we need to be consistent about
how we use these in a function.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 13:15:57 +03:00
Russell King 022ae537b2 ARM: dma: replace ISA_DMA_THRESHOLD with a variable
ISA_DMA_THRESHOLD has been unused by non-arch code, so lets now get
rid of it from ARM by replacing it with arm_dma_zone_mask.  Move
dma_supported() and dma_set_mask() out of line, and have
dma_supported() check this new variable instead.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-12 11:08:12 +01:00
Ido Yariv f299bb9527 arm: davinci: Fix low level gpio irq handlers' argument
Commit 7416401 ("arm: davinci: Fix fallout from generic irq chip
conversion") introduced a bug, causing low level interrupt handlers to
get a bogus irq number as an argument. The gpio irq handler falsely
assumes that the handler data is the irq base number and that is no
longer true.

Set the irq handler data to be a pointer to the corresponding gpio
controller. The chained irq handler can then use it to extract both the
irq base number and the gpio registers structure.

Signed-off-by: Ido Yariv <ido@wizery.com>
CC: Thomas Gleixner <tglx@linutronix.de>
[nsekhar@ti.com: renamed "ctl" to "d", simplified indexing logic for chips and
took care of odd bank handling in irq handler]
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
2011-07-12 14:21:43 +05:30
Nadav Har'El 2844d84905 KVM: nVMX: Miscellenous small corrections
Small corrections of KVM (spelling, etc.) not directly related to nested VMX.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:19 +03:00
Nadav Har'El 7b8050f570 KVM: nVMX: Add VMX to list of supported cpuid features
If the "nested" module option is enabled, add the "VMX" CPU feature to the
list of CPU features KVM advertises with the KVM_GET_SUPPORTED_CPUID ioctl.

Qemu uses this ioctl, and intersects KVM's list with its own list of desired
cpu features (depending on the -cpu option given to qemu) to determine the
final list of features presented to the guest.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:19 +03:00
Nadav Har'El 7991825b85 KVM: nVMX: Additional TSC-offset handling
In the unlikely case that L1 does not capture MSR_IA32_TSC, L0 needs to
emulate this MSR write by L2 by modifying vmcs02.tsc_offset. We also need to
set vmcs12.tsc_offset, for this change to survive the next nested entry (see
prepare_vmcs02()).
Additionally, we also need to modify vmx_adjust_tsc_offset: The semantics
of this function is that the TSC of all guests on this vcpu, L1 and possibly
several L2s, need to be adjusted. To do this, we need to adjust vmcs01's
tsc_offset (this offset will also apply to each L2s we enter). We can't set
vmcs01 now, so we have to remember this adjustment and apply it when we
later exit to L1.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:19 +03:00
Nadav Har'El 36cf24e01e KVM: nVMX: Further fixes for lazy FPU loading
KVM's "Lazy FPU loading" means that sometimes L0 needs to set CR0.TS, even
if a guest didn't set it. Moreover, L0 must also trap CR0.TS changes and
NM exceptions, even if we have a guest hypervisor (L1) who didn't want these
traps. And of course, conversely: If L1 wanted to trap these events, we
must let it, even if L0 is not interested in them.

This patch fixes some existing KVM code (in update_exception_bitmap(),
vmx_fpu_activate(), vmx_fpu_deactivate()) to do the correct merging of L0's
and L1's needs. Note that handle_cr() was already fixed in the above patch,
and that new code in introduced in previous patches already handles CR0
correctly (see prepare_vmcs02(), prepare_vmcs12(), and nested_vmx_vmexit()).

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:18 +03:00
Nadav Har'El eeadf9e755 KVM: nVMX: Handling of CR0 and CR4 modifying instructions
When L2 tries to modify CR0 or CR4 (with mov or clts), and modifies a bit
which L1 asked to shadow (via CR[04]_GUEST_HOST_MASK), we already do the right
thing: we let L1 handle the trap (see nested_vmx_exit_handled_cr() in a
previous patch).
When L2 modifies bits that L1 doesn't care about, we let it think (via
CR[04]_READ_SHADOW) that it did these modifications, while only changing
(in GUEST_CR[04]) the bits that L0 doesn't shadow.

This is needed for corect handling of CR0.TS for lazy FPU loading: L0 may
want to leave TS on, while pretending to allow the guest to change it.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:18 +03:00
Nadav Har'El 66c78ae40c KVM: nVMX: Correct handling of idt vectoring info
This patch adds correct handling of IDT_VECTORING_INFO_FIELD for the nested
case.

When a guest exits while delivering an interrupt or exception, we get this
information in IDT_VECTORING_INFO_FIELD in the VMCS. When L2 exits to L1,
there's nothing we need to do, because L1 will see this field in vmcs12, and
handle it itself. However, when L2 exits and L0 handles the exit itself and
plans to return to L2, L0 must inject this event to L2.

In the normal non-nested case, the idt_vectoring_info case is discovered after
the exit, and the decision to inject (though not the injection itself) is made
at that point. However, in the nested case a decision of whether to return
to L2 or L1 also happens during the injection phase (see the previous
patches), so in the nested case we can only decide what to do about the
idt_vectoring_info right after the injection, i.e., in the beginning of
vmx_vcpu_run, which is the first time we know for sure if we're staying in
L2.

Therefore, when we exit L2 (is_guest_mode(vcpu)), we disable the regular
vmx_complete_interrupts() code which queues the idt_vectoring_info for
injection on next entry - because such injection would not be appropriate
if we will decide to exit to L1. Rather, we just save the idt_vectoring_info
and related fields in vmcs12 (which is a convenient place to save these
fields). On the next entry in vmx_vcpu_run (*after* the injection phase,
potentially exiting to L1 to inject an event requested by user space), if
we find ourselves in L1 we don't need to do anything with those values
we saved (as explained above). But if we find that we're in L2, or rather
*still* at L2 (it's not nested_run_pending, meaning that this is the first
round of L2 running after L1 having just launched it), we need to inject
the event saved in those fields - by writing the appropriate VMCS fields.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:18 +03:00
Nadav Har'El 0b6ac343fc KVM: nVMX: Correct handling of exception injection
Similar to the previous patch, but concerning injection of exceptions rather
than external interrupts.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:17 +03:00
Nadav Har'El b6f1250edb KVM: nVMX: Correct handling of interrupt injection
The code in this patch correctly emulates external-interrupt injection
while a nested guest L2 is running.

Because of this code's relative un-obviousness, I include here a longer-than-
usual justification for what it does - much longer than the code itself ;-)

To understand how to correctly emulate interrupt injection while L2 is
running, let's look first at what we need to emulate: How would things look
like if the extra L0 hypervisor layer is removed, and instead of L0 injecting
an interrupt, we had hardware delivering an interrupt?

Now we have L1 running on bare metal with a guest L2, and the hardware
generates an interrupt. Assuming that L1 set PIN_BASED_EXT_INTR_MASK to 1, and
VM_EXIT_ACK_INTR_ON_EXIT to 0 (we'll revisit these assumptions below), what
happens now is this: The processor exits from L2 to L1, with an external-
interrupt exit reason but without an interrupt vector. L1 runs, with
interrupts disabled, and it doesn't yet know what the interrupt was. Soon
after, it enables interrupts and only at that moment, it gets the interrupt
from the processor. when L1 is KVM, Linux handles this interrupt.

Now we need exactly the same thing to happen when that L1->L2 system runs
on top of L0, instead of real hardware. This is how we do this:

When L0 wants to inject an interrupt, it needs to exit from L2 to L1, with
external-interrupt exit reason (with an invalid interrupt vector), and run L1.
Just like in the bare metal case, it likely can't deliver the interrupt to
L1 now because L1 is running with interrupts disabled, in which case it turns
on the interrupt window when running L1 after the exit. L1 will soon enable
interrupts, and at that point L0 will gain control again and inject the
interrupt to L1.

Finally, there is an extra complication in the code: when nested_run_pending,
we cannot return to L1 now, and must launch L2. We need to remember the
interrupt we wanted to inject (and not clear it now), and do it on the
next exit.

The above explanation shows that the relative strangeness of the nested
interrupt injection code in this patch, and the extra interrupt-window
exit incurred, are in fact necessary for accurate emulation, and are not
just an unoptimized implementation.

Let's revisit now the two assumptions made above:

If L1 turns off PIN_BASED_EXT_INTR_MASK (no hypervisor that I know
does, by the way), things are simple: L0 may inject the interrupt directly
to the L2 guest - using the normal code path that injects to any guest.
We support this case in the code below.

If L1 turns on VM_EXIT_ACK_INTR_ON_EXIT, things look very different from the
description above: L1 expects to see an exit from L2 with the interrupt vector
already filled in the exit information, and does not expect to be interrupted
again with this interrupt. The current code does not (yet) support this case,
so we do not allow the VM_EXIT_ACK_INTR_ON_EXIT exit-control to be turned on
by L1.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:17 +03:00
Nadav Har'El 644d711aa0 KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit
This patch contains the logic of whether an L2 exit should be handled by L0
and then L2 should be resumed, or whether L1 should be run to handle this
exit (using the nested_vmx_vmexit() function of the previous patch).

The basic idea is to let L1 handle the exit only if it actually asked to
trap this sort of event. For example, when L2 exits on a change to CR0,
we check L1's CR0_GUEST_HOST_MASK to see if L1 expressed interest in any
bit which changed; If it did, we exit to L1. But if it didn't it means that
it is we (L0) that wished to trap this event, so we handle it ourselves.

The next two patches add additional logic of what to do when an interrupt or
exception is injected: Does L0 need to do it, should we exit to L1 to do it,
or should we resume L2 and keep the exception to be injected later.

We keep a new flag, "nested_run_pending", which can override the decision of
which should run next, L1 or L2. nested_run_pending=1 means that we *must* run
L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2
and therefore expects L2 to be run (and perhaps be injected with an event it
specified, etc.). Nested_run_pending is especially intended to avoid switching
to L1 in the injection decision-point described above.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:16 +03:00
Nadav Har'El 7c1779384a KVM: nVMX: vmcs12 checks on nested entry
This patch adds a bunch of tests of the validity of the vmcs12 fields,
according to what the VMX spec and our implementation allows. If fields
we cannot (or don't want to) honor are discovered, an entry failure is
emulated.

According to the spec, there are two types of entry failures: If the problem
was in vmcs12's host state or control fields, the VMLAUNCH instruction simply
fails. But a problem is found in the guest state, the behavior is more
similar to that of an exit.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:16 +03:00
Nadav Har'El 4704d0befb KVM: nVMX: Exiting from L2 to L1
This patch implements nested_vmx_vmexit(), called when the nested L2 guest
exits and we want to run its L1 parent and let it handle this exit.

Note that this will not necessarily be called on every L2 exit. L0 may decide
to handle a particular exit on its own, without L1's involvement; In that
case, L0 will handle the exit, and resume running L2, without running L1 and
without calling nested_vmx_vmexit(). The logic for deciding whether to handle
a particular exit in L1 or in L0, i.e., whether to call nested_vmx_vmexit(),
will appear in a separate patch below.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:16 +03:00
Nadav Har'El 99e65e805d KVM: nVMX: No need for handle_vmx_insn function any more
Before nested VMX support, the exit handler for a guest executing a VMX
instruction (vmclear, vmlaunch, vmptrld, vmptrst, vmread, vmread, vmresume,
vmwrite, vmon, vmoff), was handle_vmx_insn(). This handler simply threw a #UD
exception. Now that all these exit reasons are properly handled (and emulate
the respective VMX instruction), nothing calls this dummy handler and it can
be removed.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:15 +03:00
Nadav Har'El cd232ad02f KVM: nVMX: Implement VMLAUNCH and VMRESUME
Implement the VMLAUNCH and VMRESUME instructions, allowing a guest
hypervisor to run its own guests.

This patch does not include some of the necessary validity checks on
vmcs12 fields before the entry. These will appear in a separate patch
below.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:15 +03:00
Nadav Har'El fe3ef05c75 KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12
This patch contains code to prepare the VMCS which can be used to actually
run the L2 guest, vmcs02. prepare_vmcs02 appropriately merges the information
in vmcs12 (the vmcs that L1 built for L2) and in vmcs01 (our desires for our
own guests).

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:14 +03:00
Nadav Har'El bf8179a011 KVM: nVMX: Move control field setup to functions
Move some of the control field setup to common functions. These functions will
also be needed for running L2 guests - L0's desires (expressed in these
functions) will be appropriately merged with L1's desires.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:14 +03:00
Nadav Har'El a3a8ff8ebf KVM: nVMX: Move host-state field setup to a function
Move the setting of constant host-state fields (fields that do not change
throughout the life of the guest) from vmx_vcpu_setup to a new common function
vmx_set_constant_host_state(). This function will also be used to set the
host state when running L2 guests.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:14 +03:00
Nadav Har'El 49f705c532 KVM: nVMX: Implement VMREAD and VMWRITE
Implement the VMREAD and VMWRITE instructions. With these instructions, L1
can read and write to the VMCS it is holding. The values are read or written
to the fields of the vmcs12 structure introduced in a previous patch.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:14 +03:00
Nadav Har'El 6a4d755060 KVM: nVMX: Implement VMPTRST
This patch implements the VMPTRST instruction.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:13 +03:00
Nadav Har'El 63846663ea KVM: nVMX: Implement VMPTRLD
This patch implements the VMPTRLD instruction.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:12 +03:00
Nadav Har'El 27d6c86521 KVM: nVMX: Implement VMCLEAR
This patch implements the VMCLEAR instruction.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:12 +03:00
Nadav Har'El 0140caea3b KVM: nVMX: Success/failure of VMX instructions.
VMX instructions specify success or failure by setting certain RFLAGS bits.
This patch contains common functions to do this, and they will be used in
the following patches which emulate the various VMX instructions.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:12 +03:00
Nadav Har'El 22bd035868 KVM: nVMX: Add VMCS fields to the vmcs12
In this patch we add to vmcs12 (the VMCS that L1 keeps for L2) all the
standard VMCS fields.

Later patches will enable L1 to read and write these fields using VMREAD/
VMWRITE, and they will be used during a VMLAUNCH/VMRESUME in preparing vmcs02,
a hardware VMCS for running L2.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:11 +03:00
Nadav Har'El ff2f6fe961 KVM: nVMX: Introduce vmcs02: VMCS used to run L2
We saw in a previous patch that L1 controls its L2 guest with a vcms12.
L0 needs to create a real VMCS for running L2. We call that "vmcs02".
A later patch will contain the code, prepare_vmcs02(), for filling the vmcs02
fields. This patch only contains code for allocating vmcs02.

In this version, prepare_vmcs02() sets *all* of vmcs02's fields each time we
enter from L1 to L2, so keeping just one vmcs02 for the vcpu is enough: It can
be reused even when L1 runs multiple L2 guests. However, in future versions
we'll probably want to add an optimization where vmcs02 fields that rarely
change will not be set each time. For that, we may want to keep around several
vmcs02s of L2 guests that have recently run, so that potentially we could run
these L2s again more quickly because less vmwrites to vmcs02 will be needed.

This patch adds to each vcpu a vmcs02 pool, vmx->nested.vmcs02_pool,
which remembers the vmcs02s last used to run up to VMCS02_POOL_SIZE L2s.
As explained above, in the current version we choose VMCS02_POOL_SIZE=1,
I.e., one vmcs02 is allocated (and loaded onto the processor), and it is
reused to enter any L2 guest. In the future, when prepare_vmcs02() is
optimized not to set all fields every time, VMCS02_POOL_SIZE should be
increased.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:11 +03:00
Nadav Har'El 064aea7747 KVM: nVMX: Decoding memory operands of VMX instructions
This patch includes a utility function for decoding pointer operands of VMX
instructions issued by L1 (a guest hypervisor)

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:11 +03:00
Nadav Har'El b87a51ae28 KVM: nVMX: Implement reading and writing of VMX MSRs
When the guest can use VMX instructions (when the "nested" module option is
on), it should also be able to read and write VMX MSRs, e.g., to query about
VMX capabilities. This patch adds this support.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:11 +03:00
Nadav Har'El a9d30f33dd KVM: nVMX: Introduce vmcs12: a VMCS structure for L1
An implementation of VMX needs to define a VMCS structure. This structure
is kept in guest memory, but is opaque to the guest (who can only read or
write it with VMX instructions).

This patch starts to define the VMCS structure which our nested VMX
implementation will present to L1. We call it "vmcs12", as it is the VMCS
that L1 keeps for its L2 guest. We will add more content to this structure
in later patches.

This patch also adds the notion (as required by the VMX spec) of L1's "current
VMCS", and finally includes utility functions for mapping the guest-allocated
VMCSs in host memory.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:10 +03:00
Nadav Har'El 5e1746d620 KVM: nVMX: Allow setting the VMXE bit in CR4
This patch allows the guest to enable the VMXE bit in CR4, which is a
prerequisite to running VMXON.

Whether to allow setting the VMXE bit now depends on the architecture (svm
or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function
now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4()
will also return 1, and this will cause kvm_set_cr4() will throw a #GP.

Turning on the VMXE bit is allowed only when the nested VMX feature is
enabled, and turning it off is forbidden after a vmxon.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:10 +03:00
Nadav Har'El ec378aeef9 KVM: nVMX: Implement VMXON and VMXOFF
This patch allows a guest to use the VMXON and VMXOFF instructions, and
emulates them accordingly. Basically this amounts to checking some
prerequisites, and then remembering whether the guest has enabled or disabled
VMX operation.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:09 +03:00
Nadav Har'El 801d342432 KVM: nVMX: Add "nested" module option to kvm_intel
This patch adds to kvm_intel a module option "nested". This option controls
whether the guest can use VMX instructions, i.e., whether we allow nested
virtualization. A similar, but separate, option already exists for the
SVM module.

This option currently defaults to 0, meaning that nested VMX must be
explicitly enabled by giving nested=1. When nested VMX matures, the default
should probably be changed to enable nested VMX by default - just like
nested SVM is currently enabled by default.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:09 +03:00
Takuya Yoshikawa b5c9ff731f KVM: x86 emulator: Avoid clearing the whole decode_cache
During tracing the emulator, we noticed that init_emulate_ctxt()
sometimes took a bit longer time than we expected.

This patch is for mitigating the problem by some degree.

By looking into the function, we soon notice that it clears the whole
decode_cache whose size is about 2.5K bytes now.  Furthermore, most of
the bytes are taken for the two read_cache arrays, which are used only
by a few instructions.

Considering the fact that we are not assuming the cache arrays have
been cleared when we store actual data, we do not need to clear the
arrays: 2K bytes elimination.  In addition, we can avoid clearing the
fetch_cache and regs arrays.

This patch changes the initialization not to clear the arrays.

On our 64-bit host, init_emulate_ctxt() becomes 0.3 to 0.5us faster with
this patch applied.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 11:45:09 +03:00
Takuya Yoshikawa adf52235b4 KVM: x86 emulator: Clean up init_emulate_ctxt()
Use a local pointer to the emulate_ctxt for simplicity.  Then, arrange
the hard-to-read mode selection lines neatly.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 11:45:08 +03:00
Jan Kiszka d780592b99 KVM: Clean up error handling during VCPU creation
So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if
it fails. Move this confusing resonsibility back into the hands of
kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected,
all other archs cannot fail.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 11:45:08 +03:00
Nadav Har'El d462b81923 KVM: VMX: Keep list of loaded VMCSs, instead of vcpus
In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it
because (at least in theory) the processor might not have written all of its
content back to memory. Since a patch from June 26, 2008, this is done using
a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU.

The problem is that with nested VMX, we no longer have the concept of a
vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for
L2s), and each of those may be have been last loaded on a different cpu.

So instead of linking the vcpus, we link the VMCSs, using a new structure
loaded_vmcs. This structure contains the VMCS, and the information pertaining
to its loading on a specific cpu (namely, the cpu number, and whether it
was already launched on this cpu once). In nested we will also use the same
structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the
currently active VMCS.

Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 11:45:08 +03:00
Avi Kivity 24c82e576b KVM: Sanitize cpuid
Instead of blacklisting known-unsupported cpuid leaves, whitelist known-
supported leaves.  This is more conservative and prevents us from reporting
features we don't support.  Also whitelist a few more leaves while at it.

Signed-off-by: Avi Kivity <avi@redhat.com>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:07 +03:00
Xiao Guangrong bcdd9a93c5 KVM: MMU: cleanup for dropping parent pte
Introduce drop_parent_pte to remove the rmap of parent pte and
clear parent pte

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:07 +03:00
Xiao Guangrong 38e3b2b28c KVM: MMU: cleanup for kvm_mmu_page_unlink_children
Cleanup the same operation between kvm_mmu_page_unlink_children and
mmu_pte_write_zap_pte

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:07 +03:00
Xiao Guangrong 67052b3508 KVM: MMU: remove the arithmetic of parent pte rmap
Parent pte rmap and page rmap are very similar, so use the same arithmetic
for them

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:07 +03:00
Xiao Guangrong 53c07b1878 KVM: MMU: abstract the operation of rmap
Abstract the operation of rmap to spte_list, then we can use it for the
reverse mapping of parent pte in the later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:06 +03:00
Xiao Guangrong 1249b96e72 KVM: fix uninitialized warning
Fix:

 warning: ‘cs_sel’ may be used uninitialized in this function
 warning: ‘ss_sel’ may be used uninitialized in this function

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:06 +03:00
Xiao Guangrong 8b0cedff04 KVM: use __copy_to_user/__clear_user to write guest page
Simply use __copy_to_user/__clear_user to write guest page since we have
already verified the user address when the memslot is set

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:03 +03:00
Xiao Guangrong 332b207d65 KVM: MMU: optimize pte write path if don't have protected sp
Simply return from kvm_mmu_pte_write path if no shadow page is
write-protected, then we can avoid to walk all shadow pages and hold
mmu-lock

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:02 +03:00
Avi Kivity 96304217a7 KVM: VMX: always_inline VMREADs
vmcs_readl() and friends are really short, but gcc thinks they are long because of
the out-of-line exception handlers.  Mark them always_inline to clear the
misunderstanding.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:01 +03:00
Avi Kivity 5e520e6278 KVM: VMX: Move VMREAD cleanup to exception handler
We clean up a failed VMREAD by clearing the output register.  Do
it in the exception handler instead of unconditionally.  This is
worthwhile since there are more than a hundred call sites.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:00 +03:00
Takuya Yoshikawa 7b105ca290 KVM: x86 emulator: Stop passing ctxt->ops as arg of emul functions
Dereference it in the actual users.

This not only cleans up the emulator but also makes it easy to convert
the old emulation functions to the new em_xxx() form later.

Note: Remove some inline keywords to let the compiler decide inlining.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:44:59 +03:00
Takuya Yoshikawa ef5d75cc9a KVM: x86 emulator: Stop passing ctxt->ops as arg of decode helpers
Dereference it in the actual users: only do_insn_fetch_byte().

This is consistent with the way __linearize() dereferences it.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:44:57 +03:00
Takuya Yoshikawa 67cbc90db5 KVM: x86 emulator: Place insn_fetch helpers together
The two macros need special care to use:
  Assume rc, ctxt, ops and done exist outside of them.
  Can goto outside.

Considering the fact that these are used only in decode functions,
moving these right after do_insn_fetch() seems to be a right thing
to improve the readability.

We also rename do_fetch_insn_byte() to do_insn_fetch_byte() to be
consistent.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:44:56 +03:00
Benjamin Herrenschmidt 770e1ac5f2 powerpc/mm: Fix memory_block_size_bytes() for non-pseries
Just compiling pseries in the kernel causes it to override
memory_block_size_bytes() regardless of what is the runtime
platform.

This cleans up the implementation of that function, fixing
a bug or two while at it, so that it's harmless (and potentially
useful) for other platforms. Without this, bugs in that code
would trigger a WARN_ON() in drivers/base/memory.c when
booting some different platforms.

If/when we have another platform supporting memory hotplug we
might want to either move that out to a generic place or
make it a ppc_md. callback.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-12 11:16:45 +10:00
Benjamin Herrenschmidt a63fdc5156 mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header
The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c
files, and I need it in a third one to fix a powerpc bug, so let's
first move it into a header

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
2011-07-12 11:08:01 +10:00
Arnd Bergmann 24109afd17 Merge branch 'fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ycmiao/pxa-linux-2.6 into fixes 2011-07-11 23:26:16 +02:00
Raghavendra D Prabhu 3c52b7bf69 xen:pvhvm: Modpost section mismatch fix
Removing __init from check_platform_magic since it is called by
xen_unplug_emulated_devices in non-init contexts (It probably gets inlined
because of -finline-functions-called-once, removing __init is more to avoid
mismatch being reported).

Signed-off-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:37:04 -04:00
Konrad Rzeszutek Wilk 97ffab1f14 xen/pci: Use 'acpi_gsi_to_irq' value unconditionally.
In the past we would only use the function's value if the
returned value was not equal to 'acpi_sci_override_gsi'. Meaning
that the INT_SRV_OVR values for global and source irq were different.
But it is OK to use the function's value even when the global
and source irq are the same.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:34 -04:00
Konrad Rzeszutek Wilk 78316ada22 xen/pci: Remove 'xen_allocate_pirq_gsi'.
In the past (2.6.38) the 'xen_allocate_pirq_gsi' would allocate
an entry in a Linux IRQ -> {XEN_IRQ, type, event, ..} array. All
of that has been removed in 2.6.39 and the Xen IRQ subsystem uses
an linked list that is populated when the call to
'xen_allocate_irq_gsi' (universally done from any of the xen_bind_*
calls) is done. The 'xen_allocate_pirq_gsi' is a NOP and there is
no need for it anymore so lets remove it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:33 -04:00
Konrad Rzeszutek Wilk 34b1d1269d xen/pci: Retire unnecessary #ifdef CONFIG_ACPI
As the code paths that are guarded by CONFIG_XEN_DOM0 already depend
on CONFIG_ACPI so the extra #ifdef is not required. The earlier
patch that added them in had done its job.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:32 -04:00
Konrad Rzeszutek Wilk 9b6519db5e xen/pci: Move the allocation of IRQs when there are no IOAPIC's to the end
.. which means we can preset of NR_IRQS_LEGACY interrupts using
the 'acpi_get_override_irq' API before this loop.
This means that we can get the IRQ's polarity (and trigger) from either
the ACPI (or MP); or use the default values. This fixes a bug if we did
not have an IOAPIC we would not been able to preset the IRQ's polarity
if the MP table existed.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:31 -04:00
Konrad Rzeszutek Wilk a0ee056709 xen/pci: Squash pci_xen_initial_domain and xen_setup_pirqs together.
Since they are only called once and the rest of the pci_xen_*
functions follow the same pattern of setup.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:30 -04:00
Konrad Rzeszutek Wilk ed89eb6396 xen/pci: Use the xen_register_pirq for HVM and initial domain users
.. to cut down on the code duplicity.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:29 -04:00
Konrad Rzeszutek Wilk 30bd35edfd xen/pci: In xen_register_pirq bind the GSI to the IRQ after the hypercall.
Not before .. also that code segment starts looking like the HVM one.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:28 -04:00
Konrad Rzeszutek Wilk d92edd814e xen/pci: Provide #ifdef CONFIG_ACPI to easy code squashing.
In the past we would guard those code segments to be dependent
on CONFIG_XEN_DOM0 (which depends on CONFIG_ACPI) so this patch is
not stricly necessary. But the next patch will merge common
HVM and initial domain code and we want to make sure the CONFIG_ACPI
dependency is preserved - as HVM code does not depend on CONFIG_XEN_DOM0.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:27 -04:00
Konrad Rzeszutek Wilk 996c34aee3 xen/pci: Update comments and fix empty spaces.
Update the out-dated comment at the beginning of the file.
Also provide the copyrights of folks who have been contributing
to this code lately.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:26 -04:00
Konrad Rzeszutek Wilk fef6e26208 xen/pci: Shuffle code around.
The file is hard to read. Move the code around so that
the contents of it follows a uniform format:
 - setup GSIs - PV, HVM, and initial domain case
 - then MSI/MSI-x setup - PV, HVM and then initial domain case.
 - then MSI/MSI-x teardown - same order.
 - lastly, the __init functions in PV, HVM, and initial domain order.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-11 13:19:25 -04:00
Naga Chumbalkar 7fece83235 x86, ioapic: Also print Dest field
The code in setup_ioapic_irq() determines the Destination Field,
so why not also include it in the debug printk output that gets
displayed when the boot parameter "apic=debug" is used.

Before the change, "dmesg" will show:

 IOAPIC[0]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0)
 IOAPIC[0]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0)
 IOAPIC[0]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0) ...

After the change, you will see:

 IOAPIC[0]: Set routing entry (8-1 -> 0x31 -> IRQ 1 Mode:0 Active:0 Dest:0)
 IOAPIC[0]: Set routing entry (8-2 -> 0x30 -> IRQ 0 Mode:0 Active:0 Dest:0)
 IOAPIC[0]: Set routing entry (8-3 -> 0x33 -> IRQ 3 Mode:0 Active:0 Dest:0) ...

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708184603.2734.91071.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-11 16:31:05 +02:00
Naga Chumbalkar bd6a46e087 x86, ioapic: Format clean up for IOAPIC output
When IOAPIC data is displayed in "dmesg" with the help of the
boot parameter "apic=debug" certain values are not formatted
correctly wrt their size.

In the "dmesg" snippet below, note that the output for "max
redirection entries", and "IO APIC version" which are each
defined to be just 8-bits long are displayed as 2 bytes in
length. Similarly, "Dst" under the "IRQ redirection table"
should only be 8-bits long.

IO APIC #0......
...
...
.... register #01: 00170020
.......     : max redirection entries: 0017
.......     : PRQ implemented: 0
.......     : IO APIC version: 0020
...
...
.... IRQ redirection table:
 NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 000 1    0    0   0   0    0    0    00
 01 000 0    0    0   0   0    0    0    31
 02 000 0    0    0   0   0    0    0    30
 03 000 1    0    0   0   0    0    0    33
...
...

Do some formatting clean up, so you will see output like below:

IO APIC #0......
...
...
.... register #01: 00170020
.......     : max redirection entries: 17
.......     : PRQ implemented: 0
.......     : IO APIC version: 20
...
...
.... IRQ redirection table:
 NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 00  1    0    0   0   0    0    0    00
 01 00  0    0    0   0   0    0    0    31
 02 00  0    0    0   0   0    0    0    30
 03 00  1    0    0   0   0    0    0    33
...
...

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708184557.2734.61830.sendpatchset@nchumbalkar.americas.cpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-11 16:31:05 +02:00
Tejun Heo 5da0ef9a85 x86: Disable AMD_NUMA for 32bit for now
Commit 2706a0bf7b ("x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit
too") enabled AMD NUMA for 32bit too.  Unfortunately, SPARSEMEM
on 32bit had rather coarse (512MiB) addr->node mapping
granularity due to lack of space in page->flags.  This led to
boot failure on certain AMD NUMA machines which had 128MiB
alignment on nodes.

Patches to properly detect this condition and reject NUMA
configuration are posted[1] but deemed too pervasive for merge
at this point (-rc6).  Disable AMD NUMA for 32bit for now and
re-enable once the detection logic is merged.

[1] http://thread.gmane.org/gmane.linux.kernel/1161279/focus=1162583

Reported-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Conny Seidel <conny.seidel@amd.com>
Link: http://lkml.kernel.org/r/20110711083432.GC943@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-11 16:25:30 +02:00
Daniel Mack 9c0de4947a ARM: pxa/raumfeld: fix device name for codec ak4104
In commit f0fba2ad (ASoC: multi-component - ASoC Multi-Component
Support), the name of the ak4104 codec driver was changed without
amending the platform code which uses it as well.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2011-07-11 14:26:33 +08:00
Sven Neumann a74fe1194b ARM: pxa/raumfeld: display initialisation fixes
The display requires some milliseconds between GPIO_TFT_VA_EN
and GPIO_DISPLAY_ENABLE. Reorder initialisation to comply with
the display spec.

Also tune timings for better compliance with the spec.

Signed-off-by: Sven Neumann <s.neumann@raumfeld.com>
Acked-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2011-07-11 14:26:32 +08:00
Sven Neumann 8c568df9d7 ARM: pxa/raumfeld: adapt to upcoming hardware change
The backlight control is going to change back to PWM in the
upcoming Raumfeld Controller hardware revision.

Signed-off-by: Sven Neumann <s.neumann@raumfeld.com>
Acked-by: Daniel Mack <zonque@gmail.com>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2011-07-11 14:26:31 +08:00
Linus Walleij a065685d2f ARM: pxa: fix gpio_to_chip() clash with gpiolib namespace
The PXA platform code has a static inline helper called
gpio_to_chip which clashes with the gpiolib namespace if we
try to expose the function with the same name from gpiolib,
and it's still confusing even if we don't do that. So rename
it to gpio_to_pxachip().

Reported-by: H Hartley Sweeten <hartleys@visionengravers.com>
Cc: Eric Miao <eric.miao@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2011-07-11 14:11:00 +08:00
Linus Torvalds 7fc7693627 Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
  ARM: 6994/1: smp_twd: Fix typo in 'twd_timer_rate' printing
  ARM: 6987/1: l2x0: fix disabling function to avoid deadlock
  ARM: 6966/1: ep93xx: fix inverted RTS/DTR signals on uart1
  ARM: 6980/1: mmci: use StartBitErr to detect bad connections
  ARM: 6979/1: mach-vt8500: add forgotten irq_data conversion
  ARM: move memory layout sanity checking before meminfo initialization
  ARM: 6990/1: MAINTAINERS: add entry for ARM PMU profiling and debugging
  ARM: 6989/1: perf: do not start the PMU when no events are present
  ARM: dmabounce: fix map_single() error return value
2011-07-10 07:28:30 -07:00
Magnus Damm 775b8ae870 ARM: mach-shmobile: sh7372 A3RV requires A4LC
Add a power domain workaround for the VPU and A3RV on sh7372.

The sh7372 data sheet mentions that the VPU is located in the
A3RV power domain. The A3RV power domain is not related to A4LC
in any way, but testing shows that unless A3RV _and_ A4LC are
powered on the VPU test program will bomb out.

This issue may be caused by a more or less undocumented dependency
on the MERAM block that happens to be located in A4LC. So now we
know that the out-of-reset requirement of the VPU is that the MERAM
is powered on.

This patch adds a workaround for A3RV to make sure A4LC is powered
on - this so we can use the VPU even though the LCDCs are in blanking
state and A4LC is supposed to be off.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-10 10:39:32 +02:00
Magnus Damm b9416f03fd ARM: mach-shmobile: sh7372 late pm domain off
Add sh7372 specific code to power down unused pm domains.

This should really be replaced by some generic PM core
code IMO, but until that happens this patch makes sure
we don't waste power by leaving unused power domains on.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-10 10:38:53 +02:00
Magnus Damm 999a4d2a4d ARM: mach-shmobile: Runtime PM late init callback
Add a mach-shmobile specific callback for SoC-specific code
to hook into. By having the late_initcall() in a common place
we can have multi-SoC/board support in the same kernel binary.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-10 10:38:34 +02:00
Magnus Damm d24771dec9 ARM: mach-shmobile: sh7372 D4 support
Add support for the sh7372 D4 power domain. This power domain
contains the Coresight-ETM hardware block.

Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-10 10:38:22 +02:00
Kuninori Morimoto c1ba5bb560 ARM: mach-shmobile: sh7372 A4MP support
Add support for the sh7372 A4MP power domain
and hook up the FSI/SPU2 device.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-10 10:12:08 +02:00
Kuninori Morimoto a41b64665a ARM: mach-shmobile: sh7372: make sure that fsi is peripheral of spu2
FSI act as peripheral circuits of the SPU2.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Acked-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-07-10 10:11:57 +02:00
Russell King 19dad35fe0 ARM: vfp: ensure that thread flushing works if preempted
Prevent a preemption event causing the initialized VFP state being
overwritten by ensuring that the VFP hardware access is disabled
prior to starting initialization.  We can then do this in safety
while still allowing preemption to occur.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-09 17:41:33 +01:00
Russell King f8f2a8522a ARM: vfp: fix a hole in VFP thread migration
Fix a hole in the VFP thread migration.  Lets define two threads.

Thread 1, we'll call 'interesting_thread' which is a thread which is
running on CPU0, using VFP (so vfp_current_hw_state[0] =
&interesting_thread->vfpstate) and gets migrated off to CPU1, where
it continues execution of VFP instructions.

Thread 2, we'll call 'new_cpu0_thread' which is the thread which takes
over on CPU0.  This has also been using VFP, and last used VFP on CPU0,
but doesn't use it again.

The following code will be executed twice:

		cpu = thread->cpu;

		/*
		 * On SMP, if VFP is enabled, save the old state in
		 * case the thread migrates to a different CPU. The
		 * restoring is done lazily.
		 */
		if ((fpexc & FPEXC_EN) && vfp_current_hw_state[cpu]) {
			vfp_save_state(vfp_current_hw_state[cpu], fpexc);
			vfp_current_hw_state[cpu]->hard.cpu = cpu;
		}
		/*
		 * Thread migration, just force the reloading of the
		 * state on the new CPU in case the VFP registers
		 * contain stale data.
		 */
		if (thread->vfpstate.hard.cpu != cpu)
			vfp_current_hw_state[cpu] = NULL;

The first execution will be on CPU0 to switch away from 'interesting_thread'.
interesting_thread->cpu will be 0.

So, vfp_current_hw_state[0] points at interesting_thread->vfpstate.
The hardware state will be saved, along with the CPU number (0) that
it was executing on.

'thread' will be 'new_cpu0_thread' with new_cpu0_thread->cpu = 0.
Also, because it was executing on CPU0, new_cpu0_thread->vfpstate.hard.cpu = 0,
and so the thread migration check is not triggered.

This means that vfp_current_hw_state[0] remains pointing at interesting_thread.

The second execution will be on CPU1 to switch _to_ 'interesting_thread'.
So, 'thread' will be 'interesting_thread' and interesting_thread->cpu now
will be 1.  The previous thread executing on CPU1 is not relevant to this
so we shall ignore that.

We get to the thread migration check.  Here, we discover that
interesting_thread->vfpstate.hard.cpu = 0, yet interesting_thread->cpu is
now 1, indicating thread migration.  We set vfp_current_hw_state[1] to
NULL.

So, at this point vfp_current_hw_state[] contains the following:

[0] = &interesting_thread->vfpstate
[1] = NULL

Our interesting thread now executes a VFP instruction, takes a fault
which loads the state into the VFP hardware.  Now, through the assembly
we now have:

[0] = &interesting_thread->vfpstate
[1] = &interesting_thread->vfpstate

CPU1 stops due to ptrace (and so saves its VFP state) using the thread
switch code above), and CPU0 calls vfp_sync_hwstate().

	if (vfp_current_hw_state[cpu] == &thread->vfpstate) {
		vfp_save_state(&thread->vfpstate, fpexc | FPEXC_EN);

BANG, we corrupt interesting_thread's VFP state by overwriting the
more up-to-date state saved by CPU1 with the old VFP state from CPU0.

Fix this by ensuring that we have sane semantics for the various state
describing variables:

1. vfp_current_hw_state[] points to the current owner of the context
   information stored in each CPUs hardware, or NULL if that state
   information is invalid.
2. thread->vfpstate.hard.cpu always contains the most recent CPU number
   which the state was loaded into or NR_CPUS if no CPU owns the state.

So, for a particular CPU to be a valid owner of the VFP state for a
particular thread t, two things must be true:

 vfp_current_hw_state[cpu] == &t->vfpstate && t->vfpstate.hard.cpu == cpu.

and that is valid from the moment a CPU loads the saved VFP context
into the hardware.  This gives clear and consistent semantics to
interpreting these variables.

This patch also fixes thread copying, ensuring that t->vfpstate.hard.cpu
is invalidated, otherwise CPU0 may believe it was the last owner.  The
hole can happen thus:

- thread1 runs on CPU2 using VFP, migrates to CPU3, exits and thread_info
  freed.
- New thread allocated from a previously running thread on CPU2, reusing
  memory for thread1 and copying vfp.hard.cpu.

At this point, the following are true:

	new_thread1->vfpstate.hard.cpu == 2
	&new_thread1->vfpstate == vfp_current_hw_state[2]

Lastly, this also addresses thread flushing in a similar way to thread
copying.  Hole is:

- thread runs on CPU0, using VFP, migrates to CPU1 but does not use VFP.
- thread calls execve(), so thread flush happens, leaving
  vfp_current_hw_state[0] intact.  This vfpstate is memset to 0 causing
  thread->vfpstate.hard.cpu = 0.
- thread migrates back to CPU0 before using VFP.

At this point, the following are true:

	thread->vfpstate.hard.cpu == 0
	&thread->vfpstate == vfp_current_hw_state[0]

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-09 17:22:12 +01:00
Russell King 08409c33d6 ARM: vfp: rename check_exception to vfp_hw_state_valid
Rename this branch to more accurately reflect why its taken, rather
than what the following code does.  It is the only caller of this code.
This helps to clarify following changes, yet this change results in no
actual code change.

Document the VFP hardware state at the target of this branch.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-09 17:11:23 +01:00
Russell King af61bdf035 ARM: vfp: rename last_VFP_context to vfp_current_hw_state
Rename the slightly confusing 'last_VFP_context' variable to be more
descriptive of what it actually is.  This variable stores a pointer
to the current owner's vfpstate structure for the context held in the
VFP hardware.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-09 17:11:22 +01:00
Linus Torvalds f1a04dbd1e Merge branch 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung
* 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
  ARM: S3C2440: fix section mismatch on mini2440
  ARM: S3C24XX: drop return codes in void function of dma.c
  ARM: S3C24XX: don't use uninitialized variable in dma.c
  ARM: EXYNOS4: Set appropriate I2C device variant
  ARM: S5PC100: Fix for compilation error
  spi/s3c64xx: Bug fix for SPI with different FIFO level
  ARM: SAMSUNG: Add tx_st_done variable
  ARM: EXYNOS4: Address a section mismatch w/ suspend issue.
  ARM: S5P: Fix bug on init of PWMTimers for HRTimer
  ARM: SAMSUNG: header file revised to prevent declaring duplicated
  ARM: EXYNOS4: fix improper gpio configuration
  ARM: EXYNOS4: Fix card detection for sdhci 0 and 2
2011-07-08 23:28:23 -07:00
Anupam Chanda 24a42bae68 x86, hyper: Change hypervisor detection order
Detect Xen before HyperV because in Viridian compatibility mode Xen
presents itself as HyperV.  Move Xen to the top since it seems more
likely that Xen would emulate VMware than vice versa.

Signed-off-by: Anupam Chanda <achanda@nicira.com>
Link: http://lkml.kernel.org/r/1310150570-26810-1-git-send-email-achanda@nicira.com
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-08 16:22:29 -07:00
Vivek Goyal 14cb6dcf0a x86, boot: Wait for boot cpu to show up if nr_cpus limit is about to hit
nr_cpus allows one to specify number of possible cpus in the system.
Current assumption seems to be that first cpu to show up is boot cpu
and this assumption will be broken in kdump scenario where we can be
booting on a non boot cpu with nr_cpus=1.

It might happen that first cpu we parse is not the cpu we boot on and
later we ignore boot cpu. Though code later seems to recognize this
anomaly and forcibly sets boot cpu in physical cpu map with following
warning.

if (!physid_isset(hard_smp_processor_id(), phys_cpu_present_map)) {
        printk(KERN_WARNING
                "weird, boot CPU (#%d) not listed by the BIOS.\n",
                hard_smp_processor_id());

        physid_set(hard_smp_processor_id(), phys_cpu_present_map);
}

This patch waits for boot cpu to show up and starts ignoring the cpus
once we have hit (nr_cpus - 1) number of cpus. So effectively we are
reserving one slot out of nr_cpus for boot cpu explicitly.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/20110708171926.GF2930@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-08 15:33:35 -07:00
Heechul Yun c7e89b16eb ARM: 6995/2: mm: remove unnecessary cache flush on v6 copypage
Originally introduced to maintain coherency between icache and dcache
in v6 nonaliasing mode. This is now handled by __sync_icache_dcache since
c0177800, therefore unnecessary in this function.

Signed-off-by: Heechul Yun <heechul@illinois.edu>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-08 19:46:37 +01:00
Shawn Guo 14305e685d ARM: mxc: use ARCH_NR_GPIOS to define gpio number
The patch removes MXC_GPIO_IRQS and instead uses ARCH_NR_GPIOS to
define gpio number.  This change is need when we change mxc gpio
driver to be device tree aware.  When migrating the driver to device
tree, pdev->id becomes unusable.  It requires driver get gpio range
from gpio core, which will dynamically allocates number from
ARCH_NR_GPIOS to 0.

As a bonus point, it removes lines of '#if' and make the code a
little bit cleaner.  The side effect is the waste of number.  But
this is not a point when we go single image.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-08 12:38:18 -06:00
Shawn Guo e7fc6ae744 gpio/mxc: get rid of the uses of cpu_is_mx()
The patch removes all the uses of cpu_is_mx().  Instead, it utilizes
platform_device_id to distinguish the different gpio types, IMX1_GPIO
on i.mx1, IMX21_GPIO on i.mx21 and i.mx27, IMX31_GPIO on all other
i.mx SoCs.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-08 12:38:12 -06:00
Linus Torvalds c60ffcbb62 Merge branch 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
* 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6:
  omap: drop __initdata tags from static struct platform_device declarations
2011-07-08 09:00:02 -07:00
Russell King 3973c33775 ARM: dmabounce: simplify dma_set_mask()
Simplify the dmabounce specific code in dma_set_mask().  We can just
omit setting the dma mask if dmabounce is enabled (we will have already
set dma mask via callbacks when the device is created in that case.)

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-08 14:02:21 +01:00
Naga Chumbalkar ded1f6ab43 x86: print APIC data a little later during boot
To view IOAPIC data you could boot with "apic=debug".

When booting in such a way then the kernel will dump the
IO-APIC's registers, for example:

NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 000 1    0    0   0   0    0    0    00
 01 000 0    0    0   0   0    0    0    31
 02 000 0    0    0   0   0    0    0    30
 03 000 0    0    0   0   0    0    0    33
 04 000 0    0    0   0   0    0    0    34
 05 000 0    0    0   0   0    0    0    35
 06 000 0    0    0   0   0    0    0    36
 07 000 0    0    0   0   0    0    0    37
 08 000 0    0    0   0   0    0    0    38
 09 000 0    1    0   0   0    0    0    39
 0a 000 0    0    0   0   0    0    0    3A
 0b 000 0    0    0   0   0    0    0    3B
 0c 000 0    0    0   0   0    0    0    3C
 0d 000 0    0    0   0   0    0    0    3D
 0e 000 0    0    0   0   0    0    0    3E
 0f 000 0    0    0   0   0    0    0    3F
 10 000 1    0    0   0   0    0    0    00
 11 000 1    0    0   0   0    0    0    00
 12 000 1    0    0   0   0    0    0    00
 13 000 1    0    0   0   0    0    0    00
 14 000 1    0    0   0   0    0    0    00
 15 000 1    0    0   0   0    0    0    00
 16 000 1    0    0   0   0    0    0    00
 17 000 1    0    0   0   0    0    0    00

Delaying the call to print_ICs() gives better results:

NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
 00 000 1    0    0   0   0    0    0    00
 01 000 0    0    0   0   0    0    0    31
 02 000 0    0    0   0   0    0    0    30
 03 000 1    0    0   0   0    0    0    33
 04 000 1    0    0   0   0    0    0    34
 05 000 1    0    0   0   0    0    0    35
 06 000 1    0    0   0   0    0    0    36
 07 000 1    0    0   0   0    0    0    37
 08 000 0    0    0   0   0    0    0    38
 09 000 0    1    0   0   0    0    0    39
 0a 000 1    0    0   0   0    0    0    3A
 0b 000 1    0    0   0   0    0    0    3B
 0c 000 0    0    0   0   0    0    0    3C
 0d 000 1    0    0   0   0    0    0    3D
 0e 000 1    0    0   0   0    0    0    3E
 0f 000 1    0    0   0   0    0    0    3F
 10 000 1    1    0   1   0    0    0    29
 11 000 1    0    0   0   0    0    0    00
 12 000 1    0    0   0   0    0    0    00
 13 000 1    0    0   0   0    0    0    00
 14 000 0    1    0   1   0    0    0    51
 15 000 1    0    0   0   0    0    0    00
 16 000 0    1    0   1   0    0    0    61
 17 000 0    1    0   1   0    0    0    59

Notice that the entries beyond interrupt input signal 0x0f also
get populated and arent just the hw-initialization default of
all zeroes.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Link: http://lkml.kernel.org/r/20110708083555.2598.42216.sendpatchset@nchumbalkar.americas.hpqcorp.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-08 13:20:14 +02:00
Stephen Boyd 54d5257351 ARM: 6996/1: mm: Poison freed init memory
Poisoning __init marked memory can be useful when tracking down
obscure memory corruption bugs. Therefore, poison init memory
with 0xe7fddef0 to catch bugs earlier. The poison value is an
undefined instruction in ARM mode and branch to an undefined
instruction in Thumb mode.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-08 10:28:50 +01:00
Wolfram Sang d34c1fcddc ARM: S3C2440: fix section mismatch on mini2440
If mini2440_init() is in __init, mini2440_parse_features() should also
be in __init. Fixes:

(.text+0x9adc): Section mismatch in reference from the function mini2440_parse_features.clone.0() to the
(unknown reference) .init.data:(unknown)
The function mini2440_parse_features.clone.0() references the (unknown reference) __initdata (unknown).

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Michel Pollet <buserror@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-08 18:16:56 +09:00
Russell King cb5fd904f0 Merge branch 'for-rmk' of git://linux-arm.org/linux-2.6-wd into devel-stable 2011-07-08 09:54:29 +01:00
Wolfram Sang cb26a7b1c1 ARM: S3C24XX: drop return codes in void function of dma.c
Commit bb072c3c (ARM / Samsung: Use struct syscore_ops for "core" power
management) turned s3c2410_dma_resume_chan() from int to void. So, drop
the actual return values, too. Fixes:

arch/arm/plat-s3c24xx/dma.c: In function 's3c2410_dma_resume_chan':
arch/arm/plat-s3c24xx/dma.c:1238:3: warning: 'return' with a value, in function returning void
arch/arm/plat-s3c24xx/dma.c:1250:2: warning: 'return' with a value, in function returning void

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-08 14:34:05 +09:00
Wolfram Sang 5838e9b8da ARM: S3C24XX: don't use uninitialized variable in dma.c
Commit 8970ef47 (S3C24XX: Remove hardware specific registers from DMA
calls) removed the parameter dcon in s3c2410_dma_config() and calculates
it on its own. So the debug-output for the old parameter can go, too.
Fixes:

arch/arm/plat-s3c24xx/dma.c: In function 's3c2410_dma_config':
arch/arm/plat-s3c24xx/dma.c:1030:2: warning: 'dcon' is used uninitialized in this function

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-08 14:33:47 +09:00
Russell King e2f81844ef ARM: vmlinux.lds: use _text and _stext the same way as x86
x86 uses _text to mark the start of the kernel image including the
head text, and _stext to mark the start of the .text section.  Change
our vmlinux.lds to conform.  An audit of the places which use _stext
and _text in arch/arm indicates no users of either symbol are impacted
by this change.  It does mean a slight change to /proc/iomem output.

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 23:36:34 +01:00
Russell King 3835d69a6c ARM: vmlinux.lds: move init sections between text and data sections
Place the init sections between the text and data sections.  This
means all code is grouped together at the beginning of the kernel
image, and all data is at the end of the image.  This avoids problems
with the 24-bit branch instruction relocations becoming invalid with
large initramfs images.

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 23:36:31 +01:00
Russell King 43fc9d2fa5 ARM: vmlinux.lds: remove .rodata/.rodata1 from main .text segment
RODATA() already handles these sections, so allow it to take care
of them for us.

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 23:36:28 +01:00
Russell King 1604d79d37 ARM: vmlinux.lds: rearrange .init output section
Keep the various linker tables as separate output sections rather
than combining them together into one big .init section.  This
makes the 'vmlinux' easier to see what is placed where.

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 23:35:41 +01:00
Russell King 39df88872f ARM: vmlinux.lds: move discarded sections to beginning
Rather than scattering the discarded sections throughout the linker
file, move them to the start.

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 23:35:38 +01:00
Russell King 3002b41bc6 ARM: decompressor: use better output sections
Place read-only data in a .rodata output section, and the compressed
piggy data in .piggydata.  Place the .got.plt section before the .got
section as is standard ELF practise.

This allows the piggydata to be more easily extracted from the
compressed vmlinux file for verification purposes.

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 23:35:35 +01:00
Russell King c1f2d99910 ARM: ensure tag tables are const
Nothing should ever modify a tag table entry, so mark these const.

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 23:35:21 +01:00
Linus Torvalds 075d9db131 Merge branch 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pci: Move check for acpi_sci_override_gsi to xen_setup_acpi_sci.
2011-07-07 13:19:04 -07:00
Linus Torvalds e55f1b1c00 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Don't use the EFI reboot method by default
  x86, suspend: Restore MISC_ENABLE MSR in realmode wakeup
  x86, reboot: Acer Aspire One A110 reboot quirk
  x86-32, NUMA: Fix boot regression caused by NUMA init unification on highmem machines
2011-07-07 13:18:13 -07:00
Linus Torvalds 27a3b735b7 Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus' and 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  debugobjects: Fix boot crash when kmemleak and debugobjects enabled

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  jump_label: Fix jump_label update for modules
  oprofile, x86: Fix race in nmi handler while starting counters

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: Disable (revert) SCHED_LOAD_SCALE increase
  sched, cgroups: Fix MIN_SHARES on 64-bit boxen
2011-07-07 13:17:45 -07:00
Will Deacon f4daf06fc2 ARM: proc: add definition of cpu_reset for ARMv6 and ARMv7 cores
This patch adds simple definitions of cpu_reset for ARMv6 and ARMv7
cores, which disable the MMU via the SCTLR.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:53 +01:00
Will Deacon 14abd038a7 ARM: perf: add support for the Cortex-A15 PMU
This patch adds support for the Cortex-A15 PMU to the ARMv7
perf-event backend.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:53 +01:00
Will Deacon 0c205cbe20 ARM: perf: add support for the Cortex-A5 PMU
This patch adds support for the Cortex-A5 PMU to the ARMv7 perf-event
backend.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:53 +01:00
Will Deacon 6d4eaf991c ARM: perf: add PMUv2 common event definitions
The PMUv2 specification reserves a number of event encodings
for common events.

This patch adds these events to the common event enumeration
in preparation for PMUv2 cores, such as Cortex-A15.

Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:53 +01:00
Will Deacon 7b35fa47ee ARM: perf: remove confusing comment from v7 perf events backend
The comment about measuring TLB misses and refills in the ARMv7 perf
backend makes little sense and refers loosely to raw counters that
should be used instead.

This patch removes the comments to avoid any confusion.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:53 +01:00
Will Deacon 7665d9d2df ARM: proc: add proc info for Cortex-A15MP using classic page tables
Multicore implementations of the Cortex-A15 require bit 6 of the
auxiliary control register to be set in order for cache and TLB
maintenance operations to be broadcast between CPUs.

This patch adds a new proc_info structure for Cortex-A15, which enables
the SMP bit during setup and includes the new HWCAP for integer
division.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:52 +01:00
Pawel Moll 15eb169bfe ARM: proc: add Cortex-A5 proc info
This patch adds processor info for ARM Ltd. Cortex A5,
which has SCU initialisation procedure identical to A9.

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:52 +01:00
Pawel Moll dc939cd835 ARM: proc: convert v7 proc infos into a common macro
As most of the proc info content is common across all v7
processors, this patch converts existing A9 and generic v7
descriptions into a macro (allowing extra flags in future).

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:52 +01:00
Will Deacon 18b9dc130c ARM: vfp: add VFPv4 capability detection and populate elf_hwcap
The presence of VFPv4 cannot be detected simply by looking at the FPSID
subarchitecture field, as a value >= 2 signifies the architecture as
VFPv3 or later.

This patch reads from MVFR1 to check whether or not the fused multiply
accumulate instructions are supported. Since these are introduced with
VFPv4, this tells us what we need to know.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:52 +01:00
Will Deacon 254cdf8ec3 ARM: hwcaps: add new HWCAP defines for ARMv7-A
Modern ARMv7-A cores can optionally implement these new hardware
features:

- VFPv4:
    The latest version of the ARMv7 vector floating-point extensions,
    including hardware support for fused multiple accumulate. D16 or D32
    variants may be implemented.

- Integer divide:
    The SDIV and UDIV instructions provide signed and unsigned integer
    division in hardware. When implemented, these instructions may be
    available in either both Thumb and ARM, or Thumb only.

This patch adds new HWCAP defines to describe these new features. The
integer divide capabilities are split into two bits for ARM and Thumb
respectively. Whilst HWCAP_IDIVA should never be set if HWCAP_IDIVT is
clear, separating the bits makes it easier to interpret from userspace.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:51 +01:00
Will Deacon 194a7f720f ARM: hwcaps: use shifts instead of hardcoded constants
The HWCAP numbers are defined as constants, each one being a power of 2.
This has become slightly unwieldy now that we have reached 32k.

This patch changes the HWCAP defines to use (1 << n) instead of coding
the constant directly. The values remain unchanged.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2011-07-07 19:20:51 +01:00
Steven Rostedt e08fbb78f0 tracing, x86/irq: Do not trace arch_local_{*,irq_*}() functions
I triggered a triple fault with gcc 4.5.1 because it did not
honor the inline annotation to arch_local_save_flags() function
and that function was added to the pool of functions traced by
the function tracer.

When preempt_schedule() called arch_local_save_flags() (called
by irqs_disabled()), it was traced, but the first thing the
function tracer does is disable preemption. When it enables
preemption, the NEED_RESCHED flag will not have been cleared and
the preemption check will trigger the call to preempt_schedule()
again.

Although the dynamic function tracer crashed immediately, the
static version of the function tracer (CONFIG_DYNAMIC_FTRACE is
not set) actually was able to show where the problem was.

 swapper-1       3.N.. 103885us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103886us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103886us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103887us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103887us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103888us : arch_local_save_flags <-preempt_schedule
 swapper-1       3.N.. 103888us : arch_local_save_flags <-preempt_schedule

It went on for a while before it triple faulted with a corrupted
stack.

The arch_local_save_flags and arch_local_irq_* functions should
not be traced. Even though they are marked as inline, gcc may
still make them a function and enable tracing of them.

The simple solution is to just mark them as notrace. I had to
add the <linux/types.h> for this file to include the notrace
tag.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-07 19:22:32 +02:00
Konrad Rzeszutek Wilk ee339fe63a xen/pci: Move check for acpi_sci_override_gsi to xen_setup_acpi_sci.
Previously we would check for acpi_sci_override_gsi == gsi every time
a PCI device was enabled. That works during early bootup, but later
on it could lead to triggering unnecessarily the acpi_gsi_to_irq(..) lookup.
The reason is that acpi_sci_override_gsi was declared in __initdata and
after early bootup could contain bogus values.

This patch moves the check for acpi_sci_override_gsi to the
site where the ACPI SCI is preset.

CC: stable@kernel.org
Reported-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
Tested-by: Raghavendra D Prabhu <rprabhu@wnohang.net>
[http://lists.xensource.com/archives/html/xen-devel/2011-07/msg00154.html]
Suggested-by:  Ian Campbell <ijc@hellion.org.uk>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2011-07-07 12:19:08 -04:00
Simon Guinot 659fb32d1b genirq: replace irq_gc_ack() with {set,clr}_bit variants (fwd)
This fixes a regression introduced by e59347a "arm: orion:
Use generic irq chip".

Depending on the device, interrupts acknowledgement is done by setting
or by clearing a dedicated register. Replace irq_gc_ack() with some
{set,clr}_bit variants allows to handle both cases.

Note that this patch affects the following SoCs: Davinci, Samsung and
Orion. Except for this last, the change is minor: irq_gc_ack() is just
renamed into irq_gc_ack_set_bit().

For the Orion SoCs, the edge GPIO interrupts support is currently
broken. irq_gc_ack() try to acknowledge a such interrupt by setting
the corresponding cause register bit. The Orion GPIO device expect the
opposite. To fix this issue, the irq_gc_ack_clr_bit() variant is used.

Tested on Network Space v2.

Reported-by: Joey Oravec <joravec@drewtech.com>
Signed-off-by: Simon Guinot <sguinot@lacie.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2011-07-07 16:02:26 +00:00
Wolfram Sang d30e1521b2 arm: mach-vt8500: add forgotten irq_data conversion
This platform has not been converted to 'struct irq_data' when the big
pile was done. It fails to compile nowadays, because the compatibility
code has gone.

  CC      arch/arm/mach-vt8500/irq.o
arch/arm/mach-vt8500/irq.c:118:2: error: unknown field 'ack' specified in initializer
arch/arm/mach-vt8500/irq.c:118:2: warning: initialization from incompatible pointer type
arch/arm/mach-vt8500/irq.c:119:2: error: unknown field 'mask' specified in initializer
arch/arm/mach-vt8500/irq.c:119:2: warning: initialization from incompatible pointer type
arch/arm/mach-vt8500/irq.c:120:2: error: unknown field 'unmask' specified in initializer
arch/arm/mach-vt8500/irq.c:120:2: warning: initialization from incompatible pointer type
arch/arm/mach-vt8500/irq.c:121:2: error: unknown field 'set_type' specified in initializer
arch/arm/mach-vt8500/irq.c:121:2: warning: initialization from incompatible pointer type
make[1]: *** [arch/arm/mach-vt8500/irq.o] Error 1

Add the missing conversion. Tested on a JayPC-Tablet.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Alexey Charkov <alchark@gmail.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2011-07-07 14:35:25 +00:00
Dave Martin 2ba564b90f ARM: mm: tlb-v7: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:12 +01:00
Dave Martin 3b7f39fa9a ARM: mm: tlb-v6: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:12 +01:00
Dave Martin ca56096367 ARM: mm: tlb-v4wbi: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:12 +01:00
Dave Martin 2284ece2d6 ARM: mm: tlb-v4wb: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:12 +01:00
Dave Martin f249a6425a ARM: mm: tlb-v4: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:12 +01:00
Dave Martin d7b84d2438 ARM: mm: tlb-v3: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:11 +01:00
Dave Martin 27eebe4cc9 ARM: mm: tlb-fa: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:11 +01:00
Dave Martin ab1a746da9 ARM: mm: proc-xscale: Use new generic struct definition macros
Without this patch, xscale_80200_A0_A1 is missing the
icache_flush_all entry, which would result in the wrong functions
being called at run-time.

This patch re-uses xscale_icache_flush_all for
xscale_80200_A0_A1_cache_fns.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:11 +01:00
Dave Martin c21898f9f0 ARM: mm: proc-xsc3: Use new generic struct definition macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:11 +01:00
Dave Martin 78a8f3c365 ARM: mm: proc-v7: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:11 +01:00
Dave Martin 7b7dc6e888 ARM: mm: proc-v6: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:10 +01:00
Dave Martin f58d59f6c5 ARM: mm: proc-sa1100: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:10 +01:00
Dave Martin 5973ba5886 ARM: mm: proc-sa110: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:10 +01:00
Dave Martin a39a32187b ARM: mm: proc-mohawk: Use the new processor struct macros
This patch also defines a suitable flush_icache_all implementation
which would otherwise be missing, resulting in a link failure.
Thanks to Nicolas Pitre for suggesting the code for this.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-07 15:31:10 +01:00
Dave Martin e43b670e24 ARM: mm: proc-feroceon: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-07 15:31:10 +01:00
Dave Martin d6ed10ce39 ARM: mm: proc-fa526: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:09 +01:00
Dave Martin b4a19f03c4 ARM: mm: proc-arm9tdmi: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:09 +01:00
Dave Martin b1b9892bcd ARM: mm: proc-arm946: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:09 +01:00
Dave Martin b375580147 ARM: mm: proc-arm940: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:09 +01:00
Dave Martin be90da45a9 ARM: mm: proc-arm926: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:09 +01:00
Dave Martin 6c240aec6f ARM: mm: proc-arm925: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:09 +01:00
Dave Martin f3e7383f2a ARM: mm: proc-arm922: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:08 +01:00
Dave Martin 68f5e1acb5 ARM: mm: proc-arm920: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:08 +01:00
Dave Martin fd10e2725e ARM: mm: proc-arm7tdmi: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:08 +01:00
Dave Martin 4cc1f0fa06 ARM: mm: proc-arm740: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:08 +01:00
Dave Martin 449870b1de ARM: mm: proc-arm720: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:07 +01:00
Dave Martin c3be06eb58 ARM: mm: proc-arm6_7: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:07 +01:00
Dave Martin 5c9369bc2e ARM: mm: proc-arm1026: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:07 +01:00
Dave Martin f2d8cae1d5 ARM: mm: proc-arm1022: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:07 +01:00
Dave Martin fbaa840730 ARM: mm: proc-arm1020e: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:07 +01:00
Dave Martin 56d916509b ARM: mm: proc-arm1020: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:07 +01:00
Dave Martin 455a01ec30 ARM: mm: cache-v7: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:06 +01:00
Dave Martin 641d8233a6 ARM: mm: cache-v6: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:06 +01:00
Dave Martin d5b5b2e2f8 ARM: mm: cache-v4wt: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:06 +01:00
Dave Martin eec95e56e6 ARM: mm: cache-v4wb: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:06 +01:00
Dave Martin 54d4e9ebbc ARM: mm: cache-v4: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:06 +01:00
Dave Martin 9c373968d6 ARM: mm: cache-v3: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:05 +01:00
Dave Martin 9bc7491341 ARM: mm: cache-fa: Use the new processor struct macros
Signed-off-by: Dave Martin <dave.martin@linaro.org>
2011-07-07 15:31:05 +01:00
Dave Martin 8f51965e70 ARM: assembler.h: Add string declaration macro
Declaring strings in assembler source involves a certain amount of
tedious boilerplate code in order to annotate the resulting symbol
correctly.

Encapsulating this boilerplate in a macro should help to avoid some
duplication and the occasional mistake.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-07 15:31:05 +01:00
Dave Martin 66a625a881 ARM: mm: proc-macros: Add generic proc/cache/tlb struct definition macros
This patch adds some generic macros to reduce boilerplate when
declaring certain common structures in arch/arm/mm/*.S

Thanks to Russell King for outlining what the
define_processor_functions macro could look like.

Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
2011-07-07 15:30:35 +01:00
Vitaly Kuzmichev 90c5ffe592 ARM: 6994/1: smp_twd: Fix typo in 'twd_timer_rate' printing
To get hundredths of MHz the rate needs to be divided by 10'000.
Here is an example:
 twd_timer_rate = 123456789
 Before the patch:
    twd_timer_rate / 1000000 = 123
    (twd_timer_rate / 1000000) % 100 = 23
    Result: 123.23MHz.
 After being fixed:
    twd_timer_rate / 1000000 = 123
    (twd_timer_rate / 10000) % 100 = 45
    Result: 123.45MHz.

Signed-off-by: Vitaly Kuzmichev <vkuzmichev@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 15:00:12 +01:00
Stephen Boyd 7fa22bd546 ARM: 6993/1: platsmp: Allow secondary cpu hotplug with maxcpus=1
If an ARM system has multiple cpus in the same socket and the
kernel is booted with maxcpus=1, secondary cpus are possible but
not present due to how platform_smp_prepare_cpus() is called.
Since most typical ARM processors don't actually support physical
hotplug, initialize the present map to be equal to the possible
map in generic ARM SMP code. Also, always call
platform_smp_prepare_cpus() as long as max_cpus is non-zero (0
means no SMP) to allow platform code to do any SMP setup.

After applying this patch it's possible to boot an ARM system
with maxcpus=1 on the command line and then hotplug in secondary
cpus via sysfs. This is more in line with how x86 does things.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-07 14:45:07 +01:00
Ingo Molnar b395fb36d5 Merge branch 'iommu-3.1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu into core/iommu 2011-07-07 12:58:28 +02:00
Matthew Garrett f70e957cda x86: Don't use the EFI reboot method by default
Testing suggests that at least some Lenovos and some Intels will
fail to reboot via EFI, attempting to jump to an unmapped
physical address. In the long run we could handle this by
providing a page table with a 1:1 mapping of physical addresses,
but for now it's probably just easier to assume that ACPI or
legacy methods will be present and reboot via those.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/r/1309985557-15350-1-git-send-email-mjg@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-07 11:35:05 +02:00
Kees Cook 7a3136666b x86, suspend: Restore MISC_ENABLE MSR in realmode wakeup
Some BIOSes will reset the Intel MISC_ENABLE MSR (specifically the
XD_DISABLE bit) when resuming from S3, which can interact poorly with
ebba638ae7. In 32bit PAE mode, this can
lead to a fault when EFER is restored by the kernel wakeup routines,
due to it setting the NX bit for a CPU that (thanks to the BIOS reset)
now incorrectly thinks it lacks the NX feature. (64bit is not affected
because it uses a common CPU bring-up that specifically handles the
XD_DISABLE bit.)

The need for MISC_ENABLE being restored so early is specific to the S3
resume path. Normally, MISC_ENABLE is saved in save_processor_state(),
but this happens after the resume header is created, so just reproduce
the logic here. (acpi_suspend_lowlevel() creates the header, calls
do_suspend_lowlevel, which calls save_processor_state(), so the saved
processor context isn't available during resume header creation.)

[ hpa: Consider for stable if OK in mainline ]

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Link: http://lkml.kernel.org/r/20110707011034.GA8523@outflux.net
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: <stable@kernel.org> 2.6.38+
2011-07-06 20:09:34 -07:00
Rob Herring 10cdc7e512 ARM: 6960/1: allow enabling SCU code on UP
The scu_power_mode function can be used on UP builds as it drives signals
to an SOC power controller. So make it selectable for !SMP.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:55:15 +01:00
Will Deacon eca5dc2a00 ARM: 6988/1: multi-cpu: remove arguments from CPU proc macros
The macros for invoking functions via the processor struct in the
MULTI_CPU case define the arguments as part of the macros, making it
impossible to take the address of those functions.

This patch removes the arguments from the macro definitions so that we
can take the address of these functions like we can for the !MULTI_CPU
case.

Reported-by: Frank Hofmann <frank.hofmann@tomtom.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:49:45 +01:00
Linus Walleij f022e4e41b ARM: 6986/1: mach-realview: add TCM support for PB1176
Enable TCM support on the RealView PB1176 - we have now taken
the precautions necessary to support even multi-board builds of
RealView systems with TCM enabled.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:49:45 +01:00
Linus Walleij 201043f227 ARM: 6985/1: export functions to determine the presence of I/DTCM
By allowing code to detect whether DTCM or ITCM is present, code paths
involving TCM can be avoided when running on platforms that lack it.
This is good for creating single kernels across several archs, if some
of them utilize TCM but others don't.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:49:45 +01:00
Linus Walleij 9715efb8dc ARM: 6984/1: enhance TCM robustness
The PB11MPCore reports "3" DTCM banks, but anything above 2 is an
"undefined" value, so push this to become 0. Further add some checks
if code is compiled to TCM even if there is no D/ITCM present in the
system, and if we can really fit the compiled code. We don't do the
BUG() since it's not helpful, it's better to deal with non-present
TCM dynamically. If there is nothing compiled to the TCM and no TCM
is detected, it will now just shut up even if TCM support is enabled.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:49:45 +01:00
Will Deacon 38a8914f9a ARM: 6987/1: l2x0: fix disabling function to avoid deadlock
The l2x0_disable function attempts to writel with the l2x0_lock held.
This results in deadlock when the writel contains an outer_sync call
for the platform since the l2x0_lock is already held by the disable
function. A further problem is that disabling the L2 without flushing it
first can lead to the spin_lock operation becoming visible after the
spin_unlock, causing any subsequent L2 maintenance to deadlock.

This patch replaces the writel with a call to writel_relaxed in the
disabling code and adds a flush before disabling in the control
register, preventing livelock from occurring.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:48:08 +01:00
Petr Štetiar 186dcaa448 ARM: 6966/1: ep93xx: fix inverted RTS/DTR signals on uart1
It was discovered by Roberto Bergo, that RTS/DTR signals are inverted after
the boot, because it was causing him problems with hardware controlled modem
connected on ttyAM0. Todd Valentic came with this patch for the issue.

Discussion: http://tech.groups.yahoo.com/group/ts-7000/message/20259

Comments from Petr Štetiar:

 Sorry, but forget to add Acked-by[1]:

 1. https://patchwork.kernel.org/patch/873052/

Cc: Ryan Mallon <ryan@bluewatersys.com>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Todd Valentic <todd.valentic@sri.com>
Tested-by: Roberto Bergo <roberto.bergo@robson.it>
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Acked-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:46:36 +01:00
Wolfram Sang 2eb5af44b1 ARM: 6979/1: mach-vt8500: add forgotten irq_data conversion
This platform has not been converted to 'struct irq_data' when the big
pile was done and fails to compile nowadays. Tested on a JayPC-Tablet.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Alexey Charkov <alchark@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-06 20:46:04 +01:00
Grant Likely 6eae1ace68 gpio: Move mpc5200 gpio driver to drivers/gpio
GPIO drivers are getting consolidated into drivers/gpio.  While at it,
change the driver name to mpc5200-gpio* to avoid collisions.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-06 11:57:15 -06:00
Lei Wen 6662498e13 ARM: pxa168: correct nand pmu setting
The original pair of <0x01db, 208000000> is invalid. Correct it to
the valid value.

The 6th bit of the NFC APMU register indicates NFC works whether
at 156Mhz or 78Mhz. So 0x19b indicates NFC works at 156Mhz, and
0x1db indicates it works at 78Mhz.

Signed-off-by: Lei Wen <leiwen@marvell.com>
Cc: stable@kernel.org
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2011-07-06 23:51:36 +08:00
Lei Wen d204b2c5b1 ARM: pxa910: correct nand pmu setting
The original pair of <0x01db, 208000000> is invalid.
Correct to the valid value.

Signed-off-by: Lei Wen <leiwen@marvell.com>
Cc: stable@kernel.org
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2011-07-06 23:51:22 +08:00
Paul Parsons beb0c9b056 ARM: pxa: fix PGSR register address calculation
The file mfp-pxa2xx.c defines a macro, PGSR(), which translates a gpio
bank number to a PGSR register address. The function pxa2xx_mfp_suspend()
erroneously passed in a gpio number instead of a gpio bank number.

Signed-off-by: Paul Parsons <lost.distance@yahoo.com>
Cc: stable@kernel.org
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
2011-07-06 23:47:40 +08:00
Matthias Rosenfelder 6d999da4d2 sparc32,leon: Added __init declaration to leon_flush_needed()
The function leon_flush_needed() is called only during bootup from another
__init function. Therefore, we can also add __init to leon_flush_needed().

Signed-off-by: Matthias Rosenfelder <rosenfelder.lkml@googlemail.com>
Acked-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-06 08:01:52 -07:00
Steven Rostedt 18d85bc558 sparc/irqs: Do not trace arch_local_{*,irq_*} functions
Do not trace arch_local_save_flags(), arch_local_irq_*() and friends.
Although they are marked inline, gcc may still make a function out of
them and add it to the pool of functions that are traced by the function
tracer. This can cause undesirable results (kernel panic, triple faults,
etc).

Add the notrace notation to prevent them from ever being traced.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-06 08:00:29 -07:00
Sylwester Nawrocki 5f27275edb ARM: EXYNOS4: Set appropriate I2C device variant
Set up a correct I2C bus controller variant name for Exynos4.
Without this change the I2C bus driver fails to acquire its
clocks.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-06 16:04:09 +09:00
Kukjin Kim d60c98c26a Merge branch 'for-spi' into s5p-fixes-for-linus 2011-07-06 15:46:41 +09:00
Padmavathi Venna 8fa9dd04b7 ARM: S5PC100: Fix for compilation error
S5PC100 Compilation fails without this patch

Signed-off-by: Padmavathi Venna <padma.v@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-06 15:37:08 +09:00
David S. Miller e12fe68ce3 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2011-07-05 23:23:37 -07:00
Padmavathi Venna 8918034dfb ARM: SAMSUNG: Add tx_st_done variable
tx_st_done is required for checking the transmission status of SPI
channels with different fifo levels

Signed-off-by: Padmavathi Venna <padma.v@samsung.com>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-06 15:03:08 +09:00
Peter Chubb b49c78d482 x86, reboot: Acer Aspire One A110 reboot quirk
Since git commit
  660e34cebf x86: reorder reboot method
  preferences,
my Acer Aspire One hangs on reboot.  It appears that its ACPI method
for rebooting is broken.  The attached patch adds a quirk so that the
machine will reboot via the BIOS.

[ hpa: verified that the ACPI control on this machine is just plain broken. ]

Signed-off-by: Peter Chubb <peter.chubb@nicta.com.au>
Link: http://lkml.kernel.org/r/w439iki5vl.wl%25peter@chubb.wattle.id.au
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2011-07-05 19:43:23 -07:00
MyungJoo Ham 152c036a80 ARM: EXYNOS4: Address a section mismatch w/ suspend issue.
The section mismatch in headsmp.S made hotplug stop working after the
first instance of suspend-to-RAM and its wakeup.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-06 08:43:03 +09:00
Sangbeom Kim 27ea7fd288 ARM: S5P: Fix bug on init of PWMTimers for HRTimer
This patch fixes following.

<6>[    0.000000] sched_clock: 32 bits at 33MHz, ...
<6>[  128.651309] Calibrating delay loop...

There is a big jump. The reason is that PWM Timer which
is for HRTimer was used before its initialization.

So this patch changes its order and following is kernel
boot log message after this.

<6>[    0.000000] sched_clock: 32 bits at 33MHz, ...
<6>[    0.000088] Calibrating delay loop...

Signed-off-by: Sangbeom Kim <sbkim73@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-06 08:43:02 +09:00
MyungJoo Ham bb8bb57b21 ARM: SAMSUNG: header file revised to prevent declaring duplicated
There has been no #ifndef - #define - #endif protection for this
header file.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-06 08:43:02 +09:00
Naveen Krishna Chatradhi a3df1d4811 ARM: EXYNOS4: fix improper gpio configuration
These pins are incorrectly configured for PCM2
configure them to SPDIF(_OUT & _EXT_CLK)

Signed-off-by: Naveen Krishna Chatradhi <ch.naveen@samsung.com>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-06 08:43:02 +09:00
Thomas Abraham a0d8efedb2 ARM: EXYNOS4: Fix card detection for sdhci 0 and 2
On SMDKV310 board, a card detect gpio pin is available that is directly
connected to the io pad of the sdhci controller. Fix incorrect value
of cd_type field in platform data for sdhci instance 0 and 2.

Signed-off-by: Thomas Abraham <thomas.ab@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-07-06 08:43:02 +09:00
Russell King 0371d3f7e8 ARM: move memory layout sanity checking before meminfo initialization
Ensure that the meminfo array is sanity checked before we pass the
memory to memblock.  This helps to ensure that memblock and meminfo
agree on the dimensions of memory, especially when more memory is
passed than the kernel can deal with.

Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-05 20:27:16 +01:00
Frederic Weisbecker bd96efe17d xtensa: Use generic config PREEMPT definition
So that it can handle the new CONFIG_PREEMPT_COUNT.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
2011-07-05 18:01:16 +02:00
Frederic Weisbecker e22c8f4616 h8300: Use generic config PREEMPT definition
So that it can handle the new CONFIG_PREEMPT_COUNT.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
2011-07-05 18:01:15 +02:00
Frederic Weisbecker dab16ae1a9 m32r: Use generic PREEMPT config
Use the generic preempt config definition in m32r instead of
using a custom one.

This also makes it handle the new CONFIG_PREEMPT_COUNT that
need to be selected by CONFIG_PREEMPT.

Without that it breaks

  kernel/sched.c: In function 'preempt_schedule':
  kernel/sched.c:4364: error: implicit declaration of function 'add_preempt_count_notrace'
  kernel/sched.c:4366: error: implicit declaration of function 'sub_preempt_count_notrace'

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
2011-07-05 18:01:15 +02:00
Jan Beulich d80603c9d8 x86, efi: Properly pre-initialize table pointers
Consumers of the table pointers in struct efi check for
EFI_INVALID_TABLE_ADDR to determine validity, hence these
pointers should all be pre-initialized to this value (rather
than zero).

Noticed by the discrepancy between efivars' systab sysfs entry
showing all tables (and their pointers) despite the code
intending to only display the valid ones. No other bad effects
known, but having the various table parsing routines bogusly
access physical address zero is certainly not very desirable
(even though they're unlikely to find anything useful there).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Link: http://lkml.kernel.org/r/4E13100A020000780004C256@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-05 13:40:34 +02:00
Will Deacon f4f38430c9 ARM: 6989/1: perf: do not start the PMU when no events are present
armpmu_enable can be called in situations where no events are present
(for example, from the event rotation tick after a profiled task has
exited). In this case, we currently start the PMU anyway which may
leave it active inevitably without any events being monitored.

This patch adds a simple check to the enabling code so that we avoid
starting the PMU when no events are present.

Cc: <stable@kernel.org>
Reported-by: Ashwin Chaugle <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-05 12:37:23 +01:00
Ingo Molnar 931da6137e Merge branch 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2011-07-05 11:55:43 +02:00
Linus Torvalds fc52693f8f Merge branch 'at91/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-2.6-arm-soc
* 'at91/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-2.6-arm-soc:
  AT91: Change nand buswidth logic to match hardware default configuration
  at91: Use "pclk" as con_id on at91cap9 and at91rm9200
  at91: fix udc, ehci and mmc clock device name for cap9/9g45/9rl
  atmel_serial: fix internal port num
  at91: fix at91_set_serial_console: use platform device id
2011-07-04 15:54:18 -07:00
Nicolas Ferre 64393b3ae4 AT91: Change nand buswidth logic to match hardware default configuration
The recently modified nand buswitth configuration is not aligned with
board reality: the double footprint on boards is always populated with 8bits
buswidth nand flashes.
So we have to consider that without particular configuration the 8bits
buswidth is selected by default.
Moreover, the previous logic was always using !board_have_nand_8bit(), we
change it to a simpler: board_have_nand_16bit().

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Tested-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2011-07-04 23:22:25 +02:00
Peter Korsgaard c25ded1fbc mach-s3c2410: remove unused spi-gpio.h file
Not used anymore as the spi_s3c24xx_gpio driver is gone (replaced by
the generic spi-gpio).

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-04 10:53:04 -06:00
Peter Korsgaard 947b35d52d mach-gta2: remove unused spi-gpio.h include
Not needed, and the file is going away.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-04 10:52:54 -06:00
Peter Korsgaard aa35316922 mach-qt2410: convert to spi_gpio
Rather than the deprecated spi_s3c24xx_gpio driver.

Only compile tested. Notice that the board support seems quite broken
as the spi_s3c24xx_gpio platform device name was misspelled and there
is no struct spi_board_info defined, but this atleast didn't make it
any worse.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-04 10:52:49 -06:00
Peter Korsgaard 932036ce03 mach-jive: convert to spi_gpio
Rather than the deprecated spi_s3c24xx_gpio driver. Compile tested only.

Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-04 10:51:58 -06:00
Jan Glauber e3b4f515c4 crypto: s390 - support hardware accelerated SHA-224
On recent s390 machines hardware acceleration is available for SHA-256.
SHA-224 is based on SHA-256 so it can also be accelerated by hardware.
Do this by adding the proper algorithm description and initialization.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2011-07-04 20:06:01 +08:00
Janusz Krzysztofik 07ad6ab3d7 omap: drop __initdata tags from static struct platform_device declarations
Pointers to statically declared platform device structures which are
registered with platform_device_register() are then used during run time
to access these structure members, for example from platform_uevent()
and much more. Therefore, these structures should never be placed inside
sections which are dropped after boot. Fix platform devices incorrectly
tagged with __initdata which happen to exist inside OMAP sub-trees.

This bug has exhibited itself on my ARM/OMAP1 based Amstrad Delta
videophone after commit 6d3163ce86, "mm:
check if any page in a pageblock is reserved before marking it
MIGRATE_RESERVE", resulting in reading from several
/sys/device/platform/*/uevent files always ending up with segmentation
faults.

Signed-off-by: Janusz Krzysztofik <jkrzyszt@tis.icnet.pl>
Acked-by: Felipe Balbi <balbi@ti.com>
Cc: Varadarajan, Charulatha <charu@ti.com>
Cc: Jarkko Nikula <jhnikula@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2011-07-04 03:56:15 -07:00
Ingo Molnar 9f8b6a6cf0 Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core 2011-07-04 12:27:28 +02:00
Russell King 5c8598fca1 ARM: dmabounce: no need to check dev->bus type in needs_bounce function
As the needs_bounce function is passed at DMA bounce register time,
we already know what the device bus type is, so we don't need to check
it each time the needs_bounce function is called.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-04 08:39:56 +01:00
Russell King 0703ed2a6b ARM: dmabounce: get rid of dma_needs_bounce global function
Pass the device type specific needs_bounce function in at dmabounce
register time, avoiding the need for a platform specific global
function to do this.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-04 08:39:55 +01:00
Russell King c289b2e0cc ARM: dmabounce: correct unmap_single dev_dbg
DMA addresses should not be casted to void * for printing.  Fix
that to be consistent with the rest of the file.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-07-03 23:56:17 +01:00