Граф коммитов

5618 Коммитов

Автор SHA1 Сообщение Дата
Jeremy Fitzhardinge 8ce7996009 x86: add swiotlb allocation functions
Add x86-specific swiotlb allocation functions.  These are purely
default for the moment.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 18:58:18 +01:00
Joerg Roedel 84df817595 AMD IOMMU: panic if completion wait loop fails
Impact: prevents data corruption after a failed completion wait loop

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-17 16:36:44 +01:00
Joerg Roedel cf558d25e5 AMD IOMMU: set cmd buffer pointers to zero manually
Impact: set cmd buffer head and tail pointers to zero in case nobody else did

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-17 15:06:01 +01:00
Ingo Molnar d733e00d7c x86: update io_apic.c to the new cpumask code
Impact: build fix

The sparseirq tree crossed with the cpumask changes, fix the fallout.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 13:35:51 +01:00
Ingo Molnar 855caa37b9 Merge branch 'x86/crashdump' into cpus4096
Conflicts:
	arch/x86/kernel/crash.c

Merged for semantic conflict:
	arch/x86/kernel/reboot.c
2008-12-17 13:24:52 +01:00
Ingo Molnar 948a7b2b5e Merge branch 'irq/sparseirq' into cpus4096
Conflicts:
	arch/x86/kernel/io_apic.c

Merge irq/sparseirq here, to resolve conflicts.
2008-12-17 13:16:08 +01:00
Ingo Molnar 9466d6036f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/travis/linux-2.6-cpus4096-for-ingo into cpus4096 2008-12-17 13:08:34 +01:00
Ingo Molnar 1f3f424a6b Merge branch 'linus' into cpus4096 2008-12-17 13:07:48 +01:00
Mike Travis 83b19597f7 x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
Impact: new API

The old topology_core_siblings() and topology_thread_siblings() return
a cpumask_t; these new ones return a (const) struct cpumask *.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:59 -08:00
Mike Travis e4d98207ea x86: xen: use smp_call_function_many()
Impact: use new API, remove cpumask from stack.

Change smp_call_function_mask() callers to smp_call_function_many().

This removes a cpumask from the stack, and falls back should allocating
the cpumask var fail (only possible with CONFIG_CPUMASKS_OFFSTACK).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: jeremy@xensource.com
2008-12-16 17:40:59 -08:00
Mike Travis 4cd4601d59 x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
Impact: Remove cpumask_t's from stack.

Simple transition to work_on_cpu(), rather than cpumask games.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Robert Richter <robert.richter@amd.com>
Cc: jacob.shin@amd.com
2008-12-16 17:40:59 -08:00
Mike Travis b2bb855491 x86: Remove cpumask games in x86/kernel/cpu/intel_cacheinfo.c
Impact: remove cpumask_t from stack.

We should not try to save and restore cpus_allowed on current.

We can't use work_on_cpu() here, since it's in the hotplug cpu path
(if anyone else tries to get the hotplug lock from a workqueue we
could deadlock against them).

Fortunately, we can just use smp_call_function_single() since the
function can run from an interrupt.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
2008-12-16 17:40:58 -08:00
Mike Travis 1de88cd4a3 x86: Use cpumask accessors code for possible/present maps.
Impact: use new API

Use the accessors rather than frobbing bits directly.  Most of this is
in arch code I haven't even compiled, but is straightforward.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:58 -08:00
Mike Travis 168ef543a4 x86: prepare for cpumask iterators to only go to nr_cpu_ids
Impact: cleanup, futureproof

In fact, all cpumask ops will only be valid (in general) for bit
numbers < nr_cpu_ids.  So use that instead of NR_CPUS in various
places.

This is always safe: no cpu number can be >= nr_cpu_ids, and
nr_cpu_ids is initialized to NR_CPUS at boot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 17:40:58 -08:00
Mike Travis 78637a97b7 x86: Set CONFIG_NR_CPUS even on UP
Impact: cleanup

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:58 -08:00
Mike Travis bcda016edd x86: cosmetic changes apic-related files.
This patch simply changes cpumask_t to struct cpumask and similar
trivial modernizations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:57 -08:00
Mike Travis d7b381bb7b x86: fixup_irqs() doesnt need an argument.
Impact: cleanup, remove on-stack cpumask.

The "map" arg is always cpu_online_mask.  Importantly, set_affinity
always ands the argument with cpu_online_mask anyway, so we don't need
to do it in fixup_irqs(), avoiding a temporary.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:57 -08:00
Mike Travis b78936e14e xen: convert to cpumask_var_t and new cpumask primitives.
Simple change, and eventual space saving when NR_CPUS >> nr_cpu_ids.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
2008-12-16 17:40:57 -08:00
Mike Travis 22f65d31b2 x86: Update io_apic.c to use new cpumask API
Impact: cleanup, consolidate patches, use new API

Consolidate the following into a single patch to adapt to new
sparseirq code in arch/x86/kernel/io_apic.c, add allocation of
cpumask_var_t's in domain and old_domain, and reduce further
merge conflicts.  Only one file (arch/x86/kernel/io_apic.c) is
changed in all of these patches.

	0006-x86-io_apic-change-irq_cfg-domain-old_domain-to.patch
	0007-x86-io_apic-set_desc_affinity.patch
	0008-x86-io_apic-send_cleanup_vector.patch
	0009-x86-io_apic-eliminate-remaining-cpumask_ts-from-st.patch
	0021-x86-final-cleanups-in-io_apic-to-use-new-cpumask-AP.patch

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:57 -08:00
Mike Travis 6eeb7c5a99 x86: update add-cpu_mask_to_apicid_and to use struct cpumask*
Impact: use updated APIs

Various API updates for x86:add-cpu_mask_to_apicid_and

(Note: separate because previous patch has been "backported" to 2.6.27.)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-16 17:40:56 -08:00
Mike Travis 95d313cf1c x86: Add cpu_mask_to_apicid_and
Impact: new API

Add a helper function that takes two cpumask's, and's them and then
returns the apicid of the result.  This removes a need in io_apic.c
that uses a temporary cpumask to hold (mask & cfg->domain).

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-16 17:40:56 -08:00
Mike Travis a168196501 x86: move and enhance debug printk for nr_cpu_ids etc.
Impact: cleanup, better debugging

This has proven useful in debugging, *before* we try to use
for_each_possible_cpu().  It also now shows nr_cpumask_bits.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-12-16 17:40:56 -08:00
Mike Travis e7986739a7 x86 smp: modify send_IPI_mask interface to accept cpumask_t pointers
Impact: cleanup, change parameter passing

  * Change genapic interfaces to accept cpumask_t pointers where possible.

  * Modify external callers to use cpumask_t pointers in function calls.

  * Create new send_IPI_mask_allbutself which is the same as the
    send_IPI_mask functions but removes smp_processor_id() from list.
    This removes another common need for a temporary cpumask_t variable.

  * Functions that used a temp cpumask_t variable for:

	cpumask_t allbutme = cpu_online_map;

	cpu_clear(smp_processor_id(), allbutme);
	if (!cpus_empty(allbutme))
		...

    become:

	if (!cpus_equal(cpu_online_map, cpumask_of_cpu(cpu)))
		...

  * Other minor code optimizations (like using cpus_clear instead of
    CPU_MASK_NONE, etc.)

Applies to linux-2.6.tip/master.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 17:40:56 -08:00
Mike Travis 36f5101a60 x86: enable MAXSMP
Impact: activates new off-stack cpumask code on MAXSMP (non-default) x86 configs

Set MAXSMP to enable CONFIG_CPUMASK_OFFSTACK which moves cpumask's off
the stack (and in structs) when using cpumask_var_t.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@elte.hy>
2008-12-16 17:40:55 -08:00
Cyrill Gorcunov d680fe4477 x86: entry_64 - introduce FTRACE_ frame macro v2
Impact: clean up

Itroduce MCOUNT_SAVE/RESTORE_FRAME which allow us to
save a number of lines on source level.

Also fix a comment in ftrace.h.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 00:26:38 +01:00
Yinghai Lu 17483a1f34 sparseirq: fix !SMP building, #2
Impact: build fix

make intr_remapping.c to include smp.h, so could use boot_cpu_id there

also remove old change that disabling sparseirq with !SMP

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 00:15:55 +01:00
Yinghai Lu 48a1b10aff x86, sparseirq: move irq_desc according to smp_affinity, v7
Impact: improve NUMA handling by migrating irq_desc on smp_affinity changes

if CONFIG_NUMA_MIGRATE_IRQ_DESC is set:

-  make irq_desc to go with affinity aka irq_desc moving etc
-  call move_irq_desc in irq_complete_move()
-  legacy irq_desc is not moved, because they are allocated via static array

for logical apic mode, need to add move_desc_in_progress_in_same_domain,
otherwise it will not be moved ==> also could need two phases to get
irq_desc moved.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-17 00:14:01 +01:00
Mike Travis c8cae544bb x86: fix build error with post-merge of tip/cpus4096 and rr-for-ingo/master.
Ingo Molnar wrote:

> allyes64 build failure:
>
> arch/x86/kernel/io_apic.c: In function ‘set_ir_ioapic_affinity_irq_desc’:
> arch/x86/kernel/io_apic.c:2295: error: incompatible type for argument 2 of
> ‘migrate_ioapic_irq_desc’
> arch/x86/kernel/io_apic.c: In function ‘ir_set_msi_irq_affinity’:
> arch/x86/kernel/io_apic.c:3205: error: incompatible type for argument 2 of
> ‘set_extra_move_desc’
> make[1]: *** wait: No child processes.  Stop.

Here's a small patch to correct the build error with the post-merge tree.
Built and boot-tested.  I'll will reset the follow on patches in my brand
new git tree to accommodate this change.

Fix two references in io_apic.c that were incorrect.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 23:12:18 +01:00
Hiroshi Shimamoto 8bee3f0a66 x86: ia32_signal: use proper macro __USER32_DS
Impact: cleanup

Use __USER32_DS instead of __USER_DS in ia32_signal.c.
No impact, because __USER32_DS is defined __USER_DS.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 23:06:13 +01:00
Hiroshi Shimamoto d0b48ca189 x86: ia32_signal: use __put_user() instead of __copy_to_user()
Impact: cleanup

__put_user() can be used for constant size 8, like arch/x86/kernel/signal.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 23:06:12 +01:00
Hiroshi Shimamoto ae417bb487 x86: signal: use signal_fault() in sys_sigreturn()
Impact: cleanup

Call signal_fault() in error route of sys_sigreturn().
Change log level to KERN_EMERG if current is init.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 23:06:07 +01:00
Russ Anderson c8182f0016 sgi-xp: xpc needs to pass the physical address, not virtual
Impact: fix crash

xpc needs to pass the physical address, not virtual.

Testing uncovered this problem.  The virtual address happens to work
most of the time due to the way bios was masking off the node bits.
Passing the physical address makes it work all of the time.

Signed-off-by: Russ Anderson <rja@sgi.com>
Acked-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 23:04:24 +01:00
Andi Kleen cf9b303e55 x86: re-enable MCE on secondary CPUS after suspend/resume
Impact: fix disabled MCE after resume

Don't prevent multiple initialization of MCEs.

Back from early prehistory mcheck_init() has a reentry check. Presumably
that was needed in very old kernels to prevent it entering twice.

But as Andreas points out this prevents CPU hotplug (and therefore resume)
to correctly reinitialize MCEs when a AP boots again after being
offlined.

Just drop the check.

Reported-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 23:03:02 +01:00
Jack Steiner 189f67c440 x86: UV fix for global physical addresses
Impact: fix UV boot crash

This fixes a UV bug related to generating global memory addresses
on partitioned systems. Partition systems do not have physical memory
at address 0. Instead, a chunk of high memory is remapped by the chipset
so that it appears to be at address 0. This remapping is INVISIBLE to most
of the OS. The only OS functions that need to be aware of the remaping are
functions that directly interface to the chipset. The GRU is one example.

Also, delete a couple of unused macros related to global memory addresses.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 22:54:03 +01:00
Ingo Molnar c15cb37cc4 Merge commit 'v2.6.28-rc8' into x86/uv 2008-12-16 22:53:53 +01:00
Zachary Amsden fde9071167 x86: clean up dead code in vmi_32.c
Impact: cleanup, remove dead debug code

I ran across some old debugging code in vmi paravirt-ops code that was
already dead, but still potentially useful.  After reviewing recent
changes to the way kernel page tables are allocated and initialized, and
the lack of bugs caught by this debugging code, I've concluded it is now
totally useless to have around, and it's already been #if 0'd for quite
some time.

There's no rush to get this in mainline, but it's also totally harmless,
so I'll let the x86 maintainers decide where it should be tucked.  I've
been out of the mainstream dev loop for a couple months, so apologies if
I haven't got any protocol changes in order.

Remove mummified remains found in vmi_32.c

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 22:52:19 +01:00
Hiroshi Shimamoto 8ae9366909 x86: hardirq: use inc_irq_stat() in non-unified functions
Impact: cleanup

Replace incrementing irq stat with inc_irq_stat() in non-unified functions.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 22:30:19 +01:00
Ingo Molnar 78f902ccc5 Merge commit 'v2.6.28-rc8' into x86/doc 2008-12-16 22:04:48 +01:00
Jeremy Fitzhardinge 39c04b5524 x86: make sure we really have an hpet mapping before using it
Impact: prepare the hpet code for Xen dom0 booting

When booting in Xen dom0, the hpet isn't really accessible, so make
sure the mapping is non-NULL before use.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 22:01:46 +01:00
Jeremy Fitzhardinge ecbf29cdb3 xen: clean up asm/xen/hypervisor.h
Impact: cleanup

hypervisor.h had accumulated a lot of crud, including lots of spurious
#includes.  Clean it all up, and go around fixing up everything else
accordingly.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:50:31 +01:00
Jeremy Fitzhardinge a79b7a2a75 x86: remove unused iommu_nr_pages
Impact: cleanup, remove dead code

The last usage was removed by the patch set culminating in

| commit e3c449f526
| Author: Joerg Roedel <joerg.roedel@amd.com>
| Date:   Wed Oct 15 22:02:11 2008 -0700
|
|     x86, AMD IOMMU: convert driver to generic iommu_num_pages function

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:31:37 +01:00
Jaswinder Singh a9b43c7d98 x86: setup.c find_and_reserve_crashkernel should be static
Impact: cleanup, reduce kernel size a bit

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:18:22 +01:00
Jaswinder Singh c0195b6da0 x86: ldt.c declare sys_modify_ldt before they get used
Impact: cleanup

In asm/syscalls.h moved out sys_modify_ldt from CONFIG_X86_32 as it is
common for both 32 and 64 bit.

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:18:21 +01:00
Jaswinder Singh 7b5b50f1be x86: signal.c declare do_notify_resume before they get used
Impact: cleanup

In asm/signal.h moved out do_notify_resume from __i386__ as it is common
for both 32 and 64 bit.

Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

 arch/x86/include/asm/signal.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
2008-12-16 21:10:28 +01:00
Jaswinder Singh aab02f0ae2 x86: process_64.c declare __switch_to() and sys_arch_prctl before they get used
Impact: cleanup

In asm/system.h moved out __switch_to from CONFIG_X86_32 as it is common for
both 32 and 64 bit.

In asm/pctl.h defined sys_arch_prctl
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:10:27 +01:00
Tej f63c2f2489 xen: whitespace/checkpatch cleanup
Impact: cleanup

Signed-off-by: Tej <bewith.tej@gmail.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:05:01 +01:00
Venki Pallipadi 40fb17152c x86: support always running TSC on Intel CPUs
Impact: reward non-stop TSCs with good TSC-based clocksources, etc.

Add support for CPUID_0x80000007_Bit8 on Intel CPUs as well. This bit means
that the TSC is invariant with C/P/T states and always runs at constant
frequency.

With Intel CPUs, we have 3 classes
* CPUs where TSC runs at constant rate and does not stop n C-states
* CPUs where TSC runs at constant rate, but will stop in deep C-states
* CPUs where TSC rate will vary based on P/T-states and TSC will stop in deep
  C-states.

To cover these 3, one feature bit (CONSTANT_TSC) is not enough. So, add a
second bit (NONSTOP_TSC). CONSTANT_TSC indicates that the TSC runs at
constant frequency irrespective of P/T-states, and NONSTOP_TSC indicates
that TSC does not stop in deep C-states.

CPUID_0x8000000_Bit8 indicates both these feature bit can be set.
We still have CONSTANT_TSC _set_ and NONSTOP_TSC _not_set_ on some older Intel
CPUs, based on model checks. We can use TSC on such CPUs for time, as long as
those CPUs do not support/enter deep C-states.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:02:50 +01:00
Ingo Molnar 7e91a122b1 Merge branch 'x86/cpufeature' into x86/tsc
Merge itto in x86/tsc because an upcoming patch relies on a new
cpuid bit defined in the x86/cpufeature branch.
2008-12-16 21:02:13 +01:00
Ingo Molnar d437797406 x86: support always running TSC on Intel CPUs, add cpufeature definition
Impact: add new synthetic-cpuid bit definition

add X86_FEATURE_NONSTOP_TSC to the cpufeature bits - this is in
preparation of Venki's always-running-TSC patch.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 21:01:15 +01:00
Ingo Molnar dd7a5230cd Merge commit 'v2.6.28-rc8' into x86/cpufeature 2008-12-16 20:57:41 +01:00
Janne Kulmala bacbe99945 x86: enable HPET on Fujitsu u9200
Impact: auto-enable HPET on Fujitsu u9200

HPET timer is listed in the ACPI table, but needs a quirk entry in order to
work. Unfortunately, the quirk code runs after first HPET hpet_enable() which
has already determined that the timer doesn't work (reads 0xFFFFFFFF). This
patch allows hpet_enable() to be called again after running the quirk code.

Signed-off-by: Janne Kulmala <janne.t.kulmala@tut.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 20:36:44 +01:00
Andreas Herrmann df23cab563 x86: microcode_amd: modify log messages
Impact: change microcode printk content

Change log level and provide (at least I tried to;-) consistent, short,
meaningful content.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:07 +01:00
Andreas Herrmann 5549b94bc7 x86: microcode_amd: use 'packed' attribute for structs
Impact: cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:07 +01:00
Andreas Herrmann 98415301ea x86: microcode_amd: remove (wrong) chipset deivce ID checks
Impact: remove dead/incorrect code

Currently there is no chipset specific ucode. The checks are incorrect
anyway (e.g. pci device IDs are 16 bit and not 8 bit).

Thus I remove the stuff for the time being and will reintroduce it if
it's foreseeable that it is really needed.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:06 +01:00
Andreas Herrmann 6cc9b6d94b x86: microcode_amd: consolidate macro definitions
Impact: cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:05 +01:00
Andreas Herrmann 29d0887ffd x86: microcode_amd: replace inline asm by common rdmsr/wrmsr functions
Impact: cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:04 +01:00
Andreas Herrmann 0657d9ebff x86: microcode_amd: don't pass superfluous function pointer for get_ucode_data
Impact: cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:03 +01:00
Andreas Herrmann 8c135206c8 x86: microcode_amd: fix compile warning
Impact: fix build warning

  CC      arch/x86/kernel/microcode_amd.o
arch/x86/kernel/microcode_amd.c: In function ‘request_microcode_fw’:
arch/x86/kernel/microcode_amd.c:393: warning: passing argument 2 of ‘generic_load_microcode’ discards qualifiers from pointer target type

(Respect "const" qualifier of firmware->data.)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:02 +01:00
Andreas Herrmann be957763b0 x86: microcode_amd: fix checkpatch warnings/errors
Impact: cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:02 +01:00
Andreas Herrmann 2a3282a77b x86: microcode_amd: fix typos and trailing whitespaces in log messages
Impact: fix printk typos

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:01 +01:00
Andreas Herrmann 3c763fd77e x86: microcode_amd: fix wrong handling of equivalent CPU id
Impact: fix bug resulting in non-loaded AMD microcode

mc_header->processor_rev_id is a 2 byte value. Similar is true for
equiv_cpu in an equiv_cpu_entry -- only 2 bytes are of interest.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:58:00 +01:00
Ingo Molnar b6fd6f2673 x86, mm: limit MAXMEM on 64-bit
on 64-bit x86 the physical memory limit is controlled by the sparsemem
bits - which are 44 bits right now. But MAXMEM (the max pfn number
e820 parsing will allow to enter our sizing routines) is set to
0x00003fffffffffff, i.e. 46 bits - that's too large because it overlaps
into the vmalloc range.

So couple MAXMEM to MAX_PHYSMEM_BITS, and add a comment that the
maximum of MAX_PHYSMEM_BITS is 45 bits.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 19:31:52 +01:00
Joerg Roedel 83fd5cc648 AMD IOMMU: allocate rlookup_table with __GFP_ZERO
Impact: fix bug which can lead to panic in prealloc_protection_domains()

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-16 19:17:11 +01:00
Jan Beulich cfc319833b x86, 32-bit: improve lazy TLB handling code
Impact: micro-optimize the 32-bit TLB flush code

Use the faster x86_{read,write}_percpu() accessors here.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:47:17 +01:00
Jan Beulich beeb4195cb x86, 32-bit: add some compile time checks to mem_init()
Some of the inconsistencies checked for at run time can be detected at
build time already, so duplicate the checks done at run time to also be
done at build time.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:42:51 +01:00
Jan Beulich d6be89ad66 x86, 32-bit: simplify alloc_low_page()
Impact: cleanup

Neither of the callers really needs the physical address this function
returns, so eliminate the pointless argument.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:41:37 +01:00
Jan Beulich b93a531e31 allow bug table entries to use relative pointers (and use it on x86-64)
Impact: reduce bug table size

This allows reducing the bug table size by half. Perhaps there are
other 64-bit architectures that could also make use of this.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:40:32 +01:00
Jan Beulich 1796316a8b x86: consolidate __swp_XXX() macros
Impact: cleanup, code robustization

The __swp_...() macros silently relied upon which bits are used for
_PAGE_FILE and _PAGE_PROTNONE. After having changed _PAGE_PROTNONE in
our Xen kernel to no longer overlap _PAGE_PAT, live locks and crashes
were reported that could have been avoided if these macros properly
used the symbolic constants. Since, as pointed out earlier, for Xen
Dom0 support mainline likewise will need to eliminate the conflict
between _PAGE_PAT and _PAGE_PROTNONE, this patch does all the necessary
adjustments, plus it introduces a mechanism to check consistency
between MAX_SWAPFILES_SHIFT and the actual encoding macros.

This also fixes a latent bug in that x86-64 used a 6-bit mask in
__swp_type(), and if MAX_SWAPFILES_SHIFT was increased beyond 5 in (the
seemingly unrelated) linux/swap.h, this would have resulted in a
collision with _PAGE_FILE.

Non-PAE 32-bit code gets similarly adjusted for its pte_to_pgoff() and
pgoff_to_pte() calculations.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:34:51 +01:00
Markus Metzger d072c25f53 x86, bts: correctly report invalid bts records
Impact: change the reporting of empty BTS records

Correctly report a cleared BTS record as invalid. Used to be reported
as branch from 0 to 0.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:28:37 +01:00
Markus Metzger cc1dc6d039 x86, bts: remove recursion from get_context
Impact: cleanup

Optimistically allocate a DS context. It is extremely unlikely that
one already existed. This simplifies the code a lot.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 18:27:25 +01:00
Ingo Molnar 9dfc3bc7d2 Merge branches 'tracing/fastboot', 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/hw-branch-tracing' into tracing/core 2008-12-16 12:03:38 +01:00
Ken Chen 205516c12d x86: convert rdtscll() to use __native_read_tsc
Impact: micro-optimization

Is there any reason why x86 rdtscll have to use the out of line
function instead of inline __native_read_tsc()?  native_read_tsc and
__native_read_tsc is essentially the same functions.

Patch to let x86 rdtscll() to use the inline version of read_tsc.

Signed-off-by: Ken Chen <kenchen@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-16 10:17:02 +01:00
Zachary Amsden ae8d04e2ec x86 Fix VMI crash on boot in 2.6.28-rc8
VMI initialiation can relocate the fixmap, causing early_ioremap to
malfunction if it is initialized before the relocation.  To fix this,
VMI activation is split into two phases; the detection, which must
happen before setting up ioremap, and the activation, which must happen
after parsing early boot parameters.

This fixes a crash on boot when VMI is enabled under VMware.

Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-14 16:24:38 -08:00
Rusty Russell 968ea6d80e Merge ../linux-2.6-x86
Conflicts:

	arch/x86/kernel/io_apic.c
	kernel/sched.c
	kernel/sched_stats.h
2008-12-13 21:55:51 +10:30
Rusty Russell 320ab2b0b1 cpumask: convert struct clock_event_device to cpumask pointers.
Impact: change calling convention of existing clock_event APIs

struct clock_event_timer's cpumask field gets changed to take pointer,
as does the ->broadcast function.

Another single-patch change.  For safety, we BUG_ON() in
clockevents_register_device() if it's not set.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
2008-12-13 21:20:26 +10:30
Rusty Russell 0de26520c7 cpumask: make irq_set_affinity() take a const struct cpumask
Impact: change existing irq_chip API

Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.

Fortunately, not widely used code, but hits a few architectures.

Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly.  Ingo, does this break anything?

(Folded in fix from KOSAKI Motohiro)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
2008-12-13 21:20:26 +10:30
Rusty Russell 29c0177e6a cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
Impact: change calling convention of existing cpumask APIs

Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.

These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
2008-12-13 21:20:25 +10:30
Rusty Russell 98a79d6a50 cpumask: centralize cpu_online_map and cpu_possible_map
Impact: cleanup

Each SMP arch defines these themselves.  Move them to a central
location.

Twists:
1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
   CONFIG_INIT_ALL_POSSIBLE for this rather than break them.

2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
   Those archs simply have phys_cpu_present_map replaced everywhere.

3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
   so I just manipulate them both in sync.

4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
   declarations.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Tony Luck <tony.luck@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Mike Travis <travis@sgi.com>
Cc: ink@jurassic.park.msu.ru
Cc: rmk@arm.linux.org.uk
Cc: starvik@axis.com
Cc: tony.luck@intel.com
Cc: takata@linux-m32r.org
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: paulus@samba.org
Cc: schwidefsky@de.ibm.com
Cc: lethal@linux-sh.org
Cc: wli@holomorphy.com
Cc: davem@davemloft.net
Cc: jdike@addtoit.com
Cc: mingo@redhat.com
2008-12-13 21:19:41 +10:30
Andi Kleen fd28a5b58d x86: remove simnow earlyprintk support
Impact: remove obsolete code

The later versions of SimNow! actually all have serial console
emulation, so the direct interface isn't needed anymore. So remove
the undocumented simnow earlyprintk console.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-12-12 17:02:21 +01:00
Dave Jones 9470565579 x86: remove init_mm export as planned for 2.6.26
Impact: remove deprecated export

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-12-12 16:59:05 +01:00
Ingo Molnar 8299608f14 Merge branches 'irq/sparseirq', 'x86/quirks' and 'x86/reboot' into cpus4096
We merge the irq/sparseirq, x86/quirks and x86/reboot trees into the
cpus4096 tree because the io-apic changes in the sparseirq change
conflict with the cpumask changes in the cpumask tree, and we
want to resolve those.
2008-12-12 13:49:24 +01:00
Ingo Molnar 45ab6b0c76 Merge branch 'sched/core' into cpus4096
Conflicts:
	include/linux/ftrace.h
	kernel/sched.c
2008-12-12 13:48:57 +01:00
Ingo Molnar 81444a7995 Merge branch 'tracing/fastboot' into cpus4096 2008-12-12 12:43:05 +01:00
Ingo Molnar 2bed844681 tracing/function-graph-tracer: add a new .irqentry.text section, fix
Impact: build fix

32-bit x86 needs this section too.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 12:14:05 +01:00
Hiroshi Shimamoto 915b0d0104 x86: hardirq: introduce inc_irq_stat()
Impact: cleanup

Introduce inc_irq_stat() macro and unify irq_stat accounting code.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:59:49 +01:00
Ingo Molnar fd10902797 Merge commit 'v2.6.28-rc8' into x86/irq 2008-12-12 11:59:39 +01:00
Hiroshi Shimamoto 8f2466f45f x86: kill #ifdef for exit_idle()
Impact: cleanup

Introduce helper inline function in arch/x86/include/asm/idle.h
to remove #ifdefs around exit_idle().

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:58:36 +01:00
Hiroshi Shimamoto 16855f878d x86: uaccess: return value of __{get|put}_user() can be int
Impact: cleanup

The type of return value of __{get|put}_user() can be int.
There is no user to refer the return value of __{get|put}_user() as long.
This reduces code size a bit on 64-bit.

 $ size vmlinux.*
     text	   data	    bss	    dec	    hex	filename
  4509265	 479988	 673588	5662841	 566879	vmlinux.new
  4511462	 479988	 673588	5665038	 56710e	vmlinux.old

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:54:43 +01:00
Ingo Molnar e18d7af852 Merge commit 'v2.6.28-rc8' into x86/mm 2008-12-12 11:53:43 +01:00
Frederic Weisbecker bcbc4f20b5 tracing/function-graph-tracer: annotate do_IRQ and smp_apic_timer_interrupt
Impact: move most important x86 irq entry-points to a separate subsection

Annotate do_IRQ and smp_apic_timer_interrupt to put them into the .irqentry.text
subsection. These function will so be recognized as hardirq entrypoints for the
function-graph-tracer. We could also annotate other irq entries but the others
are far less important but they can be added on request.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:14:08 +01:00
Frederic Weisbecker a0343e8231 tracing/function-graph-tracer: add a new .irqentry.text section
Impact: let the function-graph-tracer be aware of the irq entrypoints

Add a new .irqentry.text section to store the irq entrypoints functions
inside the same section. This way, the tracer will be able to signal
an interrupts triggering on output by recognizing these entrypoints.

Also, make this section recordable for dynamic tracing.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:14:07 +01:00
Ingo Molnar 85072bd552 x86, debug: remove EBDA debug printk
Remove leftover EBDA debug message.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 11:08:42 +01:00
Ingo Molnar c1dfdc7597 Merge commit 'v2.6.28-rc8' into sched/core 2008-12-12 10:29:35 +01:00
Ingo Molnar 8808500f26 x86: soften multi-BAR mapping sanity check warning message
Impact: make debug warning less scary

The ioremap() time multi-BAR map warning has been causing false
positives:

  http://lkml.org/lkml/2008/12/10/432
  http://lkml.org/lkml/2008/12/11/136

So make it less scary by making it once-per-boot, by making it KERN_INFO
and by adding this text:

  "Info: mapping multiple BARs. Your kernel is fine."

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 09:22:26 +01:00
Ingo Molnar ffc2238af8 x86, bts: fix build error
Impact: build fix

 arch/x86/kernel/ds.c: In function 'ds_request':
 arch/x86/kernel/ds.c:236: sorry, unimplemented: inlining failed in call to 'ds_get_context': recursive inlining

but the recursion here is scary ...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 08:21:19 +01:00
Markus Metzger c2724775ce x86, bts: provide in-kernel branch-trace interface
Impact: cleanup

Move the BTS bits from ptrace.c into ds.c.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 08:08:12 +01:00
Markus Metzger b0884e25fe x86, bts: turn BUG_ON into WARN_ON_ONCE
Impact: make the ds code more debuggable

Turn BUG_ON's into WARN_ON_ONCE.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12 08:08:10 +01:00
Robert Richter fe615cbf34 x86/oprofile: cleanup IBS init/exit functions in op_model_amd.c
Implementation of pairwise init/exit funcions for IBS and IBS NMI
setup. There are also some function renames and the removal of forward
function declarations.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-10 14:20:08 +01:00
Robert Richter 9fa6812dba x86/oprofile: reordering IBS code in op_model_amd.c
This is part of the cpu buffer rework.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-10 14:20:07 +01:00
Robert Richter cdc1834d1a oprofile: whitspace changes only
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-10 14:20:05 +01:00
Robert Richter fd13f6c851 oprofile: comment cleanup
This fixes the coding style of some comments.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-10 14:20:01 +01:00
Yinghai Lu 8a4830f889 sparseirq: fix !SMP && !PCI_MSI && !HT_IRQ build
Ingo Molnar wrote:

>>>  drivers/pci/intr_remapping.c: In function 'irq_2_iommu_alloc':
>>>  drivers/pci/intr_remapping.c:72: error: 'boot_cpu_id' undeclared (first use in this function)
>>>  drivers/pci/intr_remapping.c:72: error: (Each undeclared identifier is reported only once
>>>  drivers/pci/intr_remapping.c:72: error: for each function it appears in.)

sparseirq should only be used with SMP for now.
2008-12-09 21:02:19 +01:00
Ingo Molnar 50dd94e017 sparseirq: fix typo in !CONFIG_IO_APIC case
Impact: build fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 18:47:51 +01:00
Ingo Molnar 087052b02f x86: fix default_spin_lock_flags() prototype
these warnings:

  arch/x86/kernel/paravirt-spinlocks.c: In function ‘default_spin_lock_flags’:
  arch/x86/kernel/paravirt-spinlocks.c:12: warning: passing argument 1 of ‘__raw_spin_lock’ from incompatible pointer type
  arch/x86/kernel/paravirt-spinlocks.c: At top level:
  arch/x86/kernel/paravirt-spinlocks.c:11: warning: ‘default_spin_lock_flags’ defined but not used

showed that the prototype of default_spin_lock_flags() was confused about
what type spinlocks have.

the proper type on UP is raw_spinlock_t.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 16:08:29 +01:00
Hiroshi Shimamoto 4217458daf x86: signal: change type of paramter for sys_rt_sigreturn()
Impact: cleanup on 32-bit

Peter pointed this parameter can be changed.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:21:35 +01:00
Frederic Weisbecker 380c4b1411 tracing/function-graph-tracer: append the tracing_graph_flag
Impact: Provide a way to pause the function graph tracer

As suggested by Steven Rostedt, the previous patch that prevented from
spinlock function tracing shouldn't use the raw_spinlock to fix it.
It's much better to follow lockdep with normal spinlock, so this patch
adds a new flag for each task to make the function graph tracer able
to be paused. We also can send an ftrace_printk whithout worrying of
the irrelevant traced spinlock during insertion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:11:45 +01:00
Frederic Weisbecker 8b96f01198 tracing/function-graph-tracer: introduce __notrace_funcgraph to filter special functions
Impact: trace more functions

When the function graph tracer is configured, three more files are not
traced to prevent only four functions to be traced. And this impacts the
normal function tracer too.

arch/x86/kernel/process_64/32.c:

I had crashes when I let this file traced. After some debugging, I saw
that the "current" task point was changed inside__swtich_to(), ie:
"write_pda(pcurrent, next_p);" inside process_64.c Since the tracer store
the original return address of the function inside current, we had
crashes. Only __switch_to() has to be excluded from tracing.

kernel/module.c and kernel/extable.c:

Because of a function used internally by the function graph tracer:
__kernel_text_address()

To let the other functions inside these files to be traced, this patch
introduces the __notrace_funcgraph function prefix which is __notrace if
function graph tracer is configured and nothing if not.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:11:44 +01:00
Yinghai Lu 69b88afa8d x86: clean up get_smp_config()
Impact: cleanup

reorder exit path in __get_smp_config().

also move two print outs to acpi_process_madt

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 15:08:28 +01:00
Ingo Molnar aa9c9b8c58 Merge branch 'linus' into x86/quirks 2008-12-08 15:07:49 +01:00
Joerg Roedel b8d9905d02 AMD IOMMU: __unmap_single: check for bad_dma_address instead of 0
Impact: minor fix

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-08 14:58:55 +01:00
Joerg Roedel 8ad909c4c1 AMD IOMMU: fix WARN_ON in dma_ops unmap path
Impact: minor fix

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-08 14:58:46 +01:00
Joerg Roedel 24f811603e AMD IOMMU: fix typo in comment
Impact: cleanup

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-08 14:58:39 +01:00
Joerg Roedel 3cc3d84bff AMD IOMMU: fix loop counter in free_pagetable function
Impact: bugfix

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-08 14:58:19 +01:00
Joerg Roedel bb9d4ff80b AMD IOMMU: fix iommu_map_page function
Impact: bugfix in iommu_map_page function

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-08 14:58:07 +01:00
Yinghai Lu 3145e941fc x86, MSI: pass irq_cfg and irq_desc
Impact: simplify code

Pass irq_desc and cfg around, instead of raw IRQ numbers - this way
we dont have to look it up again and again.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:59 +01:00
Yinghai Lu be5d5350a9 x86: MSI start irq numbering from nr_irqs_gsi
Impact: sanitize MSI irq number ordering from top-down to bottom-up

Increase new MSI IRQs starting from nr_irqs_gsi (which is somewhere below
256), instead of decreasing from NR_IRQS. (The latter method can result
in confusingly high IRQ numbers - if NR_CPUS is set to a high value and
NR_IRQS scales up to a high value.)

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:54 +01:00
Yinghai Lu 99d093d128 x86: use NR_IRQS_LEGACY
Impact: cleanup

Introduce NR_IRQS_LEGACY instead of hard coded number.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:52 +01:00
Yinghai Lu 0b8f1efad3 sparse irq_desc[] array: core kernel and x86 changes
Impact: new feature

Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
NR_CPUS set to large values. The goal is to be able to scale up to much
larger NR_IRQS value without impacting the (important) common case.

To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
irq_desc pointers.

When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
this also makes the IRQ descriptors NUMA-local (to the site that calls
request_irq()).

This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
uses desc->chip_data for x86 to store irq_cfg.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 14:31:51 +01:00
Ingo Molnar 4d117c5c6b Merge branch 'sched/urgent' into sched/core 2008-12-08 13:52:00 +01:00
Rafael J. Wysocki 3e1e9002aa x86: change static allocation of trampoline area
Impact: fix trampoline sizing bug, save space

While debugging a suspend-to-RAM related issue it occured to me that
if the trampoline code had grown past 4 KB, we would have been
allocating too little memory for it, since the 4 KB size of the
trampoline is hardcoded into arch/x86/kernel/e820.c .  Change that
by making the kernel compute the trampoline size and allocate as much
memory as necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-08 13:49:45 +01:00
Herton Ronaldo Krzesinski 8529154ec3 [CPUFREQ] Add Celeron Core support to p4-clockmod.
Add Celeron Core support to p4-clockmod.

Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:11 -05:00
Herton Ronaldo Krzesinski c60e19eb21 [CPUFREQ] add to speedstep-lib additional fsb values for core processors
Add additional fsb values to pentium_core_get_frequency, from latest edition
(September 2008) of Intel 64 and IA-32 Architectures Software Develper's Manual,
Volume 3B: System Programming Guide, Part 2. Values added are to detect 800,
1067 and 1333 FSB types.

Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:11 -05:00
Matthew Garrett e088e4c9cd [CPUFREQ] Disable sysfs ui for p4-clockmod.
p4-clockmod has a long history of abuse.   It pretends to be a CPU
frequency scaling driver, even though it doesn't actually change
the CPU frequency, but instead just modulates the frequency with
wait-states.
The biggest misconception is that when running at the lower 'frequency'
p4-clockmod is saving power.  This isn't the case, as workloads running
slower take longer to complete, preventing the CPU from entering deep C states.

However p4-clockmod does have a purpose.  It can prevent overheating.
Having it hooked up to the cpufreq interfaces is the wrong way to achieve
cooling however. It should instead be hooked up to ACPI.

This diff introduces a means for a cpufreq driver to register with the
cpufreq core, but not present a sysfs interface.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:10 -05:00
Dominik Brodowski 10db2e5cbd [CPUFREQ] p4-clockmod: reduce noise
On those CPUs which are SpeedStep (EST) capable, we do not care at all if
p4-clockmod does not work, since a technically superior CPU frequency
management technology is to be used.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:10 -05:00
Rusty Russell 9963d1aad4 [CPUFREQ] clean up speedstep-centrino and reduce cpumask_t usage
Impact: cleanup

1) The #ifdef CONFIG_HOTPLUG_CPU seems unnecessary these days.
2) The loop can simply skip over offline cpus, rather than creating a tmp mask.
3) set_mask is set to either a single cpu or all online cpus in a policy.
   Since it's just used for set_cpus_allowed(), any offline cpus in a policy
   don't matter, so we can just use cpumask_of_cpu() or the policy->cpus.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-12-05 15:20:10 -05:00
Ingo Molnar 970987beb9 Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core 2008-12-05 14:45:22 +01:00
Michael Tokarev a0286c94f0 x86: fix missing space in printk, #2
Impact: clean up printk

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-05 14:01:26 +01:00
Michael Tokarev 55c395b470 x86: fix missing space in printk
Just come across this when booting on an old hw..
Looks somewhat ugly, that single missing space ;)

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-05 13:09:00 +01:00
Linus Torvalds e948990f95 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix early panic with boot option "nosmp"
  x86/oprofile: fix Intel cpu family 6 detection
  oprofile: fix CPU unplug panic in ppro_stop()
  AMD IOMMU: fix possible race while accessing iommu->need_sync
  AMD IOMMU: set device table entry for aliased devices
  AMD IOMMU: struct amd_iommu remove padding on 64 bit
  x86: fix broken flushing in GART nofullflush path
  x86: fix dma_mapping_error for 32bit x86
2008-12-04 21:40:08 -08:00
Linus Torvalds 2b218aea36 Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: don't export sched_mc_power_savings in laptops
2008-12-04 21:39:55 -08:00
Andi Kleen 9adc13867e x86: fix early panic with boot option "nosmp"
Impact: fix boot crash with numcpus=0 on certain systems

Fix early exception in __get_smp_config with nosmp.

Bail out early when there is no MP table.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 16:33:51 +01:00
Joe Korty affa219b60 x86: change thread_info's flag field back to 32 bits
Impact: pack struct thread_info more tightly

Change x86_64's thread_info 'flags' field back to __u32.

This was changed to 'unsigned long' when the thread_info*.h
for i386 and x86_64 were merged.  Change it back.  We can
do this as only 27 bits of 'flags' are actually used.

This change actually packs down thread_info by 64 bits:
32 bits are saved by the smaller flags, and 32 bits are
saved by the following 'mm_segment_t field' becoming
naturally 64-bit aligned.

Signed-off-by: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-04 11:06:03 +01:00
Ingo Molnar c0515566f3 Merge commit 'v2.6.28-rc7' into x86/cleanups 2008-12-04 11:05:26 +01:00
Ingo Molnar cb9c34e6d0 Merge commit 'v2.6.28-rc7' into core/locking 2008-12-04 08:52:14 +01:00
James Morris ec98ce480a Merge branch 'master' into next
Conflicts:
	fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris <jmorris@namei.org>
2008-12-04 17:16:36 +11:00
Ingo Molnar 66a05d6b47 Merge branch 'oprofile-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into x86/urgent 2008-12-03 18:52:46 +01:00
William Cohen 3d337c653c x86/oprofile: fix Intel cpu family 6 detection
Alan Jenkins wrote:
> This is on an EeePC 701, /proc/cpuinfo as attached.
>
> Is this expected?  Will the next release work?
>
> Thanks, Alan
>
> # opcontrol --setup --no-vmlinux
> cpu_type 'unset' is not valid
> you should upgrade oprofile or force the use of timer mode
>
> # opcontrol -v
> opcontrol: oprofile 0.9.4 compiled on Nov 29 2008 22:44:10
>
> # cat /dev/oprofile/cpu_type
> i386/p6
> # uname -r
> 2.6.28-rc6eeepc

Hi Alan,

Looking at the kernel driver code for oprofile it can return the "i386/p6" for
the cpu_type. However, looking at the user-space oprofile code there isn't the
matching entry in libop/op_cpu_type.c or the events/unit_mask files in
events/i386 directory.

The Intel AP-485 says this is a "Intel Pentium M processor model D". Seems like
the oprofile kernel driver should be identifying the processor as "i386/p6_mobile"

The driver identification code doesn't look quite right in nmi_init.c

http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=blob;f=arch/x86/oprofile/nmi_int.c;h=022cd41ea9b4106e5884277096e80e9088a7c7a9;hb=HEAD

has:

409         case 10 ... 13:
410                 *cpu_type = "i386/p6";
411                 break;

Referring to the Intel AP-485:
case 10 and 11 should produce "i386/piii"
case 13 should produce "i386/p6_mobile"

I didn't see anything for case 12.

Something like the attached patch. I don't have a celeron machine to verify that
changes in this area of the kernel fix thing.

-Will

Signed-off-by: William Cohen <wcohen@redhat.com>
Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-03 17:17:17 +01:00
Eric Dumazet 9ea84ad77d oprofile: fix CPU unplug panic in ppro_stop()
If oprofile statically compiled in kernel, a cpu unplug triggers
a panic in ppro_stop(), because a NULL pointer is dereferenced.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-12-03 15:58:51 +01:00
Ingo Molnar c36910c147 Merge branch 'iommu-fixes-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2008-12-03 12:54:45 +01:00
Joerg Roedel 09ee17eb8e AMD IOMMU: fix possible race while accessing iommu->need_sync
The access to the iommu->need_sync member needs to be protected by the
iommu->lock. Otherwise this is a possible race condition. Fix it with
this patch.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-03 12:20:46 +01:00
Joerg Roedel f91ba19064 AMD IOMMU: set device table entry for aliased devices
In some rare cases a request can arrive an IOMMU with its originial
requestor id even it is aliased. Handle this by setting the device table
entry to the same protection domain for the original and the aliased
requestor id.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-03 12:20:46 +01:00
Richard Kennedy eac9fbc6a9 AMD IOMMU: struct amd_iommu remove padding on 64 bit
Remove 16 bytes of padding from struct amd_iommu on 64bit builds
reducing its size to 120 bytes, allowing it to span one fewer
cachelines.

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-03 12:20:46 +01:00
Joerg Roedel 70d7d35757 x86: fix broken flushing in GART nofullflush path
Impact: remove stale IOTLB entries

In the non-default nofullflush case the GART is only flushed when
next_bit wraps around. But it can happen that an unmap operation unmaps
memory which is behind the current next_bit location. If these addresses
are reused it may result in stale GART IO/TLB entries. Fix this by
setting the GART next_bit always behind an unmapped location.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 10:02:41 +01:00
Steven Rostedt 62679efe0a ftrace: add checks on ret stack in function graph
Import: robustness checks

Add more checks in the function graph code to detect errors and
perhaps print out better information if a bug happens.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:27 +01:00
Steven Rostedt e49dc19c6a ftrace: function graph return for function entry
Impact: feature, let entry function decide to trace or not

This patch lets the graph tracer entry function decide if the tracing
should be done at the end as well. This requires all function graph
entry functions return 1 if it should trace, or 0 if the return should
not be traced.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:26 +01:00
Steven Rostedt 7ee991fbc6 ftrace: print real return in dumpstack for function graph
Impact: better dumpstack output

I noticed in my crash dumps and even in the stack tracer that a
lot of functions listed in the stack trace are simply
return_to_handler which is ftrace graphs way to insert its own
call into the return of a function.

But we lose out where the actually function was called from.

This patch adds in hooks to the dumpstack mechanism that detects
this and finds the real function to print. Both are printed to
let the user know that a hook is still in place.

This does give a funny side effect in the stack tracer output:

        Depth   Size      Location    (80 entries)
        -----   ----      --------
  0)     4144      48   save_stack_trace+0x2f/0x4d
  1)     4096     128   ftrace_call+0x5/0x2b
  2)     3968      16   mempool_alloc_slab+0x16/0x18
  3)     3952     384   return_to_handler+0x0/0x73
  4)     3568    -240   stack_trace_call+0x11d/0x209
  5)     3808     144   return_to_handler+0x0/0x73
  6)     3664    -128   mempool_alloc+0x4d/0xfe
  7)     3792     128   return_to_handler+0x0/0x73
  8)     3664     -32   scsi_sg_alloc+0x48/0x4a [scsi_mod]

As you can see, the real functions are now negative. This is due
to them not being found inside the stack.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:25 +01:00
Steven Rostedt 14a866c567 ftrace: add ftrace_graph_stop()
Impact: new ftrace_graph_stop function

While developing more features of function graph, I hit a bug that
caused the WARN_ON to trigger in the prepare_ftrace_return function.
Well, it was hard for me to find out that was happening because the
bug would not print, it would just cause a hard lockup or reboot.
The reason is that it is not safe to call printk from this function.

Looking further, I also found that it calls unregister_ftrace_graph,
which grabs a mutex and calls kstop machine. This would definitely
lock the box up if it were to trigger.

This patch adds a fast and safe ftrace_graph_stop() which will
stop the function tracer. Then it is safe to call the WARN ON.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:23 +01:00
Steven Rostedt bb4304c71c ftrace: have function graph use mcount caller address
Impact: consistency change for function graph

This patch makes function graph record the mcount caller address
the same way the function tracer does.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:22 +01:00
Steven Rostedt 347fdd9dd4 ftrace: clean up function graph asm
Impact: clean up

There exists macros for x86 asm to handle x86_64 and i386.
This patch updates function graph asm to use them.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:56:21 +01:00
Ingo Molnar dfdc5437bd Merge commit 'v2.6.28-rc7'; branch 'x86/dumpstack' into tracing/ftrace
Merge x86/dumpstack into tracing/ftrace because upcoming ftrace changes
depend on cleanups already in x86/dumpstack.

Also merge to latest upstream -rc.
2008-12-03 08:55:34 +01:00
FUJITA Tomonori 181de82ee3 x86: remove dead BIO_VMERGE_BOUNDARY definition
Impact: cleanup, remove dead code

The block layer dropped the virtual merge feature
(b8b3e16cfe).

BIO_VMERGE_BOUNDARY definition is meaningless now.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-03 08:26:40 +01:00
Ingo Molnar 6083aa485c Merge branch 'x86/io' into x86/iommu
Merge x86/io into x86/iommu due to a small patch conflict in io.h.
2008-12-03 08:25:59 +01:00
Linus Torvalds b7d6266062 Merge branch 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: MMU: avoid creation of unreachable pages in the shadow
  KVM: ppc: stop leaking host memory on VM exit
  KVM: MMU: fix sync of ptes addressed at owner pagetable
  KVM: ia64: Fix: Use correct calling convention for PAL_VPS_RESUME_HANDLER
  KVM: ia64: Fix incorrect kbuild CFLAGS override
  KVM: VMX: Fix interrupt loss during race with NMI
  KVM: s390: Fix problem state handling in guest sigp handler
2008-12-02 15:56:17 -08:00
Joerg Roedel dcb7731a18 x86: fix broken flushing in GART nofullflush path
Impact: remove stale IOTLB entries

In the non-default nofullflush case the GART is only flushed when
next_bit wraps around. But it can happen that an unmap operation unmaps
memory which is behind the current next_bit location. If these addresses
are reused it may result in stale GART IO/TLB entries. Fix this by
setting the GART next_bit always behind an unmapped location.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-02 20:25:52 +01:00
Ingo Molnar a64d31baed Merge branch 'linus' into cpus4096
Conflicts:
	kernel/trace/ring_buffer.c
2008-12-02 20:09:50 +01:00
Niels de Vos 8daa19051e x86, apm: remove CONFIG_APM_REAL_MODE_POWER_OFF in favor of a kernel parameter
Remove CONFIG_APM_REAL_MODE_POWER_OFF like CONFIG_APM_POWER_OFF which
has been done for linux-2.2.14pre8 (http://lkml.org/lkml/1999/11/23/3).

Re-introducing CONFIG_APM_POWER_OFF got nack-ed. Stephen didn't bother
to remove CONFIG_APM_REAL_MODE_POWER_OFF, let's get rid of it now.
	Reference: http://lkml.org/lkml/2008/5/7/97

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-02 11:28:49 +01:00
Frederic Weisbecker 48d68b20d0 tracing/function-graph-tracer: support for x86-64
Impact: extend and enable the function graph tracer to 64-bit x86

This patch implements the support for function graph tracer under x86-64.
Both static and dynamic tracing are supported.

This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I
wanted to use probe_kernel_read/write to make the return address
saving/patching code more generic but it causes tracing recursion.

That would be perhaps useful to implement a notrace version of these
function for other archs ports.

Note that arch/x86/process_64.c is not traced, as in X86-32. I first
thought __switch_to() was responsible of crashes during tracing because I
believed current task were changed inside but that's actually not the
case (actually yes, but not the "current" pointer).

So I will have to investigate to find the functions that harm here, to
enable tracing of the other functions inside (but there is no issue at
this time, while process_64.c stays out of -pg flags).

A little possible race condition is fixed inside this patch too. When the
tracer allocate a return stack dynamically, the current depth is not
initialized before but after. An interrupt could occur at this time and,
after seeing that the return stack is allocated, the tracer could try to
trace it with a random uninitialized depth. It's a prevention, even if I
hadn't problems with it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Bird <tim.bird@am.sony.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-02 09:47:48 +01:00
FUJITA Tomonori 50cec5c51c x86: fix dma_mapping_error for 32bit x86, cleanup
This removes ifdef CONFIG_X86_64 in dma_mapping_error():

1) Xen people plan to use swiotlb on X86_32 for Dom0 support. swiotlb
   uses ops->mapping_error so X86_32 also needs to check
   ops->mapping_error.

2) Removing #ifdef hack is almost always a good thing.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-01 20:36:17 +01:00
Ingo Molnar f6d2e6f57b Merge branch 'x86/urgent' into x86/iommu 2008-12-01 20:36:13 +01:00
Mahesh Salgaonkar 43714539ea sched: don't export sched_mc_power_savings in laptops
Impact: do not expose a control that has no effect

Fix to prevent sched_mc_power_saving from being exported through sysfs
on single-socket systems. (Say multicore single socket (Laptop))

CPU core map of the boot cpu should be equal to possible number
of cpus for single socket system.

This fix has been developed at FOSS.in kernel workout.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-01 08:44:00 +01:00
Linus Torvalds 72244c0e68 Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  irq.h: fix missing/extra kernel-doc
  genirq: __irq_set_trigger: change pr_warning to pr_debug
  irq: fix typo
  x86: apic honour irq affinity which was set in early boot
  genirq: fix the affinity setting in setup_irq
  genirq: keep affinities set from userspace across free/request_irq()
2008-11-30 13:06:20 -08:00
Linus Torvalds 66a45cc4cc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: always define DECLARE_PCI_UNMAP* macros
  x86: fixup config space size of CPU functions for AMD family 11h
  x86, bts: fix wrmsr and spinlock over kmalloc
  x86, pebs: fix PEBS record size configuration
  x86, bts: turn macro into static inline function
  x86, bts: exclude ds.c from build when disabled
  arch/x86/kernel/pci-calgary_64.c: change simple_strtol to simple_strtoul
  x86: use limited register constraint for setnz
  xen: pin correct PGD on suspend
  x86: revert irq number limitation
  x86: fixing __cpuinit/__init tangle, xsave_cntxt_init()
  x86: fix __cpuinit/__init tangle in init_thread_xstate()
  oprofile: fix an overflow in ppro code
2008-11-30 13:01:04 -08:00
Linus Torvalds 8c7b905a2d Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] powernow-k8: ignore out-of-range PstateStatus value
  [CPUFREQ] Documentation: Add Blackfin to list of supported processors
2008-11-30 11:43:41 -08:00
Christoph Hellwig 96b8936a9e remove __ARCH_WANT_COMPAT_SYS_PTRACE
All architectures now use the generic compat_sys_ptrace, as should every
new architecture that needs 32bit compat (if we'll ever get another).

Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also
kill a comment about __ARCH_SYS_PTRACE that was added after
__ARCH_SYS_PTRACE was already gone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 11:00:15 -08:00
Al Viro df6b07949b xen_play_dead() is __cpuinit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:38 -08:00
Al Viro 37af46efa5 xen_setup_vcpu_info_placement() is not init on x86
... so get xen-ops.h in agreement with xen/smp.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 23a14b9e9d kvm_setup_secondary_clock() is cpuinit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Al Viro 2236d252e0 enable_IR_x2apic() needs to be __init
calls __init, called only from __init

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-30 10:03:37 -08:00
Ingo Molnar 93093d099e x86: provide readq()/writeq() on 32-bit too, complete
if HAVE_READQ/HAVE_WRITEQ are defined, the full range of readq/writeq
APIs has to be provided to drivers:

 drivers/infiniband/hw/amso1100/c2.c: In function 'c2_tx_ring_alloc':
 drivers/infiniband/hw/amso1100/c2.c:133: error: implicit declaration of function '__raw_writeq'

So provide them on 32-bit as well. Also, map all the APIs to the
strongest ordering variant. It's way too easy to mess such details
up in drivers and the difference between "memory" and "" constrained
asm() constructs is in the noise range.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-30 10:26:18 +01:00
Ingo Molnar a0b1131e47 x86: provide readq()/writeq() on 32-bit too, cleanup
Impact: cleanup

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-30 09:38:19 +01:00
Hitoshi Mitake 2c5643b1c5 x86: provide readq()/writeq() on 32-bit too
Impact: add new API for drivers

Add implementation of readq/writeq to x86_32, and add config value to
the x86 architecture to determine existence of readq/writeq.

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-30 09:38:16 +01:00
Jiri Slaby 4385cecf1f x86: intel_cacheinfo, minor show_type cleanup
Impact: cleanup

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-30 07:46:31 +01:00
Thomas Bogendoerfer 7b1dedca42 x86: fix dma_mapping_error for 32bit x86
Devices like b44 ethernet can't dma from addresses above 1GB. The driver
handles this cases by falling back to GFP_DMA allocation. But for detecting
the problem it needs to get an indication from dma_mapping_error.
The bug is triggered by using a VMSPLIT option of 2G/2G.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-29 21:00:38 +01:00
Pavel Machek 8caac56305 aperture_64.c: clarify that too small aperture is valid reason for this code
Impact: update comment

Clarify that too small aperture is valid reason for this code.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 15:24:39 +01:00
Ingo Molnar 5b3eec0c80 x86: ret_from_fork - get rid of jump back
Impact: remove dead code

If we take a closer look at the rff_trace/rff_action ret_from_fork code,
we have to realize that it does all the wrong things: for example it
checks the TIF flag - while later on jumping back to the ret-from-syscall
path - duplicating the check needlessly.

But checking for _TIF_SYSCALL_TRACE is completely unnecessary here because
we clear that flag for every freshly forked task. So the whole "tracing"
code here, for which there is a out of line jump optimization that makes
it even harder to read, is in reality completely dead code ...

Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Cyrill Gorcunov <gorcunov@gmail.com>
2008-11-28 15:01:46 +01:00
Ingo Molnar 3bdae4f464 Merge branch 'x86/debug' into x86/irq
We merge this branch because x86/debug touches code that we started
cleaning up in x86/irq. The two branches started out independent,
but as unexpected amount of activity went into x86/irq, they became
dependent. Resolve that by this cross-merge.
2008-11-28 15:00:48 +01:00
Cyrill Gorcunov 9f1e87ea3e x86: entry_64.S - trivial: space, comments fixup
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 14:53:48 +01:00
Cyrill Gorcunov 5ae3a139cf x86: uv bau interrupt -- use proper interrupt number
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 14:17:25 +01:00
Joerg Roedel 1d9b16d169 x86: move GART specific stuff from iommu.h to gart.h
Impact: cleanup

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-28 13:06:27 +01:00
Cyrill Gorcunov c2c631e318 x86: entry_64.S - use ENTRY to define child_rip
child_rip is called not by its name but indirectly
rather so make it global and aligned.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 13:04:07 +01:00
gorcunov@gmail.com 33454539f3 x86: entry_64.S - use X86_EFLAGS_IF instead of hardcoded number
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 13:00:32 +01:00
Joerg Roedel b627c8b17c x86: always define DECLARE_PCI_UNMAP* macros
Impact: fix boot crash on AMD IOMMU if CONFIG_GART_IOMMU is off

Currently these macros evaluate to a no-op except the kernel is compiled
with GART or Calgary support. But we also need these macros when we have
SWIOTLB, VT-d or AMD IOMMU in the kernel. Since we always compile at
least with SWIOTLB we can define these macros always.

This patch is also for stable backport for the same reason the SWIOTLB
default selection patch is.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 12:44:08 +01:00
Alexander van Heukelum d211af055d i386: get rid of the use of KPROBE_ENTRY / KPROBE_END
entry_32.S is now the only user of KPROBE_ENTRY / KPROBE_END,
treewide. This patch reorders entry_64.S and explicitly generates
a separate section for functions that need the protection. The
generated code before and after the patch is equal.

The KPROBE_ENTRY and KPROBE_END macro's are removed too.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 12:37:54 +01:00
Alexander van Heukelum ddeb8f2149 x86_64: get rid of the use of KPROBE_ENTRY / KPROBE_END
Impact: clean up assembly macros and annotations - with some object impact

entry_64.S is the only user of KPROBE_ENTRY / KPROBE_END on
x86_64. This patch reorders entry_64.S and explicitly generates
a separate section for functions that need the protection. The
generated code before and after the patch is equal.

Implicitly changing sections in assembly files makes it more
difficult to follow why the assembler is doing certain things.
For example,

.p2align 5
KPROBE_ENTRY(...)

was not doing what you would expect. Other section changes
(__ex_table, .fixup, .init.rodata) are done explicitly already.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-27 12:37:53 +01:00
Ingo Molnar c7cc773076 Merge branches 'tracing/blktrace', 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/power-tracer' into tracing/core 2008-11-27 10:56:13 +01:00
Marcelo Tosatti 6c475352e8 KVM: MMU: avoid creation of unreachable pages in the shadow
It is possible for a shadow page to have a parent link
pointing to a freed page. When zapping a high level table,
kvm_mmu_page_unlink_children fails to remove the parent_pte link.
For that to happen, the child must be unreachable via the shadow
tree, which can happen in shadow_walk_entry if the guest pte was
modified in between walk() and fetch(). Remove the parent pte
reference in such case.

Possible cause for oops in bug #2217430.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-26 12:34:27 +02:00
Hannes Eder 4db646b1af x86: microcode: fix sparse warnings
Impact: make global variables and a function static

Fix following sparse warnings:

  arch/x86/kernel/microcode_core.c:102:22: warning: symbol
  'microcode_ops' was not declared. Should it be static?
  arch/x86/kernel/microcode_core.c:206:24: warning: symbol
  'microcode_pdev' was not declared. Should it be static?
  arch/x86/kernel/microcode_core.c:322:6: warning: symbol
  'microcode_update_cpu' was not declared. Should it be static?
  arch/x86/kernel/microcode_intel.c:468:22: warning: symbol
  'microcode_intel_ops' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 08:49:19 +01:00
Arjan van de Ven f3f47a6768 tracing: add "power-tracer": C/P state tracer to help power optimization
Impact: new "power-tracer" ftrace plugin

This patch adds a C/P-state ftrace plugin that will generate
detailed statistics about the C/P-states that are being used,
so that we can look at detailed decisions that the C/P-state
code is making, rather than the too high level "average"
that we have today.

An example way of using this is:

 mount -t debugfs none /sys/kernel/debug
 echo cstate > /sys/kernel/debug/tracing/current_tracer
 echo 1 > /sys/kernel/debug/tracing/tracing_enabled
 sleep 1
 echo 0 > /sys/kernel/debug/tracing/tracing_enabled
 cat /sys/kernel/debug/tracing/trace | perl scripts/trace/cstate.pl > out.svg

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 08:29:32 +01:00
Steven Rostedt 5a45cfe1c6 ftrace: use code patching for ftrace graph tracer
Impact: more efficient code for ftrace graph tracer

This patch uses the dynamic patching, when available, to patch
the function graph code into the kernel.

This patch will ease the way for letting both function tracing
and function graph tracing run together.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 06:52:54 +01:00
Hiroshi Shimamoto 5ceb40da9b x86: signal: unify signal_{32|64}.c
Impact: cleanup

Unify signal_{32|64}.c! Mechanic unification - the two
files are the same.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 05:11:56 +01:00
Hiroshi Shimamoto e5fa2d063c x86: signal: unify signal_{32|64}.c, prepare
Impact: cleanup

Add #ifdef directive for 32-bit only code.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 05:11:54 +01:00
Hiroshi Shimamoto bfeb91a943 x86: signal: cosmetic unification of __setup_sigframe() and __setup_rt_sigframe()
Impact: cleanup

Add #ifdef directive to unify __setup_sigframe() and __setup_rt_sigframe().
Move them after {setup|restore}_sigcontext() declaration.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 05:11:53 +01:00
Hiroshi Shimamoto 2601657d22 x86: signal: move {setup|restore}_sigcontext()
Impact: cleanup

Move {setup|restore}_sigcontext() declaration onto head of file.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 05:11:51 +01:00
Andreas Herrmann ffd565a8b8 x86: fixup config space size of CPU functions for AMD family 11h
Impact: extend allowed configuration space access on 11h CPUs from 256 to 4K

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 03:53:27 +01:00
Ingo Molnar c2324b694f tracing: function graph tracer, fix
fix return-tracer => graph-tracer namespace rename fallout.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 03:10:01 +01:00
Frederic Weisbecker 287b6e68ca tracing/function-return-tracer: set a more human readable output
Impact: feature

This patch sets a C-like output for the function graph tracing.
For this aim, we now call two handler for each function: one on the entry
and one other on return. This way we can draw a well-ordered call stack.

The pid of the previous trace is loosely stored to be compared against
the one of the current trace to see if there were a context switch.

Without this little feature, the call tree would seem broken at
some locations.
We could use the sched_tracer to capture these sched_events but this
way of processing is much more simpler.

2 spaces have been chosen for indentation to fit the screen while deep
calls. The time of execution in nanosecs is printed just after closed
braces, it seems more easy this way to find the corresponding function.
If the time was printed as a first column, it would be not so easy to
find the corresponding function if it is called on a deep depth.

I plan to output the return value but on 32 bits CPU, the return value
can be 32 or 64, and its difficult to guess on which case we are.
I don't know what would be the better solution on X86-32: only print
eax (low-part) or even edx (high-part).

Actually it's thee same problem when a function return a 8 bits value, the
high part of eax could contain junk values...

Here is an example of trace:

sys_read() {
  fget_light() {
  } 526
  vfs_read() {
    rw_verify_area() {
      security_file_permission() {
        cap_file_permission() {
        } 519
      } 1564
    } 2640
    do_sync_read() {
      pipe_read() {
        __might_sleep() {
        } 511
        pipe_wait() {
          prepare_to_wait() {
          } 760
          deactivate_task() {
            dequeue_task() {
              dequeue_task_fair() {
                dequeue_entity() {
                  update_curr() {
                    update_min_vruntime() {
                    } 504
                  } 1587
                  clear_buddies() {
                  } 512
                  add_cfs_task_weight() {
                  } 519
                  update_min_vruntime() {
                  } 511
                } 5602
                dequeue_entity() {
                  update_curr() {
                    update_min_vruntime() {
                    } 496
                  } 1631
                  clear_buddies() {
                  } 496
                  update_min_vruntime() {
                  } 527
                } 4580
                hrtick_update() {
                  hrtick_start_fair() {
                  } 488
                } 1489
              } 13700
            } 14949
          } 16016
          msecs_to_jiffies() {
          } 496
          put_prev_task_fair() {
          } 504
          pick_next_task_fair() {
          } 489
          pick_next_task_rt() {
          } 496
          pick_next_task_fair() {
          } 489
          pick_next_task_idle() {
          } 489

------------8<---------- thread 4 ------------8<----------

finish_task_switch() {
} 1203
do_softirq() {
  __do_softirq() {
    __local_bh_disable() {
    } 669
    rcu_process_callbacks() {
      __rcu_process_callbacks() {
        cpu_quiet() {
          rcu_start_batch() {
          } 503
        } 1647
      } 3128
      __rcu_process_callbacks() {
      } 542
    } 5362
    _local_bh_enable() {
    } 587
  } 8880
} 9986
kthread_should_stop() {
} 669
deactivate_task() {
  dequeue_task() {
    dequeue_task_fair() {
      dequeue_entity() {
        update_curr() {
          calc_delta_mine() {
          } 511
          update_min_vruntime() {
          } 511
        } 2813

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 01:59:45 +01:00
Frederic Weisbecker fb52607afc tracing/function-return-tracer: change the name into function-graph-tracer
Impact: cleanup

This patch changes the name of the "return function tracer" into
function-graph-tracer which is a more suitable name for a tracing
which makes one able to retrieve the ordered call stack during
the code flow.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26 01:59:45 +01:00
Andreas Herrmann a266d9f125 [CPUFREQ] powernow-k8: ignore out-of-range PstateStatus value
A workaround for AMD CPU family 11h erratum 311 might cause that the
P-state Status Register shows a "current P-state" which is larger than
the "current P-state limit" in P-state Current Limit Register. For the
wrong P-state value there is no ACPI _PSS object defined and
powernow-k8/cpufreq can't determine the proper CPU frequency for that
state.

As a consequence this can cause a panic during boot (potentially with
all recent kernel versions -- at least I have reproduced it with
various 2.6.27 kernels and with the current .28 series), as an
example:

powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82 processors (2 \
)
powernow-k8:    0 : pstate 0 (2200 MHz)
powernow-k8:    1 : pstate 1 (1100 MHz)
powernow-k8:    2 : pstate 2 (600 MHz)
BUG: unable to handle kernel paging request at ffff88086e7528b8
IP: [<ffffffff80486361>] cpufreq_stats_update+0x4a/0x5f
PGD 202063 PUD 0
Oops: 0002 [#1] SMP
last sysfs file:
CPU 1
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.28-rc3-dirty #16
RIP: 0010:[<ffffffff80486361>]  [<ffffffff80486361>] cpufreq_stats_update+0x4a/0\
f
Synaptics claims to have extended capabilities, but I'm not able to read them.<6\
6
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88006e7528c0
RDX: 00000000ffffffff RSI: ffff88006e54af00 RDI: ffffffff808f056c
RBP: 00000000fffee697 R08: 0000000000000003 R09: ffff88006e73f080
R10: 0000000000000001 R11: 00000000002191c0 R12: ffff88006fb83c10
R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88006fb50740(0000) knlGS:0000000000000000
Unable to initialize Synaptics hardware.
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffff88086e7528b8 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff88006fb82000, task ffff88006fb816d0)
Stack:
 ffff88006e74da50 0000000000000000 ffff88006e54af00 ffffffff804863c7
 ffff88006e74da50 0000000000000000 00000000ffffffff 0000000000000000
 ffff88006fb83c10 ffffffff8024b46c ffffffff808f0560 ffff88006fb83c10
Call Trace:
 [<ffffffff804863c7>] ? cpufreq_stat_notifier_trans+0x51/0x83
 [<ffffffff8024b46c>] ? notifier_call_chain+0x29/0x4c
 [<ffffffff8024b561>] ? __srcu_notifier_call_chain+0x46/0x61
 [<ffffffff8048496d>] ? cpufreq_notify_transition+0x93/0xa9
 [<ffffffff8021ab8d>] ? powernowk8_target+0x1e8/0x5f3
 [<ffffffff80486687>] ? cpufreq_governor_performance+0x1b/0x20
 [<ffffffff80484886>] ? __cpufreq_governor+0x71/0xa8
 [<ffffffff80484b21>] ? __cpufreq_set_policy+0x101/0x13e
 [<ffffffff80485bcd>] ? cpufreq_add_dev+0x3f0/0x4cd
 [<ffffffff8048577a>] ? handle_update+0x0/0x8
 [<ffffffff803c2062>] ? sysdev_driver_register+0xb6/0x10d
 [<ffffffff8056592c>] ? powernowk8_init+0x0/0x7e
 [<ffffffff8048604c>] ? cpufreq_register_driver+0x8f/0x140
 [<ffffffff80209056>] ? _stext+0x56/0x14f
 [<ffffffff802c2234>] ? proc_register+0x122/0x17d
 [<ffffffff802c23a0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff8025c259>] ? register_irq_proc+0x92/0xaa
 [<ffffffff8025c2c8>] ? init_irq_proc+0x57/0x69
 [<ffffffff807fc85f>] ? kernel_init+0x116/0x169
 [<ffffffff8020cc79>] ? child_rip+0xa/0x11
 [<ffffffff807fc749>] ? kernel_init+0x0/0x169
 [<ffffffff8020cc6f>] ? child_rip+0x0/0x11
Code: 05 c5 83 36 00 48 c7 c2 48 5d 86 80 48 8b 04 d8 48 8b 40 08 48 8b 34 02 48\

RIP  [<ffffffff80486361>] cpufreq_stats_update+0x4a/0x5f
 RSP <ffff88006fb83b20>
CR2: ffff88086e7528b8
---[ end trace 0678bac75e67a2f7 ]---
Kernel panic - not syncing: Attempted to kill init!

In short, aftereffect of the wrong P-state is that
cpufreq_stats_update() uses "-1" as index for some array in

cpufreq_stats_update (unsigned int cpu)
{
...
     if (stat->time_in_state)
                stat->time_in_state[stat->last_index] =
                        cputime64_add(stat->time_in_state[stat->last_index],
                                      cputime_sub(cur_time, stat->last_time));
...
}

Fortunately, the wrong P-state value is returned only if the core is
in P-state 0. This fix solves the problem by detecting the
out-of-range P-state, ignoring it, and using "0" instead.

Cc: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-11-25 13:38:29 -05:00
Markus Metzger 1e9b51c283 x86, bts, ftrace: a BTS ftrace plug-in prototype
Impact: add new ftrace plugin

A prototype for a BTS ftrace plug-in.

The tracer collects branch trace in a cyclic buffer for each cpu.

The tracer is not configurable and the trace for each snapshot is
appended when doing cat /debug/tracing/trace.

This is a proof of concept that will be extended with future patches
to become a (hopefully) useful tool.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:31:13 +01:00
Markus Metzger 6abb11aecd x86, bts, ptrace: move BTS buffer allocation from ds.c into ptrace.c
Impact: restructure DS memory allocation to be done by the usage site of DS

Require pre-allocated buffers in ds.h.

Move the BTS buffer allocation for ptrace into ptrace.c.
The pointer to the allocated buffer is stored in the traced task's
task_struct together with the handle returned by ds_request_bts().

Removes memory accounting code.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:31:12 +01:00
Markus Metzger ca0002a179 x86, bts: base in-kernel ds interface on handles
Impact: generalize the DS code to shared buffers

Change the in-kernel ds.h interface to identify the tracer via a
handle returned on ds_request_~().

Tracers used to be identified via their task_struct.

The changes are required to allow DS to be shared between different
tasks, which is needed for perfmon2 and for ftrace.

For ptrace, the handle is stored in the traced task's task_struct.
This should probably go into a (arch-specific) ptrace context some
time.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:31:11 +01:00
Ingo Molnar 7d55718b0c Merge branches 'tracing/core', 'x86/urgent' and 'x86/ptrace' into tracing/hw-branch-tracing
This pulls together all the topic branches that are needed
for the DS/BTS/PEBS tracing work.
2008-11-25 17:30:30 +01:00
Markus Metzger de90add30e x86, bts: fix wrmsr and spinlock over kmalloc
Impact: fix sleeping-with-spinlock-held bugs/crashes

- Turn a wrmsr to write the DS_AREA MSR into a wrmsrl.
- Use irqsave variants of spinlocks.
- Do not allocate memory while holding spinlocks.

Reported-by: Stephane Eranian <eranian@googlemail.com>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:29:02 +01:00
Markus Metzger c4858ffc8f x86, pebs: fix PEBS record size configuration
Impact: fix DS hw enablement on 64-bit x86

Fix the PEBS record size in the DS configuration.

Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:28:53 +01:00
Markus Metzger e5e8ca633b x86, bts: turn macro into static inline function
Impact: cleanup

Replace a macro with a static inline function.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:28:51 +01:00
Markus Metzger 292c669cd7 x86, bts: exclude ds.c from build when disabled
Impact: cleanup

Move the CONFIG guard from the .c file into the makefile.

Reported-by: Andi Kleen <andi-suse@firstfloor.org>
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 17:28:50 +01:00
Julia Lawall eff79aee91 arch/x86/kernel/pci-calgary_64.c: change simple_strtol to simple_strtoul
Impact: fix theoretical option string parsing overflow

Since bridge is unsigned, it would seem better to use simple_strtoul that
simple_strtol.

A simplified version of the semantic patch that makes this change is as
follows: (http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r2@
long e;
position p;
@@

e = simple_strtol@p(...)

@@
position p != r2.p;
type T;
T e;
@@

e =
- simple_strtol@p
+ simple_strtoul
  (...)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: muli@il.ibm.com
Cc: jdmason@kudzu.us
Cc: discuss@x86-64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 15:56:03 +01:00
Steven Rostedt 5cf02b7baf x86: use limited register constraint for setnz
Impact: build fix with certain compilers

GCC can decide to use %dil when "r" is used, which is not valid for
setnz.

This bug was brought out by Stephen Rothwell's merging of the
branch tracer into linux-next.

[ Thanks to Uros Bizjak for recommending 'q' over 'Q' ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 15:38:03 +01:00
Ingo Molnar e951e4af2e x86: fix unused variable warning in arch/x86/kernel/hpet.c
Impact: fix build warning

this warning:

  arch/x86/kernel/hpet.c:36: warning: ‘hpet_num_timers’ defined but not used

Triggers because hpet_num_timers is unused in the !CONFIG_PCI_MSI case.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-25 09:03:43 +01:00
Ingo Molnar 943f3d0300 Merge branches 'sched/core', 'core/core' and 'tracing/core' into cpus4096 2008-11-24 17:46:57 +01:00
Ingo Molnar 6f893fb2e8 Merge branches 'tracing/branch-tracer', 'tracing/fastboot', 'tracing/ftrace', 'tracing/function-return-tracer', 'tracing/power-tracer', 'tracing/powerpc', 'tracing/ring-buffer', 'tracing/stack-tracer' and 'tracing/urgent' into tracing/core 2008-11-24 17:46:24 +01:00
Ingo Molnar b19b3c74c7 Merge branches 'core/debug', 'core/futexes', 'core/locking', 'core/rcu', 'core/signal', 'core/urgent' and 'core/xen' into core/core 2008-11-24 17:44:55 +01:00
Ingo Molnar ad07e914e6 x86 defconfig: increase CONFIG_LOG_BUF_SHIFT
Impact: double the defconfig printk buffer

Booting defconfigs produces more output than 128K so the output is
truncated - double it to 256K.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24 11:33:12 +01:00
Denis V. Lunev e45f2c0774 x86: correct link to HPET timer specification
Impact: update documentation / help text

Original link is dead.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24 10:05:12 +01:00
H. Peter Anvin b47b928842 x86: drop REBOOT_CF9_COND from reboot fallback chain
Impact: Reverts sequence of reboot fallbacks

Checkin 14d7ca5c57 changed the default
reboot method to "pci", a.k.a. port CF9.  Unfortunately this has been
shown to cause lockups on at least two systems for which REBOOT_KBD
worked, both Thinkpads with Intel chipsets.  Checkin
3889d0cea2 reverted the default, but did
not revert the fallback chain.  This checkin reverts the fallback
chain; port CF9 is now only done by explicit "reboot=pci" or a future
potential DMI key.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-24 00:50:09 -08:00
Hannes Eder 3b71e9e307 x86: HPET: fix sparse warning
Impact: make global variable static

Fix this sparse warning:

 arch/x86/kernel/hpet.c:36:18: warning: symbol 'hpet_num_timers' was
 not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 20:23:37 +01:00
jia zhang 5f5db59132 x86, debug: remove the confusing entry in call trace
Impact: improve backtrace quality

avoid the confusion in call trace because of the lack of padding at the
tail of function.

When do_exit gets called, the return address behind call instruction is
pushed into stack. If something get wrong in do_exit, for x86_64, the
entry "kernel_execve +0x00/0xXX" rather than "child_rip +0xYY/0xZZ" is
in the call trace.

That looks confusing, so add a u2d to make the return address still part
of the original call site. (This also catches any instances of us returning
from that function somehow.)

Signed-off-by: jia zhang <jia.zhang2008@gmail.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 20:03:36 +01:00
Hannes Eder a1a00b5885 x86: boot - fix sparse warnings
Impact: make global variables static

Fix these sparse warnings:

 arch/x86/boot/video.c:233:3: warning: symbol 'saved' was not declared. Should it be static?
 arch/x86/boot/video-vga.c:37:13: warning: symbol 'video_vga' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 19:58:58 +01:00
Cyrill Gorcunov 3b6c52b5b6 x86: introduce ENTRY(KPROBE_ENTRY)_X86 assembly helpers to catch unbalanced declaration v3
Impact: make ENTRY()/END() macros more capable

It's usefull to catch unbalanced or messed or mixed declarations of ENTRY and
KPROBES. These macros would help a bit.

For example the following code would compile without problems

        ENTRY_X86(mcount)
                retq
        END_X86(mcount)

But if you forget and mess the following form

        ENTRY_X86(mcount)
                retq
        END(mcount)

        ENTRY_X86(ftrace_caller)

The assembler will issue the following message:
Error: ENTRY_X86/KPROBE_X86 unbalanced,missed,mixed

Actually the checking is performed at every _X86 macro
so maybe it's good idea to put ENTRY_KPROBE_FINAL_X86
at the end of .S file to be sure you didn't miss anything.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Alexander van Heukelum <heukelum@mailshack.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 19:57:01 +01:00
Marcelo Tosatti 0c0f40bdbe KVM: MMU: fix sync of ptes addressed at owner pagetable
During page sync, if a pagetable contains a self referencing pte (that
points to the pagetable), the corresponding spte may be marked as
writable even though all mappings are supposed to be write protected.

Fix by clearing page unsync before syncing individual sptes.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-23 15:24:19 +02:00
Alexander van Heukelum 6efdcfaf16 x86: KPROBE_ENTRY should be paired wth KPROBE_END
Impact: move some code out of .kprobes.text

KPROBE_ENTRY switches code generation to .kprobes.text, and KPROBE_END
uses .popsection to get back to the previous section (.text, normally).
Also replace ENDPROC by END, for consistency.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 14:21:55 +01:00
Alexander van Heukelum 322648d1ba x86: include ENTRY/END in entry handlers in entry_64.S
Impact: cleanup of entry_64.S

Except for the order and the place of the functions, this
patch should not change the generated code.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 14:21:54 +01:00
Avi Kivity bd2b3ca768 KVM: VMX: Fix interrupt loss during race with NMI
If an interrupt cannot be injected for some reason (say, page fault
when fetching the IDT descriptor), the interrupt is marked for
reinjection.  However, if an NMI is queued at this time, the NMI
will be injected instead and the NMI will be lost.

Fix by deferring the NMI injection until the interrupt has been
injected successfully.

Analyzed by Jan Kiszka.

Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-23 14:52:29 +02:00
Hannes Eder 050dc6944b x86: remove duplicate #define from 'cpufeature.h'
Impact: cleanup

Remove duplicate #define from 'cpufeature.h'.

This also fixes the following sparse warning:

 arch/x86/kernel/cpu/capflags.c:54:3: warning: Initializer entry defined twice
 arch/x86/kernel/cpu/capflags.c:58:3:   also defined here

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 13:38:50 +01:00
Ian Campbell 86bbc2c235 xen: pin correct PGD on suspend
Impact: fix Xen guest boot failure

commit eefb47f6a1 ("xen: use
spin_lock_nest_lock when pinning a pagetable") changed xen_pgd_walk to
walk over mm->pgd rather than taking pgd as an argument.

This breaks xen_mm_(un)pin_all() because it makes init_mm.pgd readonly
instead of the pgd we are interested in and therefore the pin subsequently
fails.

(XEN) mm.c:2280:d15 Bad type (saw 00000000e8000001 != exp 0000000060000000) for mfn bc464 (pfn 21ca7)
(XEN) mm.c:2665:d15 Error while pinning mfn bc464

[   14.586913] 1 multicall(s) failed: cpu 0
[   14.586926] Pid: 14, comm: kstop/0 Not tainted 2.6.28-rc5-x86_32p-xenU-00172-gee2f6cc #200
[   14.586940] Call Trace:
[   14.586955]  [<c030c17a>] ? printk+0x18/0x1e
[   14.586972]  [<c0103df3>] xen_mc_flush+0x163/0x1d0
[   14.586986]  [<c0104bc1>] __xen_pgd_pin+0xa1/0x110
[   14.587000]  [<c015a330>] ? stop_cpu+0x0/0xf0
[   14.587015]  [<c0104d7b>] xen_mm_pin_all+0x4b/0x70
[   14.587029]  [<c022bcb9>] xen_suspend+0x39/0xe0
[   14.587042]  [<c015a330>] ? stop_cpu+0x0/0xf0
[   14.587054]  [<c015a3cd>] stop_cpu+0x9d/0xf0
[   14.587067]  [<c01417cd>] run_workqueue+0x8d/0x150
[   14.587080]  [<c030e4b3>] ? _spin_unlock_irqrestore+0x23/0x40
[   14.587094]  [<c014558a>] ? prepare_to_wait+0x3a/0x70
[   14.587107]  [<c0141918>] worker_thread+0x88/0xf0
[   14.587120]  [<c01453c0>] ? autoremove_wake_function+0x0/0x50
[   14.587133]  [<c0141890>] ? worker_thread+0x0/0xf0
[   14.587146]  [<c014509c>] kthread+0x3c/0x70
[   14.587157]  [<c0145060>] ? kthread+0x0/0x70
[   14.587170]  [<c0109d1b>] kernel_thread_helper+0x7/0x10
[   14.587181]   call  1/3: op=14 arg=[c0415000] result=0
[   14.587192]   call  2/3: op=14 arg=[e1ca2000] result=0
[   14.587204]   call  3/3: op=26 arg=[c1808860] result=-22

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 13:32:24 +01:00
Cyrill Gorcunov 8a2503fa4a x86: move dwarf2 related macro to dwarf2.h
Impact: cleanup

Move recently introduced dwarf2 macros to dwarf2.h file.
It allow us to not duplicate them in assembly files.

Active usage of _cfi macros don't make assembly files
more obvious to understand but we already have a lot of
macros there which requires to search the definitions
of them *anyway*. But at least it make every cfi usage
one line shorter.

Also some code alignment is done.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 13:20:52 +01:00
Ingo Molnar 3d994e1076 Merge branch 'oprofile-for-tip' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into x86/urgent 2008-11-23 12:16:57 +01:00
Thomas Gleixner a1967d6441 x86: revert irq number limitation
Impact: fix MSIx not enough irq numbers available regression

The manual revert of the sparse_irq patches missed to bring the number
of possible irqs back to the .27 status. This resulted in a regression
when two multichannel network cards were placed in a system with only
one IO_APIC - causing the networking driver to not have the right
IRQ and the device not coming up.

Remove the dynamic allocation logic leftovers and simply return
NR_IRQS in probe_nr_irqs() for now.

   Fixes: http://lkml.org/lkml/2008/11/19/354

Reported-by: Jesper Dangaard Brouer <hawk@diku.dk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jesper Dangaard Brouer <hawk@diku.dk>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:59:52 +01:00
Török Edwin 8d26487fd4 tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORT
Impact: cleanup

User stack tracing is just implemented for x86, but it is not x86 specific.

Introduce a generic config flag, that is currently enabled only for x86.
When other arches implement it, they will have to
SELECT USER_STACKTRACE_SUPPORT.

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:53:50 +01:00
Török Edwin 8d7c6a9616 tracing/stack-tracer: fix style issues
Impact: cleanup

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:53:48 +01:00
Hannes Eder 4e42ebd57b x86: hypervisor - fix sparse warnings
Impact: fix sparse build warning

Fix the following sparse warnings:

  arch/x86/kernel/cpu/hypervisor.c:37:15: warning: symbol
  'get_hypervisor_tsc_freq' was not declared. Should it be static?
  arch/x86/kernel/cpu/hypervisor.c:53:16: warning: symbol
  'init_hypervisor' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Cc: "Alok N Kataria" <akataria@vmware.com>
Cc: "Dan Hecht" <dhecht@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:11:52 +01:00
Hannes Eder c450d7805b x86: vmware - fix sparse warnings
Impact: fix sparse build warning

Fix the following sparse warnings:

arch/x86/kernel/cpu/vmware.c:69:5: warning: symbol 'vmware_platform'
was not declared. Should it be static?
arch/x86/kernel/cpu/vmware.c:89:15: warning: symbol
'vmware_get_tsc_khz' was not declared. Should it be static?
arch/x86/kernel/cpu/vmware.c:107:16: warning: symbol
'vmware_set_feature_bits' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Cc: "Alok N Kataria" <akataria@vmware.com>
Cc: "Dan Hecht" <dhecht@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 11:02:36 +01:00
Ingo Molnar 9f14416442 Merge commit 'v2.6.28-rc6' into irq/urgent 2008-11-23 10:52:33 +01:00
Hiroshi Shimamoto 2456d738ef x86: signal: cosmetic unification of sys_rt_sigreturn()
Impact: cleanup

Add #ifdef directive for unification.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 10:50:59 +01:00
Hiroshi Shimamoto 666ac7be04 x86: signal: cosmetic unification of sys_sigaltstack()
Impact: cleanup

Add #ifdef directive for unification.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 10:50:58 +01:00
Hiroshi Shimamoto 5c9b3a0c7b x86: signal: cosmetic unification of including headers
Impact: cleanup

Make the headers portion of signal_32.c and signal_64.c the same.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 10:50:57 +01:00
Török Edwin 02b67518e2 tracing: add support for userspace stacktraces in tracing/iter_ctrl
Impact: add new (default-off) tracing visualization feature

Usage example:

 mount -t debugfs nodev /sys/kernel/debug
 cd /sys/kernel/debug/tracing
 echo userstacktrace >iter_ctrl
 echo sched_switch >current_tracer
 echo 1 >tracing_enabled
 .... run application ...
 echo 0 >tracing_enabled

Then read one of 'trace','latency_trace','trace_pipe'.

To get the best output you can compile your userspace programs with
frame pointers (at least glibc + the app you are tracing).

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 09:25:15 +01:00
Frederic Weisbecker f201ae2356 tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically
Impact: use deeper function tracing depth safely

Some tests showed that function return tracing needed a more deeper depth
of function calls. But it could be unsafe to store these return addresses
to the stack.

So these arrays will now be allocated dynamically into task_struct of current
only when the tracer is activated.

Typical scheme when tracer is activated:
- allocate a return stack for each task in global list.
- fork: allocate the return stack for the newly created task
- exit: free return stack of current
- idle init: same as fork

I chose a default depth of 50. I don't have overruns anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 09:17:26 +01:00
Ingo Molnar a0a70c735e Merge branches 'tracing/profiling', 'tracing/options' and 'tracing/urgent' into tracing/core 2008-11-23 09:10:32 +01:00
Ingo Molnar f377fa123d x86: clean up stack overflow debug check
Impact: cleanup

Simplify the irq-sampled stack overflow debug check:

 - eliminate an #idef

 - use WARN_ONCE() to emit a single warning (all bets are off
   after the first such warning anyway)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 09:04:39 +01:00
jia zhang 3aeb95d5b7 x86_64: fix the check in stack_overflow_check
Impact: make stack overflow debug check and printout narrower

stack_overflow_check() should consider the stack usage of pt_regs, and
thus it could warn us in advance. Additionally, it looks better for
the warning time to start at INITIAL_JIFFIES.

Assuming that rsp gets close to the check point before interrupt
arrives: when interrupt really happens, thread_info will be partly
overrode.

Signed-off-by: jia zhang <jia.zhang2008@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-23 08:57:42 +01:00
Ingo Molnar ca9eed7613 Merge commit 'v2.6.28-rc6' into x86/debug 2008-11-23 08:55:47 +01:00
H. Peter Anvin 3889d0cea2 x86: revert default reboot method to REBOOT_KBD
Impact: Reverts default reboot method.

Checkin 14d7ca5c57 changed the default
reboot method to "pci", a.k.a. port CF9.  Unfortunately this has been
shown to cause lockups on at least two systems for which REBOOT_KBD
worked, both Thinkpads with Intel chipsets.  This reverts the default
to REBOOT_KBD, while leaving the option to have "reboot=pci" specified
explicitly or via a DMI match.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-22 23:39:23 -08:00
Alexander van Heukelum c81084114f x86: split out some macro's and move common code to paranoid_exit, fix
Impact: fix bootup crash

Even though it tested fine for me, there was still a bug in the
first patch: I have overlooked a call to ptregscall_common. This
patch fixes that, I think, but the code is never executed for
me while running a debian install... (I tested this by putting
an "1:jmp 1b" in there.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-22 09:45:50 +01:00
Ingo Molnar 57550b27ff Merge commit 'v2.6.28-rc6' into x86/urgent 2008-11-21 20:55:09 +01:00
Alexander van Heukelum b8b1d08bf6 x86: entry_64.S: split out some macro's and move common code to paranoid_exit
Impact: cleanup

DISABLE_INTERRUPTS(CLBR_NONE)/TRACE_IRQS_OFF is now always
executed just before paranoid_exit. Move it there.

Split out paranoidzeroentry, paranoiderrorentry, and
paranoidzeroentry_ist to get more readable macro's.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21 19:02:56 +01:00
Alexander van Heukelum e2f6bc25b9 x86: entry_64.S: factor out save_paranoid and paranoid_exit
Impact: cleanup, shrink kernel image size

Also expand the paranoid_exit0 macro into nmi_exit inside the
nmi stub in the case of enabled irq-tracing.

This gives a few hundred bytes code size reduction.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21 19:02:55 +01:00
Alexander van Heukelum c002a1e6b6 x86: introduce save_rest and restructure the PTREGSCALL macro in entry_64.S
Impact: cleanup

The save_rest function completes a partial stack frame for use
by the PTREGSCALL macro. This also avoids the indirect call in
PTREGSCALLs.

This adds the macro movq_cfi_restore to hide the CFI_RESTORE
annotation when restoring a register from the stack frame.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21 19:02:54 +01:00
Ingo Molnar 14ae22ba2b x86: entry_64.S: rename
Impact: cleanup

Rename:

   CFI_PUSHQ  =>  pushq_cfi
   CFI_POPQ   =>  popq_cfi
   CFI_MOVQ   =>  movq_cfi

To make it blend better into regular assembly code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21 15:20:47 +01:00
Ingo Molnar e8a0e27662 x86: clean up after: move entry_64.S register saving out of the macros, fix
Impact: build fix

The break builds with older binutils (2.16.1):

 arch/x86/kernel/entry_64.S: Assembler messages:
 arch/x86/kernel/entry_64.S:282: Error: too many positional arguments
 arch/x86/kernel/entry_64.S:283: Error: too many positional arguments
 arch/x86/kernel/entry_64.S:284: Error: too many positional arguments
 arch/x86/kernel/entry_64.S:285: Error: too many positional arguments
 arch/x86/kernel/entry_64.S:286: Error: too many positional arguments
 arch/x86/kernel/entry_64.S:287: Error: too many positional arguments
 arch/x86/kernel/entry_64.S:288: Error: too many positional arguments
 arch/x86/kernel/entry_64.S:289: Error: too many positional arguments
 arch/x86/kernel/entry_64.S:290: Error: too many positional arguments

Took some time to figure out the detail that GAS chokes on: it's
negative offsets. Rearrange the calculations to make sure we never
go negative.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21 15:12:28 +01:00
Ingo Molnar fc02e90c34 Merge commit 'v2.6.28-rc6' into sched/core 2008-11-21 08:57:04 +01:00
Hiroshi Shimamoto 3ddd972d97 x86: signal: rename COPY_SEG_STRICT to COPY_SEG_CPL3
Impact: cleanup

Rename macro COPY_SEG_STRICT to COPY_SEG_CPL3, as suggested by hpa.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-21 08:54:28 +01:00
Matthew Wilcox 0ca4b6b001 x86: Fix interrupt leak due to migration
When we migrate an interrupt from one CPU to another, we set the
move_in_progress flag and clean up the vectors later once they're not
being used.  If you're unlucky and call destroy_irq() before the vectors
become un-used, the move_in_progress flag is never cleared, which causes
the interrupt to become unusable.

This was discovered by Jesse Brandeburg for whom it manifested as an
MSI-X device refusing to use MSI-X mode when the driver was unloaded
and reloaded repeatedly.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-20 13:17:40 -08:00
Linus Torvalds 0260da162f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: uaccess_64: fix return value in __copy_from_user()
  x86: quirk for reboot stalls on a Dell Optiplex 330
2008-11-20 13:09:32 -08:00
Alexander van Heukelum dcd072e260 x86: clean up after: move entry_64.S register saving out of the macros
This add-on patch to x86: move entry_64.S register saving out
of the macros visually cleans up the appearance of the code by
introducing some basic helper macro's. It also adds some cfi
annotations which were missing.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-20 19:05:21 +01:00
Rakib Mullick bfe085f62f x86: fixing __cpuinit/__init tangle, xsave_cntxt_init()
Annotate xsave_cntxt_init() as "can be called outside of __init".

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-20 16:43:42 +01:00
Rakib Mullick 9bc646f163 x86: fix __cpuinit/__init tangle in init_thread_xstate()
Impact:	fix incorrect __init annotation

This patch removes the following section mismatch warning. A patch set
was send previously (http://lkml.org/lkml/2008/11/10/407). But
introduce some other problem, reported by Rufus
(http://lkml.org/lkml/2008/11/11/46). Then Ingo Molnar suggest that,
it's best to remove __init from xsave_cntxt_init(void). Which is the
second patch in this series. Now, this one removes the following
warning.

WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2237): Section
mismatch in reference from the function cpu_init() to the function
.init.text:init_thread_xstate()
The function __cpuinit cpu_init() references
a function __init init_thread_xstate().
If init_thread_xstate is only used by cpu_init then
annotate init_thread_xstate with a matching annotation.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-20 16:43:41 +01:00
Alexander van Heukelum d99015b1ab x86: move entry_64.S register saving out of the macros
Here is a combined patch that moves "save_args" out-of-line for
the interrupt macro and moves "error_entry" mostly out-of-line
for the zeroentry and errorentry macros.

The save_args function becomes really straightforward and easy
to understand, with the possible exception of the stack switch
code, which now needs to copy the return address of to the
calling function. Normal interrupts arrive with ((~vector)-0x80)
on the stack, which gets adjusted in common_interrupt:

<common_interrupt>:
(5)  addq   $0xffffffffffffff80,(%rsp)		/* -> ~(vector) */
(4)  sub    $0x50,%rsp				/* space for registers */
(5)  callq  ffffffff80211290 <save_args>
(5)  callq  ffffffff80214290 <do_IRQ>
<ret_from_intr>:
     ...

An apic interrupt stub now look like this:

<thermal_interrupt>:
(5)  pushq  $0xffffffffffffff05			/* ~(vector) */
(4)  sub    $0x50,%rsp				/* space for registers */
(5)  callq  ffffffff80211290 <save_args>
(5)  callq  ffffffff80212b8f <smp_thermal_interrupt>
(5)  jmpq   ffffffff80211f93 <ret_from_intr>

Similarly the exception handler register saving function becomes
simpler, without the need of any parameter shuffling. The stub
for an exception without errorcode looks like this:

<overflow>:
(6)  callq  *0x1cad12(%rip)        # ffffffff803dd448 <pv_irq_ops+0x38>
(2)  pushq  $0xffffffffffffffff			/* no syscall */
(4)  sub    $0x78,%rsp				/* space for registers */
(5)  callq  ffffffff8030e3b0 <error_entry>
(3)  mov    %rsp,%rdi				/* pt_regs pointer */
(2)  xor    %esi,%esi				/* no error code */
(5)  callq  ffffffff80213446 <do_overflow>
(5)  jmpq   ffffffff8030e460 <error_exit>

And one for an exception with errorcode like this:

<segment_not_present>:
(6)  callq  *0x1cab92(%rip)        # ffffffff803dd448 <pv_irq_ops+0x38>
(4)  sub    $0x78,%rsp				/* space for registers */
(5)  callq  ffffffff8030e3b0 <error_entry>
(3)  mov    %rsp,%rdi				/* pt_regs pointer */
(5)  mov    0x78(%rsp),%rsi			/* load error code */
(9)  movq   $0xffffffffffffffff,0x78(%rsp)	/* no syscall */
(5)  callq  ffffffff80213209 <do_segment_not_present>
(5)  jmpq   ffffffff8030e460 <error_exit>

Unfortunately, this last type is more than 32 bytes. But the total space
savings due to this patch is about 2500 bytes on an smp-configuration,
and I think the code is clearer than it was before. The tested kernels
were non-paravirt ones (i.e., without the indirect call at the top of
the exception handlers).

Anyhow, I tested this patch on top of a recent -tip. The machine
was an 2x4-core Xeon at 2333MHz. Measured where the delays between
(almost-)adjacent rdtsc instructions. The graphs show how much
time is spent outside of the program as a function of the measured
delay. The area under the graph represents the total time spent
outside the program. Eight instances of the rdtsctest were
started, each pinned to a single cpu. The histogams are added.
For each kernel two measurements were done: one in mostly idle
condition, the other while running "bonnie++ -f", bound to cpu 0.
Each measurement took 40 minutes runtime. See the attached graphs
for the results. The graphs overlap almost everywhere, but there
are small differences.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-20 10:49:57 +01:00
Ingo Molnar c032a2de4c Merge branch 'x86/cleanups' into x86/irq
[ merged x86/cleanups into x86/irq to enable a wider IRQ entry code
  patch to be applied, which depends on a cleanup patch in x86/cleanups. ]
2008-11-20 10:48:31 +01:00
Yinghai Lu 87f7606591 x86: fix wakeup_cpu with numaq/es7000 v2 - call ->update_genapic()
Impact: fix boot crash on 32-bit

Hiroshi Shimamoto reported a boot failure on 32-bit x86.

The setting of x86_quirks.wakeup_cpu is missing (when
not passing in an explicit apic= boot parameter).

Reported-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-20 10:45:17 +01:00
Richard A. Holden III bb5574608a x86: fix arch/x86/kernel/setup.c build warning when !CONFIG_X86_RESERVE_LOW_64K
Impact: cleanup

Fix:

  arch/x86/kernel/setup.c:592: warning: 'dmi_low_memory_corruption' defined but not used

this is only used if CONFIG_X86_RESERVE_LOW_64K is defined.

Signed-off-by: Richard A. Holden III <aciddeath@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-20 09:04:25 +01:00
Ingo Molnar 90accd6fab Merge branch 'linus' into x86/memory-corruption-check 2008-11-20 09:03:38 +01:00
Richard A. Holden III 77be80e437 x86: fix arch/x86/kernel/genx2apic_uv_x.c build warning when !CONFIG_HOTPLUG_CPU
Impact: cleanup, reduce size of the kernel image a bit

Fix:

  arch/x86/kernel/genx2apic_uv_x.c:403: warning: 'uv_heartbeat_disable' defined but not used

the function is only used when CONFIG_HOTPLUG_CPU is defined.

Signed-off-by: Richard A. Holden III <aciddeath@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-20 09:03:04 +01:00
Ingo Molnar fbc2a06056 Merge branch 'linus' into x86/uv 2008-11-20 09:02:39 +01:00
Linus Torvalds 3108864e2d Merge branch 'x86/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/numa' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: make NUMA on 32-bit depend on EXPERIMENTAL again
  x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set
2008-11-19 18:53:02 -08:00
Linus Torvalds 4f7dbc7ff4 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: more general identifier for Phoenix BIOS
  AMD IOMMU: check for next_bit also in unmapped area
  AMD IOMMU: fix fullflush comparison length
  AMD IOMMU: enable device isolation per default
  AMD IOMMU: add parameter to disable device isolation
  x86, PEBS/DS: fix code flow in ds_request()
  x86: add rdtsc barrier to TSC sync check
  xen: fix scrub_page()
  x86: fix es7000 compiling
  x86, bts: fix unlock problem in ds.c
  x86, voyager: fix smp generic helper voyager breakage
  x86: move iomap.h to the new include location
2008-11-19 18:51:56 -08:00
Ulrich Drepper de11defebf reintroduce accept4
Introduce a new accept4() system call.  The addition of this system call
matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
inotify_init1(), epoll_create1(), pipe2()) which added new system calls
that differed from analogous traditional system calls in adding a flags
argument that can be used to access additional functionality.

The accept4() system call is exactly the same as accept(), except that
it adds a flags bit-mask argument.  Two flags are initially implemented.
(Most of the new system calls in 2.6.27 also had both of these flags.)

SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
for the new file descriptor returned by accept4().  This is a useful
security feature to avoid leaking information in a multithreaded
program where one thread is doing an accept() at the same time as
another thread is doing a fork() plus exec().  More details here:
http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
Ulrich Drepper).

The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
to be enabled on the new open file description created by accept4().
(This flag is merely a convenience, saving the use of additional calls
fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.

Here's a test program.  Works on x86-32.  Should work on x86-64, but
I (mtk) don't have a system to hand to test with.

It tests accept4() with each of the four possible combinations of
SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
that the appropriate flags are set on the file descriptor/open file
description returned by accept4().

I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
and it passes according to my test program.

/* test_accept4.c

  Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
       <mtk.manpages@gmail.com>

  Licensed under the GNU GPLv2 or later.
*/
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>

#define PORT_NUM 33333

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

/**********************************************************************/

/* The following is what we need until glibc gets a wrapper for
  accept4() */

/* Flags for socket(), socketpair(), accept4() */
#ifndef SOCK_CLOEXEC
#define SOCK_CLOEXEC    O_CLOEXEC
#endif
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK   O_NONBLOCK
#endif

#ifdef __x86_64__
#define SYS_accept4 288
#elif __i386__
#define USE_SOCKETCALL 1
#define SYS_ACCEPT4 18
#else
#error "Sorry -- don't know the syscall # on this architecture"
#endif

static int
accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
{
   printf("Calling accept4(): flags = %x", flags);
   if (flags != 0) {
       printf(" (");
       if (flags & SOCK_CLOEXEC)
           printf("SOCK_CLOEXEC");
       if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
           printf(" ");
       if (flags & SOCK_NONBLOCK)
           printf("SOCK_NONBLOCK");
       printf(")");
   }
   printf("\n");

#if USE_SOCKETCALL
   long args[6];

   args[0] = fd;
   args[1] = (long) sockaddr;
   args[2] = (long) addrlen;
   args[3] = flags;

   return syscall(SYS_socketcall, SYS_ACCEPT4, args);
#else
   return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
#endif
}

/**********************************************************************/

static int
do_test(int lfd, struct sockaddr_in *conn_addr,
       int closeonexec_flag, int nonblock_flag)
{
   int connfd, acceptfd;
   int fdf, flf, fdf_pass, flf_pass;
   struct sockaddr_in claddr;
   socklen_t addrlen;

   printf("=======================================\n");

   connfd = socket(AF_INET, SOCK_STREAM, 0);
   if (connfd == -1)
       die("socket");
   if (connect(connfd, (struct sockaddr *) conn_addr,
               sizeof(struct sockaddr_in)) == -1)
       die("connect");

   addrlen = sizeof(struct sockaddr_in);
   acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
                      closeonexec_flag | nonblock_flag);
   if (acceptfd == -1) {
       perror("accept4()");
       close(connfd);
       return 0;
   }

   fdf = fcntl(acceptfd, F_GETFD);
   if (fdf == -1)
       die("fcntl:F_GETFD");
   fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
              ((closeonexec_flag & SOCK_CLOEXEC) != 0);
   printf("Close-on-exec flag is %sset (%s); ",
           (fdf & FD_CLOEXEC) ? "" : "not ",
           fdf_pass ? "OK" : "failed");

   flf = fcntl(acceptfd, F_GETFL);
   if (flf == -1)
       die("fcntl:F_GETFD");
   flf_pass = ((flf & O_NONBLOCK) != 0) ==
              ((nonblock_flag & SOCK_NONBLOCK) !=0);
   printf("nonblock flag is %sset (%s)\n",
           (flf & O_NONBLOCK) ? "" : "not ",
           flf_pass ? "OK" : "failed");

   close(acceptfd);
   close(connfd);

   printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
   return fdf_pass && flf_pass;
}

static int
create_listening_socket(int port_num)
{
   struct sockaddr_in svaddr;
   int lfd;
   int optval;

   memset(&svaddr, 0, sizeof(struct sockaddr_in));
   svaddr.sin_family = AF_INET;
   svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
   svaddr.sin_port = htons(port_num);

   lfd = socket(AF_INET, SOCK_STREAM, 0);
   if (lfd == -1)
       die("socket");

   optval = 1;
   if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                  sizeof(optval)) == -1)
       die("setsockopt");

   if (bind(lfd, (struct sockaddr *) &svaddr,
            sizeof(struct sockaddr_in)) == -1)
       die("bind");

   if (listen(lfd, 5) == -1)
       die("listen");

   return lfd;
}

int
main(int argc, char *argv[])
{
   struct sockaddr_in conn_addr;
   int lfd;
   int port_num;
   int passed;

   passed = 1;

   port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;

   memset(&conn_addr, 0, sizeof(struct sockaddr_in));
   conn_addr.sin_family = AF_INET;
   conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
   conn_addr.sin_port = htons(port_num);

   lfd = create_listening_socket(port_num);

   if (!do_test(lfd, &conn_addr, 0, 0))
       passed = 0;
   if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
       passed = 0;
   if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
       passed = 0;
   if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
       passed = 0;

   close(lfd);

   exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
}

[mtk.manpages@gmail.com: rewrote changelog, updated test program]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-19 18:49:57 -08:00
Ingo Molnar 9676e73a9e Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core
Conflicts:
	kernel/trace/ftrace.c

[ We conflicted here because we backported a few fixes to
  tracing/urgent - which has different internal APIs. ]
2008-11-19 10:04:25 +01:00
Ingo Molnar 3ac3ba0b39 Merge branch 'linus' into sched/core
Conflicts:
	kernel/Makefile
2008-11-19 09:44:37 +01:00
Hiroshi Shimamoto 20a4a236c7 x86: uaccess_64: fix return value in __copy_from_user()
__copy_from_user() will return invalid value 16 when it fails to
access user space and the size is 10.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 22:28:58 +01:00
Steve Conklin 093bac154c x86: quirk for reboot stalls on a Dell Optiplex 330
Dell Optiplex 330 appears to hang on reboot. This is resolved by adding
a quirk to set bios reboot.

Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Signed-off-by: Steve Conklin <steve.conklin@canonical.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 22:22:29 +01:00
Yinghai Lu b5fe363b7d x86: use update_genapic to get rid of ES7000_CLUSTERED_APIC v2
Impact: clean up

We can autodetect those system that need cluster apic, and update genapic
accordingly.

We can also remove wakeup.h for e7000, because it's default one is now
the same as overall default mach_wakecpu.h

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 17:35:40 +01:00
Ingo Molnar f632ddcc07 x86: fix wakeup_cpu with numaq/es7000, v2, fix #2
Impact: fix boot crash

fix default_update_genapic().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 17:35:19 +01:00
Hiroshi Shimamoto 64977609e3 x86: ia32_signal: change order of storing in setup_sigcontext()
Impact: cleanup

Change order of storing to match the sigcontext_ia32.
And add casting to make this code same as arch/x86/kernel/signal_32.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 16:55:37 +01:00
Hiroshi Shimamoto 047ce93581 x86: ia32_signal: remove using temporary variable
Impact: cleanup

No need to use temporary variable.
Also rename the variable same as arch/x86/kernel/signal_32.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Reviewed-by: WANG Cong <wangcong@zeuux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 16:55:36 +01:00
Hiroshi Shimamoto 8c6e5ce0fd x86: ia32_signal: cleanup macro RELOAD_SEG
Impact: cleanup

Remove mask parameter because it's always 3.
Cleanup coding styles.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Reviewed-by: WANG Cong <wangcong@zeuux.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 16:55:35 +01:00
Hiroshi Shimamoto d71a68dca5 x86: ia32_signal: introduce COPY_SEG_CPL3
Impact: cleanup

Introduce COPY_SEG_CPL3 for ia32_restore_sigcontext().

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 16:55:34 +01:00
Hiroshi Shimamoto b78a5b5260 x86: ia32_signal: cleanup macro COPY
Impact: cleanup

No need to use temporary variable in this case.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 16:55:33 +01:00
Ingo Molnar 73f56c0d35 Merge branch 'iommu-fixes-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2008-11-18 16:48:49 +01:00
Philipp Kohlbecher 0af40a4b10 x86: more general identifier for Phoenix BIOS
Impact: widen the reach of the low-memory-protect DMI quirk

Phoenix BIOSes variously identify their vendor as "Phoenix Technologies,
LTD" or "Phoenix Technologies LTD" (without the comma.)

This patch makes the identification string in the bad_bios_dmi_table
more general (following a suggestion by Ingo Molnar), so that both
versions are handled.

Again, the patched file compiles cleanly and the patch has been tested
successfully on my machine.

Signed-off-by: Philipp Kohlbecher <xt28@gmx.de>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 16:11:36 +01:00
Joerg Roedel 8501c45cc3 AMD IOMMU: check for next_bit also in unmapped area
Impact: fix possible use of stale IO/TLB entries

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-18 15:44:43 +01:00
Joerg Roedel 695b5676c7 AMD IOMMU: fix fullflush comparison length
Impact: fix comparison length for 'fullflush'

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-18 15:44:42 +01:00
Joerg Roedel 3ce1f93c6d AMD IOMMU: enable device isolation per default
Impact: makes device isolation the default for AMD IOMMU

Some device drivers showed double-free bugs of DMA memory while testing
them with AMD IOMMU. If all devices share the same protection domain
this can lead to data corruption and data loss. Prevent this by putting
each device into its own protection domain per default.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-18 15:44:31 +01:00
Joerg Roedel e5e1f606ec AMD IOMMU: add parameter to disable device isolation
Impact: add a new AMD IOMMU kernel command line parameter

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-18 15:43:23 +01:00
Ingo Molnar cbe9ee00ce Merge branch 'x86/urgent' into x86/cleanups 2008-11-18 15:41:36 +01:00
Ingo Molnar 10db4ef7b9 x86, PEBS/DS: fix code flow in ds_request()
this compiler warning:

  arch/x86/kernel/ds.c: In function 'ds_request':
  arch/x86/kernel/ds.c:368: warning: 'context' may be used uninitialized in this function

Shows that the code flow in ds_request() is buggy - it goes into
the unlock+release-context path even when the context is not allocated
yet.

First allocate the context, then do the other checks.

Also, take care with GFP allocations under the ds_lock spinlock.

Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 15:34:36 +01:00
Joerg Roedel a1afd01c17 x86: default to SWIOTLB=y on x86_64
Impact: fixes korg bugzilla 11980

A kernel for a 64bit x86 system should always contain the swiotlb code
in case it is booted on a machine without any hardware IOMMU supported
by the kernel and more than 4GB of RAM. This patch changes Kconfig to
always compile swiotlb into the kernel for x86_64.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: stable@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 14:52:46 +01:00
Frederic Weisbecker 0231022cc3 tracing/function-return-tracer: add the overrun field
Impact: help to find the better depth of trace

We decided to arbitrary define the depth of function return trace as
"20". Perhaps this is not enough. To help finding an optimal depth, we
measure now the overrun: the number of functions that have been missed
for the current thread. By default this is not displayed, we have to
do set a particular flag on the return tracer: echo overrun >
/debug/tracing/trace_options And the overrun will be printed on the
right.

As the trace shows below, the current 20 depth is not enough.

update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 11:11:00 +01:00
James Morris f3a5c54701 Merge branch 'master' into next
Conflicts:
	fs/cifs/misc.c

Merge to resolve above, per the patch below.

Signed-off-by: James Morris <jmorris@namei.org>

diff --cc fs/cifs/misc.c
index ec36410,addd1dc..0000000
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@@ -347,13 -338,13 +338,13 @@@ header_assemble(struct smb_hdr *buffer
  		/*  BB Add support for establishing new tCon and SMB Session  */
  		/*      with userid/password pairs found on the smb session   */
  		/*	for other target tcp/ip addresses 		BB    */
 -				if (current->fsuid != treeCon->ses->linux_uid) {
 +				if (current_fsuid() != treeCon->ses->linux_uid) {
  					cFYI(1, ("Multiuser mode and UID "
  						 "did not match tcon uid"));
- 					read_lock(&GlobalSMBSeslock);
- 					list_for_each(temp_item, &GlobalSMBSessionList) {
- 						ses = list_entry(temp_item, struct cifsSesInfo, cifsSessionList);
+ 					read_lock(&cifs_tcp_ses_lock);
+ 					list_for_each(temp_item, &treeCon->ses->server->smb_ses_list) {
+ 						ses = list_entry(temp_item, struct cifsSesInfo, smb_ses_list);
 -						if (ses->linux_uid == current->fsuid) {
 +						if (ses->linux_uid == current_fsuid()) {
  							if (ses->server == treeCon->ses->server) {
  								cFYI(1, ("found matching uid substitute right smb_uid"));
  								buffer->Uid = ses->Suid;
2008-11-18 18:52:37 +11:00
Yinghai Lu 54ac14a8e9 x86: fix wakeup_cpu with numaq/es7000, v2, fix
Impact: fix wakeup_secondary_cpu with hotplug

We can not put that into x86_quirks, because that is __initdata.
So try to move that to genapic, and add update_genapic in x86_quirks.

later we even could use that stub to:

 1. autodetect CONFIG_ES7000_CLUSTERED_APIC
 2. more correct inquire_remote_apic with apic_verbosity setting.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 00:27:24 +01:00
Venki Pallipadi 93ce99e849 x86: add rdtsc barrier to TSC sync check
Impact: fix incorrectly marked unstable TSC clock

Patch (commit 0d12cdd "sched: improve sched_clock() performance") has
a regression on one of the test systems here.

With the patch, I see:

 checking TSC synchronization [CPU#0 -> CPU#1]:
 Measured 28 cycles TSC warp between CPUs, turning off TSC clock.
 Marking TSC unstable due to check_tsc_sync_source failed

Whereas, without the patch syncs pass fine on all CPUs:

 checking TSC synchronization [CPU#0 -> CPU#1]: passed.

Due to this, TSC is marked unstable, when it is not actually unstable.
This is because syncs in check_tsc_wrap() goes away due to this commit.

As per the discussion on this thread, correct way to fix this is to add
explicit syncs as below?

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-18 00:15:02 +01:00
Eric Dumazet a4a16beade oprofile: fix an overflow in ppro code
reset_value was changed from long to u64 in commit
b991702884 (oprofile: Implement Intel
architectural perfmon support)

But dynamic allocation of this array use a wrong type (long instead of
u64)

Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-11-17 18:47:36 +01:00
Yinghai Lu 569712b2b0 x86: fix wakeup_cpu with numaq/es7000, v2
Impact: fix secondary-CPU wakeup/init path with numaq and es7000

While looking at wakeup_secondary_cpu for WAKE_SECONDARY_VIA_NMI:

|#ifdef WAKE_SECONDARY_VIA_NMI
|/*
| * Poke the other CPU in the eye via NMI to wake it up. Remember that the normal
| * INIT, INIT, STARTUP sequence will reset the chip hard for us, and this
| * won't ... remember to clear down the APIC, etc later.
| */
|static int __devinit
|wakeup_secondary_cpu(int logical_apicid, unsigned long start_eip)
|{
|        unsigned long send_status, accept_status = 0;
|        int maxlvt;
|...
|        if (APIC_INTEGRATED(apic_version[phys_apicid])) {
|                maxlvt = lapic_get_maxlvt();

I noticed that there is no warning about undefined phys_apicid...

because WAKE_SECONDARY_VIA_NMI and WAKE_SECONDARY_VIA_INIT can not be
defined at the same time. So NUMAQ is using wrong wakeup_secondary_cpu.

WAKE_SECONDARY_VIA_NMI, WAKE_SECONDARY_VIA_INIT and
WAKE_SECONDARY_VIA_MIP are variants of a weird and fragile
preprocessor-driven "HAL" mechanisms to specify the kind of secondary-CPU
wakeup strategy a given x86 kernel will use.

The vast majority of systems want to use INIT for secondary wakeup - NUMAQ
uses an NMI, (old-style-) ES7000 uses 'MIP' (a firmware driven in-memory
flag to let secondaries continue).

So convert these mechanisms to x86_quirks and add a
->wakeup_secondary_cpu() method to specify the rare exception
to the sane default.

Extend genapic accordingly as well, for 32-bit.

While looking further, I noticed that functions in wakecup.h for numaq
and es7000 are different to the default in mach_wakecpu.h - but smpboot.c
will only use default mach_wakecpu.h with smphook.h.

So we need to add mach_wakecpu.h for mach_generic, to properly support
numaq and es7000, and vectorize the following SMP init methods:

	int trampoline_phys_low;
	int trampoline_phys_high;
	void (*wait_for_init_deassert)(atomic_t *deassert);
	void (*smp_callin_clear_local_apic)(void);
	void (*store_NMI_vector)(unsigned short *high, unsigned short *low);
	void (*restore_NMI_vector)(unsigned short *high, unsigned short *low);
	void (*inquire_remote_apic)(int apicid);

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-17 17:57:34 +01:00
Alexander van Heukelum 0bd7b79851 x86: entry_64.S: remove whitespace at end of lines
Impact: cleanup

All blame goes to: color white,red "[^[:graph:]]+$"
in .nanorc ;).

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-17 10:46:55 +01:00
Ingo Molnar 9dacc71ff3 Merge commit 'v2.6.28-rc5' into x86/cleanups 2008-11-17 10:46:18 +01:00
Yinghai Lu d3c6aa1e69 x86: fix es7000 compiling
Impact: fix es7000 build

  CC      arch/x86/kernel/es7000_32.o
arch/x86/kernel/es7000_32.c: In function find_unisys_acpi_oem_table:
arch/x86/kernel/es7000_32.c:255: error: implicit declaration of function acpi_get_table_with_size
arch/x86/kernel/es7000_32.c:261: error: implicit declaration of function early_acpi_os_unmap_memory
arch/x86/kernel/es7000_32.c: In function unmap_unisys_acpi_oem_table:
arch/x86/kernel/es7000_32.c:277: error: implicit declaration of function __acpi_unmap_table
make[1]: *** [arch/x86/kernel/es7000_32.o] Error 1

we applied one patch out of order...

| commit a73aaedd95
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date:   Sun Sep 14 02:33:14 2008 -0700
|
|    x86: check dsdt before find oem table for es7000, v2
|
|    v2: use __acpi_unmap_table()

that patch need:

	x86: use early_ioremap in __acpi_map_table
	x86: always explicitly map acpi memory
	acpi: remove final __acpi_map_table mapping before setting acpi_gbl_permanent_mmap
	acpi/x86: introduce __apci_map_table, v4

submitted to the ACPI tree but not upstream yet.

fix it until those patches applied, need to revert this one

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 10:05:07 +01:00
Markus Metzger d1f1e9c010 x86, bts: fix unlock problem in ds.c
Fix a problem where ds_request() returned an error without releasing the
ds lock.

Reported-by: Stephane Eranian <eranian@gmail.com>
Signed-off-by: Markus Metzger <markus.t.metzger@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 08:25:36 +01:00
Frederic Weisbecker e7d3737ea1 tracing/function-return-tracer: support for dynamic ftrace on function return tracer
This patch adds the support for dynamic tracing on the function return tracer.
The whole difference with normal dynamic function tracing is that we don't need
to hook on a particular callback. The only pro that we want is to nop or set
dynamically the calls to ftrace_caller (which is ftrace_return_caller here).

Some security checks ensure that we are not trying to launch dynamic tracing for
return tracing while normal function tracing is already running.

An example of trace with getnstimeofday set as a filter:

ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:57:38 +01:00
Frederic Weisbecker b01c746617 tracing/function-return-tracer: add a barrier to ensure return stack index is incremented in memory
Impact: fix possible race condition in ftrace function return tracer

This fixes a possible race condition if index incrementation
is not immediately flushed in memory.

Thanks for Andi Kleen and Steven Rostedt for pointing out this issue
and give me this solution.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:57:37 +01:00
Steven Rostedt 31e889098a ftrace: pass module struct to arch dynamic ftrace functions
Impact: allow archs more flexibility on dynamic ftrace implementations

Dynamic ftrace has largly been developed on x86. Since x86 does not
have the same limitations as other architectures, the ftrace interaction
between the generic code and the architecture specific code was not
flexible enough to handle some of the issues that other architectures
have.

Most notably, module trampolines. Due to the limited branch distance
that archs make in calling kernel core code from modules, the module
load code must create a trampoline to jump to what will make the
larger jump into core kernel code.

The problem arises when this happens to a call to mcount. Ftrace checks
all code before modifying it and makes sure the current code is what
it expects. Right now, there is not enough information to handle modifying
module trampolines.

This patch changes the API between generic dynamic ftrace code and
the arch dependent code. There is now two functions for modifying code:

  ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
       a nop, where the original text is calling addr. (mod is the
       module struct if called by module init)

  ftrace_make_caller(rec, addr) - convert the code rec->ip that should
       be a nop into a caller to addr.

The record "rec" now has a new field called "arch" where the architecture
can add any special attributes to each call site record.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16 07:36:02 +01:00
David Woodhouse 52168e60f7 Revert "x86: blacklist DMAR on Intel G31/G33 chipsets"
This reverts commit e51af66308, which was
wrongly hoovered up and submitted about a month after a better fix had
already been merged.

The better fix is commit cbda1ba898
("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do
this blacklisting based on the DMI identification for the offending
motherboard, since sometimes this chipset (or at least a chipset with
the same PCI ID) apparently _does_ actually have an IOMMU.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-15 11:37:16 -08:00
Alexander van Heukelum 722024dbb7 x86: irq: fix apicinterrupts on 64 bits
Impact: Fix interrupt via the apicinterrupt macro

Checkin 939b787130 changed the
"interrupt" macro, but the "interrupt" macro is also invoked
indirectly from the "apicinterrupt" macro.

The "apicinterrupt" macro probably should have its own collection of
systematic stubs for the same reason the main IRQ code does; as is it
is a huge amount of replicated code.

Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-13 17:28:38 -08:00
James Morris 2b82892565 Merge branch 'master' into next
Conflicts:
	security/keys/internal.h
	security/keys/process_keys.c
	security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 11:29:12 +11:00
David Howells a6f76f23d2 CRED: Make execve() take advantage of copy-on-write credentials
Make execve() take advantage of copy-on-write credentials, allowing it to set
up the credentials in advance, and then commit the whole lot after the point
of no return.

This patch and the preceding patches have been tested with the LTP SELinux
testsuite.

This patch makes several logical sets of alteration:

 (1) execve().

     The credential bits from struct linux_binprm are, for the most part,
     replaced with a single credentials pointer (bprm->cred).  This means that
     all the creds can be calculated in advance and then applied at the point
     of no return with no possibility of failure.

     I would like to replace bprm->cap_effective with:

	cap_isclear(bprm->cap_effective)

     but this seems impossible due to special behaviour for processes of pid 1
     (they always retain their parent's capability masks where normally they'd
     be changed - see cap_bprm_set_creds()).

     The following sequence of events now happens:

     (a) At the start of do_execve, the current task's cred_exec_mutex is
     	 locked to prevent PTRACE_ATTACH from obsoleting the calculation of
     	 creds that we make.

     (a) prepare_exec_creds() is then called to make a copy of the current
     	 task's credentials and prepare it.  This copy is then assigned to
     	 bprm->cred.

  	 This renders security_bprm_alloc() and security_bprm_free()
     	 unnecessary, and so they've been removed.

     (b) The determination of unsafe execution is now performed immediately
     	 after (a) rather than later on in the code.  The result is stored in
     	 bprm->unsafe for future reference.

     (c) prepare_binprm() is called, possibly multiple times.

     	 (i) This applies the result of set[ug]id binaries to the new creds
     	     attached to bprm->cred.  Personality bit clearance is recorded,
     	     but now deferred on the basis that the exec procedure may yet
     	     fail.

         (ii) This then calls the new security_bprm_set_creds().  This should
	     calculate the new LSM and capability credentials into *bprm->cred.

	     This folds together security_bprm_set() and parts of
	     security_bprm_apply_creds() (these two have been removed).
	     Anything that might fail must be done at this point.

         (iii) bprm->cred_prepared is set to 1.

	     bprm->cred_prepared is 0 on the first pass of the security
	     calculations, and 1 on all subsequent passes.  This allows SELinux
	     in (ii) to base its calculations only on the initial script and
	     not on the interpreter.

     (d) flush_old_exec() is called to commit the task to execution.  This
     	 performs the following steps with regard to credentials:

	 (i) Clear pdeath_signal and set dumpable on certain circumstances that
	     may not be covered by commit_creds().

         (ii) Clear any bits in current->personality that were deferred from
             (c.i).

     (e) install_exec_creds() [compute_creds() as was] is called to install the
     	 new credentials.  This performs the following steps with regard to
     	 credentials:

         (i) Calls security_bprm_committing_creds() to apply any security
             requirements, such as flushing unauthorised files in SELinux, that
             must be done before the credentials are changed.

	     This is made up of bits of security_bprm_apply_creds() and
	     security_bprm_post_apply_creds(), both of which have been removed.
	     This function is not allowed to fail; anything that might fail
	     must have been done in (c.ii).

         (ii) Calls commit_creds() to apply the new credentials in a single
             assignment (more or less).  Possibly pdeath_signal and dumpable
             should be part of struct creds.

	 (iii) Unlocks the task's cred_replace_mutex, thus allowing
	     PTRACE_ATTACH to take place.

         (iv) Clears The bprm->cred pointer as the credentials it was holding
             are now immutable.

         (v) Calls security_bprm_committed_creds() to apply any security
             alterations that must be done after the creds have been changed.
             SELinux uses this to flush signals and signal handlers.

     (f) If an error occurs before (d.i), bprm_free() will call abort_creds()
     	 to destroy the proposed new credentials and will then unlock
     	 cred_replace_mutex.  No changes to the credentials will have been
     	 made.

 (2) LSM interface.

     A number of functions have been changed, added or removed:

     (*) security_bprm_alloc(), ->bprm_alloc_security()
     (*) security_bprm_free(), ->bprm_free_security()

     	 Removed in favour of preparing new credentials and modifying those.

     (*) security_bprm_apply_creds(), ->bprm_apply_creds()
     (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds()

     	 Removed; split between security_bprm_set_creds(),
     	 security_bprm_committing_creds() and security_bprm_committed_creds().

     (*) security_bprm_set(), ->bprm_set_security()

     	 Removed; folded into security_bprm_set_creds().

     (*) security_bprm_set_creds(), ->bprm_set_creds()

     	 New.  The new credentials in bprm->creds should be checked and set up
     	 as appropriate.  bprm->cred_prepared is 0 on the first call, 1 on the
     	 second and subsequent calls.

     (*) security_bprm_committing_creds(), ->bprm_committing_creds()
     (*) security_bprm_committed_creds(), ->bprm_committed_creds()

     	 New.  Apply the security effects of the new credentials.  This
     	 includes closing unauthorised files in SELinux.  This function may not
     	 fail.  When the former is called, the creds haven't yet been applied
     	 to the process; when the latter is called, they have.

 	 The former may access bprm->cred, the latter may not.

 (3) SELinux.

     SELinux has a number of changes, in addition to those to support the LSM
     interface changes mentioned above:

     (a) The bprm_security_struct struct has been removed in favour of using
     	 the credentials-under-construction approach.

     (c) flush_unauthorized_files() now takes a cred pointer and passes it on
     	 to inode_has_perm(), file_has_perm() and dentry_open().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:39:24 +11:00
David Howells 350b4da71f CRED: Wrap task credential accesses in the x86 arch
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14 10:38:40 +11:00
Ingo Molnar 24de38620d Merge branches 'tracing/branch-tracer', 'tracing/fastboot', 'tracing/function-return-tracer' and 'tracing/urgent' into tracing/core 2008-11-13 09:48:03 +01:00
Rafael J. Wysocki 604d205548 x86: make NUMA on 32-bit depend on EXPERIMENTAL again
My previous patch to make CONFIG_NUMA on x86_32 depend on BROKEN
turned out to be unnecessary, after all, since the source of the
hibernation vs CONFIG_NUMA problem turned out to be the fact that
we didn't take the NUMA KVA remapping into account in the
hibernation code.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:28:52 +01:00
Rafael J. Wysocki 97a70e548b x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA set
Impact: fix crash during hibernation on 32-bit NUMA

The NUMA code on x86_32 creates special memory mapping that allows
each node's pgdat to be located in this node's memory.  For this
purpose it allocates a memory area at the end of each node's memory
and maps this area so that it is accessible with virtual addresses
belonging to low memory.  As a result, if there is high memory,
these NUMA-allocated areas are physically located in high memory,
although they are mapped to low memory addresses.

Our hibernation code does not take that into account and for this
reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and
with high memory present.  Fix this by adding a special mapping for
the NUMA-allocated memory areas to the temporary page tables created
during the last phase of resume.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:28:51 +01:00
Frederic Weisbecker 1dc1c6adf3 tracing/function-return-tracer: call prepare_ftrace_return by registers
Impact: Optimize a bit the function return tracer

This patch changes the calling convention of prepare_ftrace_return to
pass its arguments by register. This will optimize it a bit and
prepare it to support dynamic tracing.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:15:43 +01:00
Frederic Weisbecker 62d59d17a5 tracing/function-return-tracer: make the function return tracer lockless
Impact: remove spinlocks and irq disabling in function return tracer.

I've tried to figure out all of the race condition that could happen
when the tracer pushes or pops a return address trace to/from the
current thread_info.

Theory:

_ One thread can only execute on one cpu at a time. So this code
  doesn't need to be SMP-safe. Just drop the spinlock.

_ The only race could happen between the current thread and an
  interrupt. If an interrupt is raised, it will increase the index of
  the return stack storage and then execute until the end of the
  tracing to finally free the index it used. We don't need to disable
  irqs.

This is theorical. In practice, I've tested it with a two-core SMP and
had no problem at all. Perhaps -tip testing could confirm it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 23:15:43 +01:00
Steven Rostedt 2ed84eeb88 trace: rename unlikely profiler to branch profiler
Impact: name change of unlikely tracer and profiler

Ingo Molnar suggested changing the config from UNLIKELY_PROFILE
to BRANCH_PROFILING. I never did like the "unlikely" name so I
went one step farther, and renamed all the unlikely configurations
to a "BRANCH" variant.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 22:27:58 +01:00
Prarit Bhargava 8652cb4b0d x86: warn of incorrect cpu_khz on AMD systems
Impact: add debug check

If none of the perfctrs are free when calculating cpu_khz we default to
using ctr 3 (ie, we just choose 3).  This may lead to an incorrect tsc
freq value which can cause the system to be unstable.

To aid in future debugging, WARN the user of a potential problem.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 19:57:35 +01:00
Linus Torvalds 5d2007ebc2 Merge branch 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Fix pit memory leak if unable to allocate irq source id
  KVM: ia64: fix vmm_spin_{un}lock for !CONFIG_SMP
  KVM: VMX: Set IGMT bit in EPT entry
  KVM: Require the PCI subsystem
  x86: KVM guest: fix section mismatch warning in kvmclock.c
  KVM: ia64: Use guest signal mask when blocking
  KVM: MMU: increase per-vcpu rmap cache alloc size
2008-11-12 10:38:42 -08:00
H. Peter Anvin 8665596ec0 x86: fix up the new IRQ code for older versions of gas
Older versions of gas don't implement the C-style != operator, they
instead want the Pascal-style <> operator.  Change != to <> so we
don't break compilation with those old versions of gas.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-12 10:27:35 -08:00
Linus Torvalds 08c1184fa2 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (47 commits)
  ACPI: pci_link: remove acpi_irq_balance_set() interface
  fujitsu-laptop: Add DMI callback for Lifebook S6420
  ACPI: EC: Don't do transaction from GPE handler in poll mode.
  ACPI: EC: lower interrupt storm treshold
  ACPICA: Use spinlock for acpi_{en|dis}able_gpe
  ACPI: EC: restart failed command
  ACPI: EC: wait for last write gpe
  ACPI: EC: make kernel messages more useful when GPE storm is detected
  ACPI: EC: revert msleep patch
  thinkpad_acpi: fingers off backlight if video.ko is serving this functionality
  sony-laptop: fingers off backlight if video.ko is serving this functionality
  msi-laptop: fingers off backlight if video.ko is serving this functionality
  fujitsu-laptop: fingers off backlight if video.ko is serving this functionality
  eeepc-laptop: fingers off backlight if video.ko is serving this functionality
  compal: fingers off backlight if video.ko is serving this functionality
  asus-acpi: fingers off backlight if video.ko is serving this functionality
  Acer-WMI: fingers off backlight if video.ko is serving this functionality
  ACPI video: if no ACPI backlight support, use vendor drivers
  ACPI: video: Ignore devices that aren't present in hardware
  Delete an unwanted return statement at evgpe.c
  ...
2008-11-12 10:24:46 -08:00
Eduardo Habkost c415b3dce3 x86: disable IRQs before doing anything on nmi_shootdown_cpus()
Impact: make nmi_shootdown_cpus() callable from preemptible context

We need to know on which CPU we are running on, and we don't want to be
preempted while doing this.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:49 +01:00
Eduardo Habkost bb8dd270e6 x86: make nmi_shootdown_cpus() available on !SMP and !X86_LOCAL_APIC
Impact: widen nmi_shootdown_cpus() availability

The X86_LOCAL_APIC #ifdef was for kdump. For !SMP, the function simply
does nothing.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:48 +01:00
Eduardo Habkost 2ddded2138 x86: move nmi_shootdown_cpus() to reboot.c
Impact: make nmi_shootdown_cpus() available to the rest of the x86 platform

Now nmi_shootdown_cpus() is ready to be used by non-kdump code also.
Move it to reboot.c.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:47 +01:00
Eduardo Habkost c370e5e089 x86 kdump: make nmi_shootdown_cpus() non-static
Impact: make API available to the rest of x86 platform code

Add prototype to asm/reboot.h.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:46 +01:00
Eduardo Habkost 8e29478631 x86 kdump: make kdump_nmi_callback() a function ptr on crash_nmi_callback()
Impact: extend nmi_shootdown_cpus() with a callback

The reboot code will use a different function on crash_nmi_callback().
Adding a function pointer parameter to nmi_shootdown_cpus() for that.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:46 +01:00
Eduardo Habkost d1e7b91cfa x86 kdump: create kdump_nmi_shootdown_cpus()
Impact: cleanup

For the kdump-specific code that was living on nmi_shootdown_cpus().

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:45 +01:00
Eduardo Habkost b2bbe71b82 x86 kdump: move crashing_cpu assignment to nmi_shootdown_cpus()
Impact: cleanup

This variable will be moved to non-kdump-specific code.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:44 +01:00
Eduardo Habkost a7d41820f6 x86 kdump: extract kdump-specific code from crash_nmi_callback()
Impact: cleanup

The NMI CPU-halting code will be used on non-kdump cases, also
(e.g. emergency_reboot when virtualization is enabled).

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 18:55:43 +01:00
Ingo Molnar eb42c75878 Merge branch 'linus' into x86/crashdump 2008-11-12 15:43:39 +01:00
Ingo Molnar 2b7d0390a6 tracing: branch tracer, fix vdso crash
Impact: fix bootup crash

the branch tracer missed arch/x86/vdso/vclock_gettime.c from
disabling tracing, which caused such bootup crashes:

  [  201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077

also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
instrumentation on a per file basis.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 13:26:38 +01:00
Ingo Molnar 708b8eae0f Merge branch 'linus' into core/locking 2008-11-12 12:39:21 +01:00
Hiroshi Shimamoto 9cc3c49ed1 x86: ia32_signal: remove unnecessary padding
Impact: reduce structure padding

Remove unnecessary paddings, this saves 4 bytes.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 12:28:02 +01:00
Hiroshi Shimamoto 4a61204856 x86: signal_32: introduce retcode and rt_retcode
Impact: cleanup

Introduce retcode and rt_retcode to replace setting up frame->retcode.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 12:28:01 +01:00
Steven Rostedt 1f0d69a9fc tracing: profile likely and unlikely annotations
Impact: new unlikely/likely profiler

Andrew Morton recently suggested having an in-kernel way to profile
likely and unlikely macros. This patch achieves that goal.

When configured, every(*) likely and unlikely macro gets a counter attached
to it. When the condition is hit, the hit and misses of that condition
are recorded. These numbers can later be retrieved by:

  /debugfs/tracing/profile_likely    - All likely markers
  /debugfs/tracing/profile_unlikely  - All unlikely markers.

# cat /debug/tracing/profile_unlikely | head
 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
    2167        0   0 do_arch_prctl                  process_64.c         832
       0        0   0 do_arch_prctl                  process_64.c         804
    2670        0   0 IS_ERR                         err.h                34
   71230     5693   7 __switch_to                    process_64.c         673
   76919        0   0 __switch_to                    process_64.c         639
   43184    33743  43 __switch_to                    process_64.c         624
   12740    64181  83 __switch_to                    process_64.c         594
   12740    64174  83 __switch_to                    process_64.c         590

# cat /debug/tracing/profile_unlikely | \
  awk '{ if ($3 > 25) print $0; }' |head -20
   44963    35259  43 __switch_to                    process_64.c         624
   12762    67454  84 __switch_to                    process_64.c         594
   12762    67447  84 __switch_to                    process_64.c         590
    1478      595  28 syscall_get_error              syscall.h            51
       0     2821 100 syscall_trace_leave            ptrace.c             1567
       0        1 100 native_smp_prepare_cpus        smpboot.c            1237
   86338   265881  75 calc_delta_fair                sched_fair.c         408
  210410   108540  34 calc_delta_mine                sched.c              1267
       0    54550 100 sched_info_queued              sched_stats.h        222
   51899    66435  56 pick_next_task_fair            sched_fair.c         1422
       6       10  62 yield_task_fair                sched_fair.c         982
    7325     2692  26 rt_policy                      sched.c              144
       0     1270 100 pre_schedule_rt                sched_rt.c           1261
    1268    48073  97 pick_next_task_rt              sched_rt.c           884
       0    45181 100 sched_info_dequeued            sched_stats.h        177
       0       15 100 sched_move_task                sched.c              8700
       0       15 100 sched_move_task                sched.c              8690
   53167    33217  38 schedule                       sched.c              4457
       0    80208 100 sched_info_switch              sched_stats.h        270
   30585    49631  61 context_switch                 sched.c              2619

# cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
   39900    36577  47 pick_next_task                 sched.c              4397
   20824    15233  42 switch_mm                      mmu_context_64.h     18
       0        7 100 __cancel_work_timer            workqueue.c          560
     617    66484  99 clocksource_adjust             timekeeping.c        456
       0   346340 100 audit_syscall_exit             auditsc.c            1570
      38   347350  99 audit_get_context              auditsc.c            732
       0   345244 100 audit_syscall_entry            auditsc.c            1541
      38     1017  96 audit_free                     auditsc.c            1446
       0     1090 100 audit_alloc                    auditsc.c            862
    2618     1090  29 audit_alloc                    auditsc.c            858
       0        6 100 move_masked_irq                migration.c          9
       1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
       2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
       0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
    4514     2090  31 __grab_cache_page              filemap.c            2149
   12882   228786  94 mapping_unevictable            pagemap.h            50
       4       11  73 __flush_cpu_slab               slub.c               1466
  627757   330451  34 slab_free                      slub.c               1731
    2959    61245  95 dentry_lru_del_init            dcache.c             153
     946     1217  56 load_elf_binary                binfmt_elf.c         904
     102       82  44 disk_put_part                  genhd.h              206
       1        1  50 dst_gc_task                    dst.c                82
       0       19 100 tcp_mss_split_point            tcp_output.c         1126

As you can see by the above, there's a bit of work to do in rethinking
the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
that were more than 25%.

Note:  After submitting my first version of this patch, Andrew Morton
  showed me a version written by Daniel Walker, where I picked up
  the following ideas from:

  1)  Using __builtin_constant_p to avoid profiling fixed values.
  2)  Using __FILE__ instead of instruction pointers.
  3)  Using the preprocessor to stop all profiling of likely
       annotations from vsyscall_64.c.

Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
for their feed back on this patch.

(*) Not ever unlikely is recorded, those that are used by vsyscalls
 (a few of them) had to have profiling disabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12 11:52:02 +01:00
Ingo Molnar 60a011c736 Merge branch 'tracing/function-return-tracer' into tracing/fastboot 2008-11-12 10:17:09 +01:00
Ingo Molnar d06bbd6695 Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/core
Conflicts:
	kernel/trace/ring_buffer.c
2008-11-12 10:11:37 +01:00
Len Brown 3e0fe36483 Merge branch 'misc' into release 2008-11-11 21:14:11 -05:00
Bjorn Helgaas 32836259ff ACPI: pci_link: remove acpi_irq_balance_set() interface
This removes the acpi_irq_balance_set() interface from the PCI
interrupt link driver.

x86 used acpi_irq_balance_set() to tell the PCI interrupt link
driver to configure links to minimize IRQ sharing.  But the link
driver can easily figure out whether to turn on IRQ balancing
based on the IRQ model (PIC/IOAPIC/etc), so we can get rid of
that external interface.

It's better for the driver to figure this out at init-time.  If
we set it externally via the x86 code, the interface reduces
modularity, and we depend on the fact that acpi_process_madt()
happens before we process the kernel command line.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-11-11 21:12:05 -05:00
H. Peter Anvin 14d7ca5c57 x86: attempt reboot via port CF9 if we have standard PCI ports
Impact: Changes reboot behavior.

If port CF9 seems to be safe to touch, attempt it before trying the
keyboard controller.  Port CF9 is not available on all chipsets (a
significant but decreasing number of modern chipsets don't implement
it), but port CF9 itself should in general be safe to poke (no ill
effects if unimplemented) on any system which has PCI Configuration
Method #1 or #2, as it falls inside the PCI configuration port range
in both cases.  No chipset without PCI is known to have port CF9,
either, although an explicit "pci=bios" would mean we miss this and
therefore don't use port CF9.  An explicit "reboot=pci" can be used to
force the use of port CF9.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-11 16:19:48 -08:00
H. Peter Anvin 939b787130 x86: 64 bits: shrink and align IRQ stubs
Move the IRQ stub generation to assembly to simplify it and for
consistency with 32 bits.  Doing it in a C file with asm() statements
doesn't help clarity, and it prevents some optimizations.

Shrink the IRQ stubs down to just over four bytes per (we fit seven
into a 32-byte chunk.)  This shrinks the total icache consumption of
the IRQ stubs down to an even kilobyte, if all of them are in active
use.

The downside is that we end up with a double jump, which could have a
negative effect on some pipelines.  The double jump is always inside
the same cacheline on any modern chips.

To get the most effect, cache-align the IRQ stubs.

This makes the 64-bit code match changes already done to the 32-bit
code, and should open up irqinit*.c for unification.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-11 13:51:52 -08:00
H. Peter Anvin b7c6244f13 x86: 32 bits: shrink and align IRQ stubs
Shrink the IRQ stubs on 32 bits down to just over four bytes per (we
fit seven into a 32-byte chunk.)  This shrinks the total icache
consumption of the IRQ stubs down to an even kilobyte, if all of them
are in active use.

The downside is that we end up with a double jump, which could have a
negative effect on some pipelines.  The double jump is always inside
the same cacheline on any modern chips (the exception being
486/Elan/Geode which have only 16-byte cachelines, but are unlikely to
have too many interrupt sources.)

To get the most effect, cache-align the IRQ stubs.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-11 13:24:58 -08:00
H. Peter Anvin 4687518c4c x86: 32 bit: interrupt stub consistency with 64 bit
Don't generate interrupt stubs for interrupt vectors below
FIRST_EXTERNAL_VECTOR, and make the table of interrupt vectors
(interrupt[]) __initconst.  Both of these changes both conserve memory
and improve consistency with 64 bits.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-11-11 13:03:07 -08:00
Avi Kivity e17d1dc086 KVM: Fix pit memory leak if unable to allocate irq source id
Reported-By: Daniel Marjamäki <danielm77@spray.se>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-11-11 21:01:51 +02:00
Sheng Yang 928d4bf747 KVM: VMX: Set IGMT bit in EPT entry
There is a potential issue that, when guest using pagetable without vmexit when
EPT enabled, guest would use PAT/PCD/PWT bits to index PAT msr for it's memory,
which would be inconsistent with host side and would cause host MCE due to
inconsistent cache attribute.

The patch set IGMT bit in EPT entry to ignore guest PAT and use WB as default
memory type to protect host (notice that all memory mapped by KVM should be WB).

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-11 21:00:37 +02:00
Avi Kivity ca93e992fd KVM: Require the PCI subsystem
PCI device assignment makes calls to pci code, so require it to be built
into the kernel.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-11-11 20:56:13 +02:00
Rakib Mullick a29a2af378 x86: KVM guest: fix section mismatch warning in kvmclock.c
WARNING: arch/x86/kernel/built-in.o(.text+0x1722c): Section mismatch
in reference from the function kvm_setup_secondary_clock() to the
function .devinit.text:setup_secondary_APIC_clock()
The function kvm_setup_secondary_clock() references
the function __devinit setup_secondary_APIC_clock().
This is often because kvm_setup_secondary_clock lacks a __devinit
annotation or the annotation of setup_secondary_APIC_clock is wrong.

Signed-off-by: Md.Rakib H. Mullick <rakib.mullick@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-11 20:55:10 +02:00
Linus Torvalds f21f237cf5 Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context
  nohz: disable tick_nohz_kick_tick() for now
  irq: call __irq_enter() before calling the tick_idle_check
  x86: HPET: enter hpet_interrupt_handler with interrupts disabled
  x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP
  x86: HPET: convert WARN_ON to WARN_ON_ONCE
2008-11-11 10:53:50 -08:00
Marcelo Tosatti c41ef344de KVM: MMU: increase per-vcpu rmap cache alloc size
The page fault path can use two rmap_desc structures, if:

- walk_addr's dirty pte update allocates one rmap_desc.
- mmu_lock is dropped, sptes are zapped resulting in rmap_desc being
freed.
- fetch->mmu_set_spte allocates another rmap_desc.

Increase to 4 for safety.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-11 20:53:34 +02:00
Ingo Molnar c280ea5e4c x86: fix documentation typo in arch/x86/Kconfig
Impact: documentation update

Chris Snook pointed out that it's Core i7, not Core 7i.

Reported-by: Chris Snook <csnook@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 19:30:29 +01:00
Thomas Gleixner a98f8fd24f x86: apic reset counter on shutdown
Impact: avoid spurious lapic timer events on shutdown

The apic timer might be close to firing when it is shutdown. We can
not really disable the timer - we just mask the interrupt. That way we
can get an extra interrupt when it is reenabled. Set the counter to
max on shutdown to avoid this.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 14:56:55 +01:00
Ivan Vecera d3ec5cae09 x86: call machine_shutdown and stop all CPUs in native_machine_halt
Impact: really halt all CPUs on halt

Function machine_halt (resp. native_machine_halt) is empty for x86
architectures. When command 'halt -f' is invoked, the message "System
halted." is displayed but this is not really true because all CPUs are
still running.

There are also similar inconsistencies for other arches (some uses
power-off for halt or forever-loop with IRQs enabled/disabled).

IMO there should be used the same approach for all architectures OR
what does the message "System halted" really mean?

This patch fixes it for x86.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 14:50:02 +01:00
James Bottomley 6cd10f8db3 x86, voyager: fix smp generic helper voyager breakage
Impact: build/boot fix for x86/Voyager

This change:

| commit 3d44223327
| Author: Jens Axboe <jens.axboe@oracle.com>
| Date:   Thu Jun 26 11:21:34 2008 +0200
|
|     Add generic helpers for arch IPI function calls

didn't wire up the voyager smp call function correctly, so do that
here.  Also make CONFIG_USE_GENERIC_SMP_HELPERS a def_bool y again,
since we now use the generic helpers for every x86 architecture.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jens Axboe <Jens.Axboe@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 12:08:53 +01:00
Ingo Molnar 19b3e9671c tracing: function return tracer, build fix
fix:

 arch/x86/kernel/ftrace.c: In function 'ftrace_return_to_handler':
 arch/x86/kernel/ftrace.c:112: error: implicit declaration of function 'cpu_clock'

cpu_clock() is implicitly included via a number of ways, but its real
location is sched.h. (Build failure is triggerable if enough other
kernel components are turned off.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 12:03:27 +01:00
Cliff Wickman a3d732f937 x86, UV: fix redundant creation of sgi_uv
Impact: fix double entry creation in /proc

There is a collision between two UV functions:
  both uv_ptc_init() and gru_proc_init() try to make /proc/sgi_uv

So move it's creation to a single place: uv_system_init()

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 11:38:50 +01:00
Ingo Molnar 867f7fb3eb tracing, x86: function return tracer, fix assembly constraints
fix:

 arch/x86/kernel/ftrace.c: Assembler messages:
 arch/x86/kernel/ftrace.c:140: Error: missing ')'
 arch/x86/kernel/ftrace.c:140: Error: junk `(%ebp))' after expression
 arch/x86/kernel/ftrace.c:141: Error: missing ')'
 arch/x86/kernel/ftrace.c:141: Error: junk `(%ebp))' after expression

the [parent_replaced] is used in an =rm fashion, so that constraint
is correct in isolation - but [parent_old] aliases register %0 and uses
it in an addressing mode that is only valid with registers - so change
the constraint from =rm to =r.

This fixes the build failure.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11 11:12:18 +01:00