Plug a race in TSC synchronization
We need to do tsc_sync_wait() before the CPU is set online to prevent
multiple CPUs from doing it in parallel - which won't work because TSC
sync has global unprotected state.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oops. I knew I didn't have the physical versus logical cpu identifiers right
when I generated that patch. It's not nearly as bad as I feared at the time
though.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sync_tsc was using smp_call_function to ask the boot processor to report
it's tsc value. smp_call_function performs an IPI_send_allbutself which is
a broadcast ipi. There is a window during processor startup during which
the target cpu has started and before it has initialized it's interrupt
vectors so it can properly process an interrupt. Receveing an interrupt
during that window will triple fault the cpu and do other nasty things.
Why cli does not protect us from that is beyond me.
The simple fix is to match ia64 and provide a smp_call_function_single.
Which avoids the broadcast and is more efficient.
This certainly fixes the problem of getting stuck on boot which was
very easy to trigger on my SMP Hyperthreaded Xeon, and I think
it fixes it for the right reasons.
Minor changes by AK
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
No need to print kernel addresses there and clarify what the APIC-ID is.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Various code needs this information now before the actual SMP bootup. Instead
of computing it on the fly while booting the other CPUs set it up now while
initial MPtable/MADT parsing.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
turn many #if $undefined_string into #ifdef $undefined_string to fix some
warnings after -Wno-def was added to global CFLAGS
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fixes boot up lockups on some machines where CPU apic ids don't start with
0
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Broadcast IPI's provide un-expected behaviour for cpu hotplug. CPU's in
offline state also end up receiving the IPI. Once the cpus become online they
receive these stale IPI's which are bad and introduce unexpected behaviour.
This is easily avoided by not sending a broadcast and addressing just the
CPU's in online map. Doing prelim cycle counts it appears there is no big
overhead and numbers seem around 0x3000-0x3900 on an average on x86 and x86_64
systems with CPUS running 3G, both for broadcast and mask version of the
API's.
The shortcuts are useful only for flat mode (where the perf shows no
degradation), and in cluster mode, its unicast anyway. Its simpler to just
not use broadcast anymore.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is a minor cleanup to the cpu sibling/core map. It is required
that this setup happens on a per-cpu bringup time.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Experimental CPU hotplug patch for x86_64
-----------------------------------------
This supports logical CPU online and offline.
- Test with maxcpus=1, and then kick other cpu's off to test if init code
is all cleaned up. CONFIG_SCHED_SMT works as well.
- idle threads are forked on demand from keventd threads for clean startup
TBD:
1. Not tested on a real NUMA machine (tested with numa=fake=2)
2. Handle ACPI pieces for physical hotplug support.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Shaohua.li<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds __cpuinit and __cpuinitdata sections that need to exist past
boot to support cpu hotplug.
Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
Andi Kleen. Much of the generic hotplug code in kernel, and none of the other
archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont
mark sections with __cpuinit, but only mark them as __devinit, and
__devinitdata.
If someone is motivated to change generic code, we need to make sure all
existing hotplug code does not break, on other arch's that dont use __cpuinit,
and __cpudevinit.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the assumption that LAPIC entries contain the BSP as its
first entry. This is a slight improvement to the temporary fix submitted by
Suresh Siddha.
- Removes assumption that LAPIC entries contain BSP first.
- Builds x86_acpiid_to_apicid[] and bios_cpu_apicid[] properly with BSP as
first entry.
- Made maxcpus=1 boot on these systems. Since the parsing earlier in
arch/x86_64/kernel/mpparse.c stopped after maxcpus entries, other entries
were not processed, this causes kernel not to boot on these systems.
TBD: x86_acpiid_to_apicid and bios_cpu_apicid[] seem to be exactly the
same. This could be removed, but might need more work to cleanup.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Collected NMI watchdog fixes.
- Fix call of check_nmi_watchdog
- Remove earlier move of check_nmi_watchdog to later. It does not fix the
race it was supposed to fix fully.
- Remove unused P6 definitions
- Add support for performance counter based watchdog on P4 systems.
This allows to run it only once per second, which saves some CPU time.
Previously it would run at 1000Hz, which was too much.
Code ported from i386
Make this the default on Intel systems.
- Use check_nmi_watchdog with local APIC based nmi
- Fix race in touch_nmi_watchdog
- Fix bug that caused incorrect performance counters to be programmed in a
few cases on K8.
- Remove useless check for local APIC
- Use local_t and per_cpu variables for per CPU data.
- Keep other CPUs busy during check_nmi_watchdog to make sure they really
tick when in lapic mode.
- Only check CPUs that are actually online.
- Various other fixes.
- Fix fallback path when MSRs are unimplemented
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The new TSC sync algorithm recently submitted did not work too well.
The result was that some MP machines where the TSC came up of the BIOS very
unsynchronized and that did not have HPET support were nearly unusable because
the time would jump forwards and backwards between CPUs.
After a lot of research ;-) and some more prototypes I ended up with just
using the one from IA64 which looks best. It has some internal self tuning
that should adapt to changing interconnect latencies. It holds up in my tests
so far.
I believe it was originally written by David Mosberger, I just ported it over
to x86-64. See the inline comment for a description.
This cleans up the code because it uses smp_call_function for syncing instead
of having custom hooks in SMP bootup.
Please note that the cycle numbers it outputs are too optimistic because they
do not take into account the latency of WRMSR and RDTSC, which can be hundreds
of cycles. It seems to be able to sync a dual Opteron to 200-300 cycles,
which is probably good enough.
There is a timing window during AP bootup where interrupts can see
inconsistent time before the TSC is synced. It is hard to avoid unfortunately
because we can only do the TSC sync after some setup, and we need to enable
interrupts before that. I just ignored it for now.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- broken sibling_map setup in x86_64
- grouping all the core and HT related cpuinfo fields.
We are reasonably sure that adding new cpuinfo fields after "siblings" field,
will not cause any app failure. Thats because today's /proc/cpuinfo
format is completely different on x86, x86_64 and we haven't heard of any
x86 app breakage because of this issue. Grouping these fields will
result in more or less common format on all architectures (ia64, x86 and
x86_64) and will cause less confusion.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This will allow hotplug CPU in the future and in general cleans up a lot of
crufty code. It also should plug some races that the old hackish way
introduces. Remove one old race workaround in NMI watchdog setup that is not
needed anymore.
I removed the old total sum of bogomips reporting code. The brag value of
BogoMips has been greatly devalued in the last years on the open market.
Real CPU hotplug will need some more work, but the infrastructure for it is
there now.
One drawback: the new TSC sync algorithm is less accurate than before. The
old way of zeroing TSCs is too intrusive to do later. Instead the TSC of the
BP is duplicated now, which is less accurate.
akpm:
- sync_tsc_bp_init seems to have the sense of `init' inverted.
- SPIN_LOCK_UNLOCKED is deprecated - use DEFINE_SPINLOCK.
Cc: <rusty@rustcorp.com.au>
Cc: <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Appended patch adds the support for Intel dual-core detection and displaying
the core related information in /proc/cpuinfo.
It adds two new fields "core id" and "cpu cores" to x86 /proc/cpuinfo and the
"core id" field for x86_64("cpu cores" field is already present in x86_64).
Number of processor cores in a die is detected using cpuid(4) and this is
documented in IA-32 Intel Architecture Software Developer's Manual (vol 2a)
(http://developer.intel.com/design/pentium4/manuals/index_new.htm#sdm_vol2a)
This patch also adds cpu_core_map similar to cpu_sibling_map.
Slightly hacked by AK.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Port over a i386 kludge from rusty to x86-64
I don't think it is a full solution, but the upcomming smp bootup rewrite
will solve it.
This fixes BUGs at bootup on bigger x86-64 systems.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!