WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Huang Weiyi	85701e6ac1	powerpc: Remove duplicated #include's Remove duplicated #include's in - arch/powerpc/include/asm/ps3fb.h - arch/powerpc/kernel/setup-common.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-04-07 15:18:58 +10:00
Paul Mackerras	d5d2bc0dd0	perf_counter: make it possible for hw_perf_counter_init to return error codes Impact: better error reporting At present, if hw_perf_counter_init encounters an error, all it can do is return NULL, which causes sys_perf_counter_open to return an EINVAL error to userspace. This isn't very informative for userspace; it means that userspace can't tell the difference between "sorry, oprofile is already using the PMU" and "we don't support this CPU" and "this CPU doesn't support the requested generic hardware event". This commit uses the PTR_ERR/ERR_PTR/IS_ERR set of macros to let hw_perf_counter_init return an error code on error rather than just NULL if it wishes. If it does so, that error code will be returned from sys_perf_counter_open to userspace. If it returns NULL, an EINVAL error will be returned to userspace, as before. This also adapts the powerpc hw_perf_counter_init to make use of this to return ENXIO, EINVAL, EBUSY, or EOPNOTSUPP as appropriate. It would be good to add extra error numbers in future to allow userspace to distinguish the various errors that are currently reported as EINVAL, i.e. irq_period < 0, too many events in a group, conflict between exclude_* settings in a group, and PMU resource conflict in a group. [ v2: fix a bug pointed out by Corey Ashford where error returns from hw_perf_counter_init were not handled correctly in the case of raw hardware events.] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Orig-LKML-Reference: <20090330171023.682428180@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:40 +02:00
Paul Mackerras	7595d63b3a	perf_counter: powerpc: only reserve PMU hardware when we need it Impact: cooperate with oprofile At present, on PowerPC, if you have perf_counters compiled in, oprofile doesn't work. There is code to allow the PMU to be shared between competing subsystems, such as perf_counters and oprofile, but currently the perf_counter subsystem reserves the PMU for itself at boot time, and never releases it. This makes perf_counter play nicely with oprofile. Now we keep a count of how many perf_counter instances are counting hardware events, and reserve the PMU when that count becomes non-zero, and release the PMU when that count becomes zero. This means that it is possible to have perf_counters compiled in and still use oprofile, as long as there are no hardware perf_counters active. This also means that if oprofile is active, sys_perf_counter_open will fail if the hw_event specifies a hardware event. To avoid races with other tasks creating and destroying perf_counters, we use a mutex. We use atomic_inc_not_zero and atomic_add_unless to avoid having to take the mutex unless there is a possibility of the count going between 0 and 1. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Orig-LKML-Reference: <20090330171023.627912475@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:39 +02:00
Peter Zijlstra	925d519ab8	perf_counter: unify and fix delayed counter wakeup While going over the wakeup code I noticed delayed wakeups only work for hardware counters but basically all software counters rely on them. This patch unifies and generalizes the delayed wakeup to fix this issue. Since we're dealing with NMI context bits here, use a cmpxchg() based single link list implementation to track counters that have pending wakeups. [ This should really be generic code for delayed wakeups, but since we cannot use cmpxchg()/xchg() in generic code, I've let it live in the perf_counter code. -- Eric Dumazet could use it to aggregate the network wakeups. ] Furthermore, the x86 method of using TIF flags was flawed in that its quite possible to end up setting the bit on the idle task, loosing the wakeup. The powerpc method uses per-cpu storage and does appear to be sufficient. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Orig-LKML-Reference: <20090330171023.153932974@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:36 +02:00
Paul Mackerras	53cfbf5937	perf_counter: record time running and time enabled for each counter Impact: new functionality Currently, if there are more counters enabled than can fit on the CPU, the kernel will multiplex the counters on to the hardware using round-robin scheduling. That isn't too bad for sampling counters, but for counting counters it means that the value read from a counter represents some unknown fraction of the true count of events that occurred while the counter was enabled. This remedies the situation by keeping track of how long each counter is enabled for, and how long it is actually on the cpu and counting events. These times are recorded in nanoseconds using the task clock for per-task counters and the cpu clock for per-cpu counters. These values can be supplied to userspace on a read from the counter. Userspace requests that they be supplied after the counter value by setting the PERF_FORMAT_TOTAL_TIME_ENABLED and/or PERF_FORMAT_TOTAL_TIME_RUNNING bits in the hw_event.read_format field when creating the counter. (There is no way to change the read format after the counter is created, though it would be possible to add some way to do that.) Using this information it is possible for userspace to scale the count it reads from the counter to get an estimate of the true count: true_count_estimate = count * total_time_enabled / total_time_running This also lets userspace detect the situation where the counter never got to go on the cpu: total_time_running == 0. This functionality has been requested by the PAPI developers, and will be generally needed for interpreting the count values from counting counters correctly. In the implementation, this keeps 5 time values (in nanoseconds) for each counter: total_time_enabled and total_time_running are used when the counter is in state OFF or ERROR and for reporting back to userspace. When the counter is in state INACTIVE or ACTIVE, it is the tstamp_enabled, tstamp_running and tstamp_stopped values that are relevant, and total_time_enabled and total_time_running are determined from them. (tstamp_stopped is only used in INACTIVE state.) The reason for doing it like this is that it means that only counters being enabled or disabled at sched-in and sched-out time need to be updated. There are no new loops that iterate over all counters to update total_time_enabled or total_time_running. This also keeps separate child_total_time_running and child_total_time_enabled fields that get added in when reporting the totals to userspace. They are separate fields so that they can be atomic. We don't want to use atomics for total_time_running, total_time_enabled etc., because then we would have to use atomic sequences to update them, which are slower than regular arithmetic and memory accesses. It is possible to measure total_time_running by adding a task_clock counter to each group of counters, and total_time_enabled can be measured approximately with a top-level task_clock counter (though inaccuracies will creep in if you need to disable and enable groups since it is not possible in general to disable/enable the top-level task_clock counter simultaneously with another group). However, that adds extra overhead - I measured around 15% increase in the context switch latency reported by lat_ctx (from lmbench) when a task_clock counter was added to each of 2 groups, and around 25% increase when a task_clock counter was added to each of 4 groups. (In both cases a top-level task-clock counter was also added.) In contrast, the code added in this commit gives better information with no overhead that I could measure (in fact in some cases I measured lower times with this code, but the differences were all less than one standard deviation). [ v2: address review comments by Andrew Morton. ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Orig-LKML-Reference: <18890.6578.728637.139402@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:36 +02:00
Peter Zijlstra	7b732a7504	perf_counter: new output ABI - part 1 Impact: Rework the perfcounter output ABI use sys_read() only for instant data and provide mmap() output for all async overflow data. The first mmap() determines the size of the output buffer. The mmap() size must be a PAGE_SIZE multiple of 1+pages, where pages must be a power of 2 or 0. Further mmap()s of the same fd must have the same size. Once all maps are gone, you can again mmap() with a new size. In case of 0 extra pages there is no data output and the first page only contains meta data. When there are data pages, a poll() event will be generated for each full page of data. Furthermore, the output is circular. This means that although 1 page is a valid configuration, its useless, since we'll start overwriting it the instant we report a full page. Future work will focus on the output format (currently maintained) where we'll likey want each entry denoted by a header which includes a type and length. Further future work will allow to splice() the fd, also containing the async overflow data -- splice() would be mutually exclusive with mmap() of the data. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Orig-LKML-Reference: <20090323172417.470536358@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:27 +02:00
Paul Mackerras	37d8182838	perf_counter: add an mmap method to allow userspace to read hardware counters Impact: new feature giving performance improvement This adds the ability for userspace to do an mmap on a hardware counter fd and get access to a read-only page that contains the information needed to translate a hardware counter value to the full 64-bit counter value that would be returned by a read on the fd. This is useful on architectures that allow user programs to read the hardware counters, such as PowerPC. The mmap will only succeed if the counter is a hardware counter monitoring the current process. On my quad 2.5GHz PowerPC 970MP machine, userspace can read a counter and translate it to the full 64-bit value in about 30ns using the mmapped page, compared to about 830ns for the read syscall on the counter, so this does give a significant performance improvement. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Orig-LKML-Reference: <20090323172417.297057964@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:26 +02:00
Peter Zijlstra	f4a2deb486	perf_counter: remove the event config bitfields Since the bitfields turned into a bit of a mess, remove them and rely on good old masks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Orig-LKML-Reference: <20090323172417.059499915@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:25 +02:00
Paul Mackerras	9aaa131a27	perf_counter: fix type/event_id layout on big-endian systems Impact: build fix for powerpc Commit db3a944aca35ae61 ("perf_counter: revamp syscall input ABI") expanded the hw_event.type field into a union of structs containing bitfields. In particular it introduced a type field and a raw_type field, with the intention that the 1-bit raw_type field should overlay the most-significant bit of the 8-bit type field, and in fact perf_counter_alloc() now assumes that (or at least, assumes that raw_type doesn't overlay any of the bits that are 1 in the values of PERF_TYPE_{HARDWARE,SOFTWARE,TRACEPOINT}). Unfortunately this is not true on big-endian systems such as PowerPC, where bitfields are laid out from left to right, i.e. from most significant bit to least significant. This means that setting hw_event.type = PERF_TYPE_SOFTWARE will set hw_event.raw_type to 1. This fixes it by making the layout depend on whether or not __BIG_ENDIAN_BITFIELD is defined. It's a bit ugly, but that's what we get for using bitfields in a user/kernel ABI. Also, that commit didn't fix up some places in arch/powerpc/kernel/ perf_counter.c where hw_event.raw and hw_event.event_id were used. This fixes them too. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-04-06 09:30:18 +02:00
Paul Mackerras	db4fb5acf2	perf_counter: powerpc: clean up perc_counter_interrupt Impact: cleanup This updates the powerpc perf_counter_interrupt following on from the "perf_counter: unify irq output code" patch. Since we now use the generic perf_counter_output code, which sets the perf_counter_pending flag directly, we no longer need the need_wakeup variable. This removes need_wakeup and makes perf_counter_interrupt use get_perf_counter_pending() instead. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Orig-LKML-Reference: <20090319194234.024464535@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:18 +02:00
Peter Zijlstra	0322cd6ec5	perf_counter: unify irq output code Impact: cleanup Having 3 slightly different copies of the same code around does nobody any good. First step in revamping the output format. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Orig-LKML-Reference: <20090319194233.929962222@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:17 +02:00
Peter Zijlstra	b8e83514b6	perf_counter: revamp syscall input ABI Impact: modify ABI The hardware/software classification in hw_event->type became a little strained due to the addition of tracepoint tracing. Instead split up the field and provide a type field to explicitly specify the counter type, while using the event_id field to specify which event to use. Raw counters still work as before, only the raw config now goes into raw_event. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Orig-LKML-Reference: <20090319194233.836807573@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:30:17 +02:00
Paul Mackerras	b6c5a71da1	perf_counter: abstract wakeup flag setting in core to fix powerpc build Impact: build fix for powerpc Commit bd753921015e7905 ("perf_counter: software counter event infrastructure") introduced a use of TIF_PERF_COUNTERS into the core perfcounter code. This breaks the build on powerpc because we use a flag in a per-cpu area to signal wakeups on powerpc rather than a thread_info flag, because the thread_info flags have to be manipulated with atomic operations and are thus slower than per-cpu flags. This fixes the by changing the core to use an abstracted set_perf_counter_pending() function, which is defined on x86 to set the TIF_PERF_COUNTERS flag and on powerpc to set the per-cpu flag (paca->perf_counter_pending). It changes the previous powerpc definition of set_perf_counter_pending to not take an argument and adds a clear_perf_counter_pending, so as to simplify the definition on x86. On x86, set_perf_counter_pending() is defined as a macro. Defining it as a static inline in arch/x86/include/asm/perf_counters.h causes compile failures because <asm/perf_counters.h> gets included early in <linux/sched.h>, and the definitions of set_tsk_thread_flag etc. are therefore not available in <asm/perf_counters.h>. (On powerpc this problem is avoided by defining set_perf_counter_pending etc. in <asm/hw_irq.h>.) Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-04-06 09:30:14 +02:00
Ingo Molnar	f541ae326f	Merge branch 'linus' into perfcounters/core-v2 Merge reason: we have gathered quite a few conflicts, need to merge upstream Conflicts: arch/powerpc/kernel/Makefile arch/x86/ia32/ia32entry.S arch/x86/include/asm/hardirq.h arch/x86/include/asm/unistd_32.h arch/x86/include/asm/unistd_64.h arch/x86/kernel/cpu/common.c arch/x86/kernel/irq.c arch/x86/kernel/syscall_table_32.S arch/x86/mm/iomap_32.c include/linux/sched.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:02:57 +02:00
Linus Torvalds	714f83d5d9	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits) tracing, net: fix net tree and tracing tree merge interaction tracing, powerpc: fix powerpc tree and tracing tree interaction ring-buffer: do not remove reader page from list on ring buffer free function-graph: allow unregistering twice trace: make argument 'mem' of trace_seq_putmem() const tracing: add missing 'extern' keywords to trace_output.h tracing: provide trace_seq_reserve() blktrace: print out BLK_TN_MESSAGE properly blktrace: extract duplidate code blktrace: fix memory leak when freeing struct blk_io_trace blktrace: fix blk_probes_ref chaos blktrace: make classic output more classic blktrace: fix off-by-one bug blktrace: fix the original blktrace blktrace: fix a race when creating blk_tree_root in debugfs blktrace: fix timestamp in binary output tracing, Text Edit Lock: cleanup tracing: filter fix for TRACE_EVENT_FORMAT events ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release() x86: kretprobe-booster interrupt emulation code fix ... Fix up trivial conflicts in arch/parisc/include/asm/ftrace.h include/linux/memory.h kernel/extable.c kernel/module.c	2009-04-05 11:04:19 -07:00
Linus Torvalds	bad6a5c08c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/kyle/rtc-parisc * git://git.kernel.org/pub/scm/linux/kernel/git/kyle/rtc-parisc: powerpc/ps3: Add rtc-ps3 powerpc: Hook up rtc-generic, and kill rtc-ppc m68k: Hook up rtc-generic parisc: rtc: Rename rtc-parisc to rtc-generic parisc: rtc: Add missing module alias parisc: rtc: platform_driver_probe() fixups parisc: rtc: get_rtc_time() returns unsigned int	2009-04-03 09:51:35 -07:00
Alexey Dobriyan	6f2c55b843	Simplify copy_thread() First argument unused since 2.3.11. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:51 -07:00
Jean Delvare	bf6aede712	workqueue: add to_delayed_work() helper function It is a fairly common operation to have a pointer to a work and to need a pointer to the delayed work it is contained in. In particular, all delayed works which want to rearm themselves will have to do that. So it would seem fair to offer a helper function for this operation. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-02 19:04:50 -07:00
Geert Uytterhoeven	bcd68a70cb	powerpc: Hook up rtc-generic, and kill rtc-ppc PowerPC has been a long time user of the generic RTC abstraction, so hook up rtc-generic: - Create the "rtc-generic" platform device if ppc_md.get_rtc_time is set, - Kill rtc-ppc, as rtc-generic offers the same functionality in a more generic way, and supports autoloading through udev. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>	2009-04-02 01:05:31 +00:00
Stephen Rothwell	a095bdbb13	tracing, powerpc: fix powerpc tree and tracing tree interaction Today's linux-next build (powerpc allyesconfig) failed like this: arch/powerpc/kernel/ftrace.c: In function 'prepare_ftrace_return': arch/powerpc/kernel/ftrace.c:612: warning: passing argument 3 of 'ftrace_push_return_trace' makes pointer from integer without a cast arch/powerpc/kernel/ftrace.c:612: error: too many arguments to function 'ftrace_push_return_trace' Caused by commit `5d1a03dc54` ("function-graph: moved the timestamp from arch to generic code") from the tracing tree which (removed an argument from ftrace_push_return_trace()) interacting with commit `6794c78243` ("powerpc64: port of the function graph tracer") from the powerpc tree. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: <linuxppc-dev@ozlabs.org> LKML-Reference: <20090327230834.93d0221d.sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-02 00:50:24 +02:00
Linus Torvalds	e76e5b2c66	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits) PCI: fix HT MSI mapping fix PCI: don't enable too much HT MSI mapping x86/PCI: make pci=lastbus=255 work when acpi is on PCI: save and restore PCIe 2.0 registers PCI: update fakephp for bus_id removal PCI: fix kernel oops on bridge removal PCI: fix conflict between SR-IOV and config space sizing powerpc/PCI: include pci.h in powerpc MSI implementation PCI Hotplug: schedule fakephp for feature removal PCI Hotplug: rename legacy_fakephp to fakephp PCI Hotplug: restore fakephp interface with complete reimplementation PCI: Introduce /sys/bus/pci/devices/.../rescan PCI: Introduce /sys/bus/pci/devices/.../remove PCI: Introduce /sys/bus/pci/rescan PCI: Introduce pci_rescan_bus() PCI: do not enable bridges more than once PCI: do not initialize bridges more than once PCI: always scan child buses PCI: pci_scan_slot() returns newly found devices PCI: don't scan existing devices ... Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt	2009-04-01 09:47:12 -07:00
Alexey Dobriyan	99b7623380	proc 2/2: remove struct proc_dir_entry::owner Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy as correctly noted at bug #12454. Someone can lookup entry with NULL ->owner, thus not pinning enything, and release it later resulting in module refcount underflow. We can keep ->owner and supply it at registration time like ->proc_fops and ->data. But this leaves ->owner as easy-manipulative field (just one C assignment) and somebody will forget to unpin previous/pin current module when switching ->owner. ->proc_fops is declared as "const" which should give some thoughts. ->read_proc/->write_proc were just fixed to not require ->owner for protection. rmmod'ed directories will be empty and return "." and ".." -- no harm. And directories with tricky enough readdir and lookup shouldn't be modular. We definitely don't want such modular code. Removing ->owner will also make PDE smaller. So, let's nuke it. Kudos to Jeff Layton for reminding about this, let's say, oversight. http://bugzilla.kernel.org/show_bug.cgi?id=12454 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>	2009-03-31 01:14:44 +04:00
Benjamin Herrenschmidt	9ff9a26b78	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/include/asm/elf.h drivers/i2c/busses/i2c-mpc.c	2009-03-30 14:04:53 +11:00
Ingo Molnar	6e15cf0486	Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2 Conflicts: arch/parisc/kernel/irq.c arch/x86/include/asm/fixmap_64.h arch/x86/include/asm/setup.h kernel/irq/handle.c Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-27 17:28:43 +01:00
Benjamin Herrenschmidt	ec78c8ac16	powerpc: Fix bugs introduced by sysfs changes Rusty's patch to change our sysfs access to various registers to use smp_call_function_single() introduced a whole bunch of warnings. This fixes them. This version also fixes an actual bug in here where it did mtspr instead of mfspr when reading the files Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-27 16:58:24 +11:00
Josh Boyer	efbda86098	powerpc: Sanitize stack pointer in signal handling code On powerpc64 machines running 32-bit userspace, we can get garbage bits in the stack pointer passed into the kernel. Most places handle this correctly, but the signal handling code uses the passed value directly for allocating signal stack frames. This fixes the issue by introducing a get_clean_sp function that returns a sanitized stack pointer. For 32-bit tasks on a 64-bit kernel, the stack pointer is masked correctly. In all other cases, the stack pointer is simply returned. Additionally, we pass an 'is_32' parameter to get_sigframe now in order to get the properly sanitized stack. The callers are know to be 32 or 64-bit statically. Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-27 16:58:24 +11:00
Linus Torvalds	a8416961d3	Merge branch 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits) x86: disable __do_IRQ support sparseirq, powerpc/cell: fix unused variable warning in interrupt.c genirq: deprecate obsolete typedefs and defines genirq: deprecate __do_IRQ genirq: add doc to struct irqaction genirq: use kzalloc instead of explicit zero initialization genirq: make irqreturn_t an enum genirq: remove redundant if condition genirq: remove unused hw_irq_controller typedef irq: export remove_irq() and setup_irq() symbols irq: match remove_irq() args with setup_irq() irq: add remove_irq() for freeing of setup_irq() irqs genirq: assert that irq handlers are indeed running in hardirq context irq: name 'p' variables a bit better irq: further clean up the free_irq() code flow irq: refactor and clean up the free_irq() code flow irq: clean up manage.c irq: use GFP_KERNEL for action allocation in request_irq() kernel/irq: fix sparse warning: make symbol static irq: optimize init_kstat_irqs/init_copy_kstat_irqs ...	2009-03-26 16:06:50 -07:00
Jesse Barnes	ceb93a9ff1	powerpc/PCI: include pci.h in powerpc MSI implementation This file uses PCI MSI defines and so needs pci.h. Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-25 08:54:29 -07:00
Hollis Blanchard	366d4b9b9f	KVM: ppc: No need to include core-header for KVM in asm-offsets.c currently Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-03-24 11:02:58 +02:00
Benjamin Herrenschmidt	757c74d298	powerpc/mm: Introduce early_init_mmu() on 64-bit This moves some MMU related init code out of setup_64.c into hash_utils_64.c and calls it early_init_mmu() and early_init_mmu_secondary(). This will make it easier to plug in a new MMU type. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:34 +11:00
Kumar Gala	2319f12395	powerpc/mm: e300c2/c3/c4 TLB errata workaround Complete workaround for DTLB errata in e300c2/c3/c4 processors. Due to the bug, the hardware-implemented LRU algorythm always goes to way 1 of the TLB. This fix implements the proposed software workaround in form of a LRW table for chosing the TLB-way. Based on patch from David Jander <david@protonic.nl> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:32 +11:00
Kumar Gala	eb3436a013	powerpc/mm: Used free register to save a few cycles in SW TLB miss handling Now that r0 is free we can keep the value of I/DMISS in r3 and not reload it before doing the tlbli/d. This saves us a few cycles in the fast path case. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:31 +11:00
Kumar Gala	00fcb14703	powerpc/mm: Remove unused register usage in SW TLB miss handling Long ago we had some code that actually used the CTR in the SW TLB miss handlers (603/e300). Since we don't use it no reason to waste cycles saving it off and restoring it (we actually didn't restore it in the fast path case). Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:31 +11:00
Kumar Gala	d746286c1f	powerpc: setup default archdata for {of_}platform via bus_register_notifier Since a number of powerpc chips are SoCs we end up having dma-able devices that are registered as platform or of_platform devices. We need to hook the archdata to setup proper dma_ops for these devices. Rather than having to add a bus_notify to each platform we add a default one at the highest priority (called first) to set the default dma_ops for of_platform and platform devices to dma_direct_ops. This allows platform code to override the ops by providing their own notifier call back. In the future to enable >4G DMA support on ppc32 we can hook swiotlb ops. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:30 +11:00
Kumar Gala	32ac57668d	powerpc/pci: Default to dma_direct_ops for pci dma_ops This will allow us to remove the ppc32 specific checks in get_dma_ops() that defaults to dma_direct_ops if the archdata is NULL. We really should always have archdata set to something going forward. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:30 +11:00
Rusty Russell	9a3719341a	powerpc: Make sysfs code use smp_call_function_single Impact: performance improvement This fixes 'powerpc: avoid cpumask games in arch/powerpc/kernel/sysfs.c' which talked about using smp_call_function_single, but actually used work_on_cpu (an older version of the patch). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:27 +11:00
Benjamin Herrenschmidt	151a9f4aef	powerpc: Fix prom_init on 32-bit OF machines Commit `e7943fbbfd` broke ppc32 using Open Firmware client interface due to using the wrong relocation macro when accessing the variable "linux_banner". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:43:35 +11:00
Benjamin Herrenschmidt	9e41d9597e	Merge commit 'origin/master' into next	2009-03-24 13:38:30 +11:00
Kumar Gala	345953cf9a	powerpc/mm: Fix Respect _PAGE_COHERENT on classic ppc32 SW TLB load machines Grant picked up the wrong version of "Respect _PAGE_COHERENT on classic ppc32 SW" (commit `a4bd6a93c3`) It was missing the code to actually deal with the fixup of _PAGE_COHERENT based on the CPU feature. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-03-23 08:38:26 -05:00
Ingo Molnar	b3e3b302cf	Merge branches 'irq/sparseirq' and 'linus' into irq/core	2009-03-23 10:07:49 +01:00
Matthew Wilcox	1c8d7b0a56	PCI MSI: Add support for multiple MSI Add the new API pci_enable_msi_block() to allow drivers to request multiple MSI and reimplement pci_enable_msi in terms of pci_enable_msi_block. Ensure that the architecture back ends don't have to know about multiple MSI. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-03-20 10:48:14 -07:00
Kumar Gala	a4bd6a93c3	powerpc/mm: Respect _PAGE_COHERENT on classic ppc32 SW Since we now set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it out before we setup the SW TLB. Today all the SW TLB machines (603/e300) that we support are non-SMP, however there are some errata on some devices that cause us to set _PAGE_COHERENT via CPU_FTR_NEED_COHERENT. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2009-03-17 09:17:50 -06:00
Ingo Molnar	edb35028e4	Merge branches 'irq/genirq' and 'linus' into irq/core	2009-03-16 09:20:13 +01:00
Benjamin Herrenschmidt	28794d34ec	powerpc/kconfig: Kill PPC_MULTIPLATFORM CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't really meaningful anymore. It was basically equivalent to PPC64 \|\| 6xx. This removes it along with the following changes: - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely on 6xx which is what they want anyway. - A new symbol, PPC_BOOK3S, is defined that represent compliance with the "Server" variant of the architecture. This is set when either 6xx or PPC64 is set and open the door for future BOOK3E 64-bit. - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use PPC64 && PPC_BOOK3S - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now used to control the use of prom_init.c Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:35 +11:00
Thomas Gleixner	97f7d6bcc1	powerpc/irq: Convert obsolete irq_desc_t to struct irq_desc Impact: cleanup Convert the last remaining users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: linuxppc-dev@ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:34 +11:00
Andrew Klossner	af9c724907	powerpc/udbg: Fix lost byte during console handover; change LFCR to CRLF When the console is on a serial port to be driven by serial8250, a character can be lost from the end of the first line in the two-line sequence serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A console handover: boot [udbg0] -> real [ttyS0] This happens because udbg_puts or udbg_write stuff the last byte of the line into the Tx FIFO and return, whereupon the serial8250 initialization code immediately empties that FIFO. The fix: udbg_puts and udbg_write now wait for the Tx FIFO to clear before returning. This delays the system by one additional serial frame time for each line written by udbg, but the effect is not noticeable, a cumulative 17 milliseconds for 200 lines of early printk output at 115200 baud. Also, the routines in udbg_16550.c now emit CRLF instead of LFCR. Linux makes a point of emitting CRLF because, when serial output is captured to a file, LFCR sequences can confuse text editors. See http://lkml.org/lkml/2006/2/4/50 for some history. Signed-off-by: Andrew Klossner <andrew@cesa.opbu.xerox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:34 +11:00
Wolfram Sang	a77acda0b7	powerpc/pci: Fix typo: s/resouces/resources/ in a pr_debug Fix typo: s/resouces/resources/ in a pr_debug Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:34 +11:00
Michael Ellerman	e7943fbbfd	powerpc: Print linux_banner in prom_init So at least you can see what kernel you're booting if you die before the kernel prints it mid-way through start_kernel(). Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:33 +11:00
Octavian Purdila	7c9583a4db	powerpc/oprofile: Enable support for ppc750 processors This patch enables oprofile for all 3 FX variants and GX variant of the 750 processor. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:32 +11:00
Michael Ellerman	9e1e3723be	powerpc: Remove unused asm-offsets entries for cpu_spec Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:10:15 +11:00
Michael Ellerman	2657dd4e30	powerpc: Make sure we copy all cpu_spec features except PMC related ones When identify_cpu() is called a second time with a logical PVR, it only copies a subset of the cpu_spec fields so as to avoid overwriting the performance monitor fields that were initialized based on the real PVR. However some of the other, non performance monitor related fields are also not copied: * pvr_mask * pvr_value * mmu_features * machine_check The fact that pvr_mask is not copied can result in show_cpuinfo() showing the cpu as "unknown", if we override an unknown PVR with a logical one - as reported by Shaggy. So change the logic to copy all fields, and then put back the PMC related ones in the case that we're overwriting a real PVR with a logical one. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:10:14 +11:00
Michael Ellerman	666435bbf3	powerpc: Deindentify identify_cpu() The for-loop body of identify_cpu() has gotten a little big, so move the loop body logic into a separate function. No other changes. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:10:14 +11:00
Tejun Heo	19390c4d03	linker script: define __per_cpu_load on all SMP capable archs Impact: __per_cpu_load available on all SMP capable archs Percpu now requires three symbols to be defined - __per_cpu_load, __per_cpu_start and __per_cpu_end. There were three archs which didn't have it. Update them as follows. * powerpc: can use generic PERCPU() macro. Compile tested for powerpc32, compile/boot tested for powerpc64. * ia64: can use generic PERCPU_VADDR() macro. __phys_per_cpu_start is identical to __per_cpu_load. Compile tested and symbol table looks identical after the change except for the additional __per_cpu_load. * arm: added explicit __per_cpu_load definition. Currently uses unified .init output section so can't use the generic macro. Dunno whether the unified .init ouput section is required by arch peculiarity so I left it alone. Please break it up and use PERCPU() if possible. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Pat Gefre <pfg@sgi.com> Cc: Russell King <rmk@arm.linux.org.uk>	2009-03-10 16:27:48 +09:00
Kumar Gala	c3071951d0	powerpc/fsl-booke: Add support for tlbilx instructions The e500mc core supports the new tlbilx instructions that do core local invalidates and also provide us the ability to take down all TLB entries matching a given PID. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-03-09 09:25:38 -05:00
Paul Mackerras	880860e392	perfcounters/powerpc: add support for POWER4 processors Impact: more hardware support This adds the back-end for the PMU on the POWER4 and POWER4+ processors (GP and GQ). This is quite similar to the PPC970, with 8 PMCs, but has fewer events than the PPC970. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-03-06 16:30:57 +11:00
Paul Mackerras	aabbaa6036	perfcounters/powerpc: add support for POWER5+ processors Impact: more hardware support This adds the back-end for the PMU on the POWER5+ processors (i.e. GS, including GS DD3 aka POWER5++). This doesn't use the fixed-function PMC5 and PMC6 since they don't respect the freeze conditions and don't generate interrupts, as on POWER6. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-03-06 16:28:37 +11:00
Paul Mackerras	86028598de	perfcounters/powerpc: fix oops with multiple counters in a group Impact: fix oops-causing bug This fixes a bug in the powerpc hw_perf_counter_init where the code didn't initialize ctrs[n] before passing the ctrs array to check_excludes, leading to possible oopses and other incorrect behaviour. This fixes it by initializing ctrs[n] correctly. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-03-06 08:07:13 +11:00
Ingo Molnar	8163d88c79	Merge commit 'v2.6.29-rc7' into perfcounters/core Conflicts: arch/x86/mm/iomap_32.c	2009-03-04 11:42:31 +01:00
Benjamin Herrenschmidt	652e8f8d57	Merge commit 'jwb/next' into next	2009-03-03 13:30:03 +11:00
Ingo Molnar	55f2b78995	Merge branch 'x86/urgent' into x86/pat	2009-03-01 12:47:58 +01:00
Ingo Molnar	8e818179eb	Merge branch 'x86/core' into perfcounters/core Conflicts: arch/x86/kernel/apic/apic.c arch/x86/kernel/irqinit_32.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 13:02:23 +01:00
Paul Mackerras	742bd95ba9	perfcounters/powerpc: Add support for POWER5 processors This adds the back-end for the PMU on the POWER5 processor. This knows how to use the fixed-function PMC5 and PMC6 (instructions completed and run cycles). Unlike POWER6, PMC5/6 obey the freeze conditions and can generate interrupts, so their use doesn't impose any extra restrictions. POWER5+ is different and is not supported by this patch. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-02-26 15:36:48 +11:00
Michael Neuling	49f297f8df	powerpc: Fix load/store float double alignment handler When we introduced VSX, we changed the way FPRs are stored in the thread_struct. Unfortunately we missed the load/store float double alignment handler code when updating how we access FPRs in the thread_struct. Below fixes this and merges the little/big endian case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-26 14:02:53 +11:00
Paul Mackerras	d095cd46da	perfcounters/powerpc: Make exclude_kernel bit work on Apple G5 processors Currently, setting hw_event.exclude_kernel does nothing on the PPC970 variants used in Apple G5 machines, because they have the HV (hypervisor) bit in the MSR forced to 1, so as far as the PMU is concerned, the kernel runs in hypervisor mode. Thus we have to use the MMCR0_FCHV (freeze counters in hypervisor mode) bit rather than the MMCR0_FCS (freeze counters in supervisor mode) bit. This checks the MSR.HV bit at startup, and if it is set, we set the freeze_counters_kernel variable to MMCR0_FCHV (it was initialized to MMCR0_FCS). We then use that whenever we need to exclude kernel events. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-02-23 23:01:28 +11:00
Anton Blanchard	501cb16d3c	powerpc: Randomise PIEs Randomise ELF_ET_DYN_BASE, which is used when loading position independent executables. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:21 +11:00
Anton Blanchard	912f9ee21c	powerpc: Randomise the brk region Randomize the heap. before: tundro2:~ # sleep 1 & cat /proc/${!}/maps \| grep heap 10017000-10118000 rw-p 10017000 00:00 0 [heap] 10017000-10118000 rw-p 10017000 00:00 0 [heap] 10017000-10118000 rw-p 10017000 00:00 0 [heap] 10017000-10118000 rw-p 10017000 00:00 0 [heap] 10017000-10118000 rw-p 10017000 00:00 0 [heap] after tundro2:~ # sleep 1 & cat /proc/${!}/maps \| grep heap 19419000-1951a000 rw-p 19419000 00:00 0 [heap] 325ff000-32700000 rw-p 325ff000 00:00 0 [heap] 1a97c000-1aa7d000 rw-p 1a97c000 00:00 0 [heap] 1cc60000-1cd61000 rw-p 1cc60000 00:00 0 [heap] 1afa9000-1b0aa000 rw-p 1afa9000 00:00 0 [heap] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:20 +11:00
Anton Blanchard	d839088cae	powerpc: Randomise lower bits of stack address Randomise the lower bits of the stack address. More randomisation is good for security but the scatter can also help with SMT threads that share an L1. A quick test case shows this working: int main() { int sp; printf("%x\n", (unsigned long)&sp & 4095); } before: 80 80 80 80 80 after: 610 490 300 6b0 d80 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:20 +11:00
Anton Blanchard	a465f9b694	powerpc: Move is_32bit_task Move is_32bit_task into asm/thread_info.h, that allows us to test for 32/64bit tasks without an ugly CONFIG_PPC64 ifdef. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:06 +11:00
Michael Neuling	553631e25f	powerpc: Fix load/store float double alignment handler When we introduced VSX, we changed the way FPRs are stored in the thread_struct. Unfortunately we missed the load/store float double alignment handler code when updating how we access FPRs in the thread_struct. Below fixes this and merges the little/big endian case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:05 +11:00
Michael Neuling	545bba1824	powerpc: Add alignment handler for new lfiwzx instruction lfiwzx is a new floating point load instruction in 2.06 that needs an alignment handler for Linux. Turns out to be the worlds easiest handler to add. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:04 +11:00
Brian King	f52862f407	powerpc/pseries: Fix partition migration hang under load While testing partition migration with heavy CPU load using shared processors, it was observed that sometimes the migration would never complete and would appear to hang. Currently, the migration code assumes that if H_SUCCESS is returned from the H_JOIN then the migration is complete and the processor is waking up on the target system. If there was an outstanding PROD to the processor when the H_JOIN is called, however, it will return H_SUCCESS on the source system, causing the migration to hang, or in some scenarios cause the kernel to crash on the complete call waking the caller of rtas_percpu_suspend_me. Fix this by calling H_JOIN multiple times if necessary during the migration. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:04 +11:00
Kumar Gala	620165f971	powerpc: Add support for using doorbells for SMP IPI The e500mc supports the new msgsnd/doorbell mechanisms that were added in the Power ISA 2.05 architecture. We use the normal level doorbell for doing SMP IPIs at this point. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:03 +11:00
Tom Arbuckle	f81786913a	powerpc/pci: Fix PCI<->OF matching of old style multifunc devices Old OF variants used to create a 'dummy' parent node "multifunc-device" for devices with more than one PCI function. Our code that matches OF nodes to PCI devices dealt with that in one place but not in another, this fixes it. This has the practical effect of fixing interrupt routing of multifunction PCI cards on some older PowerMac machines. Signed-off-by: Tom Arbuckle <tom.d.arbuckle@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:57 +11:00
Kumar Gala	16c57b3620	powerpc: Unify opcode definitions and support Create a new header that becomes a single location for defining PowerPC opcodes used by code that is either generationg instructions at runtime (fixups, debug, etc.), emulating instructions, or just compiling instructions old assemblers don't know about. We currently don't handle the floating point emulation or alignment decode as both are better handled by the specific decode support they already have. Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions since older assemblers don't know about them. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:56 +11:00
Steven Rostedt	bb9b903527	powerpc, ftrace: use create_branch lib function Impact: clean up, remove duplicate code When ftrace was first ported to PowerPC, there existed a create_function_call that would create the instruction to make a call to a given address. Unfortunately, this call expected to write to the address it was given, and since it used the address to calculate the offset, it could not be faked. ftrace needed a way to create the instruction without actually writing that instruction to the text section. So ftrace had to implement its own code. Now we have create_branch in the code patching library, which does exactly what ftrace needs. This patch replaces ftrace's implementation with the library function. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:56 +11:00
Steven Rostedt	b54dcfe108	powerpc, ftrace: use unsigned int for instruction manipulation The original port of ftrace to PowerPC kept a lot of the code used by x86. Some of this code was to handle x86's 5 byte instruction. This was handled by using character arrays to manipulate the code. PowerPC has a consistent 4 byte instruction. Using unsigned ints makes the code more efficient as well as more readable. By converting to use unsigned ints to represent instructions, I was able to remove the side effects that were needed for manipulating character strings. i.e. memcpy and memcmp Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:55 +11:00
Steven Rostedt	60ce8f7260	powerpc32, ftrace: dynamic function graph tracer This patch gets function graph tracing working with dynamic function tracer on PowerPC32. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:55 +11:00
Steven Rostedt	fad4f47cc8	powerpc32, ftrace: port function graph tracer to ppc32, static only This patch ports the function graph tracer for PowerPC, but only for static function tracing. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:55 +11:00
Steven Rostedt	bf528a3a9b	powerpc32, ftrace: save and restore mcount regs with macro Impact: clean up Use a macro to save and restore the registers for PowerPC32, since that code is duplicated. This is similar to the work done by Cyrill Gorcunov for the mcount code in x86_64. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:54 +11:00
Steven Rostedt	bb7253403f	powerpc64, ftrace: save toc only on modules for function graph The TOCS used by modules are different than the one used by the core kernel code. The function graph tracer must save and restore the TOC whenever it traces a module call. But this is an added overhead to burden the majority of core kernel code being traced. Benjamin Herrenschmidt suggested in testing the entry of the call to tell if it is a core kernel function or a module. He recommended using the REGION_ID() macro to perform this test. This patch implements Benjamin's idea, and uses a different return_to_handler routine dependent on if the entry is a core kernel function or not. The module version saves the TOC, where as the core kernel version does not. Geoff Lavand tested on PS3. Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:54 +11:00
Steven Rostedt	4654288847	powerpc64, tracing: add function graph tracer with dynamic tracing This is the port of the function graph tracer to PowerPC with dynamic tracing. Geoff Lavand tested on PS3. Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:54 +11:00
Steven Rostedt	6794c78243	powerpc64: port of the function graph tracer This is a port of the function graph tracer that was written by Frederic Weisbecker for the x86. This only works for PPC64 at the moment and only for static tracing. PPC32 and dynamic function graph tracing support will come later. The trace produces a visual calling of functions: # tracer: function_graph # # CPU DURATION FUNCTION CALLS # \| \| \| \| \| \| \| 0) 2.224 us \| } 0) ! 271.024 us \| } 0) ! 320.080 us \| } 0) ! 324.656 us \| } 0) ! 329.136 us \| } 0) \| .put_prev_task_fair() { 0) \| .update_curr() { 0) 2.240 us \| .update_min_vruntime(); 0) 6.512 us \| } 0) 2.528 us \| .__enqueue_entity(); 0) + 15.536 us \| } 0) \| .pick_next_task_fair() { 0) 2.032 us \| .__pick_next_entity(); 0) 2.064 us \| .__clear_buddies(); 0) \| .set_next_entity() { 0) 2.672 us \| .__dequeue_entity(); 0) 6.864 us \| } Geoff Lavand tested on PS3. Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:53 +11:00
Steven Rostedt	17be5b3ddf	powerpc, ftrace: fix compile error when modules not configured Michael Neuling reported a compile bug when dynamic ftrace was configured in and modules were not. This was due to the ftrace code referencing module specific structures. Reported-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:53 +11:00
Steven Rostedt	44e1d064b9	ftrace, powerpc: replace debug macro with proper pr_deug Impact: cleanup The PowerPC ftrace code uses a hacked up DEBUGP macro for prints. This patch converts it to the standard pr_debug. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:52 +11:00
Ingo Molnar	fc6fc7f1b1	Merge branch 'linus' into x86/apic Conflicts: arch/x86/mach-default/setup.c Semantic conflict resolution: arch/x86/kernel/setup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-22 20:05:19 +01:00
Benjamin Herrenschmidt	3b7faeb49e	Merge commit 'kumar/next' into next	2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt	82a0a1cc8f	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/include/asm/pgtable-ppc32.h	2009-02-18 13:19:25 +11:00
Madhulika Madishetty	6c71209023	AMCC PPC 460SX redwood SoC platform initial framework This patch contains initial framework for the AMCC Redwood board. Signed-off-by: Madhulika Madishetty <mmadishetty@amcc.com> Signed-off-by: Tirumala Marri <tmarri@amcc.com> Signed-off-by: Feng Kan <fkan@amcc.com> Signed-off-by: Vidhyananth Venkatasamy <vvenkatasamy@amcc.com> Signed-off-by: Preetesh Parekh <pparekh@amcc.com> Acked-by: Loc Ho <lho@amcc.com> Acked-by: Feng Kan <fkan@amcc.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2009-02-14 14:41:29 -05:00
Yuri Tikhonov	e12401222f	powerpc/44x: Support for 256KB PAGE_SIZE This patch adds support for 256KB pages on ppc44x-based boards. For simplification of implementation with 256KB pages we still assume 2-level paging. As a side effect this leads to wasting extra memory space reserved for PTE tables: only 1/4 of pages allocated for PTEs are actually used. But this may be an acceptable trade-off to achieve the high performance we have with big PAGE_SIZEs in some applications (e.g. RAID). Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize the risk of stack overflows in the cases of on-stack arrays, which size depends on the page size (e.g. multipage BIOs, NTFS, etc.). With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be occupied by PKMAP addresses leaving no place for vmalloc. We do not separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that value of 10 in support for 16K/64K had been selected rather intuitively. Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB, one) we have 512 pages for PKMAP. Because ELF standard supports only page sizes up to 64K, then you should use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K for building applications, which are to be run with the 256KB-page sized kernel. If using the older binutils, then you should patch them like follows: --- binutils/bfd/elf32-ppc.c.orig +++ binutils/bfd/elf32-ppc.c -#define ELF_MAXPAGESIZE 0x10000 +#define ELF_MAXPAGESIZE 0x40000 One more restriction we currently have with 256KB page sizes is inability to use shmem safely, so, for now, the 256KB is available only if you turn the CONFIG_SHMEM option off (another variant is to use BROKEN). Though, if you need shmem with 256KB pages, you can always remove the !SHMEM dependency in 'config PPC_256K_PAGES', and use the workaround available here: http://lkml.org/lkml/2008/12/19/20 Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2009-02-14 14:40:04 -05:00
Ingo Molnar	8f8573ae9f	Merge branches 'irq/genirq', 'irq/sparseirq' and 'irq/urgent' into irq/core	2009-02-13 11:57:18 +01:00
Ingo Molnar	f8a6b2b9ce	Merge branch 'linus' into x86/apic Conflicts: arch/x86/kernel/acpi/boot.c arch/x86/mm/fault.c	2009-02-13 09:44:22 +01:00
Ingo Molnar	e9c4ffb11f	Merge branch 'linus' into perfcounters/core Conflicts: arch/x86/kernel/acpi/boot.c	2009-02-13 09:34:07 +01:00
Michael Neuling	26456dcfb8	powerpc/vsx: Fix VSX alignment handler for regs 32-63 Fix the VSX alignment handler for VSX registers > 32. 32-63 are stored in the VMX part of the thread_struct not the FPR part. Signed-off-by: Michael Neuling <mikey@neuling.org> CC: stable@kernel.org (2.6.27 & .28 please) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-13 16:37:45 +11:00
Kumar Gala	70fe3af840	powerpc/book-3e: Introduce concept of Book-3e MMU The Power ISA 2.06 spec introduces a standard MMU programming model that is based on the Freescale Book-E MMU programing model. The Freescale version is pretty backwards compatiable with the ISA 2.06 definition so we are starting to refactor some of the Freescale code so it can be easily shared. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:51:33 -06:00
Kumar Gala	d66c82ea45	powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines The Power ISA 2.06 added power of two page sizes to the embedded MMU architecture. Its done it such a way to be code compatiable with the existing HW. Made the minor code changes to support both power of two and power of four page sizes. Also added some new MAS bits and macros that are defined as part of the 2.06 ISA. Renamed some things to use the 'Book-3e' concept to convey the new MMU that is based on the Freescale Book-E MMU programming model. Note, its still invalid to try and use a page size that isn't supported by cpu. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:37:11 -06:00
Ingo Molnar	ffc0467293	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perfcounters into perfcounters/core	2009-02-11 09:22:14 +01:00
Ingo Molnar	95fd4845ed	Merge commit 'v2.6.29-rc4' into perfcounters/core Conflicts: arch/x86/kernel/setup_percpu.c arch/x86/mm/fault.c drivers/acpi/processor_idle.c kernel/irq/handle.c	2009-02-11 09:22:04 +01:00
Milton Miller	c3bd517de6	powerpc/pci: Move hose_list and pci_address_to_pio to pci-common move the definition of hose_list next to its hotplug spinlock. create pcibios_io_size to encapsulate ifdef in existing pci-common function pcibios_vaddr_is_ioport move pci_address_to_pio to pci-common, using new pcibios_io_size, and protect this GPL exported function against concurrent hotplug removal Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:07 +11:00
Paul Mackerras	0475f9ea8e	perf_counters: allow users to count user, kernel and/or hypervisor events Impact: new perf_counter feature This extends the perf_counter_hw_event struct with bits that specify that events in user, kernel and/or hypervisor mode should not be counted (i.e. should be excluded), and adds code to program the PMU mode selection bits accordingly on x86 and powerpc. For software counters, we don't currently have the infrastructure to distinguish which mode an event occurs in, so we currently fail the counter initialization if the setting of the hw_event.exclude_* bits would require us to distinguish. Context switches and CPU migrations are currently considered to occur in kernel mode. On x86, this changes the previous policy that only root can count kernel events. Now non-root users can count kernel events or exclude them. Non-root users still can't use NMI events, though. On x86 we don't appear to have any way to control whether hypervisor events are counted or not, so hw_event.exclude_hv is ignored. On powerpc, the selection of whether to count events in user, kernel and/or hypervisor mode is PMU-wide, not per-counter, so this adds a check that the hw_event.exclude_* settings are the same as other events on the PMU. Counters being added to a group have to have the same settings as the other hardware counters in the group. Counters and groups can only be enabled in hw_perf_group_sched_in or power_perf_enable if they have the same settings as any other counters already on the PMU. If we are not running on a hypervisor, the exclude_hv setting is ignored (by forcing it to 0) since we can't ever get any hypervisor events. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-02-11 15:06:59 +11:00
Michael Ellerman	059f134f84	powerpc: Allow debugging of LMBs with lmb=debug The lmb debugging can be turned on at boottime with lmb=debug on the command line. However on powerpc that doesn't work, because we don't necessarily call lmb_dump_all(). So always call lmb_dump_all() after lmb_analyze(), no output is generated unless lmb=debug is found on the command line. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:38:00 +11:00
Michael Ellerman	33642d31d1	powerpc: Remove unused ppc64_terminate_msg() Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:38:00 +11:00
Benjamin Herrenschmidt	edbc29d76d	Merge commit 'kumar/next' into next	2009-02-11 13:37:44 +11:00
Benjamin Herrenschmidt	5b11abfdb5	powerpc/pci: mmap anonymous memory when legacy_mem doesn't exist The new legacy_mem file in sysfs is causing problems with X on machines that don't support legacy memory access. The way I initially implemented it, we would fail with -ENXIO when trying to mmap it, thus exposing to X that we do support the API but there is no legacy memory. Unfortunately, X poor error handling is causing it to fail to start when it gets this error. This implements a workaround hack that instead maps anonymous memory instead (using shmem if VM_SHARED is set, just like /dev/zero does). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-10 14:39:08 +11:00
Steven Rostedt	f25f9074c2	powerpc/ftrace: Fix math to calculate offset in TOC Impact: fix dynamic ftrace with large modules in PPC64 The math to calculate the offset into the TOC that is taken from reading the trampoline is incorrect. The bottom half of the offset is a signed extended short. The current code was using an OR to create the offset when it should have been using an addition. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-10 14:39:08 +11:00
Ingo Molnar	9d45cf9e36	Merge branch 'x86/urgent' into x86/apic Conflicts: arch/x86/mach-default/setup.c Semantic merge: arch/x86/kernel/irqinit_32.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-05 22:30:01 +01:00
Benjamin Herrenschmidt	59b608c2c3	powerpc: Fix oops on some machines due to incorrect pr_debug() Recently, a patch left DEBUG enabled in the powerpc common PCI code, resulting in an old bug in a pr_debug() statement to show up and cause a NULL dereference on some machines. This fixes the pr_debug() statement and reverts to DEBUG not being force-enabled in that file. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-02 17:08:25 +11:00
Kumar Gala	105c31df6f	powerpc/fsl-booke: Cleanup init/exception setup to be runtime We currently have a few variants of fsl-booke processors (e500v1, e500v2, e500mc, and e200). They all have minor differences that we had previously been handling via ifdefs. To move towards having this support the following changes have been made: * PID1, PID2 only exist on e500v1 & e500v2 and should not be accessed on e500mc or e200. We use MMUCFG[NPIDS] to determine which case we are since we only touch PID1/2 in extremely early init code. * Not all IVORs exist on all the processors so introduce cpu_setup functions for each variant to setup the proper IVORs that are either unique or exist but have some variations between the processors Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:50 -06:00
Ingo Molnar	6a385db5ce	Merge branch 'core/percpu' into x86/core Conflicts: kernel/irq/handle.c	2009-01-28 23:12:55 +01:00
Robert Jennings	69b052e828	powerpc/pseries: Correct VIO bus accounting problem in CMO env. In the VIO bus code the wrappers for dma alloc_coherent and free_coherent calls are rounding to IOMMU_PAGE_SIZE. Taking a look at the underlying calls, the actual mapping is promoted to PAGE_SIZE. Changing the rounding in these two functions fixes under-reporting the entitlement used by the system. Without this change, the system could run out of entitlement before it believes it has and incur mapping failures at the firmware level. Also in the VIO bus code, the wrapper for dma map_sg is not exiting in an error path where it should. Rather than fall through to code for the success case, this patch adds the return that is needed in the error path. Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-28 17:15:52 +11:00
Ingo Molnar	77835492ed	Merge commit 'v2.6.29-rc2' into perfcounters/core Conflicts: include/linux/syscalls.h	2009-01-21 16:37:27 +01:00
Ingo Molnar	198030782c	Merge branch 'x86/mm' into core/percpu Conflicts: arch/x86/mm/fault.c	2009-01-21 10:39:51 +01:00
Ingo Molnar	af37501c79	Merge branch 'core/percpu' into perfcounters/core Conflicts: arch/x86/include/asm/pda.h We merge tip/core/percpu into tip/perfcounters/core because of a semantic and contextual conflict: the former eliminates the PDA, while the latter extends it with apic_perf_irqs field. Resolve the conflict by moving the new field to the irq_cpustat structure on 64-bit too. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-18 18:15:49 +01:00
Tejun Heo	74e7904559	linker script: add missing .data.percpu.page_aligned arm, arm/mach-integrator and powerpc were missing .data.percpu.page_aligned in their percpu output section definitions. Add it. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-01-17 15:26:32 +09:00
Michael Neuling	b60c31d85a	powerpc: Get the number of SLBs from "slb-size" property The PAPR says that the property for specifying the number of SLBs should be called "slb-size". We currently only look for "ibm,slb-size" because this is what firmware actually presents. This patch makes us look for the "slb-size" property as well and in preference to the "ibm,slb-size". This should future proof us if firmware changes to match PAPR. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-16 16:15:16 +11:00
Paul Mackerras	3b6f9e5cb2	perf_counter: Add support for pinned and exclusive counter groups Impact: New perf_counter features A pinned counter group is one that the user wants to have on the CPU whenever possible, i.e. whenever the associated task is running, for a per-task group, or always for a per-cpu group. If the system cannot satisfy that, it puts the group into an error state where it is not scheduled any more and reads from it return EOF (i.e. 0 bytes read). The group can be released from error state and made readable again using prctl(PR_TASK_PERF_COUNTERS_ENABLE). When we have finer-grained enable/disable controls on counters we'll be able to reset the error state on individual groups. An exclusive group is one that the user wants to be the only group using the CPU performance monitor hardware whenever it is on. The counter group scheduler will not schedule an exclusive group if there are already other groups on the CPU and will not schedule other groups onto the CPU if there is an exclusive group scheduled (that statement does not apply to groups containing only software counters, which can always go on and which do not prevent an exclusive group from going on). With an exclusive group, we will be able to let users program PMU registers at a low level without the concern that those settings will perturb other measurements. Along the way this reorganizes things a little: - is_software_counter() is moved to perf_counter.h. - cpuctx->active_oncpu now records the number of hardware counters on the CPU, i.e. it now excludes software counters. Nothing was reading cpuctx->active_oncpu before, so this change is harmless. - A new cpuctx->exclusive field records whether we currently have an exclusive group on the CPU. - counter_sched_out moves higher up in perf_counter.c and gets called from __perf_counter_remove_from_context and __perf_counter_exit_task, where we used to have essentially the same code. - __perf_counter_sched_in now goes through the counter list twice, doing the pinned counters in the first loop and the non-pinned counters in the second loop, in order to give the pinned counters the best chance to be scheduled in. Note that only a group leader can be exclusive or pinned, and that attribute applies to the whole group. This avoids some awkwardness in some corner cases (e.g. where a group leader is closed and the other group members get added to the context list). If we want to relax that restriction later, we can, and it is easier to relax a restriction than to apply a new one. This doesn't yet handle the case where a pinned counter is inherited and goes into error state in the child - the error state is not propagated up to the parent when the child exits, and arguably it should. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-01-14 21:00:30 +11:00
Paul Mackerras	01d0287f06	powerpc/perf_counter: Make sure PMU gets enabled properly This makes sure that we call the platform-specific ppc_md.enable_pmcs function on each CPU before we try to use the PMU on that CPU. If the CPU goes off-line and then on-line, we need to do the enable_pmcs call again, so we use the hw_perf_counter_setup hook to ensure that. It gets called as each CPU comes online, but it isn't called on the CPU that is coming up, so this adds the CPU number as an argument to it (there were no non-empty instances of hw_perf_counter_setup before). This also arranges to set the pmcregs_in_use field of the lppaca (data structure shared with the hypervisor) on each CPU when we are using the PMU and clear it when we are not. This allows the hypervisor to optimize partition switches by not saving/restoring the PMU registers when we aren't using the PMU. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-01-14 13:44:19 +11:00
Kumar Gala	5597b25c30	powerpc/e500mc: Doorbells need to be taken w/exceptions disabled We use Doorbell interrupts for IPIs and thus we need to make sure we aren't interrupted in the process of processing the IPI. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Dave Liu <daveliu@freescale.com>	2009-01-13 17:46:24 -06:00
Benjamin Herrenschmidt	c478b58135	powerpc/powermac: Fix occasional SMP boot failure The PowerMac kernel occasionally fails to bring up the secondary CPUs on SMP, the trigger factor seem to be fairly random and related to location of code and data. This appears to be due to the initial loading of the TOC value by the secondary processor which now happens before we clear HID4:RM_CI (Real Mode Cache Invalidate). This bit should really be cleared before we do any load or store other than fetching code. This fix works based on the assumption that all SMP 64-bit PowerMacs use variants of the 970, which fortunately is true, by explicitely clearing that bit, adding an slbia for good measure as RM_CI mode is known to create bogus ERAT entries. I also removed some spurrious debug output that was left enabled by mistake while at it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:48:03 +11:00
Nathan Lynch	fc7a9feb9c	powerpc/cacheinfo: Rename cache_dir per-cpu variable The per_cpu__ prefix on DECLARE_PER_CPU'd variables is going away; rename cache_dir to cache_dir_pcpu. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:48:02 +11:00
Stephen Rothwell	9477e455b4	powerpc: Cleanup from l64 to ll64 change: arch code Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:47:59 +11:00
Ingo Molnar	fe333321e2	powerpc: Change u64/s64 to a long long integer type Convert arch/powerpc/ over to long long based u64: -#ifdef __powerpc64__ -# include <asm-generic/int-l64.h> -#else -# include <asm-generic/int-ll64.h> -#endif +#include <asm-generic/int-ll64.h> This will avoid reoccuring spurious warnings in core kernel code that comes when people test on their own hardware. (i.e. x86 in ~98% of the cases) This is what x86 uses and it generally helps keep 64-bit code 32-bit clean too. [Adjusted to not impact user mode (from paulus) - sfr] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:47:59 +11:00
Milton Miller	66c721e184	powerpc/kexec: Check crash_base for relocatable kernel Enforce that the crash kernel region never overlaps the current kernel, as it will be written directly on kexec load. Also, default to the previous KDUMP_KERNELBASE if the start is 0. Other architectures (x86, ia64) state that specifying the start address 0 (or omitting it) will result in the kernel allocating it. Before the relocatable patch in 2.6.28, powerpc would adjust any other start value to the hardcoded KDUMP_KERNELBASE of 32M. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:47:59 +11:00
Milton Miller	e16459c6b7	powerpc: Make dummy section a valid note header We are declaring the dummy section (used to work around a binutils bug) as PT_NOTE, but we don't have enough bytes for it to be a valid note header, and kexec userspace complains: Warning: Elf Note name is not null terminated Warning: append= option is not passed. Using the first kernel root partition Warning: Elf Note name is not null terminated Instead of using the arbitray value 0xf177 (aka "fill"), declare a no-name no-description note of type 0. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:47:58 +11:00
Benjamin Herrenschmidt	30aae739a9	Merge commit 'kumar/kumar-next' into next	2009-01-13 13:59:03 +11:00
Mike Travis	e65e49d0f3	irq: update all arches for new irq_desc Impact: cleanup, update to new cpumask API Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's so access to them should be using the new cpumask API. Signed-off-by: Mike Travis <travis@sgi.com>	2009-01-12 15:27:13 -08:00
Yinghai Lu	dee4102a9a	sparseirq: use kstat_irqs_cpu instead Impact: build fix Ingo Molnar wrote: > tip/arch/blackfin/kernel/irqchip.c: In function 'show_interrupts': > tip/arch/blackfin/kernel/irqchip.c:85: error: 'struct kernel_stat' has no member named 'irqs' > make[2]: * [arch/blackfin/kernel/irqchip.o] Error 1 > make[2]: * Waiting for unfinished jobs.... > So could move kstat_irqs array to irq_desc struct. (s390, m68k, sparc) are not touched yet, because they don't support genirq Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-01-11 15:53:13 +01:00
Ingo Molnar	c0d362a832	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/perfcounters into perfcounters/core	2009-01-11 02:44:08 +01:00
Paul Mackerras	f78628374a	powerpc/perf_counter: Add support for POWER6 This adds the back-end for the PMU on the POWER6 processor. Fortunately, the event selection hardware is somewhat simpler on POWER6 than on other POWER family processors, so the constraints fit into only 32 bits. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-01-10 16:35:01 +11:00
Paul Mackerras	16b067993d	powerpc/perf_counter: Add support for PPC970 family This adds the back-end for the PMU on the PPC970 family. The PPC970 allows events from the ISU to be selected in two different ways. Rather than use alternative event codes to express this, we instead use a single encoding for ISU events and express the resulting constraint (that you can't select events from all three of FPU/IFU/VPU, ISU and IDU/STS at the same time, since they all come in through only 2 multiplexers) using a NAND constraint field, and work out which multiplexer is used for ISU events at compute_mmcr time. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-01-10 16:34:07 +11:00
Paul Mackerras	4574910e50	powerpc/perf_counter: Add generic support for POWER-family PMU hardware This provides the architecture-specific functions needed to access PMU hardware on the 64-bit PowerPC processors. It has been designed for the IBM POWER family (POWER 4/4+/5/5+/6 and PPC970) but will hopefully also suit other 64-bit PowerPC machines (although probably not Cell given how different it is in this area). This doesn't include back-ends for any specific processors. This implements a system which allows back-ends to express the constraints that their hardware has on what events can be counted simultaneously. The constraints are expressed as a 64-bit mask + 64-bit value for each event, and the encoding is capable of expressing the constraints arising from having a set of multiplexers feeding an event bus, with some events being available through multiple multiplexer settings, such as we get on POWER4 and PPC970. Furthermore, the back-end can supply alternative event codes for each event, and the constraint checking code will try all possible combinations of alternative event codes to try to find a combination that will fit. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-01-10 16:32:05 +11:00
Paul Mackerras	93a6d3ce69	powerpc: Provide a way to defer perf counter work until interrupts are enabled Because 64-bit powerpc uses lazy (soft) interrupt disabling, it is possible for a performance monitor exception to come in when the kernel thinks interrupts are disabled (i.e. when they are soft-disabled but hard-enabled). In such a situation the performance monitor exception handler might have some processing to do (such as process wakeups) which can't be done in what is effectively an NMI handler. This provides a way to defer that work until interrupts get enabled, either in raw_local_irq_restore() or by returning from an interrupt handler to code that had interrupts enabled. We have a per-processor flag that indicates that there is work pending to do when interrupts subsequently get re-enabled. This flag is checked in the interrupt return path and in raw_local_irq_restore(), and if it is set, perf_counter_do_pending() is called to do the pending work. Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-01-09 19:48:17 +11:00
Kumar Gala	1edda9c795	powerpc: Export cacheable_memzero as its now used in a driver The Freescale PowerPC specific gianfar driver (gig-e) uses cacheable_memzero for performance reasons we need to export the symbol to allow the driver to be built as a module. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:17 +11:00
Ingo Molnar	2b931fb67e	powerpc: Use correct type in prom_init.c tce_entryp is a "u64 " not an "unsigned long ". [Split from a large patch -sfr] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:16 +11:00
Stephen Rothwell	6327716131	powerpc: Remove unnecessary casts of_get_flat_dt_prop() returns a "void *", so we don't need to cast when assigning its result to a pointer variable. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:16 +11:00
Paul Mackerras	16124f10df	powerpc: Fix pciconfig_iobase system call on PCI-Express powermac X has been failing to start on my quad G5 powermac since commit `1fd0f52583` ("powerpc: Fix domain numbers in /proc on 64-bit") went in. The reason is that the change allows X to see the PCI-PCI bridge above the video card (previously it was obscured by the fact that there were two "00" directories in /proc/bus/pci), and the pciconfig_iobase system call on the bridge is failing because of a hack that we have to return information about the AGP bus when X asks about bus 0. This machine doesn't have an AGP bus (it has PCI Express) and so the pciconfig_iobase call is returning -1, which ultimately causes X to fail to start. This fixes it by checking that we have an AGP bridge before redirecting the pciconfig_iobase call to return information about the AGP bus. With this, X starts successfully both on a quad G5 with PCI Express and on an older dual G5 with AGP. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:11 +11:00
Nathan Lynch	93197a36a9	powerpc: Rewrite sysfs processor cache info code The current code for providing processor cache information in sysfs has the following deficiencies: - several complex functions that are hard to understand - implicit recursion (cache_desc_release -> kobject_put -> cache_desc_release) - explicit recursion (create_cache_index_info) - use of two per-cpu arrays when one would suffice - duplication of work on systems where CPUs share cache Also, when I looked at implementing support for a shared_cpu_map attribute, it was pretty much impossible to handle hotplug without checking every single online CPU's cache_desc list and fixing things up... not that this is a hot path, but it would have introduced O(n^2)-ish behavior during boot. Addressing this involved rethinking the core data structures used, which didn't lend itself to an incremental approach. This implementation maintains a "forest" (potentially more than one tree) of cache objects which reflects the system's cache topology. Cache objects are instantiated as needed as CPUs come online. A per-cpu array is used mainly for sysfs-related bookkeeping; the objects in the array just point to the appropriate points in the forest. This maintains compatibility with the existing code and includes some enhancements: - Implement the shared_cpu_map attribute, which is essential for enabling userspace to discover the system's overall cache topology. - Use cache-block-size properties if cache-line-size is not available. I chose to place this implementation in a new file since it would have roughly doubled the size of sysfs.c, which is already kind of messy. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:10 +11:00
Benjamin Herrenschmidt	c1f343028d	powerpc/pci: Reserve legacy regions on PCI There's a problem on some embedded platforms when we re-assign everything on PCI, such as 44x. The generic code tries to avoid assigning devices to addresses overlapping the low legacy addresses such as VGA hard decoded areas using constants that are unfortunately no good for us, as they don't take into account the address translation we do to access PCI busses. Thus we end up allocating things like IO BARs to 0, which is technically legal, but will shadow hard decoded ports for use by things like VGA cards. This works around it by attempting to reserve legacy regions before we try to assign addresses. NOTE: This may have nasty side effects in cases I haven't tested yet: - We try to use FW mappings (ie. powermac) and the FW has allocated a conflicting address over those legacy regions. This will typically happen. I would expect the new code to just fail with an informative message without harm but I haven't had a chance to test that scenario yet. - A device with fixed BARs overlapping those legacy addresses such as an IDE controller in legacy mode is in the system. I don't know for sure yet what will happen there, I have to test :-) Ideally, we should change PCIBIOS_MIN_IO/MIN_MEM accross the board to take a bus pointer so they can provide appropriate per-bus translated values to the generic code but that's a more invasive patch. I will do that in the future, but in the meantime, this fixes the problem locally Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:07 +11:00
Linus Torvalds	b424e8d3b4	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (98 commits) PCI PM: Put PM callbacks in the order of execution PCI PM: Run default PM callbacks for all devices using new framework PCI PM: Register power state of devices during initialization PCI PM: Call pci_fixup_device from legacy routines PCI PM: Rearrange code in pci-driver.c PCI PM: Avoid touching devices behind bridges in unknown state PCI PM: Move pci_has_legacy_pm_support PCI PM: Power-manage devices without drivers during suspend-resume PCI PM: Add suspend counterpart of pci_reenable_device PCI PM: Fix poweroff and restore callbacks PCI: Use msleep instead of cpu_relax during ASPM link retraining PCI: PCIe portdrv: Add kerneldoc comments to remining core funtions PCI: PCIe portdrv: Rearrange code so that related things are together PCI: PCIe portdrv: Fix suspend and resume of PCI Express port services PCI: PCIe portdrv: Add kerneldoc comments to some core functions x86/PCI: Do not use interrupt links for devices using MSI-X net: sfc: Use pci_clear_master() to disable bus mastering PCI: Add pci_clear_master() as opposite of pci_set_master() PCI hotplug: remove redundant test in cpq hotplug PCI: pciehp: cleanup register and field definitions ...	2009-01-07 15:41:01 -08:00
Trent Piepho	6fd8be4bf7	powerpc/fsl-booke: Remove num_tlbcam_entries This is a global variable defined in fsl_booke_mmu.c with a value that gets initialized in assembly code in head_fsl_booke.S. It's never used. If some code ever does want to know the number of entries in TLB1, then "numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a global initialized during kernel boot from assembly. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-07 15:33:07 -06:00
Trent Piepho	19f5465e82	powerpc/fsl-booke: Don't hard-code size of struct tlbcam Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam to 20 when it indexed the TLBCAM table. Anyone changing the size of struct tlbcam would not know to expect that. The kernel already has a system to get the size of C structures into assembly language files, asm-offsets, so let's use it. The definition of the struct gets moved to a header, so that asm-offsets.c can include it. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-07 15:33:06 -06:00
Linus Torvalds	57c44c5f6f	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) trivial: chack -> check typo fix in main Makefile trivial: Add a space (and a comma) to a printk in 8250 driver trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx trivial: Fix misspelling of "firmware" in powerpc Makefile trivial: Fix misspelling of "firmware" in usb.c trivial: Fix misspelling of "firmware" in qla1280.c trivial: Fix misspelling of "firmware" in a100u2w.c trivial: Fix misspelling of "firmware" in megaraid.c trivial: Fix misspelling of "firmware" in ql4_mbx.c trivial: Fix misspelling of "firmware" in acpi_memhotplug.c trivial: Fix misspelling of "firmware" in ipw2100.c trivial: Fix misspelling of "firmware" in atmel.c trivial: Fix misspelled firmware in Kconfig trivial: fix an -> a typos in documentation and comments trivial: fix then -> than typos in comments and documentation trivial: update Jesper Juhl CREDITS entry with new email trivial: fix singal -> signal typo trivial: Fix incorrect use of "loose" in event.c trivial: printk: fix indentation of new_text_line declaration trivial: rtc-stk17ta8: fix sparse warning ...	2009-01-07 11:31:52 -08:00
Bjorn Helgaas	3f9455d488	PCI: powerpc: use generic pci_swizzle_interrupt_pin() Use the generic pci_swizzle_interrupt_pin() instead of arch-specific code. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2009-01-07 11:12:52 -08:00
Masami Hiramatsu	1294156078	kprobes: add kprobe_insn_mutex and cleanup arch_remove_kprobe() Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove kprobe_mutex from architecture dependent code. This allows us to call arch_remove_kprobe() (and free_insn_slot) while holding kprobe_mutex. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:20 -08:00
Frederik Schwarzer	025dfdafe7	trivial: fix then -> than typos in comments and documentation - (better, more, bigger ...) then -> (...) than Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-01-06 11:28:06 +01:00
Linus Torvalds	61420f59a5	Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID [PATCH] improve idle cputime accounting [PATCH] improve precision of idle time detection. [PATCH] improve precision of process accounting. [PATCH] idle cputime accounting [PATCH] fix scaled & unscaled cputime accounting	2009-01-03 11:56:24 -08:00
Linus Torvalds	b840d79631	Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually	2009-01-02 11:44:09 -08:00
Linus Torvalds	597b0d2162	Merge branch 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits) KVM: MMU: handle large host sptes on invlpg/resync KVM: Add locking to virtual i8259 interrupt controller KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared MAINTAINERS: Maintainership changes for kvm/ia64 KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs() KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip KVM: fix handling of ACK from shared guest IRQ KVM: MMU: check for present pdptr shadow page in walk_shadow KVM: Consolidate userspace memory capability reporting into common code KVM: Advertise the bug in memory region destruction as fixed KVM: use cpumask_var_t for cpus_hardware_enabled KVM: use modern cpumask primitives, no cpumask_t on stack KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus KVM: set owner of cpu and vm file operations anon_inodes: use fops->owner for module refcount x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj KVM: MMU: prepopulate the shadow on invlpg KVM: MMU: skip global pgtables on sync due to cr3 switch KVM: MMU: collapse remote TLB flushes on root sync ...	2009-01-02 11:41:11 -08:00
Al Viro	18d8fda7c3	take init_fs to saner place Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-31 18:07:42 -05:00
Hollis Blanchard	73e75b416f	KVM: ppc: Implement in-kernel exit timing statistics Existing KVM statistics are either just counters (kvm_stat) reported for KVM generally or trace based aproaches like kvm_trace. For KVM on powerpc we had the need to track the timings of the different exit types. While this could be achieved parsing data created with a kvm_trace extension this adds too much overhead (at least on embedded PowerPC) slowing down the workloads we wanted to measure. Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm code. These statistic is available per vm&vcpu under the kvm debugfs directory. As this statistic is low, but still some overhead it can be enabled via a .config entry and should be off by default. Since this patch touched all powerpc kvm_stat code anyway this code is now merged and simplified together with the exit timing statistic code (still working with exit timing disabled in .config). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:55:41 +02:00
Hollis Blanchard	7924bd4109	KVM: ppc: directly insert shadow mappings into the hardware TLB Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the guest would load this array into the hardware TLB. This consumed 1280 bytes of memory (64 entries of 16 bytes plus a struct page pointer each), and also required some assembly to loop over the array on every entry. Instead of saving a copy in memory, we can just store shadow mappings directly into the hardware TLB, accepting that the host kernel will clobber these as part of the normal 440 TLB round robin. When we do that we need less than half the memory, and we have decreased the exit handling time for all guest exits, at the cost of increased number of TLB misses because the host overwrites some guest entries. These savings will be increased on processors with larger TLBs or which implement intelligent flush instructions like tlbivax (which will avoid the need to walk arrays in software). In addition to that and to the code simplification, we have a greater chance of leaving other host userspace mappings in the TLB, instead of forcing all subsequent tasks to re-fault all their mappings. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:55:09 +02:00
Hollis Blanchard	db93f5745d	KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor This patch doesn't yet move all 44x-specific data into the new structure, but is the first step down that path. In the future we may also want to create a struct kvm_vcpu_booke. Based on patch from Liu Yu <yu.liu@freescale.com>. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:52:22 +02:00
Hollis Blanchard	0f55dc481e	KVM: ppc: Rename "struct tlbe" to "struct kvmppc_44x_tlbe" This will ease ports to other cores. Also remove unused "struct kvm_tlb" while we're at it. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2008-12-31 16:51:50 +02:00
Martin Schwidefsky	79741dd357	[PATCH] idle cputime accounting The cpu time spent by the idle process actually doing something is currently accounted as idle time. This is plain wrong, the architectures that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the time spent doing nothing and the time spent by idle doing work. The first is accounted with account_idle_time and the second with account_system_time. The architectures that use the account_xxx_time interface directly and not the account_xxx_ticks interface now need to do the check for the idle process in their arch code. In particular to improve the system vs true idle time accounting the arch code needs to measure the true idle time instead of just testing for the idle process. To improve the tick based accounting as well we would need an architecture primitive that can tell us if the pt_regs of the interrupted context points to the magic instruction that halts the cpu. In addition idle time is no more added to the stime of the idle process. This field now contains the system time of the idle process as it should be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as every tick that occurs while idle is running will be accounted as idle time. This patch contains the necessary common code changes to be able to distinguish idle system time and true idle time. The architectures with support for VIRT_CPU_ACCOUNTING need some changes to exploit this. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-31 15:11:46 +01:00
Martin Schwidefsky	457533a7d3	[PATCH] fix scaled & unscaled cputime accounting The utimescaled / stimescaled fields in the task structure and the global cpustat should be set on all architectures. On s390 the calls to account_user_time_scaled and account_system_time_scaled never have been added. In addition system time that is accounted as guest time to the user time of a process is accounted to the scaled system time instead of the scaled user time. To fix the bugs and to prevent future forgetfulness this patch merges account_system_time_scaled into account_system_time and account_user_time_scaled into account_user_time. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Michael Neuling <mikey@neuling.org> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-31 15:11:46 +01:00
Linus Torvalds	3c92ec8ae9	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions	2008-12-28 16:54:33 -08:00
Ilya Yanok	ca9153a3a2	powerpc/44x: Support 16K/64K base page sizes on 44x This adds support for 16k and 64k page sizes on PowerPC 44x processors. The PGDIR table is much smaller than a page when using 16k or 64k pages (512 and 32 bytes respectively) so we allocate the PGDIR with kzalloc() instead of __get_free_pages(). One PTE table covers rather a large memory area when using 16k or 64k pages (32MB or 512MB respectively), so we can easily put FIXMAP and PKMAP in the area covered by one PTE table. Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Vladimir Panfilov <pvr@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-29 09:53:25 +11:00
Hollis Blanchard	6ca4f7494b	powerpc: Force memory size to be a multiple of PAGE_SIZE Ensure that total memory size is page-aligned, because otherwise mark_bootmem() gets upset. This error case was triggered by using 64 KiB pages in the kernel while arch/powerpc/boot/4xx.c arbitrarily reduced the amount of memory by 4096 (to work around a chip bug that affects the last 256 bytes of physical memory). Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-29 09:53:14 +11:00
Linus Torvalds	1db2a5c11e	Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 * 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (85 commits) [S390] provide documentation for hvc_iucv kernel parameter. [S390] convert ctcm printks to dev_xxx and pr_xxx macros. [S390] convert zfcp printks to pr_xxx macros. [S390] convert vmlogrdr printks to pr_xxx macros. [S390] convert zfcp dumper printks to pr_xxx macros. [S390] convert cpu related printks to pr_xxx macros. [S390] convert qeth printks to dev_xxx and pr_xxx macros. [S390] convert sclp printks to pr_xxx macros. [S390] convert iucv printks to dev_xxx and pr_xxx macros. [S390] convert ap_bus printks to pr_xxx macros. [S390] convert dcssblk and extmem printks messages to pr_xxx macros. [S390] convert monwriter printks to pr_xxx macros. [S390] convert s390 debug feature printks to pr_xxx macros. [S390] convert monreader printks to pr_xxx macros. [S390] convert appldata printks to pr_xxx macros. [S390] convert setup printks to pr_xxx macros. [S390] convert hypfs printks to pr_xxx macros. [S390] convert time printks to pr_xxx macros. [S390] convert cpacf printks to pr_xxx macros. [S390] convert cio printks to pr_xxx macros. ...	2008-12-28 12:33:21 -08:00
Martin Schwidefsky	fc5243d98a	[S390] arch_setup_additional_pages arguments arch_setup_additional_pages currently gets two arguments, the binary format descripton and an indication if the process uses an executable stack or not. The second argument is not used by anybody, it could be removed without replacement. What actually does make sense is to pass an indication if the process uses the elf interpreter or not. The glibc code will not use anything from the vdso if the process does not use the dynamic linker, so for statically linked binaries the architecture backend can choose not to map the vdso. Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-12-25 13:38:54 +01:00
Dale Farnsworth	f8f50b1bdd	powerpc/32: Wire up the trampoline code for kdump Wire up the trampoline code for ppc32 to relay exceptions from the vectors at address 0 to vectors at address 32MB, and modify Kconfig to enable Kdump support for all classic powerpcs. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Dale Farnsworth	ccdcef72c2	powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M Add the ability for a classic ppc kernel to be loaded at an address of 32MB. This done by fixing a few places that assume we are loaded at address 0, and by changing several uses of KERNELBASE to use PAGE_OFFSET, instead. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Dale Farnsworth	6f29c3298b	powerpc/32: Setup OF properties for kdump Refactor the setting of kdump OF properties, moving the common code from machine_kexec_64.c to machine_kexec.c where it can be used on both ppc64 and ppc32. This will be needed for kdump to work on ppc32 platforms. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Anton Vorontsov	7375331388	powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() This replaces the dummy crash_setup_regs function with full-fledged crash_setup_regs implementation. On PPC32 we simply use the new ppc_save_regs function to dump the registers. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:28 +11:00
Anton Vorontsov	322b439455	powerpc: Prepare xmon_save_regs for use with kdump Today the arch/powerpc/xmon/setjmp.S file contains only the xmon_save_regs function. We want to use it for kdump purposes, so let's move the file into arch/powerpc/kernel/ and give the function a more generic name (ppc_save_regs). Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:28 +11:00
Anton Vorontsov	77733f8a33	powerpc: Make default kexec/crash_kernel ops implicit This removes the need for each platform to specify default kexec and crash kernel ops, thus effectively adds a working kexec support for most 6xx/7xx/7xxx-based boards. Platforms that can't cope with default ops will explode in some weird way (a hang or reboot is most likely), which means that the board's kexec support should be fixed or blacklisted via dummy _prepare callback returning -ENOSYS. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:28 +11:00
Dale Farnsworth	2e8e4f5b80	powerpc: Setup OF properties for ppc32 kexec Refactor the setting of kexec OF properties, moving the common code from machine_kexec_64.c to machine_kexec.c where it can be used on both ppc64 and ppc32. This is needed for kexec to work on ppc32 platforms. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:28 +11:00
Benjamin Herrenschmidt	9dce3ce5c5	powerpc/44x: 44x TLB doesn't need "Guarded" set for all pages After discussing with chip designers, it appears that it's not necessary to set G everywhere on 440 cores. The various core errata related to prefetch should be sorted out by firmware by disabling icache prefetching in CCR0. We add the workaround to the kernel however just in case oooold firmwares don't do it. This is valid for -all- 4xx core variants. Later ones hard wire the absence of prefetch but it doesn't harm to clear the bits in CCR0 (they should already be cleared anyway). We still leave G=1 on the linear mapping for now, we need to stop over-mapping RAM to be able to remove it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	64b3d0e812	powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in in the hash code based on some CPU feature bit. We also manipulate _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places. This changes the logic so that instead, the PTE now contains _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms that need it. The hash code clears it if the feature bit is not set. It also adds some clean accessors to setup various valid combinations of access flags and change various bits of code to use them instead. This should help having the PTE actually containing the bit combinations that we really want. I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead set it explicitely from the TLB miss. I will ultimately remove it completely as it appears that it might not be needed after all but in the meantime, having it in the TLB miss makes things a lot easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	7752035180	powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs This makes the MMU context code used for CPUs with no hash table (except 603) dynamically allocate the various maps used to track the state of contexts. Only the main free map and CPU 0 stale map are allocated at boot time. Other CPU maps are allocated when those CPUs are brought up and freed if they are unplugged. This also moves the initialization of the MMU context management slightly later during the boot process, which should be fine as it's really only needed when userland if first started anyways. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	2a4aca1144	powerpc/mm: Split low level tlb invalidate for nohash processors Currently, the various forms of low level TLB invalidations are all implemented in misc_32.S for 32-bit processors, in a fairly scary mess of #ifdef's and with interesting duplication such as a whole bunch of code for FSL _tlbie and _tlbia which are no longer used. This moves things around such that _tlbie is now defined in hash_low_32.S and is only used by the 32-bit hash code, and all nohash CPUs use the various _tlbil_* forms that are now moved to a new file, tlb_nohash_low.S. I moved all the definitions for that stuff out of include/asm/tlbflush.h as they are really internal mm stuff, into mm/mmu_decl.h The code should have no functional changes. I kept some variants inline for trivial forms on things like 40x and 8xx. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	f048aace29	powerpc/mm: Add SMP support to no-hash TLB handling This commit moves the whole no-hash TLB handling out of line into a new tlb_nohash.c file, and implements some basic SMP support using IPIs and/or broadcast tlbivax instructions. Note that I'm using local invalidations for D->I cache coherency. At worst, if another processor is trying to execute the same and has the old entry in its TLB, it will just take a fault and re-do the TLB flush locally (it won't re-do the cache flush in any case). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	7c03d653cd	powerpc/mm: Introduce MMU features We're soon running out of CPU features and I need to add some new ones for various MMU related bits, so this patch separates the MMU features from the CPU features. I moved over the 32-bit MMU related ones, added base features for MMU type families, but didn't move over any 64-bit only feature yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	5e696617c4	powerpc/mm: Split mmu_context handling This splits the mmu_context handling between 32-bit hash based processors, 64-bit hash based processors and everybody else. This is preliminary work for adding SMP support for BookE processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt	6d2170be45	powerpc/4xx: Extended DCR support v2 This adds supports to the "extended" DCR addressing via the indirect mfdcrx/mtdcrx instructions supported by some 4xx cores (440H6 and later). I enabled the feature for now only on AMCC 460 chips. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Nathan Lynch	13ba3c0092	powerpc: Convert sysfs cache code to of_find_next_cache_node() Using the common code means that more complete cache information will provided in sysfs on platforms that don't use the l2-cache property convention. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:14 +11:00
Nathan Lynch	b2ea25b958	powerpc: Convert cpu_to_l2cache() to of_find_next_cache_node() The smp code uses cache information to populate cpu_core_map; change it to use common code for cache lookup. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:14 +11:00
Nathan Lynch	e523f723d6	powerpc: Add of_find_next_cache_node() We have more than one piece of code that looks up cache nodes manually using the "l2-cache" property. Add a common helper routine which does this and handles ePAPR's "next-level-cache" property as well as powermac. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:14 +11:00
Ingo Molnar	30cd324e97	Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core Conflicts: include/linux/ftrace.h	2008-12-19 09:42:40 +01:00
Ingo Molnar	b9974dc6bd	Merge branch 'linus' into cpus4096	2008-12-18 11:48:30 +01:00
Paul Mackerras	c280266a32	Merge branch 'linux-2.6' into next	2008-12-18 11:06:12 +11:00
Dave Liu	28707af01b	powerpc/fsl-booke: Fix the miss interrupt restore The commit `e5e774d883` powerpc/fsl-booke: Fix problem with _tlbil_va being interrupted introduce one issue. that casue the problem like this: Kernel BUG at c00b19fc [verbose debug info unavailable] Oops: Exception in kernel mode, sig: 5 [#1] MPC8572 DS Modules linked in: NIP: c00b19fc LR: c00b1c34 CTR: c0064e88 REGS: ef02b7b0 TRAP: 0700 Not tainted (2.6.28-rc8-00057-g1bda712) MSR: 00021000 <ME> CR: 44048028 XER: 20000000 TASK = ef02c000[1] 'init' THREAD: ef02a000 GPR00: 00000001 ef02b860 ef02c000 eec201a0 c0dec2c0 00000000 000078a1 00000400 GPR08: c00b4e40 000078a1 c048ec00 a1780000 44048028 ecd26917 00000001 ef02b948 GPR16: ffffffea 0000020c 00000000 00000000 00000003 0000000a 00000000 000078a1 GPR24: eec201a0 00000000 ed849000 00000400 ef02b95c 00000001 ef02b978 ef02b984 NIP [c00b19fc] __find_get_block+0x24/0x238 LR [c00b1c34] __getblk+0x24/0x2a0 Call Trace: [ef02b860] [c017b768] generic_make_request+0x290/0x328 (unreliable) [ef02b8b0] [c00b1c34] __getblk+0x24/0x2a0 [ef02b910] [c00b4ae4] __bread+0x14/0xf8 [ef02b920] [c00fc228] ext2_get_branch+0xf0/0x138 [ef02b940] [c00fcc88] ext2_get_block+0xb8/0x828 [ef02ba00] [c00bbdc8] do_mpage_readpage+0x188/0x808 [ef02bac0] [c00bc5b4] mpage_readpages+0xec/0x144 [ef02bb50] [c00fba38] ext2_readpages+0x24/0x34 [ef02bb60] [c006ade0] __do_page_cache_readahead+0x150/0x230 [ef02bbb0] [c0064bdc] filemap_fault+0x31c/0x3e0 [ef02bbf0] [c00728b8] __do_fault+0x60/0x5b0 [ef02bc50] [c0011e0c] do_page_fault+0x2d8/0x4c4 [`ef02bd10`] [c000ed90] handle_page_fault+0xc/0x80 [ef02bdd0] [c00c7adc] set_brk+0x74/0x9c [ef02bdf0] [c00c9274] load_elf_binary+0x70c/0x1180 [ef02be70] [c00945f0] search_binary_handler+0xa8/0x274 [ef02bea0] [c0095818] do_execve+0x19c/0x1d4 [ef02bed0] [c000766c] sys_execve+0x58/0x84 [ef02bef0] [`c000e950`] ret_from_syscall+0x0/0x3c [ef02bfb0] [c009c6fc] sys_dup+0x24/0x6c [ef02bfc0] [c0001e04] init_post+0xb0/0xf0 [ef02bfd0] [c046c1ac] kernel_init+0xcc/0xf4 [ef02bff0] [c000e6d0] kernel_thread+0x4c/0x68 Instruction dump: 4bffffa4 813f000c 4bffffac 9421ffb0 7c0802a6 7d800026 90010054 bf210034 91810030 7c0000a6 68008000 54008ffe <0f000000> 3d20c04e 3b29ffb8 38000008 The issue was the beqlr returns early but we haven't reenabled interrupts. Signed-off-by: Dave Liu <daveliu@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-12-17 10:06:13 -06:00
Ingo Molnar	1f3f424a6b	Merge branch 'linus' into cpus4096	2008-12-17 13:07:48 +01:00
Kay Sievers	aab0d375e0	powerpc: struct device - replace bus_id with dev_name(), dev_set_name() Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:38 +11:00
Nathan Lynch	edc72ac4a0	powerpc/pseries: Check for GIQ indicator before calling set-indicator Since "Factor out cpu joining/unjoining the GIQ" (`b4963255ad`) the WARN_ON in xics_set_cpu_giq() is being triggered during boot on JS20 because the GIQ indicator is not available on that platform. While the warning is harmless and the system runs normally, it's nicer to check for the existence of the indicator before trying to manipulate it. Implement rtas_indicator_present(), which searches the /rtas/rtas-indicators property for the given indicator token, and use this function in xics_set_cpu_giq(). Also use a WARN statement in xics_set_cpu_giq to get better information on failure. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:13 +11:00
Nathan Lynch	13a9801eb6	powerpc: Move smp_hw_index to 32-bit code smp_hw_index isn't used on 64-bit, so move it from smp.c to setup_32.c. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:05 +11:00
Anton Vorontsov	6b82b3e4b5	powerpc: Remove `have_of' global variable The `have_of' variable is a relic from the arch/ppc time, it isn't useful nowadays. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:52:57 +11:00
Paul Mackerras	1e1c568d6c	Merge branch 'merge' into next	2008-12-16 14:38:58 +11:00
Kumar Gala	e5e774d883	powerpc/fsl-booke: Fix problem with _tlbil_va being interrupted An example calling sequence which we did see: copy_user_highpage -> kmap_atomic -> flush_tlb_page -> _tlbil_va We got interrupted after setting up the MAS registers before the tlbwe and the interrupt handler that caused the interrupt also did a kmap_atomic (ide code) and thus on returning from the interrupt the MAS registers no longer contained the proper values. Since we dont save/restore MAS registers for normal interrupts we need to disable interrupts in _tlbil_va to ensure atomicity. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-12-13 17:02:47 -06:00
Rusty Russell	968ea6d80e	Merge ../linux-2.6-x86 Conflicts: arch/x86/kernel/io_apic.c kernel/sched.c kernel/sched_stats.h	2008-12-13 21:55:51 +10:30
Rusty Russell	320ab2b0b1	cpumask: convert struct clock_event_device to cpumask pointers. Impact: change calling convention of existing clock_event APIs struct clock_event_timer's cpumask field gets changed to take pointer, as does the ->broadcast function. Another single-patch change. For safety, we BUG_ON() in clockevents_register_device() if it's not set. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>	2008-12-13 21:20:26 +10:30
Rusty Russell	0de26520c7	cpumask: make irq_set_affinity() take a const struct cpumask Impact: change existing irq_chip API Not much point with gentle transition here: the struct irq_chip's setaffinity method signature needs to change. Fortunately, not widely used code, but hits a few architectures. Note: In irq_select_affinity() I save a temporary in by mangling irq_desc[irq].affinity directly. Ingo, does this break anything? (Folded in fix from KOSAKI Motohiro) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: jeremy@xensource.com Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>	2008-12-13 21:20:26 +10:30
Rusty Russell	98a79d6a50	cpumask: centralize cpu_online_map and cpu_possible_map Impact: cleanup Each SMP arch defines these themselves. Move them to a central location. Twists: 1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a CONFIG_INIT_ALL_POSSIBLE for this rather than break them. 2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'. Those archs simply have phys_cpu_present_map replaced everywhere. 3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky so I just manipulate them both in sync. 4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map' declarations. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Tested-by: Tony Luck <tony.luck@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Travis <travis@sgi.com> Cc: ink@jurassic.park.msu.ru Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: tony.luck@intel.com Cc: takata@linux-m32r.org Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: paulus@samba.org Cc: schwidefsky@de.ibm.com Cc: lethal@linux-sh.org Cc: wli@holomorphy.com Cc: davem@davemloft.net Cc: jdike@addtoit.com Cc: mingo@redhat.com	2008-12-13 21:19:41 +10:30
Ingo Molnar	45ab6b0c76	Merge branch 'sched/core' into cpus4096 Conflicts: include/linux/ftrace.h kernel/sched.c	2008-12-12 13:48:57 +01:00
Grant Likely	640d17d60e	powerpc/virtex5: Fix Virtex5 machine check handling The 440x5 core in the Virtex5 uses the 440A type machine check (ie, they have MCSRR0/MCSRR1). They thus need to call the appropriate fixup function to hook the right variant of the exception. Without this, all machine checks become fatal due to loss of context when entering the exception handler. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2008-12-05 14:34:26 -05:00
Ingo Molnar	970987beb9	Merge branches 'tracing/ftrace', 'tracing/function-graph-tracer' and 'tracing/urgent' into tracing/core	2008-12-05 14:45:22 +01:00
Ingo Molnar	b8307db247	Merge commit 'v2.6.28-rc7' into tracing/core	2008-12-04 09:07:19 +01:00
Kumar Gala	d5b26db2cf	powerpc/85xx: Add support for SMP initialization Added 85xx specifc smp_ops structure. We use ePAPR style boot release and the MPIC for IPIs at this point. Additionally added routines for secondary cpu entry and initializtion. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-12-03 08:19:20 -06:00
Kumar Gala	06b90969a7	powerpc/85xx: minor head_fsl_booke.S cleanup Removed unused branch labels Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-12-03 08:19:19 -06:00
Trent Piepho	b389889535	powerpc: Better setup of boot page TLB entry The initial TLB mapping for the kernel boot didn't set the memory coherent attribute, MAS2[M], in SMP mode. If this code supported booting a secondary processor, which it doesn't yet, but if it did, then when a secondary processor boots, it would probably signal the primary processor by setting a variable called something like __secondary_hold_acknowledge. However, due to the lack of the M bit, the primary processor would not snoop the transaction (even if a transaction were broadcast). If primary CPU's L1 D-cache had a copy, it would not be flushed and the CPU would never see the ack. Which would have resulted in the primary CPU spinning for a long time, perhaps a full second before it gives up, while it would have waited for the ack from the secondary CPU that it wouldn't have been able to see because of the stale cache. The value of MAS2 for the boot page TLB1 entry is a compile time constant, so there is no need to calculate it in powerpc assembly language. Also, from the MPC8572 manual section 6.12.5.3, "Bits that represent offsets within a page are ignored and should be cleared." Existing code didn't clear them, this code does. The same when the page of KERNELBASE is found; we don't need to use asm to mask the lower 12 bits off. In the code that computes the address to rfi from, don't hard code the offset to 24 bytes, but have the assembler figure that out for us. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-12-03 08:19:18 -06:00
Liu Yu	6a800f36ac	powerpc: Add SPE/EFP math emulation for E500v1/v2 processors. This patch add the handlers of SPE/EFP exceptions. The code is used to emulate float point arithmetic, when MSR(SPE) is enabled and receive EFP data interrupt or EFP round interrupt. This patch has no conflict with or dependence on FP math-emu. The code has been tested by TestFloat. Now the code doesn't support SPE/EFP instructions emulation (it won't be called when receive program interrupt), but it could be easily added. Signed-off-by: Liu Yu <yu.liu@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-12-03 08:19:16 -06:00
Sebastien Dugue	6358d6cb32	powerpc/ibmebus: Get rid of the IRQ mapping in ibmebus_free_irq() ibmebus_free_irq() frees the IRQ but does not remove its mapping, which results in stale entries in the map. This fixes it by adding a call to irq_dispose_mapping() in ibmebus_free_irq(). Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:36 +11:00
Julia Lawall	786b32f892	powerpc: Eliminate NULL test and memset after alloc_bootmem As noted by Akinobu Mita in commit `b1fceac2` ("x86: remove unnecessary memset and NULL check after alloc_bootmem()"), alloc_bootmem and related functions never return NULL and always return a zeroed region of memory. Thus a NULL test or memset after calls to these functions is unnecessary. This was fixed using the following semantic patch. (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression E; statement S; @@ E = $alloc_bootmem\\|alloc_bootmem_low\\|alloc_bootmem_pages\\|alloc_bootmem_low_pages\\|alloc_bootmem_node\\|alloc_bootmem_low_pages_node\\|alloc_bootmem_pages_node$(...) ... when != E ( - BUG_ON (E == NULL); \| - if (E == NULL) S ) @@ expression E,E1; @@ E = $alloc_bootmem\\|alloc_bootmem_low\\|alloc_bootmem_pages\\|alloc_bootmem_low_pages\\|alloc_bootmem_node\\|alloc_bootmem_low_pages_node\\|alloc_bootmem_pages_node$(...) ... when != E - memset(E,0,E1); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:36 +11:00
Becky Bruce	15e09c0eca	powerpc: Add sync__for_ to dma_ops We need to swap these out once we start using swiotlb, so add them to dma_ops. Create CONFIG_PPC_NEED_DMA_SYNC_OPS Kconfig option; this is currently enabled automatically if we're CONFIG_NOT_COHERENT_CACHE. In the future, this will also be enabled for builds that need swiotlb. If PPC_NEED_DMA_SYNC_OPS is not defined, the dma_sync__for_ ops compile to nothing. Otherwise, they access the dma_ops pointers for the sync ops. This patch also changes dma_sync_single_range_* to actually sync the range - previously it was using a generous dma_sync_single. dma_sync_single_* is now implemented as a dma_sync_single_range with an offset of 0. Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:36 +11:00
Johannes Berg	c4d04be11f	powerpc: Allow the max stack trace depth to be configured On my screen, when something crashes, I only have space for maybe 16 functions of the stack trace before the information above it scrolls off the screen. It's easy to hack the kernel to print out only that much, but it's harder to remember to do it. This introduces a config option for it so that I can keep the setting in my config. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:35 +11:00
Kumar Gala	1b98326b91	powerpc: Add MSR[CE, DE] to the MSR bits we print on show_regs() Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:35 +11:00
Paul Mackerras	5274918855	Merge branch 'merge'	2008-12-03 20:11:06 +11:00
Benjamin Herrenschmidt	2434bbb30e	powerpc: Fix dma_map_sg() cache flushing on non coherent platforms On PowerPC 4xx or other non cache-coherent platforms, we lost the appropriate cache flushing in dma_map_sg() when merging the 32 and 64-bit DMA code (commit `4fc665b88a`, "powerpc: Merge 32 and 64-bit dma code"). This restores it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 18:24:08 +11:00
Milton Miller	a1e0eb1042	powerpc: Fix build for 32-bit SMP configs attr_smt_snooze_delay is only defined for CONFIG_PPC64, so protect the attribute removal with the same condition. This fixes this build error on 32-bit SMP configurations: /data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c: In function ‘unregister_cpu_online’: /data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: ‘attr_smt_snooze_delay’ undeclared (first use in this function) /data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: (Each undeclared identifier is reported only once /data/home/miltonm/next.git/arch/powerpc/kernel/sysfs.c:722: error: for each function it appears in.) Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-01 13:28:19 +11:00
Paul Mackerras	ab598b6680	powerpc: Fix system calls on Cell entered with XER.SO=1 It turns out that on Cell, on a kernel with CONFIG_VIRT_CPU_ACCOUNTING = y, if a program sets the SO (summary overflow) bit in the XER and then does a system call, the SO bit in CR0 will be set on return regardless of whether the system call detected an error. Since CR0.SO is used as the error indication from the system call, this means that all system calls appear to fail. The reason is that the workaround for the timebase bug on Cell uses a compare instruction. With CONFIG_VIRT_CPU_ACCOUNTING = y, the ACCOUNT_CPU_USER_ENTRY macro reads the timebase, so we end up doing a compare instruction, which copies XER.SO to CR0.SO. Since we were doing this in the system call entry patch after clearing CR0.SO but before saving the CR, this meant that the saved CR image had CR0.SO set if XER.SO was set on entry. This fixes it by moving the clearing of CR0.SO to after the ACCOUNT_CPU_USER_ENTRY call in the system call entry path. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-12-01 09:40:19 +11:00
Adhemerval Zanella	4b824de9b1	powerpc: Fix IRQ assignment for some PCIe devices Currently, some PCIe devices on POWER6 machines do not get interrupts assigned correctly. The problem is that OF doesn't create an "interrupt" property for them. The fix is for of_irq_map_pci to fall back to using the value in the PCI interrupt-pin register in config space, as we do when there is no OF device-tree node for the device. I have verified that this works fine with a pair of Squib-E SAS adapter on a P6-570. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-01 09:40:18 +11:00
Steven Rostedt	f1eecf0e4f	powerpc/ppc32: static ftrace fixes for PPC32 Impact: fix for PowerPC 32 code There were some early init code that was not safe for static ftrace to boot on my PowerBook. This code must only use relative addressing, and static mcount performs a compare of the ftrace_trace_function pointer, and gets that with an absolute address. In the early init boot up code, this will cause a fault. This patch removes tracing from the files containing the offending functions. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 14:08:07 +01:00
Steven Rostedt	0029ff8752	powerpc: ftrace, use create_branch Impact: clean up Paul Mackerras pointed out that the code to determine if the branch can reach the destination is incorrect. Michael Ellerman suggested to pull out the code from create_branch and use that. Simply using create_branch is probably the best. Reported-by: Michael Ellerman <michael@ellerman.id.au> Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 14:08:01 +01:00
Steven Rostedt	ec682cef2d	powerpc: ftrace, added missing icache flush Impact: fix to PowerPC code modification After modifying code it is essential to flush the icache. This patch adds the missing flush. Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 14:07:56 +01:00
Steven Rostedt	d9af12b72b	powerpc: ftrace, fix cast aliasing and add code verification Impact: clean up and robustness addition This patch addresses the comments made by Paul Mackerras. It removes the type casting between unsigned int and unsigned char pointers, and replaces them with a use of all unsigned int. Verification that the jump is indeed made to a trampoline has also been added. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 14:07:50 +01:00
Steven Rostedt	c7b0d17366	powerpc: ftrace, do nothing in mcount call for dyn ftrace Impact: quicken mcount calls that are not replaced by dyn ftrace Dynamic ftrace no longer does on the fly recording of mcount locations. The mcount locations are now found at compile time. The mcount function no longer needs to store registers and call a stub function. It can now just simply return. Since there are some functions that do not get converted to a nop (.init sections and other code that may disappear), this patch should help speed up that code. Also, the stub for mcount on PowerPC 32 can not be a simple branch link register like it is on PowerPC 64. According to the ABI specification: "The _mcount routine is required to restore the link register from the stack so that the profiling code can be inserted transparently, whether or not the profiled function saves the link register itself." This means that we must restore the link register that was used to make the call to mcount. The minimal mcount function for PPC32 ends up being: mcount: mflr r0 mtctr r0 lwz r0, 4(r1) mtlr r0 bctr Where we move the link register used to call mcount into the ctr register, and then restore the link register from the stack. Then we use the ctr register to jump back to the mcount caller. The r0 register is free for us to use. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-11-28 14:07:45 +01:00
Steven Rostedt	7cc45e6432	powerpc/ppc32: ftrace, dynamic ftrace to handle modules Impact: add ability to trace modules on 32 bit PowerPC This patch performs the necessary trampoline calls to handle modules with dynamic ftrace on 32 bit PowerPC. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2008-11-20 10:52:53 -08:00
Steven Rostedt	f48cb8b48b	powerpc/ppc64: ftrace, handle module trampolines for dyn ftrace Impact: Allow 64 bit PowerPC to trace modules with dynamic ftrace This adds code to handle the PPC64 module trampolines, and allows for PPC64 to use dynamic ftrace. Thanks to Paul Mackerras for these updates: - fix the mod and rec->arch.mod NULL checks. - fix to is_bl_op compare. Thanks to Milton Miller for: - finding the nasty race with using two nops, and recommending instead that I use a branch 8 forward. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2008-11-20 10:52:28 -08:00
Steven Rostedt	e4486fe316	powerpc: ftrace, use probe_kernel API to modify code Impact: use cleaner probe_kernel API over assembly Using probe_kernel_read/write interface is a much cleaner approach than the current assembly version. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2008-11-20 10:52:04 -08:00
Steven Rostedt	8fd6e5a8c8	powerpc: ftrace, convert to new dynamic ftrace arch API Impact: update to PowerPC ftrace arch API This patch converts PowerPC to use the new dynamic ftrace arch API. Thanks to Paul Mackennas for pointing out the mistakes of my original test_24bit_addr function. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2008-11-20 10:51:40 -08:00
Steven Rostedt	6d07bb4735	powerpc: ftrace, do not latency trace idle Impact: fix for irq off latency tracer When idle is called, interrupts are disabled, but the idle function will still wake up on an interrupt. The problem is that the interrupt disabled latency tracer will take this call to idle as a latency. This patch disables the latency tracing when going into idle. Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2008-11-20 10:51:15 -08:00
Milton Miller	25ddd738c2	powerpc: Provide a separate handler for each IPI action With the new generic smp call function helpers, I noticed the code in smp_message_recv was a single function call in many cases. While getting the message number from the ipi data is easy, we can reduce the path length by a function and data-dependent switch by registering seperate IPI actions for these simple calls. Originally I left the ipi action array exposed, but then I realized the registration code should be common too. The three users each had their own name array, so I made a fourth to convert all users to use a common one. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-19 16:05:06 +11:00
Michael Ellerman	54018178ef	powerpc: Use for_each_node_with_property() in of_irq_map_init() Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-19 16:05:01 +11:00
Benjamin Herrenschmidt	6612d9b0b8	powerpc/44x: Fix 460EX/460GT machine check handling Those cores use the 440A type machine check (ie, they have MCSRR0/MCSRR1). They thus need to call the appropriate fixup function to hook the right variant of the exception. Without this, all machine checks become fatal due to loss of context when entering the exception handler. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2008-11-13 10:11:26 -05:00
Paul Mackerras	486936cd93	Merge branch 'linux-2.6' into next	2008-11-12 08:43:22 +11:00
Andreas Schwab	77eb50aefa	powerpc: Fix msr check in compat_sys_swapcontext The new context may not be 16-byte aligned, so the real address of the mcontext structure should be read from the uc_regs pointer instead of directly using the (unaligned) uc_mcontext field. Signed-off-by: Andreas Schwab <schwab@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-11 19:42:22 +11:00
Kumar Gala	b41d6fee37	powerpc/fsl-booke: Fix synchronization bug w/local tlb invalidates The implemetation of _tlbil_pid() on Freescale Book-E cores needs an msync & isync after we flash invalidate the TLBs. This was causing the following oops reported by Sebastian Andrzej Siewior: VFS: Mounted root (nfs filesystem) readonly. Freeing unused kernel memory: 148k init BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/mm/mmap.c:234 in_atomic():1, irqs_disabled():0 Call Trace: [df189df0] [c0007160] show_stack+0x48/0x148 (unreliable) [df189e30] [c0029480] __might_sleep+0xf0/0x100 [df189e40] [c0070ac0] remove_vma+0x28/0x98 [df189e50] [c0070c1c] exit_mmap+0xec/0x128 [df189e80] [c002d2f4] mmput+0x54/0xec [df189ea0] [c0030b6c] exit_mm+0x10c/0x120 [df189ed0] [c003288c] do_exit+0x1ac/0x6e8 [df189f20] [c0032e48] do_group_exit+0x80/0xac [df189f40] [c000e9dc] ret_from_syscall+0x0/0x3c BUG: scheduling while atomic: udevd/956/0x10000002 Modules linked in: Call Trace: [df189df0] [c0007160] show_stack+0x48/0x148 (unreliable) [df189e30] [c002ac88] __schedule_bug+0x58/0x6c [df189e40] [c023e6cc] schedule+0xa8/0x4a8 [df189e90] [c002ad6c] __cond_resched+0x38/0x64 [df189ea0] [c023ebc8] _cond_resched+0x3c/0x58 [df189eb0] [c0030e70] put_files_struct+0x90/0xec [df189ed0] [c00328a8] do_exit+0x1c8/0x6e8 [df189f20] [c0032e48] do_group_exit+0x80/0xac [df189f40] [c000e9dc] ret_from_syscall+0x0/0x3c Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-11-08 12:38:55 -06:00
Paul Mackerras	3cc698789a	powerpc: Eliminate unused do_gtod variable Since we started using the generic timekeeping code, we haven't had a powerpc-specific version of do_gettimeofday, and hence there is now nothing that reads the do_gtod variable in arch/powerpc/kernel/time.c. This therefore removes it and the code that sets it. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-06 09:49:28 +11:00
Paul Mackerras	597bc5c00b	powerpc: Improve resolution of VDSO clock_gettime Currently the clock_gettime implementation in the VDSO produces a result with microsecond resolution for the cases that are handled without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC. The nanoseconds field of the result is obtained by computing a microseconds value and multiplying by 1000. This changes the code in the VDSO to do the computation for clock_gettime with nanosecond resolution. That means that the resolution of the result will ultimately depend on the timebase frequency. Because the timestamp in the VDSO datapage (stamp_xsec, the real time corresponding to the timebase count in tb_orig_stamp) is in units of 2^-20 seconds, it doesn't have sufficient resolution for computing a result with nanosecond resolution. Therefore this adds a copy of xtime to the VDSO datapage and updates it in update_gtod() along with the other time-related fields. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-06 09:49:22 +11:00
Benjamin Herrenschmidt	7eef440a54	powerpc/pci: Cosmetic cleanups of pci-common.c This does a few cosmetic cleanups, moving a couple of things around but without actually changing what the code does. (There is a minor change in ordering of operations in pcibios_setup_bus_devices but it should have no impact). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-06 09:41:52 +11:00
Benjamin Herrenschmidt	fd6852c8fa	powerpc/pci: Fix various pseries PCI hotplug issues The pseries PCI hotplug code has a number of issues, ranging from incorrect resource setup to crashes, depending on what is added, when, whether it contains a bridge, etc etc.... This fixes a whole bunch of these, while actually simplifying the code a bit, using more generic code in the process and factoring out common code between adding of a PHB, a slot or a device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-06 09:31:52 +11:00
Benjamin Herrenschmidt	b5ae5f911d	powerpc/pci: Make pcibios_allocate_bus_resources more robust To properly fix PCI hotplug, it's useful to be able to make the fixup passes on all devices whether they were just hot plugged or already there. However, pcibios_allocate_bus_resources() wouldn't cope well with being called twice for a given bus. This makes it ignore resources that have already been allocated, along with adding a bit of debug output. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-06 09:26:05 +11:00
Benjamin Herrenschmidt	8b8da35804	powerpc/pci: Split pcibios_fixup_bus() into bus setup and device setup Currently, our PCI code uses the pcibios_fixup_bus() callback, which is called by the generic code when probing PCI buses, for two different things. One is to set up things related to the bus itself, such as reading bridge resources for P2P bridges, fixing them up, or setting up the iommu's associated with bridges on some platforms. The other is some setup for each individual device under that bridge, mostly setting up DMA mappings and interrupts. The problem is that this approach doesn't work well with PCI hotplug when an existing bus is re-probed for new children. We fix this problem by splitting pcibios_fixup_bus into two routines: pcibios_setup_bus_self() is now called to setup the bus itself pcibios_setup_bus_devices() is now called to setup devices pcibios_fixup_bus() is then modified to call these two after reading the bridge bases, and the OF based PCI probe is modified to avoid calling into the first one when rescanning an existing bridge. [paulus@samba.org - fixed eeh.h for 32-bit compile now that pci-common.c is including it unconditionally.] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-06 09:22:37 +11:00
Benjamin Herrenschmidt	ab56ced9c5	powerpc/pci: Remove pcibios_do_bus_setup() The function pcibios_do_bus_setup() was used by pcibios_fixup_bus() to perform setup that is different between the 32-bit and 64-bit code. This difference no longer exists, thus the function is removed and the setup now done directly from pci-common.c. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:11:53 +11:00
Benjamin Herrenschmidt	5328032335	powerpc/pci: Use common PHB resource hookup The 32-bit and 64-bit powerpc PCI code used to set up the resource pointers of the root bus of a given PHB in completely different places. This unifies this in large part, by making 32-bit use a routine very similar to what 64-bit does when initially scanning the PCI busses. The actual setup of the PHB resources itself is then moved to a common function in pci-common.c. This should cause no functional change on 64-bit. On 32-bit, the effect is that the PHB resources are going to be setup a bit earlier, instead of being setup from pcibios_fixup_bus(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:11:53 +11:00
Benjamin Herrenschmidt	b0494bc8ee	powerpc/pci: Cleanup debug printk's This removes the various DBG() macro from the powerpc PCI code and makes it use the standard pr_debug instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:11:53 +11:00
Brian King	409001948d	powerpc: Update page-in counter for CMM A new field has been added to the VPA as a method for the client OS to communicate to firmware the number of page-ins it is performing when running collaborative memory overcommit. The hypervisor will use this information to better determine if a partition is experiencing memory pressure and needs more memory allocated to it. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:08:28 +11:00
Benjamin Herrenschmidt	a6a8e009b1	powerpc: Silence software timebase sync When no hardware method is provided to sync the timebase registers across the machine, and the platform doesn't sync them for us, then we use a generic software implementation. Currently, the code for that has many printks, and they don't have log levels. Most of the printks are only useful for debugging the code, and since we haven't had any problems with it for years, this turns them into pr_debug. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:08:28 +11:00
Benjamin Herrenschmidt	1fd0f52583	powerpc: Fix domain numbers in /proc on 64-bit The code to properly expose domain numbers in /proc is somewhat bogus on ppc64 as it depends on the "buid" field being non-0, but that field is really pseries specific. This removes that code and makes ppc64 use the same code as 32-bit which effectively decides whether to expose domains based on ppc_pci_flags set by the platform, and sets the default for 64-bit to enable domains and enable compatibility for domain 0 (which strips the domain number for domain 0 to help with X servers). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:08:27 +11:00
Linus Torvalds	f891caf28f	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (23 commits) Revert "powerpc: Sync RPA note in zImage with kernel's RPA note" powerpc: Fix compile errors with CONFIG_BUG=n powerpc: Fix format string warning in arch/powerpc/boot/main.c powerpc: Fix bug in kernel copy of libfdt's fdt_subnode_offset_namelen() powerpc: Remove duplicate DMA entry from mpc8313erdb device tree powerpc/cell/OProfile: Fix on-stack array size in activate spu profiling function powerpc/mpic: Fix regression caused by change of default IRQ affinity powerpc: Update remaining dma_mapping_ops to use map/unmap_page powerpc/pci: Fix unmapping of IO space on 64-bit powerpc/pci: Properly allocate bus resources for hotplug PHBs OF-device: Don't overwrite numa_node in device registration powerpc: Fix swapcontext system for VSX + old ucontext size powerpc: Fix compiler warning for the relocatable kernel powerpc: Work around ld bug in older binutils powerpc/ppc64/kdump: Better flag for running relocatable powerpc: Use is_kdump_kernel() powerpc: Kexec exit should not use magic numbers powerpc/44x: Update 44x defconfigs powerpc/40x: Update 40x defconfigs powerpc: enable heap randomization for linkstations ...	2008-10-31 08:14:15 -07:00
Paul Mackerras	5663a1232b	Revert "powerpc: Sync RPA note in zImage with kernel's RPA note" This reverts commit `91a0030295`, plus commit `0dcd440120` ("powerpc: Revert CHRP boot wrapper to real-base = 12MB on 32-bit") which depended on it. Commit `91a00302` was causing NVRAM corruption on some pSeries machines, for as-yet unknown reasons, so this reverts it until the cause is identified. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 22:36:21 +11:00
Mark Nelson	f9226d572d	powerpc: Update remaining dma_mapping_ops to use map/unmap_page After the merge of the 32 and 64bit DMA code, dma_direct_ops lost their map/unmap_single() functions but gained map/unmap_page(). This caused a problem for Cell because Cell's dma_iommu_fixed_ops called the dma_direct_ops if the fixed linear mapping was to be used or the iommu ops if the dynamic window was to be used. So in order to fix this problem we need to update the 64bit DMA code to use map/unmap_page. First, we update the generic IOMMU code so that iommu_map_single() becomes iommu_map_page() and iommu_unmap_single() becomes iommu_unmap_page(). Then we propagate these changes up through all the callers of these two functions and in the process update all the dma_mapping_ops so that they have map/unmap_page rahter than map/unmap_single. We can do this because on 64bit there is no HIGHMEM memory so map/unmap_page ends up performing exactly the same function as map/unmap_single, just taking different arguments. This has no affect on drivers because the dma_map_single_attrs() just ends up calling the map_page() function of the appropriate dma_mapping_ops and similarly the dma_unmap_single_attrs() calls unmap_page(). This fixes an oops on Cell blades, which oops on boot without this because they call dma_direct_ops.map_single, which is NULL. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:13:48 +11:00
Benjamin Herrenschmidt	b30115ea8f	powerpc/pci: Fix unmapping of IO space on 64-bit A typo/thinko made us pass the wrong argument to __flush_hash_table_range when unplugging bridges, thus not flushing all the translations for the IO space on unplug. The third parameter to __flush_hash_table_range is `end', not `size'. This causes the hypervisor to refuse unplugging slots. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:13:46 +11:00
Nathan Fontenot	e90a131846	powerpc/pci: Properly allocate bus resources for hotplug PHBs Resources for PHB's that are dynamically added to a system are not properly allocated in the resource tree. Not having these resources allocated causes an oops when removing the PHB when we try to release them. The diff appears a bit messy, this is mainly due to moving everything one tab to the left in the pcibios_allocate_bus_resources routine. The functionality change in this routine is only that the list_for_each_entry() loop is pulled out and moved to the necessary calling routine. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:12:03 +11:00
Jeremy Kerr	6098e2ee14	OF-device: Don't overwrite numa_node in device registration Currently, the numa_node of OF-devices will be overwritten during device_register, which simply sets the node to -1. On cell machines, this means that devices can't find their IOMMU, which is referenced through the device's numa node. Set the numa node for OF devices with no parent, and use the lower-level device_initialize and device_add functions, so that the node is preserved. We can remove the call to set_dev_node in of_device_alloc, as it will be overwritten during register. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:12:01 +11:00
Michael Neuling	16c29d180b	powerpc: Fix swapcontext system for VSX + old ucontext size Since VSX support was added, we now have two sizes of ucontext_t; the older, smaller size without the extra VSX state, and the new larger size with the extra VSX state. A program using the sys_swapcontext system call and supplying smaller ucontext_t structures will currently get an EINVAL error if the task has used VSX (e.g. because of calling library code that uses VSX) and the old_ctx argument is non-NULL (i.e. the program is asking for its current context to be saved). Thus the program will start getting EINVAL errors on calls that previously worked. This commit changes this behaviour so that we don't send an EINVAL in this case. It will now return the smaller context but the VSX MSR bit will always be cleared to indicate that the ucontext_t doesn't include the extra VSX state, even if the task has executed VSX instructions. Both 32 and 64 bit cases are updated. [paulus@samba.org - also fix some access_ok() and get_user() calls] Thanks to Ben Herrenschmidt for noticing this problem. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:12:00 +11:00
Michael Neuling	b160544ccc	powerpc: Fix compiler warning for the relocatable kernel Fixes this warning: arch/powerpc/kernel/setup_64.c:447:5: warning: "kernstart_addr" is not defined which arises because PHYSICAL_START is no longer a constant when CONFIG_RELOCATABLE=y. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:11:54 +11:00
Paul Mackerras	2a4b9c5af8	powerpc: Work around ld bug in older binutils Commit `549e8152de` ("powerpc: Make the 64-bit kernel as a position-independent executable") added lines to vmlinux.lds.S to add the extra sections needed to implement a relocatable kernel. However, those lines seem to trigger a bug in older versions of GNU ld (such as 2.16.1) when building a non-relocatable kernel. Since ld 2.16.1 is still a popular choice for cross-toolchains, this adds an #ifdef to vmlinux.lds.S so the added lines are only included when building a relocatable kernel. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:11:52 +11:00
Milton Miller	8b8b0cc1c7	powerpc/ppc64/kdump: Better flag for running relocatable The __kdump_flag ABI is overly constraining for future development. As of 2.6.27, the kernel entry point has 4 constraints: Offset 0 is the starting point for the master (boot) cpu (entered with r3 pointing to the device tree structure), offset 0x60 is code for the slave cpus (entered with r3 set to their device tree physical id), offset 0x20 is used by the iseries hypervisor, and secondary cpus must be well behaved when the first 256 bytes are copied to address 0. Placing the __kdump_flag at 0x18 is bad because: - It was taking the last 8 bytes before the iseries hypervisor data. - It was 8 bytes for a boolean flag - It had no way of identifying that the flag was present - It does leave any room for the master to add any additional code before branching, which hurts debug. - It will be unnecessarily hard for 32 bit code to be common (8 bytes) Now that we have eliminated the use of __kdump_flag in favor of the standard is_kdump_kernel(), this flag only controls run without relocating the kernel to PHYSICAL_START (0), so rename it __run_at_load. Move the flag to 0x5c, 1 word before the secondary cpu entry point at 0x60. Initialize it with "run0" to say it will run at 0 unless it is set to 1. It only exists if we are relocatable. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:11:49 +11:00
Milton Miller	62a8bd6c92	powerpc: Use is_kdump_kernel() linux/crash_dump.h defines is_kdump_kernel() to be used by code that needs to know if the previous kernel crashed instead of a (clean) boot or reboot. This updates the just added powerpc code to use it. This is needed for the next commit, which will remove __kdump_flag. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:11:47 +11:00
Milton Miller	1767c8f392	powerpc: Kexec exit should not use magic numbers Commit `54622f10a6` ("powerpc: Support for relocatable kdump kernel") added a magic flag value in a register to tell purgatory that it should be a panic kernel. This part is wrong and is reverted by this commit. The kernel gets a list of memory blocks and a entry point from user space. Its job is to copy the blocks into place and then branch to the designated entry point (after turning "off" the mmu). The user space tool inserts a trampoline, called purgatory, that runs before the user supplied code. Its job is to establish the entry environment for the new kernel or other application based on the contents of memory. The purgatory code is compiled and embedded in the tool, where it is later patched using the elf symbol table using elf symbols. Since the tool knows it is creating a purgatory that will run after a kernel crash, it should just patch purgatory (or the kernel directly) if something needs to happen. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-10-31 16:11:44 +11:00

... 3 4 5 6 7 ...

2179 Коммитов