This patch adds a cpufreq governor that takes the number of running spus
into account. It's very similar to the ondemand governor, but not as complex.
Instead of hacking spu load into the ondemand governor it might be easier to
have cpufreq accepting multiple governors per cpu in future.
Don't know if this is the right way, but it would keep the governors simple.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It is okay for both _PAGE_GUARDED and _PAGE_COHERENT (G and M) to be set
in the same pte. In fact, even if that were not the case, there doesn't
seem to be any place where G is set without also setting I (_PAGE_NO_CACHE),
so the test for I is sufficient as a condition to clear _PAGE_COHERENT
when filling the hash table.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make cell_dma_dev_setup_iommu() return a pointer to the struct iommu_table
(or NULL if no table can be found) rather than putting this pointer into
dev->archdata.dma_data (let the caller do that), and rename this function
to cell_get_iommu_table() to reflect this change.
This will allow us to get the iommu table for a device that doesn't have
the table in the archdata.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As nr_active counter includes also spus waiting for syscalls to return
we need a seperate counter that only counts spus that are currently running
on spu side. This counter shall be used by a cpufreq governor that targets
a frequency dependent from the number of running spus.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, the .ctx debug file in spu context directories is always
present.
We'd prefer to prevent users from relying on this file, so add a
"debug" mount option to spufs. The .ctx file will only be added to
the context directories when this option is present.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Populate the size member of a few context files. Leave out files that
have different semantics with read vs mmap, or contain a
variable-length hex string.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, spufs never specifies the i_size for the files in context
directories, so stat() always reports 0-byte files.
This change adds allows the spufs_dir_(nosched_)contents arrays to
specify a file size. This allows stat() to report correct file sizes,
and makes SEEK_END work.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
An spu context shouldn't get an extra tick if the time slice code
couldn't find something else to run. This means contexts that are not
within spu_run (ie, SPU_SCHED_SPU_RUN is cleared) will not receive
extra ticks while we have no other contexts waiting.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Add a ctxt file to spufs that shows spu context information that is used
in scheduling. This info can be used for debugging spufs scheduler
issues, and to isolate between application and spufs problems as it
shows a lot of state such as priorities and dispatch counts.
This file contains internal spufs state and is subject to change at any
time, and therefore no applications should depend on it. The file is
intended for the use of spufs kernel developers.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We need to disable ptcal before starting a new kernel after a crash,
in order to avoid overwriting data in the kdump kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This converts ppc to use the new helpers for smp_call_function() and
friends, and adds support for smp_call_function_single().
ppc loses the timeout functionality of smp_call_function_mask() with
this change, as the generic code does not provide that.
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There is a delay in the transition to the stopped state for class 2
interrupts. In some cases, the controlling thread detects the state of
the spu as running, and goes back to sleep resulting in a hung
application as the event is missed.
This change detects the stop condition and re-generates the wakeup event
after a context save.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Time slicing can occur at the same time as spu exception handling
resulting in the wakeup of the wrong thread.
This change uses the the spu's register_lock to enforce synchronization
between bind/unbind and spu exception handling so that they are
mutually exclusive.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
According to the CBEA, the SPU dsisr is not updated for class 0
exceptions.
spu_stopped() is testing the dsisr that was passed to it from the class
0 exception handler, so we return a false positive here.
This patch cleans up the interrupt handler and erroneous tests in
spu_stopped. It also removes the fields from the csa since it is not
needed to process class 0 events.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
If the spu is stopping (ie, the SPU_STATUS_RUNNING bit is still set),
re-read the register to get the final stopped value.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When I changed irq_alloc_host() to take an of_node
(52964f87c64e6c6ea671b5bf3030fb1494090a48: "Add an optional
device_node pointer to the irq_host"), I botched the reference
counting semantics.
Stephen pointed out that it's irq_alloc_host()'s business if
it needs to take an additional reference to the device_node,
the caller shouldn't need to care.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If we do the call to irq_of_parse_and_map() first, then we don't
need to worry about freeing the irq_host.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds some debugging code to the Axon MSI driver. It creates a
file per MSIC in /sys/kernel/debug/powerpc, which allows the user to
trigger a fake MSI interrupt by writing to the file.
This can be used to test some of the MSI generation path. In
particular, that the MSIC recognises a write to the MSI address,
generates an interrupt and writes the MSI packet into the ring buffer.
All the code is inside #ifdef DEBUG so it causes no harm unless it's
enabled.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix following warnings:
WARNING: arch/powerpc/platforms/cell/built-in.o(.devinit.text+0x9c): Section mismatch in reference from the function .cell_setup_phb() to the function .init.text:.iowa_register_bus()
WARNING: arch/powerpc/platforms/cell/built-in.o(.devinit.text+0xa4): Section mismatch in reference from the function .cell_setup_phb() to the function .init.text:.io_workaround_init()
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With CONFIG_VIRT_CPU_ACCOUNTING disabled, I got the following error:
linux-2.6/arch/powerpc/platforms/cell/spufs/file.c: In function 'spu_switch_log_notify':
linux-2.6/arch/powerpc/platforms/cell/spufs/file.c:2542: error: implicit declaration of function 'get_tb'
make[4]: *** [arch/powerpc/platforms/cell/spufs/file.o] Error 1
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If victim (not ctx) is in spu_run, add victim to rq.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to acquire the parent i_mutex with I_MUTEX_PARENT to keep
lockdep happy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We should not requeue the victim context in find_victim if the owner is
not in spu_run. It's first not needed because leaving the context on
the spu is an optimization and second is harmful because it means the
owner could re-enter spu_run when the context is on the runqueue and
trip the BUG_ON in __spu_update_sched_info.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Creating a spufs context or gand using spu_create should send an inotify
event so that things like performance monitors have an easy way to find
out about newly created contexts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, page fault handlers don't issue a mfc restart if the context
switch pending flag is set, which can leave us with a hanging DMA after
a context restore.
This patch introduces fault pending flag that is set by the fault
handler and read by the context switch code, so that the latter can add
the restart bit at the right spot, after it has successfuly saved the
state of the mfc control register.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
SPU class 0 & 1 exceptions may occur in parallel, so we may end up
overwriting csa.dsisr.
This change adds dedicated fields for each class to the spu and the spu
context so that fault data is not overwritten.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we re-route SPU interrupts to the current cpu, which may be
on a remote node. In the case of time slicing, all spu interrupts will
end up routed to the same cpu, where the spusched_tick occurs.
This change routes mfc interrupts to the cpu where the controlling
thread last ran, provided that cpu is on the same node as the spu
(otherwise don't reroute interrupts).
This should improve performance and provide a more predictable
environment for processing spu exceptions. In the past we have seen
concurrent delivery of spu exceptions to two cpus. This eliminates that
concern.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
synchronize_irq() provides the serialization for
SPU_CONTEXT_SWITCH_PENDING which is read with a simple load. This
routine guarantees that the relevant interrupt handlers are not running,
so that the next time they do run they will see the update
memory value.
This must be done correctly so that exception handling code does not
restart the mfc in the middle of a context switch while we are trying
to atomically stop it and save state.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There's currently no way to tell if spu_process_callback has
returned with the state mutex held, as -EINTR may be returned
by either the syscall or the spu_acquire fail case.
Instead, just do a non-interruptible mutex_lock here.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we update the SPU master run control bit (ie,
spu_enable_spu) in spufs_run_spu before we grab the context mutex. This
can result in races with other processes accessing this context's
resources.
This change moves the spu_enable_spu to after we have acquired the
context lock.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We currently have two issues with the MFC save code:
* save_mfc_decr doesn't handle a transition of 1 -> 0 of the Ds bit
* The Q bit may be stale in the CSA
This change fixes the first issue by clearing the relevant bits from
the MFC_CNTL value in the CSA before or-ing in the updated status.
Also, we add the Q bit to the updated status.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we can introduce invalid entries into the MFC queues:
1) context starts a DMA
2) context gets scheduled out during a DMA
- kernel saves MFC queue to CSA
- kernel saves 0x0 in csa->mfc_control_RW
3) context gets scheduled in
- csa->mfc_control[Q] ('queues empty') isn't set, so DMA queues are
restored from the CSA
4) context's DMA is completed
5) context gets scheduled out again, no DMA occuring this time
- kernel sees that MFC_CNTL[Q] ('queues empty') is set, so doesn't
touch saved queue data in CSA
- kernel saves 0x0 in csa->mfc_control_RW
6) context gets scheduled in
- csa->mfc_control[Q] ('queues empty') isn't set (we saved is as 0!),
so DMA queues are restored from the CSA
In this last restore, we've restored the queue status from step 2,
which are now invalid.
This change makes save_mfc_cntl() closer to the save/restore sequence,
as specified in the CBE handbook.
With changes from Luke Browning.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When we issue a MFC purge request, we may inadvertantly clear the
suspended status.
This change adds the MFC_CNTL_SUSPEND_MASK when we issue a purge
request, so that the suspend bit is masked out.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We may currently lose interrupts during SPE context switch, as we alter
the INT_Route register. Because the IIC uses a per-thread priority
status, changing the interrupt routing to a different thread means that
the IRQ is no longer masked by the priority status, so we end up with
two fasteoi IRQ handlers executing for the one irq_desc. The fasteoi
handler doesn't handle multiple IRQs, so drops the second one.
Fix this by using our own flow handler. This is based on
handle_edge_irq, but issues an eoi after IRQs are handled, and doesn't
do any mask/unmasking.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The sputrace module contained a trace entry for spu_acquire_saved, but
this marker was not placed anywhere. Fix this by adding a marker to the
routine.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Fix a typo in the marker for the find_victim function, which prevented
it from being traced. It previously read find_vitim.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The sputrace module contained a reference to a marker for
destroy_spu_context, but this marker did not appear in the code. Fix
this by adding a marker in the function.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The markers facility defines the marker parameters to be of the form
'name %format'. Add parameter names to sputrace, to specify the context
and %spu paramerters, instead of just specifying the '%format' part.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There are userspace instrumentation tools that need to monitor spu
context switches. This patch adds a new file called 'switch_log' to
each spufs context directory that can be used to monitor the context
switches.
Context switch in/out and exit from spu_run are monitored after the
file was first opened and can be read from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Add correct ->owner to proc_fops to fix reading/module unloading race.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for PCI Express port on Celleb. I/O space of this
PCI Express port is not mapped in memory space. So we use the
io-workaround mechanism to make accesses indirect.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves miscellaneous files for Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves SPU support code on Beat into platforms/cell/.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for mmu and iommu on Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for Beat hvcall interfaces into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the SCC (Super Companion Chip) related code for celleb
into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the base code for celleb support into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now, we can use generic io-workarounds mechanism and the workaround
code for spider-pci. This changes Celleb PCI code to use spider-pci
code.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits cell io-workaround code into spider-pci dependent code and
a generic part, and also moves io-workarounds initialization into
cell_setup_phb.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace two open-coded occurences of the of_get_next_parent() logic.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, ppu-gdb can't trace spu infomation with coredump generated
by the kernel. While the core dumps notes have correct contents, they
have the wrong names, as the file descriptors used to generate the note
names are off-by-one. An application that opens a SPE context as fd 3,
the current core dump code will generate notes like:
SPU/4/mem
SPU/4/regs
etc.
This confuses GDB, which knows it is looking for SPE context 3 (from
parsing the spu_context_run system call arguments), and cannot find
any notes that match context 3.
This change corrects the file descriptor counting, to only increment
the fd until after we've written the note name.
Signed-off-by: Gerhard Stenzel <stenzel@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
During the context save process, we currently save the MFC command
channel after purging the MFC queues. This causes a systemsim warning,
as the command channel may be in an unknown state after the purge.
This change does the save before purging the MFC queues.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
During spu_process callback, we release then acquire the SPU, but keep a
pointer to the local store memory. Since the context may have been
scheduled out during the callback, the ls pointer may become invalid.
This change reacquires the pointer to the context local store after
spu_acquire()-ing, so that it isn't invalidated by a context switch.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
All of the single-value files in spufs are terminated by a newline,
except for signal1_type and signal2_type.
This change adds a trailing newline to these two files.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The PCI bridge representing the PCIE root complex on Axon, contains
device BARs for a memory range and ROM that define inbound accesses.
This confuses the kernel resource management code -- the resources
need to be hidden when Axon is a host bridge.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The cell IOMMU code to parse the dma-ranges properties, used for the fixed
mapping, was broken in two ways for some devices.
Firstly it didn't cope with empty dma-ranges properties. An empty property
implies no translation so can be safely skipped.
The code also wrongly assumed it would be looking at PCI devices, and hard
coded the number of address and size cells.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, we can hit the BUG_ON in __spu_update_sched_info by reading
the regs file of a context between two calls to spu_run. The
spu_release_saved called by spufs_regs_read() is resulting in the (now
non-runnable) context being placed back on the run queue, so the next
call to spu_run ends up in the bug condition.
This change uses the SPU_SCHED_SPU_RUN flag to only reschedule a context
if it's still in spu_run().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
commit 4ef11014 introduced a usage of SCHED_IDLE to detect when
a context is within spu_run.
Instead of SCHED_IDLE (which has other meaning), add a flag to
sched_flags to tell if a context should be running.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The only tricky part is we need to adjust the PTE insertion loop to
cater for holes in the page table. The PTEs for each segment start on
a 4K boundary, so with 16M pages we have 16 PTEs per segment and then
a gap to the next 4K page boundary.
It might be possible to allocate the PTEs for each segment separately,
saving the memory currently filling the gaps. However we'd need to
check that's OK with the hardware, and that it actually saves memory.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Make some preliminary changes to cell_iommu_alloc_ptab() to allow it to
take the page size as a parameter rather than assuming IOMMU_PAGE_SIZE.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We use n_pte_pages to calculate the stride through the page tables, but
we also use it to set the NPPT value in the segment table entry. That is
defined as the number of 4K pages per segment, so we should calculate
it as such regardless of the IOMMU page size.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently the cell IOMMU code allocates the entire IOMMU page table in a
contiguous chunk. This is nice and tidy, but for machines with larger
amounts of RAM the page table allocation can fail due to it simply being
too large.
So split the segment table and page table setup routine, and arrange to
have the dynamic and fixed page tables allocated separately.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
There's no need to allocate the pad page unless we're going to actually
use it - so move the allocation to where we know we're going to use it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The cell IOMMU code no longer needs to save the pte_offset variable
separately, it is incorporated into tbl->it_offset.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The cell IOMMU tce build and free routines use pte_offset to convert
the index passed from the generic IOMMU code into a page table offset.
This takes into account the SPIDER_DMA_OFFSET which sets the top bit
of every DMA address.
However it doesn't cater for the IOMMU window starting at a non-zero
address, as the base of the window is not incorporated into pte_offset
at all.
As it turns out tbl->it_offset already contains the value we need, it
takes into account the base of the window and also pte_offset. So use
it instead!
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
It's called the fixed mapping, not the static mapping.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Ulrich Weigand has found that the hardware watchpoints on cell were not
working back in November :
http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046135.html
This patch sets them during initialization.
Signed-off-by: Jens Osterkamp <jens@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The spu_runcntl_RW register is restored within spu_restore function.
So, at the end of spu_bind_context, the SPU context is not just loaded,
but running.
This change corrects the state switch to account the time as USER.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There is a potential race between flushes of the entire SLB in the MFC
and the point where new entries are being established. The problem is
that we might put a ESID entry into the MFC SLB when the VSID entry has
just been cleared by the global flush.
This can be circumvented by holding the register_lock throughout both
the flushing and the creation of SLB entries.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When we replace an SLB entry in the MFC after using up all the available
entries, there is a short window in which an incorrect entry is marked
as valid.
The problem is that the 'valid' bit is stored in the ESID, which is
always written after the VSID. Overwriting the VSID first will make the
original ESID entry point to the new VSID, which means that any
concurrent DMA accessing the old ESID ends up being redirected to the
new virtual address. A few cycles later, we write the new ESID and
everything is fine again.
That race can be closed by writing a zero entry to the ESID first, which
makes sure that the VSID is not accessed until we write the new ESID.
Note that we don't actually need to invalidate the SLB entry using the
invalidation register, which would also flush any ERAT entries for that
segment, because the segment translation does not become invalid but is
only removed from the SLB cache.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There is a small race between the context save procedure
and the SPU interrupt handling, where we expect all interrupt
processing to have finished after disabling them, while
an interrupt is still being processed on another CPU.
The obvious fix is to call synchronize_irq() after disabling
the interrupts at the start of the context save procedure
to make sure we never access the SPU any more during an
ongoing save or even after that.
Thanks to Benjamin Herrenschmidt for pointing this out.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we get the following output from sputrace:
[5.097935954] 1606: spufs_ps_nopfn__enter (thread = 1605, spu = -1)
[5.097958164] 1606: spufs_ps_nopfn__insert (thread = 1605, spu = 15)
[5.097973529] 1607: spufs_ps_nopfn__enter (thread = 1605, spu = -1)
[5.097989174] 1607: spufs_ps_nopfn__insert (thread = 1605, spu = 14)
Which leads me to believe that 160[67] is the current thread ID, and
1605 is the context backing the psmap.
However, the 'current' and 'owner' tids are reversed - the 'current'
tid is on the right. This change puts the current thread ID in the
left-hand column instead, and renames the right to 'ctxthread'.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
At present, we have a situation where a context with no owner is
re-scheduled by spu_forget:
Thread 1: reading regs file Thread 2: context owner
spu_forget()
- ctx->owner = NULL
- set SPU_SCHED_WAS_ACTIVE
spu_acquire_saved()
- context is in saved state
spu_release_saved()
- SPU_SCHED_WAS_ACTIVE is set,
so spu_activate() the context,
which now has no owner
In spu_forget(), we shouldn't be requesting a re-schedule by setting
SPU_SCHED_WAS_ACTIVE. This change removes the set_bit in spu_forget(),
so that spu_release_saved() doesn't reinsert this destroyed context on
to the run queue.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We have a small window where a spu context may be destroyed while
we're servicing a page fault (from another thread) to the context's
problem state mapping.
After we up_read() the mmap_sem, it's possible that the context is
destroyed by its owning thread, and so the later references to ctx
are invalid. This can maifest as a deadlock on the (now free()-ed)
context state mutex.
This change adds a reference to the context before we release the
mmap_sem, so that the context cannot be destroyed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
At present, the __spufs_trap_data_map and __spu_trap_data_seq functions
exit if spu->flags has the SPU_CONTEXT_SWITCH_ACTIVE set. This was
resulting in suprious returns from these functions, as they may be
legitimately called when we have this bit set.
We only use it in these two sanity checks, so this change removes the
flag completely. This fixes hangs in the page-fault path of SPE apps.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
2.6.25 has a regression where we can starve the scheduler by creating
(N_SPES+1) contexts, then running them one at a time.
The final context will never be run, as the other contexts are loaded on
the SPEs, none of which are repoted as free (ie, spu->alloc_state !=
SPU_FREE), so spu_get_idle() doesn't give us a spu to run on. Because
all of the contexts are stopped, none are descheduled by the scheduler
tick, as spusched_tick returns if spu_stopped(ctx).
This change replaces the spu_stopped() check with checking for SCHED_IDLE
in ctx->policy. We set a context's policy to SCHED_IDLE when we're not
in spu_run(). We also favour SCHED_IDLE contexts when looking for contexts
to unbind, but leave their timeslice intact for later resumption.
This patch fixes the following test in the spufs-testsuite:
tests/20-scheduler/02-yield-starvation
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Remove unused CONFIG_WANT_DEVICE_TREE
[POWERPC] Cell RAS: Remove DEBUG, and add license and copyright
[POWERPC] hvc_rtas_init() must be __init
[POWERPC] free_property() must not be __init
[POWERPC] vdso_do_func_patch{32,64}() must be __init
[POWERPC] Remove generated files on make clean
[POWERPC] Fix arch/ppc compilation - add typedef for pgtable_t
[POWERPC] Wire up new timerfd syscalls
[POWERPC] PS3: Update sys-manager button events
[POWERPC] PS3: Sys-manager code cleanup
[POWERPC] PS3: Use system reboot on restart
[POWERPC] PS3: Fix bootwrapper hang bug
[POWERPC] PS3: Fix reading pm interval in logical performance monitor
[POWERPC] PS3: Fix setting bookmark in logical performance monitor
[POWERPC] Fix DEBUG_PREEMPT warning when warning
* Add path_put() functions for releasing a reference to the dentry and
vfsmount of a struct path in the right order
* Switch from path_release(nd) to path_put(&nd->path)
* Rename dput_path() to path_put_conditional()
[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.
Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
<dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
struct path in every place where the stack can be traversed
- it reduces the overall code size:
without patch series:
text data bss dec hex filename
5321639 858418 715768 6895825 6938d1 vmlinux
with patch series:
text data bss dec hex filename
5320026 858418 715768 6894212 693284 vmlinux
This patch:
Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/powerpc/platforms/cell/ras.c still has DEBUG #defined, which is no
longer necessary. Disable it - this disables two pr_debugs().
While we're there this file should have a copyright notice and license,
so add both.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
RCU style multiple probes support for the Linux Kernel Markers. Common case
(one probe) is still fast and does not require dynamic allocation or a
supplementary pointer dereference on the fast path.
- Move preempt disable from the marker site to the callback.
Since we now have an internal callback, move the preempt disable/enable to the
callback instead of the marker site.
Since the callback change is done asynchronously (passing from a handler that
supports arguments to a handler that does not setup the arguments is no
arguments are passed), we can safely update it even if it is outside the
preempt disable section.
- Move probe arm to probe connection. Now, a connected probe is automatically
armed.
Remove MARK_MAX_FORMAT_LEN, unused.
This patch modifies the Linux Kernel Markers API : it removes the probe
"arm/disarm" and changes the probe function prototype : it now expects a
va_list * instead of a "...".
If we want to have more than one probe connected to a marker at a given
time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it,
connecting a second probe handler to a marker will fail.
It allow us, for instance, to do interesting combinations :
Do standard tracing with LTTng and, eventually, to compute statistics
with SystemTAP, or to have a special trigger on an event that would call
a systemtap script which would stop flight recorder tracing.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Add arch-specific walk_memory_remove() for 64-bit powerpc
[POWERPC] Enable hotplug memory remove for 64-bit powerpc
[POWERPC] Add remove_memory() for 64-bit powerpc
[POWERPC] Make cell IOMMU fixed mapping printk more useful
[POWERPC] Fix potential cell IOMMU bug when switching back to default DMA ops
[POWERPC] Don't enable cell IOMMU fixed mapping if there are no dma-ranges
[POWERPC] Fix cell IOMMU null pointer explosion on old firmwares
[POWERPC] spufs: Fix timing dependent false return from spufs_run_spu
[POWERPC] spufs: No need to have a runnable SPU for libassist update
[POWERPC] spufs: Update SPU_Status[CISHP] in backing runcntl write
[POWERPC] spufs: Fix state_mutex leaks
[POWERPC] Disable G5 NAP mode during SMU commands on U3
Add a .show_options super operation to spufs.
Use generic_show_options() and save the complete option string in
spufs_fill_super().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
simple_attr_close implementes ->release so it should be named accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sometimes simple attributes might need to return an error, e.g. for
acquiring a mutex interruptibly. In fact we have that situation in
spufs already which is the original user of the simple attributes. This
patch merged the temporarily forked attributes in spufs back into the
main ones and allows to return errors.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: <stefano.brivio@polimi.it>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg KH <greg@kroah.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the cell IOMMU fixed mapping just printks that it's been setup,
which is not particularly useful. Much more interesting is the address
ranges for the different windows. This adds one line to dmesg on a blade.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If we get a 64-bit dma mask we switch to the fixed ops and call
cell_dma_dev_setup(). If the driver then switches back to a 32-bit dma
mask for any reason we don't call cell_dma_dev_setup() again, which
has the potential to leave bogus data in dev->archdata.dma_data.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In order for the cell IOMMU fixed mapping to work we need "dma-ranges"
properties in the device tree. If there are none then there's no point
enabling the fixed mapping support.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The cell IOMMU fixed mapping support has a null pointer bug if you run
it on older firmwares that don't contain the "dma-ranges" properties.
Fix it and convert to using of_get_next_parent() while we're there.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Stop bits are only valid when the running bit is not set. Status bits
carry over from one invocation of spufs_run_spu() to another, so the
RUNNING bit gets added to the previous state of the register which may
have been a remote library call. In this case, it looks like another
library routine should be invoked, but the spe is actually running.
This fixes a problem with a testcase that exercises the scheduler.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We don't need to update the libassist statistic with the context in a
runnable state, so do it after spu_disable_spu().
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently, the kernel may fail to restart a SPE context which
has stopped and been swapped out.
This changes spu_backing_runcntl_write to emulate the real
SPU_Status register exactly. When the SPU Run Control register
is written with SPU_RunCntl[Run] set to '1', the physical SPU
automatically sets SPU_Status[R] and clears SPU_Status[CISHP].
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix various state_mutex leaks. The worst one was introduced by the
interrutible state_mutex conversion but there've been a few before
too. Notably spufs_wait now returns without the state_mutex held
when returning an error, which actually cleans up some code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
I got this warning from gcc:
arch/powerpc/platforms/cell/axon_msi.c:118: warning: 'tmp' may be used uninitialized in this function
Which turns out to be a false positive, but pointed out that it was
possible for the error path in find_msi_translator() to do an extra
of_node_put on a node. This fixes it by localising the ref counting
a bit. As a side effect, the warning goes away.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There's a brown-paper-bag bug in axon_msi, we pass the address of our
FIFO directly to the hardware, without DMA mapping it. This leads to
DMA exceptions if you enable MSI & the IOMMU.
The fix is to correctly DMA map the fifo, dma_alloc_coherent() does
what we want - and we need to track the virt & phys addresses.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now that we create of_platform devices earlier on cell, we can make the
axon_msi driver an of_platform driver. This makes the code cleaner in
several ways, and most importantly means we have a struct device.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently cell publishes OF devices at device_initcall() time, which
means the earliest a driver can bind to a device is also device_initcall()
time. We have a driver we want to register before other devices, so
publish the devices at subsys_initcall() time.
This should not cause any behaviour change for existing drivers, as they
are still bound at device_initcall() time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Reference count for the "neighbor" spu context was not
being correctly decremented after usage.
So, contexts used as reference during SPU affinity setup
were not being deallocated, leading to a memory leak.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we only catch debug events through the 0x3fff status;
spufs_run_spu doesn't handle single-step SPE events.
This change adds a handler for conditions where the SPE is stopped due
to single-step-mode.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds markers two important points in the spufs code and a new
module (sputrace.ko) that allows reading these out through a proc file.
Long-term I'd rather see something like lttng extended to use the spufs
instrumentation, but for now I think this is a good enough quick
solution. We'll probably want to add various addition event in addition
to that ones I have already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds support for setting up a fixed IOMMU mapping on certain
cell machines. For 64-bit devices this avoids the performance overhead of
mapping and unmapping pages at runtime. 32-bit devices are unable to use
the fixed mapping.
The fixed mapping is established at boot, and maps all of physical memory
1:1 into device space at some offset. On machines with < 30 GB of memory
we setup the fixed mapping immediately above the normal IOMMU window.
For example a machine with 4GB of memory would end up with the normal
IOMMU window from 0-2GB and the fixed mapping window from 2GB to 6GB. In
this case a 64-bit device wishing to DMA to 1GB would be told to DMA to
3GB, plus any offset required by firmware. The firmware offset is encoded
in the "dma-ranges" property.
On machines with 30GB or more of memory, we are unable to place the fixed
mapping above the normal IOMMU window as we would run out of address space.
Instead we move the normal IOMMU window to coincide with the hash page
table, this region does not need to be part of the fixed mapping as no
device should ever be DMA'ing to it. We then setup the fixed mapping
from 0 to 32GB.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split out the ioid fetching and checking logic so we can use it elsewhere
in a subsequent patch.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add support to cell_iommu_setup_page_tables() for handling two windows,
the dynamic window and the fixed window. A fixed window size of 0
indicates that there is no fixed window at all.
Currently there are no callers who pass a non-zero fixed window, but the
upcoming fixed IOMMU mapping patch will change that.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split the IOMMU logic out from cell_dma_dev_setup() into a separate
function. If we're not using dma_direct_ops or dma_iommu_ops we don't
know what the hell's going on, so BUG.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split cell_iommu_setup_hardware() into two parts. Split the page table
setup into cell_iommu_setup_page_tables() and the bits that kick the
hardware into cell_iommu_enable_hardware().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Split out the logic that allocates a struct iommu into a separate
function. This can fail however the calling code has never cared - so
just return if we can't allocate an iommu.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently the IOMMU code allocates one page for the segment table, that
isn't safe if we have more than 132 GB of RAM.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Rather than using the global variable, have cell use its own variable
to store the direct DMA offset.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Store the direct_dma_offset in each device's dma_data in the case
where we're using the direct DMA ops.
We need to make sure we setup the ppc_md.pci_dma_dev_setup() callback
if we're using a non-zero offset.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Commit aed3a8c9bb introduced a
definition of notify_spus_active in .../cell/spu_syscalls.c, and
another definition under #ifndef MODULE in .../cell/spufs/sched.c.
The latter is not necessary and causes the build to fail when
CONFIG_SPU_FS=y, so this removes it. It also removes the export
of do_notify_spus_active, which is unnecessary.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
This removes an OProfile dependency on the spufs module. This
dependency was causing a problem for multiplatform systems that are
built with support for Oprofile on Cell but try to load the oprofile
module on a non-Cell system.
Signed-off-by: Bob Nelson <rrnelson@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on an original patch from Arnd Bergmann
<arnd.bergmann@de.ibm.com>
If there's no entry in the mailbox, then a read on the _info file will
return data from an uninitialised variable.
This change returns EOF if there's no mailbox info available instead.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes the behavior of spufs when a spu tries a DMA operation
based on a wrong / unavailable address.
Instead of just generating a SIGBUS signal, spufs now
generates a SIGSEGV signal and restarts the problematic DMA operation
after the execution of the application's signal handler. This allows
applications to employ user-level paging systems.
Although the restart_dma function is called before the application's
signal handler, the operation is not actually performed at this time,
since the spu context is already stopped. The operation only takes
place when spu_run is restarted (which happens automatically).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The original spusched_timer was designed to take effect only when
a context is waiting in the runqueue.
This change adds an additional lower-freq timer has been added to
purely handle the spu_load updates. The new timer will be triggered
per LOAD_FREQ ticks.
Signed-off-by: Aegis Lin <aegislin@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Make most places that use spu_acquire/spu_acquire_saved interruptible,
this allows getting out of the spufs code when e.g. pressing ctrl+c.
There are a few places where we get called e.g. from spufs teardown
routines were we can't simply err out so these are left with a comment.
For now I've also not touched the poll routines because it's open what
libspe would expect in terms of interrupted system calls.
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The simple attr macros currently used by spufs can't deal with the
handlers returning errors, which is required to make the state_mutex
interruptible. This adds a local copy that allows for an error
return from the get/set handlers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change spufs_spu_run so that the context is queued directly to the
scheduler and the controlling thread advances directly to spufs_wait()
for spe errors and exceptions.
nosched contexts are treated the same as before.
Fixes from Christoph Hellwig <hch@lst.de>
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This changes the spu context switch code to not write to reserved bits
of spu interrupt status register.
The architecture book says the reserved fields should be set to zero.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Need to re-check priority after dropping lock. Otherwise, a
more favored context may be preempted.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This cleans up spu_run_init so that it does all of the spu
initialization for spufs_run_spu. It initializes the spu context as
much as possible before it activates the spu and writes the runcntl
register.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Based on original patches from
Arnd Bergmann <arnd.bergman@de.ibm.com>; and
Luke Browning <lukebr@linux.vnet.ibm.com>
Currently, spu contexts need to be loaded to the SPU in order to take
class 0 and class 1 exceptions.
This change makes the actual interrupt-handlers much simpler (ie,
set the exception information in the context save area), and defers the
handling code to the spufs_handle_class[01] functions, called from
spufs_run_spu.
This should improve the concurrency of the spu scheduling leading to
greater SPU utilization when SPUs are overcommited.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a few #defines for the class 0, 1 and 2 interrupt status bits, and
use them instead of magic numbers when we're setting or checking for
these interrupts.
Also, add a #define for the class 2 mailbox threshold interrupt mask.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
When doing a poll on the mbox stat file of a swapped-out context, we
clear the class 0 interrupt status, rather than the class 2 interrupt
status.
This change corrects the poll operation to clear the correct interrupt.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change encapsulates the spu_privcntl_RW register so that it can
be written through backing ops. This is necessary so that spu contexts
can be initialized and queued to the scheduler in spufs_run_spu.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change disables the logic that faults-in spu contexts under the
covers from the page fault handler. When a fault requires a runnable
context, the handler will block until the context is scheduled by
other means.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>