Based on original patches from
Arnd Bergmann <arnd.bergman@de.ibm.com>; and
Luke Browning <lukebr@linux.vnet.ibm.com>
Currently, spu contexts need to be loaded to the SPU in order to take
class 0 and class 1 exceptions.
This change makes the actual interrupt-handlers much simpler (ie,
set the exception information in the context save area), and defers the
handling code to the spufs_handle_class[01] functions, called from
spufs_run_spu.
This should improve the concurrency of the spu scheduling leading to
greater SPU utilization when SPUs are overcommited.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add a few #defines for the class 0, 1 and 2 interrupt status bits, and
use them instead of magic numbers when we're setting or checking for
these interrupts.
Also, add a #define for the class 2 mailbox threshold interrupt mask.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We're currently getting a warning from not checking the result of
sysfs_create_group, which is declared as __must_check.
This change introduces appropriate error-handling for
spu_add_sysdev_attr_group()
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, we have a possibilty that the SLBs setup during context
switch don't cover the entirety of the necessary lscsa and code
regions, if these regions cross a segment boundary.
This change checks the start and end of each region, and inserts a SLB
entry for each, if unique. We also remove the assumption that the
spu_save_code and spu_restore_code reside in the same segment, by using
the specific code array for save and restore.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add a function spu_64k_pages_available(), so that we can abstract the
explicity use of mmu_psize_defs() in lssca_alloc.c
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Now that we have a helper function to setup a SPU SLB, use it for
__spu_trap_data_seq.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, the SPU context switch code (spufs/switch.c) sets up the
SPU's SLBs directly, which requires some low-level mm stuff.
This change moves the kernel SLB setup to spu_base.c, by exposing
a function spu_setup_kernel_slbs() to do this setup. This allows us
to remove the low-level mm code from switch.c, making it possible
to later move switch.c to the spufs module.
Also, add a struct spu_slb for the cases where we need to deal with
SLB entries.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Export force_sig_info to allow signals to be sent from a modular spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).
We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.
We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments. That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.
Parts of this patch were originally written by Ben Herrenschmidt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
There are a few symbols used only in one file within spufs; this change
makes them static where suitable.
Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The Cell BE Architecture spec states that the SPU MFC Class 0 interrupt
is edge-triggered. The current spu interrupt handler assumes this
behavior and does not clear the interrupt status.
The PS3 hypervisor visualizes all SPU interrupts as level, and on return
from the interrupt handler the hypervisor will deliver a new virtual
interrupt for any unmasked interrupts which for which the status has not
been cleared. This fix clears the interrupt status in the interrupt
handler.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch moves affinity initialization code from spu_base.c to a
new spu_management_of_ops function (init_affinity), which is empty
in the case of PS3. This fixes a linking problem that was happening
when compiling for PS3.
Also, some small code style changes were made.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This sorts out the various lists and related locks in the spu code.
In detail:
- the per-node free_spus and active_list are gone. Instead struct spu
gained an alloc_state member telling whether the spu is free or not
- the per-node spus array is now locked by a per-node mutex, which
takes over from the global spu_lock and the per-node active_mutex
- the spu_alloc* and spu_free function are gone as the state change is
now done inline in the spufs code. This allows some more sharing of
code for the affinity vs normal case and more efficient locking
- some little refactoring in the affinity code for this locking scheme
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Sort out the locking mess in spu_base and document the current rules.
As an added benefit spu_alloc* and spu_free don't block anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch links spus according to their physical position using
information provided by the firmware through a special vicinity
device-tree property. This property is present in current version
of Malta firmware.
Example of vicinity properties for a node in Malta:
Node: Vicinity property contains phandles of:
spe@0 [ spe@100000 , mic-tm@50a000 ]
spe@100000 [ spe@0 , spe@200000 ]
spe@200000 [ spe@100000 , spe@300000 ]
spe@300000 [ spe@200000 , bif0@512000 ]
spe@80000 [ spe@180000 , mic-tm@50a000 ]
spe@180000 [ spe@80000 , spe@280000 ]
spe@280000 [ spe@180000 , spe@380000 ]
spe@380000 [ spe@280000 , bif0@512000 ]
Only spe@* have a vicinity property (e.g., bif0@512000 and
mic-tm@50a000 do not have it).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch makes the scheduller honor affinity information for each
context being scheduled. If the context has no affinity information,
behaviour is unchanged. If there are affinity information, context is
schedulled to be run on the exact spu recommended by the affinity
placement algorithm.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch allows the use of spu affinity on QS20, whose
original FW does not provide affinity information.
This is done through two hardcoded arrays, and by reading the reg
property from each spu.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds affinity data to each spu instance.
A doubly linked list is created, meant to connect the spus
in the physical order they are placed in the BE. SPUs
near to memory should be marked as having memory affinity.
Adjustments of the fields acording to FW properties is done
in separate patches, one for CPBW, one for Malta (patch for
Malta under testing).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Addition of a spufs-global "cbe_info" array. Each entry contains information
about one Cell/B.E. node, namelly:
* list of spus (both free and busy spus are in this list);
* list of free spus (replacing the static spu_list from spu_base.c)
* number of spus;
* number of reserved (non scheduleable) spus.
SPE affinity implementation actually requires only access to one spu per
BE node (since it implements its own pointer to walk through the other spus
of the ring) and the number of scheduleable spus (n_spus - non_sched_spus)
However having this more general structure can be useful for other
functionalities, concentrating per-cbe statistics / data.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch exports per-context statistics in spufs as long as spu
statistics in sysfs.
It was formed by merging:
"spufs: add spu stats in sysfs" From: Christoph Hellwig
"spufs: add stat file to spufs" From: Christoph Hellwig
"spufs: fix libassist accounting" From: Jeremy Kerr
"spusched: fix spu utilization statistics" From: Luke Browning
And some adjustments by myself, after suggestions on cbe-oss-dev.
Having separate patches was making the review process harder
than it should, as we end up integrating spus and ctx statistics
accounting much more than it was on the first implementation.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds support for investigating spus information after a
kernel crash event, through kdump vmcore file.
Implementation is based on xmon code, but the new functionality was
kept independent from xmon.
Signed-off-by: Lucio Jose Herculano Correia <luciojhc@br.ibm.com>
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Let spu_management_ops.enumerate_spus() return the number of found SPEs
and use that information to draw some little helper penguin logos.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-By: James Simmons <jsimmons@infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a shutdown method to spu_sysdev_class to allow proper spu resource
cleanup on system shutdown. This is needed to support kexec on the PS3
platform.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The basic issue is to be able to do what hugetlbfs does but with
different page sizes for some other special filesystems; more
specifically, my need is:
- Huge pages
- SPE local store mappings using 64K pages on a 4K base page size
kernel on Cell
- Some special 4K segments in 64K-page kernels for mapping a dodgy
type of powerpc-specific infiniband hardware that requires 4K MMU
mappings for various reasons I won't explain here.
The main issues are:
- To maintain/keep track of the page size per "segment" (as we can
only have one page size per segment on powerpc, which are 256MB
divisions of the address space).
- To make sure special mappings stay within their allotted
"segments" (including MAP_FIXED crap)
- To make sure everybody else doesn't mmap/brk/grow_stack into a
"segment" that is used for a special mapping
Some of the necessary mechanisms to handle that were present in the
hugetlbfs code, but mostly in ways not suitable for anything else.
The patch relies on some changes to the generic get_unmapped_area()
that just got merged. It still hijacks hugetlb callbacks here or
there as the generic code hasn't been entirely cleaned up yet but
that shouldn't be a problem.
So what is a slice ? Well, I re-used the mechanism used formerly by our
hugetlbfs implementation which divides the address space in
"meta-segments" which I called "slices". The division is done using
256MB slices below 4G, and 1T slices above. Thus the address space is
divided currently into 16 "low" slices and 16 "high" slices. (Special
case: high slice 0 is the area between 4G and 1T).
Doing so simplifies significantly the tracking of segments and avoids
having to keep track of all the 256MB segments in the address space.
While I used the "concepts" of hugetlbfs, I mostly re-implemented
everything in a more generic way and "ported" hugetlbfs to it.
Slices can have an associated page size, which is encoded in the mmu
context and used by the SLB miss handler to set the segment sizes. The
hash code currently doesn't care, it has a specific check for hugepages,
though I might add a mechanism to provide per-slice hash mapping
functions in the future.
The slice code provide a pair of "generic" get_unmapped_area() (bottomup
and topdown) functions that should work with any slice size. There is
some trickiness here so I would appreciate people to have a look at the
implementation of these and let me know if I got something wrong.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use DEFINE_SPINLOCK instead of initializing spinlocks to
SPIN_LOCK_UNLOCKED, since DEFINE_SPINLOCK is better for lockdep.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This change fixes the case where spu_base and spufs are initialised on a
system with no SPEs - unconditionally create the spu_lists so spu_alloc
doesn't explode, and check for spu_management ops before starting spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
arch/powerpc/platforms/cell/spu_base.c | 7 ++++---
arch/powerpc/platforms/cell/spufs/inode.c | 5 +++++
2 files changed, 9 insertions(+), 3 deletions(-)
spu_base.c is always built into the kernel image, so there is no need
for a cleanup function. And some of the things it does are in the
way for my following patches, so I'd rather get rid of it ASAP.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Until now, we have always entered the spu page fault handler
with a mutex for the spu context held. This has multiple
bad side-effects:
- it becomes impossible to suspend the context during
page faults
- if an spu program attempts to access its own mmio
areas through DMA, we get an immediate livelock when
the nopage function tries to acquire the same mutex
This patch makes the page fault logic operate on a
struct spu_context instead of a struct spu, and moves it
from spu_base.c to a new file fault.c inside of spufs.
We now also need to copy the dar and dsisr contents
of the last fault into the saved context to have it
accessible in case we schedule out the context before
activating the page fault handler.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
There is no reason to execute spu_init_channels under spu_mutex
after the spu has been taken off the freelist it's ours.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The SPU code doesn't properly invalidate SPUs SLBs when necessary,
for example when changing a segment size from the hugetlbfs code. In
addition, it saves and restores the SLB content on context switches
which makes it harder to properly handle those invalidations.
This patch removes the saving & restoring for now, something more
efficient might be found later on. It also adds a spu_flush_all_slbs(mm)
that can be used by the core mm code to flush the SLBs of all SPEs that
are running a given mm at the time of the flush.
In order to do that, it adds a spinlock to the list of all SPEs and move
some bits & pieces from spufs to spu_base.c
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
spu->register_lock should be held before accessing registers.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a platform specific spu management abstraction and the coresponding
routines to support the IBM Cell Blade. It also removes the hypervisor only
resources that were included in struct spu.
Three new platform specific routines are introduced, spu_enumerate_spus(),
spu_create_spu() and spu_destroy_spu(). The underlying design uses a new
type, struct spu_management_ops, to hold function pointers that the platform
setup code is expected to initialize to instances appropriate to that platform.
For the IBM Cell Blade support, I put the hypervisor only resources that were
in struct spu into a platform specific data structure struct spu_pdata.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
When we attempt an MFC DMA to an unmapped address, the event
returned from spu_run should be SPE_EVENT_SPE_DATA_STORAGE,
not SPE_EVENT_INVALID_DMA.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace the use of the platform specific variable spu.nid with the
platform independednt variable spu.node.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This fixes a typo in the "new style" code for mapping SPE resources,
which causes it to try to map the same resource 4 times.
It also adds some pr_debug's that are useful to track down issues with
the firmware when bringinh up new machines.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds support for stopping, and restarting, spus
from xmon. We use the spu master runcntl bit to stop execution,
this is apparently the "right" way to control spu execution and
spufs will be changed in the future to use this bit.
Testing has shown that to restart execution we have to turn the
master runcntl bit on and also rewrite the spu runcntl bit, even
if it is already set to 1 (running).
Stopping spus is triggered by the xmon command 'ss' - "spus stop"
perhaps. Restarting them is triggered via 'sr'. Restart doesn't
start execution on spus unless they were running prior to being
stopped by xmon.
Walking the spu->full_list in xmon after a panic, would mean
corruption of any spu struct would make all the others
inaccessible. To avoid this, and also to make the next patch
easier, we cache pointers to all spus during boot.
We attempt to catch and recover from errors while stopping and
restarting the spus, but as with most xmon functionality there are
no guarantees that performing these operations won't crash xmon
itself.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
In order to add sysfs attributes to all spu's, there is a
need for a list of all available spu's. Adding the device_node
makes also sense, as it is needed for proper register access.
This patch also adds two functions to create and remove sysfs
attributes and attribute_groups to all spus.
That allows to group spu attributes in a subdirectory like:
/sys/devices/system/spu/spuX/group_name/what_ever
This will be used by cbe_thermal to group all attributes dealing with
thermal support in one directory.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds general support for isolated mode SPE apps.
Isolated apps are started indirectly, by a dedicated loader "kernel".
This patch starts the loader when spe_create is invoked with the
ISOLATE flag. We do this at spe_create time to allow libspe to pass the
isolated app in before calling spe_run.
The loader is read from the device tree, at the location
"/spu-isolation/loader". If the loader is not present, an attempt to
start an isolated SPE binary will fail with -ENODEV.
Update: loader needs to be correctly aligned - copy to a kmalloced buf.
Update: remove workaround for systemsim/spurom 'L-bit' bug, which has
been fixed.
Update: don't write to runcntl on spu_run_init: SPU is already running.
Update: do spu_setup_isolated earlier
Tested on systemsim.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove the mostly unused variable isrc from struct spu and a forgotten
function declaration.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
SPRN_SDR1 and the SPE's MFC SDR are hypervisor resources and
are not accessible from a logical partition. This change adds an
access wrapper.
When running on bare H/W, the spufs needs to only set the SPE's MFC SDR
to the value of the PPE's SPRN_SDR1 once at SPE initialization, so this
change renames mfc_sdr_set() to mfc_sdr_setup() and moves the
access of SPRN_SDR1 into the mmio wrapper. It also removes the now
unneeded member mfc_sdr_RW from struct spu_priv1_collapsed.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
--
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SPU code will crash if CONFIG_NUMA is not set and SPUs are found on
a non-0 node. This workaround will ignore those SPEs and just print an
message in the kernel log.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The properties we used traditionally in the device tree are somewhat
nonstandard. This adds support for a more conventional format using
'interrupts' and 'reg' properties.
The interrupts are specified in three cells (class 0, 1 and 2) and
registered at the interrupt-parent.
The reg property contains either three or four register areas in the
order 'local-store', 'problem', 'priv2', and 'priv1', so the priv1 one
can be left out in case of hypervisor driven systems that access these
through hcalls.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Any firmware that still uses the 'spc' nodes already
stopped running for other reasons, so let's get rid of this.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This tries to fix spufs so we have an interface closer to what is
specified in the man page for events returned in the third argument of
spu_run.
Fortunately, libspe has never been using the returned contents of that
register, as they were the same as the return code of spu_run (duh!).
Unlike the specification that we never implemented correctly, we now
require a SPU_CREATE_EVENTS_ENABLED flag passed to spu_create, in
order to get the new behavior. When this flag is not passed, spu_run
will simply ignore the third argument now.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch adds NUMA support to the the spufs scheduler.
The new arch/powerpc/platforms/cell/spufs/sched.c is greatly
simplified, in an attempt to reduce complexity while adding
support for NUMA scheduler domains. SPUs are allocated starting
from the calling thread's node, moving to others as supported by
current->cpus_allowed. Preemption is gone as it was buggy, but
should be re-enabled in another patch when stable.
The new arch/powerpc/platforms/cell/spu_base.c maintains idle
lists on a per-node basis, and allows caller to specify which
node(s) an SPU should be allocated from, while passing -1 tells
spu_alloc() that any node is allowed.
Since the patch removes the currently implemented preemptive
scheduling, it is technically a regression, but practically
all users have since migrated to this version, as it is
part of the IBM SDK and the yellowdog distribution, so there
is not much point holding it back while the new preemptive
scheduling patch gets delayed further.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>