Currently, an empty spufs root inode has nlink count of 1. However,
the directory has two links; / -> spu and /spu/ -> .
This change increments the link count of the root inode in spufs.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Use the struct device's numa_node instead; use accessor functions
to get/set numa_node.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
We currently have a race when scheduling a context to a SPE -
after we have found a runnable context in spusched_tick, the same
context may have been scheduled by spu_activate().
This may result in a panic if we try to unschedule a context that has
been freed in the meantime.
This change exits spu_schedule() if the context has already been
scheduled, so we don't end up scheduling it twice.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We currently have a race for a free SPE. With one thread doing a
spu_yield(), and another doing a spu_activate():
thread 1 thread 2
spu_yield(oldctx) spu_activate(ctx)
__spu_deactivate(oldctx)
spu_unschedule(oldctx, spu)
spu->alloc_state = SPU_FREE
spu = spu_get_idle(ctx)
- searches for a SPE in
state SPU_FREE, gets
the context just
freed by thread 1
spu_schedule(ctx, spu)
spu->alloc_state = SPU_USED
spu_schedule(newctx, spu)
- assumes spu is still free
- tries to schedule context on
already-used spu
This change introduces a 'free_spu' flag to spu_unschedule, to indicate
whether or not the function should free the spu after descheduling the
context. We only set this flag if we're not going to re-schedule
another context on this SPU.
Add a comment to document this behaviour.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Commit 8d5636fbca introduced a reference
count on SPU contexts during find_victim, but this may cause a leak in
the reference count if we later find a better contender for a context to
unschedule.
Change the reference to after we've found our victim context, so we
don't do the extra get_spu_context().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Based on an original patch from Christoph Hellwig <hch@lst.de>.
Currently, there is a possible reference-after-free in the spusched
code - contexts may be freed after we have released their state_mutex
in spusched_tick and find_victim.
This change takes a reference to the context before releasing the
mutex, so that the context doesn't get destroyed.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, spu_run ignores the npc argument for contexts created with
SPU_CREATE_NOSCHED. While this is correct for isolated contexts,
there's no need to enforce the npc restriction on non-isolated NOSCHED
contexts.
This means that NOSCHED contexts can only ever run with an entry point
of 0x0.
This change to spu_run_init allows setting of the npc (and, while we're
at it, the privcntl) for non-isolated NOSCHED contexts. This allows
us to run NOSCHED contexts from any entry point.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Ingo Molnar provided a fix to not call _PPC at processor driver
initialization time in "[PATCH] ACPI: fix cpufreq regression" (git
commit e4233dec74)
But it can still happen that _PPC is called at processor driver
initialization time.
This patch should make sure that this is not possible anymore.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To support Cooperative Memory Overcommitment (CMO), we need to check
for failure from some of the tce hcalls.
These changes for the pseries platform affect the powerpc architecture;
patches for the other affected platforms are included in this patch.
pSeries platform IOMMU code changes:
* platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
return an error.
Architecture IOMMU code changes:
* Calls to ppc_md.tce_build need to check return values and return
DMA_MAPPING_ERROR for transient errors.
Architecture changes:
* struct machdep_calls for tce_build*_pSeriesLP functions need to change
to indicate failure.
* all other platforms will need updates to iommu functions to match the new
calling semantics; they will return 0 on success. The other platforms
default configs have been built, but no further testing was performed.
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: Olof Johansson <olof@lixom.net>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment the fixed mapping is by default strongly ordered (the
iommu_fixed=weak boot option must be used to make the fixed mapping weakly
ordered). If we're on a setup where the southbridge is being used in
endpoint mode (triblade and CAB boards) the default should be a weakly
ordered fixed mapping.
This adds a check so that if a node of type pcie-endpoint can be found in
the device tree the fixed mapping is set to be weak by default (but can be
overridden using iommu_fixed=strong).
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This uses the new vm_ops->access to allow gdb to access the SPU local
store. We currently prevent access to problem state registers, this can
be done later if really needed but it's safer not to.
[akpm@linux-foundation.org: fix typo]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adjusts the placement of a reference context from
a spu affinity chain. The reference context can now be placed
only on nodes that have enough spus not intended to be used by
another gang (already running on the node).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currenlt,, it is possible to lock aff_mutex and
cbe_spu_info[n].list_mutex in different orders, allowing a deadlock to
occur. With this change, aff_mutex is not taken within a list_mutex
critical section anymore.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
kcalloc is supposed to be called with the count as its first argument and
the element size as the second.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (49 commits)
powerpc: Fix build bug with binutils < 2.18 and GCC < 4.2
powerpc/eeh: Don't panic when EEH_MAX_FAILS is exceeded
fbdev: Teaches offb about palette on radeon r5xx/r6xx
powerpc/cell/edac: Log a syndrome code in case of correctable error
powerpc/cell: Add DMA_ATTR_WEAK_ORDERING dma attribute and use in Cell IOMMU code
powerpc: Indicate which oprofile counters to use while in compat mode
powerpc/boot: Change spaces to tabs
powerpc: Remove duplicate 6xx option in Kconfig
powerpc: Use PPC_LONG and PPC_LONG_ALIGN in lib/string.S
powerpc: Use PPC_LONG_ALIGN in uaccess.h
powerpc: Add a #define for aligning to a long-sized boundary
powerpc: Fix OF parsing of 64 bits PCI addresses
powerpc: Use WARN_ON(1) instead of __WARN()
powerpc: Fix support for latencytop
powerpc/ps3: Update ps3_defconfig
powerpc/ps3: Add a sub-match id to ps3_system_bus
powerpc: Add a 6xx defconfig
powerpc/dma: Use the struct dma_attrs in iommu code
powerpc/cell: Add support for power button of future IBM cell blades
powerpc/cell: Cleanup sysreset_hack for IBM cell blades
...
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Introduce a new dma attriblue DMA_ATTR_WEAK_ORDERING to use weak ordering
on DMA mappings in the Cell processor. Add the code to the Cell's IOMMU
implementation to use this code.
Dynamic mappings can be weakly or strongly ordered on an individual basis
but the fixed mapping has to be either completely strong or completely weak.
This is currently decided by a kernel boot option (pass iommu_fixed=weak
for a weakly ordered fixed linear mapping, strongly ordered is the default).
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Update iommu_alloc() to take the struct dma_attrs and pass them on to
tce_build(). This change propagates down to the tce_build functions of
all the platforms.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds support for the power button on future IBM cell blades.
It actually doesn't shut down the machine. Instead it exposes an
input device /dev/input/event0 to userspace which sends KEY_POWER
if power button has been pressed.
haldaemon actually recognizes the button, so a plattform independent acpid
replacement should handle it correctly.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a config option for the sysreset_hack used for
IBM Cell blades. The code is moves from pervasive.c into ras.c and
gets it's own init method.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch adds a cpufreq governor that takes the number of running spus
into account. It's very similar to the ondemand governor, but not as complex.
Instead of hacking spu load into the ondemand governor it might be easier to
have cpufreq accepting multiple governors per cpu in future.
Don't know if this is the right way, but it would keep the governors simple.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
It is okay for both _PAGE_GUARDED and _PAGE_COHERENT (G and M) to be set
in the same pte. In fact, even if that were not the case, there doesn't
seem to be any place where G is set without also setting I (_PAGE_NO_CACHE),
so the test for I is sufficient as a condition to clear _PAGE_COHERENT
when filling the hash table.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Make cell_dma_dev_setup_iommu() return a pointer to the struct iommu_table
(or NULL if no table can be found) rather than putting this pointer into
dev->archdata.dma_data (let the caller do that), and rename this function
to cell_get_iommu_table() to reflect this change.
This will allow us to get the iommu table for a device that doesn't have
the table in the archdata.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
As nr_active counter includes also spus waiting for syscalls to return
we need a seperate counter that only counts spus that are currently running
on spu side. This counter shall be used by a cpufreq governor that targets
a frequency dependent from the number of running spus.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently, the .ctx debug file in spu context directories is always
present.
We'd prefer to prevent users from relying on this file, so add a
"debug" mount option to spufs. The .ctx file will only be added to
the context directories when this option is present.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Populate the size member of a few context files. Leave out files that
have different semantics with read vs mmap, or contain a
variable-length hex string.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, spufs never specifies the i_size for the files in context
directories, so stat() always reports 0-byte files.
This change adds allows the spufs_dir_(nosched_)contents arrays to
specify a file size. This allows stat() to report correct file sizes,
and makes SEEK_END work.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
An spu context shouldn't get an extra tick if the time slice code
couldn't find something else to run. This means contexts that are not
within spu_run (ie, SPU_SCHED_SPU_RUN is cleared) will not receive
extra ticks while we have no other contexts waiting.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Add a ctxt file to spufs that shows spu context information that is used
in scheduling. This info can be used for debugging spufs scheduler
issues, and to isolate between application and spufs problems as it
shows a lot of state such as priorities and dispatch counts.
This file contains internal spufs state and is subject to change at any
time, and therefore no applications should depend on it. The file is
intended for the use of spufs kernel developers.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We need to disable ptcal before starting a new kernel after a crash,
in order to avoid overwriting data in the kdump kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This converts ppc to use the new helpers for smp_call_function() and
friends, and adds support for smp_call_function_single().
ppc loses the timeout functionality of smp_call_function_mask() with
this change, as the generic code does not provide that.
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
There is a delay in the transition to the stopped state for class 2
interrupts. In some cases, the controlling thread detects the state of
the spu as running, and goes back to sleep resulting in a hung
application as the event is missed.
This change detects the stop condition and re-generates the wakeup event
after a context save.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Time slicing can occur at the same time as spu exception handling
resulting in the wakeup of the wrong thread.
This change uses the the spu's register_lock to enforce synchronization
between bind/unbind and spu exception handling so that they are
mutually exclusive.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
According to the CBEA, the SPU dsisr is not updated for class 0
exceptions.
spu_stopped() is testing the dsisr that was passed to it from the class
0 exception handler, so we return a false positive here.
This patch cleans up the interrupt handler and erroneous tests in
spu_stopped. It also removes the fields from the csa since it is not
needed to process class 0 events.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
If the spu is stopping (ie, the SPU_STATUS_RUNNING bit is still set),
re-read the register to get the final stopped value.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When I changed irq_alloc_host() to take an of_node
(52964f87c64e6c6ea671b5bf3030fb1494090a48: "Add an optional
device_node pointer to the irq_host"), I botched the reference
counting semantics.
Stephen pointed out that it's irq_alloc_host()'s business if
it needs to take an additional reference to the device_node,
the caller shouldn't need to care.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If we do the call to irq_of_parse_and_map() first, then we don't
need to worry about freeing the irq_host.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds some debugging code to the Axon MSI driver. It creates a
file per MSIC in /sys/kernel/debug/powerpc, which allows the user to
trigger a fake MSI interrupt by writing to the file.
This can be used to test some of the MSI generation path. In
particular, that the MSIC recognises a write to the MSI address,
generates an interrupt and writes the MSI packet into the ring buffer.
All the code is inside #ifdef DEBUG so it causes no harm unless it's
enabled.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Fix following warnings:
WARNING: arch/powerpc/platforms/cell/built-in.o(.devinit.text+0x9c): Section mismatch in reference from the function .cell_setup_phb() to the function .init.text:.iowa_register_bus()
WARNING: arch/powerpc/platforms/cell/built-in.o(.devinit.text+0xa4): Section mismatch in reference from the function .cell_setup_phb() to the function .init.text:.io_workaround_init()
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With CONFIG_VIRT_CPU_ACCOUNTING disabled, I got the following error:
linux-2.6/arch/powerpc/platforms/cell/spufs/file.c: In function 'spu_switch_log_notify':
linux-2.6/arch/powerpc/platforms/cell/spufs/file.c:2542: error: implicit declaration of function 'get_tb'
make[4]: *** [arch/powerpc/platforms/cell/spufs/file.o] Error 1
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If victim (not ctx) is in spu_run, add victim to rq.
Signed-off-by: Luke Browning <lukebrowning@us.ibm.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We need to acquire the parent i_mutex with I_MUTEX_PARENT to keep
lockdep happy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We should not requeue the victim context in find_victim if the owner is
not in spu_run. It's first not needed because leaving the context on
the spu is an optimization and second is harmful because it means the
owner could re-enter spu_run when the context is on the runqueue and
trip the BUG_ON in __spu_update_sched_info.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Creating a spufs context or gand using spu_create should send an inotify
event so that things like performance monitors have an easy way to find
out about newly created contexts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, page fault handlers don't issue a mfc restart if the context
switch pending flag is set, which can leave us with a hanging DMA after
a context restore.
This patch introduces fault pending flag that is set by the fault
handler and read by the context switch code, so that the latter can add
the restart bit at the right spot, after it has successfuly saved the
state of the mfc control register.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
SPU class 0 & 1 exceptions may occur in parallel, so we may end up
overwriting csa.dsisr.
This change adds dedicated fields for each class to the spu and the spu
context so that fault data is not overwritten.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we re-route SPU interrupts to the current cpu, which may be
on a remote node. In the case of time slicing, all spu interrupts will
end up routed to the same cpu, where the spusched_tick occurs.
This change routes mfc interrupts to the cpu where the controlling
thread last ran, provided that cpu is on the same node as the spu
(otherwise don't reroute interrupts).
This should improve performance and provide a more predictable
environment for processing spu exceptions. In the past we have seen
concurrent delivery of spu exceptions to two cpus. This eliminates that
concern.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
synchronize_irq() provides the serialization for
SPU_CONTEXT_SWITCH_PENDING which is read with a simple load. This
routine guarantees that the relevant interrupt handlers are not running,
so that the next time they do run they will see the update
memory value.
This must be done correctly so that exception handling code does not
restart the mfc in the middle of a context switch while we are trying
to atomically stop it and save state.
Signed-off-by: Luke Browning <lukebr@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There's currently no way to tell if spu_process_callback has
returned with the state mutex held, as -EINTR may be returned
by either the syscall or the spu_acquire fail case.
Instead, just do a non-interruptible mutex_lock here.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we update the SPU master run control bit (ie,
spu_enable_spu) in spufs_run_spu before we grab the context mutex. This
can result in races with other processes accessing this context's
resources.
This change moves the spu_enable_spu to after we have acquired the
context lock.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We currently have two issues with the MFC save code:
* save_mfc_decr doesn't handle a transition of 1 -> 0 of the Ds bit
* The Q bit may be stale in the CSA
This change fixes the first issue by clearing the relevant bits from
the MFC_CNTL value in the CSA before or-ing in the updated status.
Also, we add the Q bit to the updated status.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Currently, we can introduce invalid entries into the MFC queues:
1) context starts a DMA
2) context gets scheduled out during a DMA
- kernel saves MFC queue to CSA
- kernel saves 0x0 in csa->mfc_control_RW
3) context gets scheduled in
- csa->mfc_control[Q] ('queues empty') isn't set, so DMA queues are
restored from the CSA
4) context's DMA is completed
5) context gets scheduled out again, no DMA occuring this time
- kernel sees that MFC_CNTL[Q] ('queues empty') is set, so doesn't
touch saved queue data in CSA
- kernel saves 0x0 in csa->mfc_control_RW
6) context gets scheduled in
- csa->mfc_control[Q] ('queues empty') isn't set (we saved is as 0!),
so DMA queues are restored from the CSA
In this last restore, we've restored the queue status from step 2,
which are now invalid.
This change makes save_mfc_cntl() closer to the save/restore sequence,
as specified in the CBE handbook.
With changes from Luke Browning.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
When we issue a MFC purge request, we may inadvertantly clear the
suspended status.
This change adds the MFC_CNTL_SUSPEND_MASK when we issue a purge
request, so that the suspend bit is masked out.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
We may currently lose interrupts during SPE context switch, as we alter
the INT_Route register. Because the IIC uses a per-thread priority
status, changing the interrupt routing to a different thread means that
the IRQ is no longer masked by the priority status, so we end up with
two fasteoi IRQ handlers executing for the one irq_desc. The fasteoi
handler doesn't handle multiple IRQs, so drops the second one.
Fix this by using our own flow handler. This is based on
handle_edge_irq, but issues an eoi after IRQs are handled, and doesn't
do any mask/unmasking.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The sputrace module contained a trace entry for spu_acquire_saved, but
this marker was not placed anywhere. Fix this by adding a marker to the
routine.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Fix a typo in the marker for the find_victim function, which prevented
it from being traced. It previously read find_vitim.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The sputrace module contained a reference to a marker for
destroy_spu_context, but this marker did not appear in the code. Fix
this by adding a marker in the function.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The markers facility defines the marker parameters to be of the form
'name %format'. Add parameter names to sputrace, to specify the context
and %spu paramerters, instead of just specifying the '%format' part.
Signed-off-by: Julio M. Merino Vidal <jmerino@ac.upc.edu>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
There are userspace instrumentation tools that need to monitor spu
context switches. This patch adds a new file called 'switch_log' to
each spufs context directory that can be used to monitor the context
switches.
Context switch in/out and exit from spu_run are monitored after the
file was first opened and can be read from it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data
be setup before gluing PDE to main tree.
Add correct ->owner to proc_fops to fix reading/module unloading race.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds support for PCI Express port on Celleb. I/O space of this
PCI Express port is not mapped in memory space. So we use the
io-workaround mechanism to make accesses indirect.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves miscellaneous files for Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves SPU support code on Beat into platforms/cell/.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for mmu and iommu on Beat into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves files for Beat hvcall interfaces into platforms/cell/.
All files in this patch are used by celleb-beat only.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the SCC (Super Companion Chip) related code for celleb
into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the base code for celleb support into platforms/cell/.
All files in this patch are used by celleb-beat and celleb-native
commonly.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Now, we can use generic io-workarounds mechanism and the workaround
code for spider-pci. This changes Celleb PCI code to use spider-pci
code.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This splits cell io-workaround code into spider-pci dependent code and
a generic part, and also moves io-workarounds initialization into
cell_setup_phb.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace two open-coded occurences of the of_get_next_parent() logic.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, ppu-gdb can't trace spu infomation with coredump generated
by the kernel. While the core dumps notes have correct contents, they
have the wrong names, as the file descriptors used to generate the note
names are off-by-one. An application that opens a SPE context as fd 3,
the current core dump code will generate notes like:
SPU/4/mem
SPU/4/regs
etc.
This confuses GDB, which knows it is looking for SPE context 3 (from
parsing the spu_context_run system call arguments), and cannot find
any notes that match context 3.
This change corrects the file descriptor counting, to only increment
the fd until after we've written the note name.
Signed-off-by: Gerhard Stenzel <stenzel@de.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
During the context save process, we currently save the MFC command
channel after purging the MFC queues. This causes a systemsim warning,
as the command channel may be in an unknown state after the purge.
This change does the save before purging the MFC queues.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
During spu_process callback, we release then acquire the SPU, but keep a
pointer to the local store memory. Since the context may have been
scheduled out during the callback, the ls pointer may become invalid.
This change reacquires the pointer to the context local store after
spu_acquire()-ing, so that it isn't invalidated by a context switch.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
All of the single-value files in spufs are terminated by a newline,
except for signal1_type and signal2_type.
This change adds a trailing newline to these two files.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
The PCI bridge representing the PCIE root complex on Axon, contains
device BARs for a memory range and ROM that define inbound accesses.
This confuses the kernel resource management code -- the resources
need to be hidden when Axon is a host bridge.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The cell IOMMU code to parse the dma-ranges properties, used for the fixed
mapping, was broken in two ways for some devices.
Firstly it didn't cope with empty dma-ranges properties. An empty property
implies no translation so can be safely skipped.
The code also wrongly assumed it would be looking at PCI devices, and hard
coded the number of address and size cells.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At present, we can hit the BUG_ON in __spu_update_sched_info by reading
the regs file of a context between two calls to spu_run. The
spu_release_saved called by spufs_regs_read() is resulting in the (now
non-runnable) context being placed back on the run queue, so the next
call to spu_run ends up in the bug condition.
This change uses the SPU_SCHED_SPU_RUN flag to only reschedule a context
if it's still in spu_run().
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
commit 4ef11014 introduced a usage of SCHED_IDLE to detect when
a context is within spu_run.
Instead of SCHED_IDLE (which has other meaning), add a flag to
sched_flags to tell if a context should be running.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>