Using ata_link_warn() instead of ata_link_printk().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Using ata_dev_info() instead of ata_dev_printk().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Using ata_<foo>_<level>() instead of ata_<foo>_printk().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
These structs are used only for ahci_platform.c, so they should be
static. Thanks to Fengguang for the (automated) suggestion.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This reverts commit 1645bf1b51.
Brian Norris writes:
> David Daney writes:
> I can seem to find it. Without knowing what that does, I would be inclined
> to NACK the whole thing.
A NACK is probably the right thing. I was mostly converting a few
other drivers which used some simple, common patterns to use my new
common code, but this driver was missing it altogether. It looks like
there may be bigger issues, though, as you point out.
> This patch is likely to be incomplete as the driver is also missing the
> module_exit() things.
>
> It might be simpler to just make the driver "bool" instead of "tristate" in
> the Kconfig.
As noted earlier, I don't have much interest in this driver. I agree
that there are some other issues with the driver; I think it leaks
memory if it is ever allowed to unload, for one. Feel free to submit
an alternative patch to prevent this driver from being built as a
module.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This reverts commit de90cd71f6.
Shane Huang writes:
Please suspend this patch because I just received two new
DevSlp drives but found word 78 bit 5 is _not_ set.
I'm checking with the drive vendor whether he gave me
the wrong information. If bit 5 is not the necessary and
sufficient condition, I will implement another patch to
replace ata_device->sata_settings into ->devslp_timing.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
I failed to include <linux/libata.h>, causing this error:
drivers/ata/pata_of_platform.c:93: error: 'ata_platform_remove_one' undeclared here (not in a function)
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This driver does not detach and remove its ata_host properly on device
removal. Add the common .remove helper.
Note: I do not know this driver well enough to ensure this is the right
thing to do. Merge this patch with caution.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
All users of __pata_platform_remove() have been converted to utilize the
common ata_platform_remove_one().
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This relatively simple boiler-plate code is repeated in several platform
drivers. We should implement a common version in libata.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
AHCI platform devices may provide an exit() routine, via
ahci_platform_data, that powers off the SATA core. Such a routine should
be executed from the ata_port_operations host_stop() hook. That way, the
ATA subsystem can perform any last-minute hardware cleanup (via devres,
for example), then trigger the power-off at the appropriate time.
This patch fixes bus errors triggered during module removal or device
unbinding, seen on an SoC SATA core.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The ahci_platform driver can now use the module_platform_driver() macro.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
platform_driver_probe() should be used for registering this driver only
if we want to
"...remove its run-once probe() infrastructure from memory after the
driver has bound to the device."
However, we may want to leave the probe infrastructure in place in order
to support binding/unbinding a device dynamically. This is useful, for
instance, as a power management mechanism, where a device can be totally
powered down when unbound (whereas with runtime power management,
powering down the SATA core would incur unacceptable loss of
functionality).
Thus, convert this driver to use platform_driver_register().
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ata_device->dma_mode's initial value is zero, which is not a valid dma
mode, but ata_dma_enabled will return true for this value. This patch
sets dma_mode to 0xff in reset function, so that ata_dma_enabled will
not return true for this case, or it will cause problem for pata_acpi.
The corrsponding bugzilla page is at:
https://bugzilla.kernel.org/show_bug.cgi?id=49151
Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Tested-by: Szymon Janc <szymon@janc.net.pl>
Tested-by: Dutra Julio <dutra.julio@gmail.com>
Acked-by: Alan Cox <alan@linux.intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
NCQ capability was used to check availability of SATA Settings page
from Identify Device Data Log, which contains DevSlp timing variables.
It does not work on some HDDs and leads to error messages.
IDENTIFY word 78 bit 5(Hardware Feature Control) should be used.
Quoting SATA spec 3.1:
If Hardware Feature Control is supported, then:
a) IDENTIFY DEVICE data word 78 bit 5 (see 13.2.1.18) shall be
set to one;
b) the SET FEATURES Select Hardware Feature Control subcommand
shall be supported (see 13.3.8);
c) page 08h of the Identify Device Data log (see 13.7.7) shall
be supported;
This patch is not tested on SATA HDD with DevSlp supported.
Reported-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Shane Huang <shane.huang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Commit 66fa7f215 "libata-acpi: improve ACPI disabling" introdcued the
behaviour of disabling ATA ACPI if ata_acpi_on_devcfg failed the 2nd
time, but commit 30dcf76ac dropped this behaviour and this caused
problem for Dimitris Damigos, where his laptop can not resume correctly.
The bugzilla page for it is:
https://bugzilla.kernel.org/show_bug.cgi?id=49331
The problem is, ata_dev_push_id will fail the 2nd time it is invoked,
and due to disabling ACPI code is dropped, ata_acpi_on_devcfg which
calls ata_dev_push_id will keep failing and eventually made the device
disabled.
This patch restores the original behaviour, if acpi failed the 2nd time,
disable acpi functionality for the device(and we do not event need to
add a debug message for this as it is still there ;-).
Reported-by: Dimitris Damigos <damigos@freemail.gr>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
dev_<level> calls take less code than dev_printk(KERN_<LEVEL>
and reducing object size is good.
Coalesce formats for easier grep.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
'acdev->qc', 'acdev->qc->ap', and 'acdev->qc->tf' expressions are used multiple
times in this function, so it makes sense to use the local variables for them.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ahci_highbank_hardreset() uses bare number for the BSY bit of the ATA status
register, despite it is #define'd in <linux/ata.h>
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
... because those functions don't use this parameter.
While at it, correctly align 'total_len' parameter.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The variable addr is initialized but never used
otherwise, so remove the unused variable.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The variable addr is initialized but never used
otherwise, so remove the unused variable.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The variable port_flags is initialized but never used
otherwise, so remove the unused variable.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
I am working on a device which uses the cs5536 pata driver. There
are some broken hardware revisions out in the field, which can be
detected via DMI. On older versions with an embedded BIOS I
used libata.dma=0 to disable dma completely.
Now we are switching to a coreboot/seabios based BIOS where we
have DMI support and so I think its a good idea to get rid of
all those hacky kernel parameters as the same image
is used other devices where libata.dma=0 is not a good idea.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
An earlier commit cd006086fa ("ata_piix:
defer disks to the Hyper-V drivers by default") broke MS Virtual PC
guests. Hyper-V guests and Virtual PC guests have nearly identical DMI
info. As a result the driver does currently ignore the emulated hardware
in Virtual PC guests and defers the handling to hv_blkvsc. Since Virtual
PC does not offer paravirtualized drivers no disks will be found in the
guest.
One difference in the DMI info is the product version. This patch adds a
match for MS Virtual PC 2007 and "unignores" the emulated hardware.
This was reported for openSuSE 12.1 in bugzilla:
https://bugzilla.novell.com/show_bug.cgi?id=737532
Here is a detailed list of DMI info from example guests:
hwinfo --bios:
virtual pc guest:
System Info: #1
Manufacturer: "Microsoft Corporation"
Product: "Virtual Machine"
Version: "VS2005R2"
Serial: "3178-9905-1533-4840-9282-0569-59"
UUID: undefined, but settable
Wake-up: 0x06 (Power Switch)
Board Info: #2
Manufacturer: "Microsoft Corporation"
Product: "Virtual Machine"
Version: "5.0"
Serial: "3178-9905-1533-4840-9282-0569-59"
Chassis Info: #3
Manufacturer: "Microsoft Corporation"
Version: "5.0"
Serial: "3178-9905-1533-4840-9282-0569-59"
Asset Tag: "7188-3705-6309-9738-9645-0364-00"
Type: 0x03 (Desktop)
Bootup State: 0x03 (Safe)
Power Supply State: 0x03 (Safe)
Thermal State: 0x01 (Other)
Security Status: 0x01 (Other)
win2k8 guest:
System Info: #1
Manufacturer: "Microsoft Corporation"
Product: "Virtual Machine"
Version: "7.0"
Serial: "9106-3420-9819-5495-1514-2075-48"
UUID: undefined, but settable
Wake-up: 0x06 (Power Switch)
Board Info: #2
Manufacturer: "Microsoft Corporation"
Product: "Virtual Machine"
Version: "7.0"
Serial: "9106-3420-9819-5495-1514-2075-48"
Chassis Info: #3
Manufacturer: "Microsoft Corporation"
Version: "7.0"
Serial: "9106-3420-9819-5495-1514-2075-48"
Asset Tag: "7076-9522-6699-1042-9501-1785-77"
Type: 0x03 (Desktop)
Bootup State: 0x03 (Safe)
Power Supply State: 0x03 (Safe)
Thermal State: 0x01 (Other)
Security Status: 0x01 (Other)
win2k12 guest:
System Info: #1
Manufacturer: "Microsoft Corporation"
Product: "Virtual Machine"
Version: "7.0"
Serial: "8179-1954-0187-0085-3868-2270-14"
UUID: undefined, but settable
Wake-up: 0x06 (Power Switch)
Board Info: #2
Manufacturer: "Microsoft Corporation"
Product: "Virtual Machine"
Version: "7.0"
Serial: "8179-1954-0187-0085-3868-2270-14"
Chassis Info: #3
Manufacturer: "Microsoft Corporation"
Version: "7.0"
Serial: "8179-1954-0187-0085-3868-2270-14"
Asset Tag: "8374-0485-4557-6331-0620-5845-25"
Type: 0x03 (Desktop)
Bootup State: 0x03 (Safe)
Power Supply State: 0x03 (Safe)
Thermal State: 0x01 (Other)
Security Status: 0x01 (Other)
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
sata_promise's pdc_hard_reset_port() needs to serialize because it
flips a port-specific bit in controller register that's shared by
all ports. The code takes the ata host lock for this, but that's
broken because an interrupt may arrive on our irq during the hard
reset sequence, and that too will take the ata host lock. With
lockdep enabled a big nasty warning is seen.
Fixed by adding private state to the ata host structure, containing
a second lock used only for serializing the hard reset sequences.
This eliminated the lockdep warnings both on my test rig and on
the original reporter's machine.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Tested-by: Adko Branil <adkobranil@yahoo.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Merge misc fixes from Andrew Morton:
"8 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches)
futex: avoid wake_futex() for a PI futex_q
watchdog: using u64 in get_sample_period()
writeback: put unused inodes to LRU after writeback completion
mm: vmscan: check for fatal signals iff the process was throttled
Revert "mm: remove __GFP_NO_KSWAPD"
proc: check vma->vm_file before dereferencing
UAPI: strip the _UAPI prefix from header guards during header installation
include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID
Here is a single fix for a reported regression in 3.7-rc5 for the tty
layer. This fix has been in the linux-next tree and solves the reported
problem.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlCz9SgACgkQMUfUDdst+ylwJACgrm6WSMdZy+Dcg+4lY+zLftUq
UhQAn1Y00nwte19cNvJLqWYgJlqkcUBw
=NAlw
-----END PGP SIGNATURE-----
Merge tag 'tty-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull TTY fix from Greg Kroah-Hartman:
"Here is a single fix for a reported regression in 3.7-rc5 for the tty
layer. This fix has been in the linux-next tree and solves the
reported problem.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'tty-3.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty vt: Fix a regression in command line edition
We have:
- A twl fix preventing a buffer overflow.
- A wm5102 register patch fix.
- A wm5110 error misreport fix.
- Arizona fixes: Use the right array size when adding subdevices, correctly
report underclocked events, synchronize register cache after reset.
- A twl4030 fix for preventing the system to hang from an interrupt flood.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQs33KAAoJEIqAPN1PVmxKlI8P/2wz99BLHEGG4uEZyRprOnA1
B1I0TYmrHidz9XqX9rB/ytWnyVg3ZumpwJOWETGcpGE842/jw9yKtYcra9R2JZ7H
IdobU2dQxlB+7J5Yoj7SuK9gY4eAN3/krKb7F0x8YKYjHleAwWwjzBVHXsK4I0oJ
wPuN5NnmEs7wqT9y74WgLwZCZmbvjDYBFNW3ZrwD28KJAHKak1FE2RxtiTuwUv4o
21j4s9XKlyI0mNgNPXQ0v5P6KZdceGBwXrjDa1+1onEuOsdXQF0ZHXfIfU24j/TB
rRzcWo8sycAf35HvPdfHOd6vwdmIEx36i9nuJLVUJEAxkkEGKar0ZfIhDXuvGFCm
bVS05+ZdZTbEWgiRRivpOuiL9P5KFElG5N/QhVhpNRkcOCKo3eGn/NDjIuv7hl/p
Y+nJFrqSmdzvdN2MUCh+dXoe4LTb3jdB1qRpNr3VMlyL0I/8w7YW9ypB2bitN+IE
0rZ4oK/BDHHRvQLIr4+frNbhG4OwH7XZmmtdiB10GMLw12akOkO2L0Zm2TTG/68g
asVECywpqEGLBvyn96n0PYO489Obk72VU9oUz1KftIdRTxd9xj+Eo0v9W4lIMb47
rscgjCrWTZs/kar52Em3Wc3q6eHfTG/h8uqN/R/ONwQzPwBXonHk1UxJI5+a1VXK
tvHiz+RvjfH0UW58Bc5w
=tHAv
-----END PGP SIGNATURE-----
Merge tag 'mfd-for-linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
Pull MFD fixes from Samuel Ortiz:
- A twl fix preventing a buffer overflow.
- A wm5102 register patch fix.
- A wm5110 error misreport fix.
- Arizona fixes: Use the right array size when adding subdevices,
correctly report underclocked events, synchronize register cache
after reset.
- A twl4030 fix for preventing the system to hang from an interrupt
flood.
* tag 'mfd-for-linus-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: twl4030: Fix chained irq handling on resume from suspend
mfd: arizona: Sync regcache after reset
mfd: arizona: Correctly report when AIF2/AIF1 is underclocked
mfd: arizona: Use correct array for ARRAY_SIZE in mfd_add_devices call
mfd: wm5110: Disable control interface error report for WM5110 rev B
mfd: wm5102: Update register patch for latest evaluation
mfd: twl-core: Fix chip ID for the twl6030-pwm module
Pull ARM fixes from Russell King:
"Not much here, just a couple minor/cosmetic fixes and a patch for the
decompressor which fixes problems with modern GCC and CPUs."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7583/1: decompressor: Enable unaligned memory access for v6 and above
ARM: 7572/1: proc-v6.S: fix comment
ARM: 7570/1: quiet down the non make -s output
Pull ext3 regression fix from Jan Kara:
"Fix an ext3 regression introduced during 3.7 merge window. It leads
to deadlock if you stress the filesystem in the right way (luckily
only if blocksize < pagesize)."
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
jbd: Fix lock ordering bug in journal_unmap_buffer()
Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed. Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.
While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.
Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected. To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.
This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.
[akpm@linux-foundation.org: tidy up the WARN()]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Dave Jones <davej@redat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In get_sample_period(), unsigned long is not enough:
watchdog_thresh * 2 * (NSEC_PER_SEC / 5)
case1:
watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800
case2:
set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000
In case2, we need use u64 to express the sample period. Otherwise,
changing the threshold thru proc often can not be successful.
Signed-off-by: liu chuansheng <chuansheng.liu@intel.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 169ebd9013 ("writeback: Avoid iput() from flusher thread")
removed iget-iput pair from inode writeback. As a side effect, inodes
that are dirty during iput_final() call won't be ever added to inode LRU
(iput_final() doesn't add dirty inodes to LRU and later when the inode
is cleaned there's noone to add the inode there). Thus inodes are
effectively unreclaimable until someone looks them up again.
The practical effect of this bug is limited by the fact that inodes are
pinned by a dentry for long enough that the inode gets cleaned. But
still the bug can have nasty consequences leading up to OOM conditions
under certain circumstances. Following can easily reproduce the
problem:
for (( i = 0; i < 1000; i++ )); do
mkdir $i
for (( j = 0; j < 1000; j++ )); do
touch $i/$j
echo 2 > /proc/sys/vm/drop_caches
done
done
then one needs to run 'sync; ls -lR' to make inodes reclaimable again.
We fix the issue by inserting unused clean inodes into the LRU after
writeback finishes in inode_sync_complete().
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org> [3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 5515061d22 ("mm: throttle direct reclaimers if PF_MEMALLOC
reserves are low and swap is backed by network storage") introduced a
check for fatal signals after a process gets throttled for network
storage. The intention was that if a process was throttled and got
killed that it should not trigger the OOM killer. As pointed out by
Minchan Kim and David Rientjes, this check is in the wrong place and too
broad. If a system is in am OOM situation and a process is exiting, it
can loop in __alloc_pages_slowpath() and calling direct reclaim in a
loop. As the fatal signal is pending it returns 1 as if it is making
forward progress and can effectively deadlock.
This patch moves the fatal_signal_pending() check after throttling to
throttle_direct_reclaim() where it belongs. If the process is killed
while throttled, it will return immediately without direct reclaim
except now it will have TIF_MEMDIE set and will use the PFMEMALLOC
reserves.
Minchan pointed out that it may be better to direct reclaim before
returning to avoid using the reserves because there may be pages that
can easily reclaim that would avoid using the reserves. However, we do
no such targetted reclaim and there is no guarantee that suitable pages
are available. As it is expected that this throttling happens when
swap-over-NFS is used there is a possibility that the process will
instead swap which may allocate network buffers from the PFMEMALLOC
reserves. Hence, in the swap-over-nfs case where a process can be
throtted and be killed it can use the reserves to exit or it can
potentially use reserves to swap a few pages and then exit. This patch
takes the option of using the reserves if necessary to allow the process
exit quickly.
If this patch passes review it should be considered a -stable candidate
for 3.6.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following
Hmm, so it's just took longer to hit the problem and observe
kswapd0 spinning on my CPU again - it's not as endless like before -
but still it easily eats minutes - it helps to turn off Firefox
or TB (memory hungry apps) so kswapd0 stops soon - and restart
those apps again. (And I still have like >1GB of cached memory)
kswapd0 R running task 0 30 2 0x00000000
Call Trace:
preempt_schedule+0x42/0x60
_raw_spin_unlock+0x55/0x60
put_super+0x31/0x40
drop_super+0x22/0x30
prune_super+0x149/0x1b0
shrink_slab+0xba/0x510
The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.
The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.
If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time. This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.
The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing. As 3.7 is very close to release and this
is not a bug we should release with, a safer path is to revert "mm:
remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
out the balance_pgdat() logic in general.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 7b540d0646 ("proc_map_files_readdir(): don't bother with
grabbing files") switched proc_map_files_readdir() to use @f_mode
directly instead of grabbing @file reference, but same time the test for
@vm_file presence was lost leading to nil dereference. The patch brings
the test back.
The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped
(which is set to 'n' by default) so the bug doesn't affect regular
kernels.
The regression is 3.7-rc1 only as far as I can tell.
[gorcunov@openvz.org: provided changelog]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Strip the _UAPI prefix from header guards during header installation so
that any userspace dependencies aren't affected. glibc, for example,
checks for linux/types.h, linux/kernel.h, linux/compiler.h and
linux/list.h by their guards - though the last two aren't actually
exported.
libtool: compile: gcc -std=gnu99 -DHAVE_CONFIG_H -I. -Wall -Werror -Wformat -Wformat-security -D_FORTIFY_SOURCE=2 -fno-delete-null-pointer-checks -fstack-protector -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -c child.c -fPIC -DPIC -o .libs/child.o
In file included from cli.c:20:0:
common.h:152:8: error: redefinition of 'struct sysinfo'
In file included from /usr/include/linux/kernel.h:4:0,
from /usr/include/linux/sysctl.h:25,
from /usr/include/sys/sysctl.h:43,
from common.h:50,
from cli.c:20:
/usr/include/linux/sysinfo.h:7:8: note: originally defined here
Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit baf05aa927 ("bug: introduce BUILD_BUG_ON_INVALID() macro")
introduces this macro only when _CHECKER_ is not defined. Define a
silent macro in the else condition to fix following sparse warning:
mm/filemap.c:395:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
mm/filemap.c:396:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
mm/filemap.c:397:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
include/linux/mm.h:419:9: error: undefined identifier 'BUILD_BUG_ON_INVALID'
include/linux/mm.h:419:9: error: not a function <noident>
Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>