This patch fixes the mmap/truncate race that was fixed for delayed
allocation by merging ext4_{journalled,normal,da}_writepage() into
ext4_writepage().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
It is possible to see buffer_heads which are not mapped in the
writepage callback in the following scneario (where the fs blocksize
is 1k and the page size is 4k):
1) truncate(f, 1024)
2) mmap(f, 0, 4096)
3) a[0] = 'a'
4) truncate(f, 4096)
5) writepage(...)
Now if we get a writepage callback immediately after (4) and before an
attempt to write at any other offset via mmap address (which implies we
are yet to get a pagefault and do a get_block) what we would have is the
page which is dirty have first block allocated and the other three
buffer_heads unmapped.
In the above case the writepage should go ahead and try to write the
first blocks and clear the page_dirty flag. Further attempts to write
to the page will again create a fault and result in allocating blocks
and marking page dirty. If we don't write any other offset via mmap
address we would still have written the first block to the disk and
rest of the space will be considered as a hole.
So to address this, we change all of the places where we look for
delayed, unmapped, or unwritten buffer heads, and only check for
delayed or unwritten buffer heads instead.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Buffer heads outside i_size will be unmapped. So when we
are doing "walk_page_buffers" limit ourself to i_size.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Josef Bacik <jbacik@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
----
The goal inode is specificed by inode number which belongs
to [1; s_inodes_count].
Signed-off-by: Johann Lombardi <johann@sun.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If there is no journal, ext4_should_writeback_data() should return
TRUE. This will fix ext4_set_aops() to set ext4_da_ops in the case of
delayed allocation; otherwise ext4_journaled_aops gets used by
default, which doesn't handle delayed allocation properly.
The advantage of using ext4_should_writeback_data() approach is that
it should handle nobh better as well.
Thanks to Curt Wohlgemuth for investigating this problem, and Aneesh
Kumar for suggesting this approach.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When we have space in the extent tree leaf node we should be able to
insert the extent with much less journal credits. The code was doing
proper calculation but missed a return statement.
Reported-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Contents of long symlinks is written via standard write methods. So
when the write fails, we add inode to orphan list. But symlinks don't
have .truncate method defined so nobody properly removes them from the
on disk orphan list.
Fix this by calling ext4_truncate() directly instead of calling
vmtruncate() (which is saner anyway since we don't need anything
vmtruncate() does except from calling .truncate in these paths). We
also add inode to orphan list only if ext4_can_truncate() is true
(currently, it can be false for symlinks when there are no blocks
allocated) - otherwise orphan list processing will complain and
ext4_truncate() will not remove inode from on-disk orphan list.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The following race can happen:
CPU1 CPU2
checkpointing code checks the buffer, adds
it to an array for writeback
do_get_write_access()
...
lock_buffer()
unlock_buffer()
flush_batch() submits the buffer for IO
__jbd2_journal_file_buffer()
So a buffer under writeout is returned from
do_get_write_access(). Since the filesystem code relies on the fact
that journaled buffers cannot be written out, it does not take the
buffer lock and so it can modify buffer while it is under
writeout. That can lead to a filesystem corruption if we crash at the
right moment.
We fix the problem by clearing the buffer dirty bit under buffer_lock
even if the buffer is on BJ_None list. Actually, we clear the dirty
bit regardless the list the buffer is in and warn about the fact if
the buffer is already journalled.
Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>.
Reported-by: dingdinghua <dingdinghua85@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The ext4 module uses rcu_call() thus it should use rcu_barrier()on
module unload.
The kmem cache ext4_pspace_cachep is sometimes free'ed using
call_rcu() callbacks. Thus, we must wait for completion of call_rcu()
before doing kmem_cache_destroy().
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Ted noticed a stack-deep callchain through
writepages->ext4_mb_regular_allocator->ext4_mb_init_cache->submit_bh ...
With all the static functions in mballoc.c, gcc helpfully
inlines for us, and we get something like this:
ext4_mb_regular_allocator (232 bytes stack)
ext4_mb_init_cache (232 bytes stack)
submit_bh (starts 464 deeper)
the 2 ext4 functions here get several others inlined; by telling
gcc not to inline them, we can save stack space for when we
head off into submit_bh land and associated block layer callchains.
The following noinlined functions are only called once, so this
won't impact any other callchains:
ext4_mb_regular_allocator (104) (was 232)
ext4_mb_find_by_goal (56) (noinlined)
ext4_mb_init_group (24) (noinlined)
ext4_mb_init_cache (136) (was 232)
ext4_mb_generate_buddy (88) (noinlined)
ext4_mb_generate_from_pa (40) (noinlined)
submit_bh
ext4_mb_simple_scan_group (24) (noinlined)
ext4_mb_scan_aligned (56) (noinlined)
ext4_mb_complex_scan_group (40) (noinlined)
ext4_mb_try_best_found (24) (noinlined)
now when we head off into submit_bh() we're only 264 bytes deeper
in stack than when we entered ext4_mb_regular_allocator()
(vs. 464 bytes before). Every 200 bytes helps. :)
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Commit 537a1bf059 (fbdev: add mutex for
fb_mmap locking) introduces a ->mm_lock mutex for protecting smem
assignments. Unfortunately in the case of sm501fb these happen quite
early in the initialization code, well before the mutex_init() that takes
place in register_framebuffer(), leading to:
Badness at kernel/mutex.c:207
Pid : 1, Comm: swapper
CPU : 0 Not tainted (2.6.31-rc1-00284-g529ba0d-dirty #2273)
PC is at __mutex_lock_slowpath+0x72/0x1bc
PR is at __mutex_lock_slowpath+0x66/0x1bc
...
matroxfb appears to have the same issue and has solved it with an early
mutex_init(), so we do the same for sm501fb.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cc: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: Fix CONFIG_FLATMEM version of pfn_valid()
MIPS: Reorganize Cavium OCTEON PCI support.
Update Yoichi Yuasa's e-mail address
MIPS: Allow suspend and hibernation again on uniprocessor kernels.
MIPS: 64-bit: Fix o32 core dump
MIPS: BC47xx: Fix SSB irq setup
MIPS: CMP: Update sync-r4k for current kernel
MIPS: CMP: Move gcmp_probe to before the SMP ops
MIPS: CMP: activate CMP support
MIPS: CMP: Extend IPI handling to CPU number
MIPS: CMP: Extend the GIC IPI interrupts beyond 32
MIPS: Define __arch_swab64 for all mips r2 cpus
MIPS: Update VR41xx GPIO driver to use gpiolib
MIPS: Hookup new syscalls sys_rt_tgsigqueueinfo and sys_perf_counter_open.
MIPS: Malta: Remove unnecessary function prototypes
MIPS: MT: Remove unnecessary semicolons
MIPS: Add support for Texas Instruments AR7 System-on-a-Chip
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
sound: do not set DEVNAME for OSS devices
ALSA: hda - Add sanity check in PCM open callback
ALSA: hda - Call snd_pcm_lib_hw_rates() again after codec open callback
ALSA: hda - Avoid invalid formats and rates with shared SPDIF
ALSA: hda - Improve ASUS eeePC 1000 mixer
ALSA: hda - Add GPIO1 control at muting with HP laptops
ALSA: usx2y - reparent sound device
ALSA: snd_usb_caiaq: reparent sound device
sound: virtuoso: fix Xonar D1/DX silence after resume
ASoC: Only disable pxa2xx-i2s clocks if we enabled them
ALSA: hda - Add quirk for HP 6930p
ALSA: hda - Add missing static to patch_ca0110()
ASoC: OMAP: fix OMAP1510 broken PCM pointer callback
ASoC: remove BROKEN from Efika and pcm030 fabric drivers
ASoC: Fix typo in MPC5200 PSC AC97 driver Kconfig
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
kbuild: finally remove the obsolete variable $TOPDIR
gitignore: ignore scripts/ihex2fw
Kbuild: Disable the -Wformat-security gcc flag
gitignore: ignore gcov output files
kbuild: deb-pkg ship changelog
Add new __init_task_data macro to be used in arch init_task.c files.
asm-generic/vmlinux.lds.h: shuffle INIT_TASK* macro names in vmlinux.lds.h
Add new macros for page-aligned data and bss sections.
asm-generic/vmlinux.lds.h: Fix up RW_DATA_SECTION definition.
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: don't merge requests of different failfast settings
cciss: Ignore stale commands after reboot
* fix/hda:
ALSA: hda - Add sanity check in PCM open callback
ALSA: hda - Call snd_pcm_lib_hw_rates() again after codec open callback
ALSA: hda - Avoid invalid formats and rates with shared SPDIF
ALSA: hda - Improve ASUS eeePC 1000 mixer
ALSA: hda - Add GPIO1 control at muting with HP laptops
Add some sanity checks of struct snd_pcm_hardware fields in the PCM
open callback of hda driver. This makes a bit easier to debug any PCM
setup errors in the codec side.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The PCM rates bit field may have been changed by the codec open callback.
In that case, we need to reset rate_min and rate_max. So, simply call
snd_pcm_lib_hw_rates() again after the codec open callback.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Check whether formats and rates don't result in zero due to the
restriction of SPDIF sharing. If any of them can be zero, disable
the SPDIF sharing mode instead. Otherwise it will lead to a PCM
configuration error.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Block layer used to merge requests and bios with different failfast
settings. This caused regular IOs to fail prematurely when they were
merged into failfast requests for readahead.
Niel Lambrechts could trigger the problem semi-reliably on ext4 when
resuming from STR. ext4 uses readahead when reading inodes and
combined with the deterministic extra SATA PHY exception cycle during
resume on the specific configuration, non-readahead inode read would
fail causing ext4 errors. Please read the following thread for
details.
http://lkml.org/lkml/2009/5/23/21
This patch makes block layer reject merging if the failfast settings
don't match. This is correct but likely to lower IO performance by
preventing regular IOs from mingling into surrounding readahead
requests. Changes to allow such mixed merges and handle errors
correctly will be added later.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Niel Lambrechts <niel.lambrechts@gmail.com>
Cc: Theodore Tso <tytso@mit.edu>
Signed-off-by: Jens Axboe <axboe@carl.(none)>
When doing an unexpected shutdown like kexec the cciss
firmware might still have some commands in flight, which
it is trying to complete.
The driver is doing it's best on resetting the HBA,
but sadly there's a firmware issue causing the firmware
_not_ to abort or drop old commands.
So the firmware will send us commands which we haven't
accounted for, causing the driver to panic.
With this patch we're just ignoring these commands as
there is nothing we could be doing with them anyway.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Acked-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Jens Axboe <axboe@carl.(none)>
For systems which do not define PHYS_OFFSET as 0 pfn_valid() may falsely
have returned 0 on most configurations. Bug introduced by commit
752fbeb2e3555c0d236e992f1195fd7ce30e728d (linux-mips.org) rsp.
6f284a2ce7 (kernel.org) titled "[MIPS]
FLATMEM: introduce PHYS_OFFSET."
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Move the cavium PCI files to the arch/mips/pci directory. Also cleanup
comment formatting and code layout. Code from pci-common.c, was moved
into other files.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
If an o32 process generates a core dump on a 64 bit kernel, the core file
will not be correctly recognized. This is because ELF_CORE_COPY_REGS and
ELF_CORE_COPY_TASK_REGS are not correctly defined for o32 and will use
the default register set which would be CONFIG_64BIT in asm/elf.h.
So we'll switch to use the right register defines in this situation by
checking for WANT_COMPAT_REG_H and use the right defines of
ELF_CORE_COPY_REGS and ELF_CORE_COPY_TASK_REGS.
[Ralf: made ELF_CORE_COPY_TASK_REGS() bullet-proof against funny arguments.]
Signed-off-by: Yong Zhang <yong.zhang@windriver.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The current ssb irq setup in ssb_mipscore_init has the problem that it
configures some device on some irq without checking that the irq is not
taken by an other device.
For example in my case PCI host is on irq 0 and IPSEC on irq 3.
The current code:
- store in dev->irq that IPSEC irq is 3 + 2
- do a set_irq 0->3 on PCI host
But now IPSEC irq is not routed anymore to the mips code and dev->irq is
wrong. This causes a problem described in [1].
This patch tries to solve the problem by making set_irq configure the
device we want to take the irq on the shared irq0. The previous example
becomes:
- store in dev->irq that IPSEC irq is 3 + 2
- do a set_irq 0->3 on PCI host:
- irq 3 is already taken by IPSEC. do a set_irq 3->0 on IPSEC
I also added some code to print the irq configuration after irq setup to
allow easier debugging. And I add extra checking in ssb_mips_irq to report
device without irq or device with not routed irq.
[1] http://www.danm.de/files/src/bcm5365p/REPORTED_DEVICES
Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr>
Acked-by : Michael Buesch <mb@bu3sch.de>
Tested-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This revises the sync-4k so it will boot and operate since the removal of
expirelo from the timer code.
Signed-off-by: Tim Anderson <tanderson@mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This is to move the gcmp_probe call to before the use of and selection of
the smp_ops functions. This allows malta with 1004K to work.
Signed-off-by: Tim Anderson <tanderson@mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Most of the CMP support was added before, this mostly correct compile
problems but adds a platform specific translation for the interrupt number
based on cpu number.
Signed-off-by: Tim Anderson <tanderson@mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This takes the current IPI interrupt assignment from the fix number of 4
to the number of CPUs defined in the system.
Signed-off-by: Tim Anderson <tanderson@mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch extends the GIC interrupt handling beyond the current 32 bit
range as well as extending the number of interrupts based on the number
of CPUs.
Signed-off-by: Tim Anderson <tanderson@mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Some CPUs implement mipsr2, but because they are a super-set of mips64r2 do
not define CONFIG_CPU_MIPS64_R2. Cavium OCTEON falls into this category.
We would still like to use the optimized implementation, so since we have
already checked for CONFIG_CPU_MIPSR2, checking for CONFIG_64BIT instead of
CONFIG_CPU_MIPS64_R2 is sufficient.
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[Ralf: I fixed up the numbering in the comment in scall64-n32.S.]
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch adds support for the Texas Instruments AR7 System-on-a-Chip.
It supports the TNETD7100, 7200 and 7300 versions of the SoC.
Signed-off-by: Matteo Croce <matteo@openwrt.org>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Eugene Konev <ejka@openwrt.org>
Signed-off-by: Nicolas Thill <nico@openwrt.org>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
nfsd_open() gets an unrefcounted pointer to the current process's effective
credentials at the top of the function, then calls nfsd_setuser() via
fh_verify() - which may replace and destroy the current process's effective
credentials - and then passes the unrefcounted pointer to dentry_open() - but
the credentials may have been destroyed by this point.
Instead, the value from current_cred() should be passed directly to
dentry_open() as one of its arguments, rather than being cached in a variable.
Possibly fh_verify() should return the creds to use.
This is a regression introduced by
745ca2475a "CRED: Pass credentials through
dentry_open()".
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-and-Verified-By: Steve Dickson <steved@redhat.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>