This patch lets ext4_group_add_blocks() return an error code if it
fails, so that upper functions can handle error correctly.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
A filesystem with errors is not allowed to being resized, otherwise,
it is easy to destroy the filesystem.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Before this patch, parallel resizers are allowed and protected by a
mutex lock, actually, there is no need to support parallel resizer, so
this patch prevents parallel resizers by atmoic bit ops, like
lock_page() and unlock_page() do.
To do this, the patch removed the mutex lock s_resize_lock from struct
ext4_sb_info and added a unsigned long field named s_resize_flags
which inidicates if there is a resizer.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There are two wrapper functions which do exactly the same thing:
ext4_journal_release_buffer(), and ext4_handle_release_buffer(). In
addition, ext4_xattr_block_set() calls jbd2_journal_release_buffer()
directly.
Unify all of the code to use ext4_handle_release_buffer(), and get rid
of ext4_journal_release_buffer().
Signed-off-by: Amir Goldstein <amir73il@users.sf.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Compile 2.6.38-rc1 with turning EXT4FS_DEBUG on,
we get following compile warnings. This patch fixes them.
CC fs/ext4/hash.o
CC fs/ext4/resize.o
fs/ext4/resize.c: In function 'setup_new_group_blocks':
fs/ext4/resize.c:233:2: warning: format '%#04llx' expects type 'long long
unsigned int', but argument 3 has type 'long unsigned int'
fs/ext4/resize.c:251:2: warning: format '%#04llx' expects type 'long long
unsigned int', but argument 3 has type 'long unsigned int'
CC fs/ext4/extents.o
CC fs/ext4/ext4_jbd2.o
CC fs/ext4/migrate.o
Reported-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
ext4: fix trimming starting with block 0 with small blocksize
ext4: revert buggy trim overflow patch
ext4: don't pass entire map to check_eofblocks_fl
ext4: fix memory leak in ext4_free_branches
ext4: remove ext4_mb_return_to_preallocation()
ext4: flush the i_completed_io_list during ext4_truncate
ext4: add error checking to calls to ext4_handle_dirty_metadata()
ext4: fix trimming of a single group
ext4: fix uninitialized variable in ext4_register_li_request
ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary
ext4: drop i_state_flags on architectures with 64-bit longs
ext4: reorder ext4_inode_info structure elements to remove unneeded padding
ext4: drop ec_type from the ext4_ext_cache structure
ext4: use ext4_lblk_t instead of sector_t for logical blocks
ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED
ext4: fix 32bit overflow in ext4_ext_find_goal()
ext4: add more error checks to ext4_mkdir()
ext4: ext4_ext_migrate should use NULL not 0
ext4: Use ext4_error_file() to print the pathname to the corrupted inode
ext4: use IS_ERR() to check for errors in ext4_error_file
...
Call ext4_std_error() in various places when we can't bail out
cleanly, so the file system can be marked as in error.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
https://bugzilla.kernel.org/show_bug.cgi?id=25352
This regression was caused by commit a31437b85: "ext4: use
sb_issue_zeroout in setup_new_group_blocks", by accidentally dropping
the code which reserved the block group descriptor and inode table
blocks.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Use sb_issue_zeroout to zero out inode table and descriptor table
blocks instead of old approach which involves journaling.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
No real bugs found, just removed some dead code.
Found by gcc 4.6's new warnings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We don't need to set s_dirt in most of the ext4 code when journaling
is enabled. In ext3/4 some of the summary statistics for # of free
inodes, blocks, and directories are calculated from the per-block
group statistics when the file system is mounted or unmounted. As a
result the superblock doesn't have to be updated, either via the
journal or by setting s_dirt. There are a few exceptions, most
notably when resizing the file system, where the superblock needs to
be modified --- and in that case it should be done as a journalled
operation if possible, and s_dirt set only in no-journal mode.
This patch will optimize out some unneeded disk writes when using ext4
with a journal.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out,
and every other access to this first tests s_log_groups_per_flex;
same thing needs to happen in resize or we'll wander off into
a null pointer when doing an online resize of the file system.
Thanks to Christoph Biedl, who came up with the trivial testcase:
# truncate --size 128M fsfile
# mkfs.ext3 -F fsfile
# tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile
# e2fsck -yDf -C0 fsfile
# truncate --size 132M fsfile
# losetup /dev/loop0 fsfile
# mount /dev/loop0 mnt
# resize2fs -p /dev/loop0
https://bugzilla.kernel.org/show_bug.cgi?id=13549
Reported-by: Alessandro Polverini <alex@nibbles.it>
Test-case-by: Christoph Biedl <bugzilla.kernel.bpeb@manchmal.in-ulm.de>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Just a pet peeve of mine; we had a mishash of calls with either __func__
or "function_name" and the latter tends to get out of sync.
I think it's easier to just hide the __func__ in a macro, and it'll
be consistent from then on.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We don't need to take the alloc_sem lock when we are adding new
groups, since mballoc won't see the new group added until we bump
sbi->s_groups_count.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Follow-up to "block: enable by default support for large devices
and files on 32-bit archs".
Rename CONFIG_LBD to CONFIG_LBDAF to:
- allow update of existing [def]configs for "default y" change
- reflect that it is used also for large files support nowadays
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Use a separate lock to protect s_groups_count and the other block
group descriptors which get changed via an on-line resize operation,
so we can stop overloading the use of lock_super().
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reduce pressure on the sb_bgl_lock family of locks by using atomic_t's
to track the number of free blocks and inodes in each flex_group.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Make sure all of the fields of the group descriptor are properly
initialized. Previously, we allowed bg_flags field to be contain
random garbage, which could trigger non-deterministic behavior,
including a kernel OOPS.
http://bugzilla.kernel.org/show_bug.cgi?id=12433
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
We need to mark the block/inode bitmap beyond the end of the group
with '1'.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Rename the lower bits with suffix _lo and add helper
to access the values. Also rename bg_itable_unused_hi
to bg_pad as in e2fsprogs.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The new groups added during resize are flagged as
need_init group. Make sure we properly initialize these
groups. When we have block size < page size and we are adding
new groups the page may still be marked uptodate even though
we haven't initialized the group. While forcing the init
of buddy cache we need to make sure other groups part of the
same page of buddy cache is not using the cache.
group_info->alloc_sem is added to ensure the same.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
cc: stable@kernel.org
With this change new blocks added during resize
are marked as free in the block bitmap and the
group is flagged with EXT4_GROUP_INFO_NEED_INIT_BIT
flag. This makes sure when mballoc tries to allocate
blocks from the new group we would reload the
buddy information using the bitmap present in the disk.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
Nearly all places in the ext3/4 code which uses "unsigned long" is
probably a bug, since on 32-bit systems a ulong a 32-bits, which means
we are wasting stack space on 64-bit systems.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This removes annoying blank syslog entries emitted by ext4_error() or
ext4_warning(), since these functions add their own newline.
Signed-off-by: Nick Warne <nick@ukfsn.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
A few weeks ago I posted a patch for discussion that allowed ext4 to run
without a journal. Since that time I've integrated the excellent
comments from Andreas and fixed several serious bugs. We're currently
running with this patch and generating some performance numbers against
both ext2 (with backported reservations code) and ext4 with and without
a journal. It just so happens that running without a journal is
slightly faster for most everything.
We did
iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2
which creates 4 threads, each of which create and do reads and writes on
a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens
to bypass the page cache. Results:
ext2 ext4, default ext4, no journal
initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s
rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s
reads 15.2 MB/s 16.9 MB/s 17.2 MB/s
re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s
random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s
random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s
So it seems that, so far, this was a useful exercise.
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The inode table has been zeroed in setup_new_group_blocks(). Mark it as
such in ext4_group_add(). Since we are currently clearing inode table
for the new block group, we should set the EXT4_BG_INODE_ZEROED flag.
If at some point in the future we don't immediately zero out the inode
table as part of the resize operation, then obviously we shouldn't do
this.
Signed-off-by: Solofo.Ramangalahy@bull.net
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This fixes a bug which prevented the newly created inodes after a
resize from being used on filesystems with flex_bg.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When trying to resize an ext4 fs and you run out of reserved gdt blocks,
you get an error that doesn't actually tell you what went wrong, it just
says that the gdb it picked is not correct, which is the case since you
don't have any reserved gdt blocks left. This patch adds a check to make
sure you have reserved gdt blocks to use, and if not prints out a more
relevant error.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Andreas Dilger <adilger@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Update group infos when updating a group's descriptor.
Add group infos when adding a group's descriptor.
Refresh cache pages used by mb_alloc when changes occur.
This will probably need modifications when META_BG resizing will be allowed.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
This is the patch for the group descriptor table corruption during
online resize pointed out by Theodore Tso. The problem was caused by
the fact that the ext4 group descriptor can be either 32 or 64 bytes
long. Only the 64 bytes structure was taken into account.
Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
There is a bug when we are trying to verify that the reserve inode's
double indirect blocks point back to the primary gdt blocks. The fix is
obvious, we need to mod the gdb count by the addr's per block. This was
verified using the same testcase as with the ext3 equivalent of this
patch.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Move ext4 headers out of include/linux. This is just the trivial move,
there's some more thing that could be done later.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This fixes the allocations with GFP_KERNEL while under a transaction problems
in ext4. This patch is the same as its ext3 counterpart, just switches these
to GFP_NOFS.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add missing ext4_journal_stop() in error handling.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Stephen Tweedie <sct@redhat.com>
Cc: adilger@clusterfs.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mingming Cao <cmm@us.ibm.com>
Stop the EXT4 filesystem from using iget() and read_inode(). Replace
ext4_read_inode() with ext4_iget(), and call that instead of iget().
ext4_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
ext4_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail
without these changes. Clean up some format warnings too.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
In many places variables for block group are of type int, which limits the
maximum number of block groups to 2^31. Each block group can have up to
2^15 blocks, with a 4K block size, and the max filesystem size is limited to
2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB
This patch introduces a new type ext4_group_t, of type unsigned long, to
represent block group numbers in ext4.
All occurrences of block group variables are converted to type ext4_group_t.
Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
When resizing online, setup_new_group_blocks attempts to reserve a
potentially very large transaction, depending on the current filesystem
geometry. For some journal sizes, there may not be enough room for this
transaction, and the online resize will fail.
The patch below resizes & restarts the transaction as necessary while
setting up the new group, and should work with even the smallest journal.
Tested with something like:
[root@newbox ~]# dd if=/dev/zero of=fsfile bs=1024 count=32768
[root@newbox ~]# mkfs.ext3 -b 1024 fsfile 16384
[root@newbox ~]# mount -o loop fsfile mnt/
[root@newbox ~]# resize2fs /dev/loop0
resize2fs 1.40.2 (12-Jul-2007)
Filesystem at /dev/loop0 is mounted on /root/mnt; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/loop0 to 32768 (1k) blocks.
resize2fs: No space left on device While trying to add group #2
[root@newbox ~]# dmesg | tail -n 1
JBD: resize2fs wants too many credits (258 > 256)
[root@newbox ~]#
With the below change, it works.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Andreas Dilger <adilger@clusterfs.com>
setup_new_group_blocks() manipulates the group descriptor block bh
under the block_bitmap bh's lock. It shouldn't matter since nobody
but resize should be touching these blocks, but it's worth fixing up.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>