WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Theodore Ts'o	aeb1e5d69a	ext4: fix possible use after free with metadata csum Commit `fa77dcfafe` introduces block bitmap checksum calculation into ext4_new_inode() in the case that block group was uninitialized. However we brelse() the bitmap buffer before we attempt to checksum it so we have no guarantee that the buffer is still there. Fix this by releasing the buffer after the possible checksum computation. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: stable@vger.kernel.org	2012-11-29 21:21:22 -05:00
Eric Sandeen	ffb5387e85	ext4: fix unjournaled inode bitmap modification commit `119c0d4460` changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-10-28 22:24:57 -04:00
Tao Ma	79f1ba4956	ext4: Checksum the block bitmap properly with bigalloc enabled In mke2fs, we only checksum the whole bitmap block and it is right. While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the size of the checksumed bitmap which is wrong when we enable bigalloc. The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes it. Also as every caller of ext4_block_bitmap_csum_set and ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8, we'd better removes this parameter and sets it in the function itself. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org	2012-10-22 00:34:32 -04:00
Yongqiang Yang	f2a09af645	ext4: check free inode count before allocating an inode Recently, I ecountered some corrupted filesystems in which some groups' free inode counts were 65535, it seemed that free inode count was overflow. This patch teaches ext4 to check free inode count before allocaing an inode. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-09-23 23:16:03 -04:00
Jan Kara	97a7406880	ext4: remove useless marking of superblock dirty Commit `a0375156` properly notes that superblock doesn't need to be marked as dirty when only number of free inodes / blocks / number of directories changes since that is recomputed on each mount anyway. However that comment leaves some unnecessary markings as dirty in place. Remove these. Artem: tested using xfstests for both journalled and non-journalled ext4. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>	2012-07-22 20:29:31 -04:00
Theodore Ts'o	f6fb99cadc	ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr Make it possible for ext4_count_free to operate on buffers and not just data in buffer_heads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-06-30 19:14:57 -04:00
Linus Torvalds	4edebed866	Ext4 updates for 3.5 The major new feature added in this update is Darrick J. Wong's metadata checksum feature, which adds crc32 checksums to ext4's metadata fields. There is also the usual set of cleanups and bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABCAAGBQJPyNleAAoJENNvdpvBGATwtLMP/i3WsPyTvxmYP6HttHXQb8Jk GYCoTQ5bZMuTbOwOGg3w137cXWBv5uuPpxIk79YVLHSWx6HuanlGIa7/VnPKIaLu 2ihuvVfnrDqpwQ4MJaSq4R1Eka9JCwZ7HbYYo+fYOVobxgw588JVV9VVI9EdKRGz z11UkW8iHE0f6Xa5gOhdAMkR0uaPnxwJX/qHZYiHuognRivuwMglqWJSiMr8nQmo A2GmeoLehhW+k65IqgTCmSW6ZgFTvZdk6bskQIij3fOYHW3hHn/gcLFtmLTIZ/B5 LZdg/lngPYve+R/UyypliGKi+pv1qNEiTiBm0rrBgsdZFkBdGj0soSvGZzeK+Mp4 Q1vAmOBPYPFzs6nVzPst2n/osryyykFCK6TgSGZ50dosJ0NO8cBeDdX/gh9JKD2R yQUMUltOCCSj/eWU4iwqZ0T3FXRiH/+S3XMHznoKJiwUyGDBNQy4+Yg2k2WzUXrz Cu5t5BwNG2WNP7y5Et/wmUIzpC7VPId4qYmGyHe7OwTxSJgW+6f7GVkHfjWcDMuv pGgEUiInbMmLajP3v2/LKfVU4hXLZy4uJbhoBgDdeIpZrnPifJG/MwDOS4W+dLVT tDzgO1SAh3/E4jATreZ5bjzD/HGsfe1OX09UH3Pbc1EcgkrLnyrQXFwdHshdVu4A cxMoKNPVCQJySb1UrLkO =SdJJ -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull Ext4 updates from Theodore Ts'o: "The major new feature added in this update is Darrick J Wong's metadata checksum feature, which adds crc32 checksums to ext4's metadata fields. There is also the usual set of cleanups and bug fixes." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits) ext4: hole-punch use truncate_pagecache_range jbd2: use kmem_cache_zalloc wrapper instead of flag ext4: remove mb_groups before tearing down the buddy_cache ext4: add ext4_mb_unload_buddy in the error path ext4: don't trash state flags in EXT4_IOC_SETFLAGS ext4: let getattr report the right blocks in delalloc+bigalloc ext4: add missing save_error_info() to ext4_error() ext4: add debugging trigger for ext4_error() ext4: protect group inode free counting with group lock ext4: use consistent ssize_t type in ext4_file_write() ext4: fix format flag in ext4_ext_binsearch_idx() ext4: cleanup in ext4_discard_allocated_blocks() ext4: return ENOMEM when mounts fail due to lack of memory ext4: remove redundundant "(char *) bh->b_data" casts ext4: disallow hard-linked directory in ext4_lookup ext4: fix potential integer overflow in alloc_flex_gd() ext4: remove needs_recovery in ext4_mb_init() ext4: force ro mount if ext4_setup_super() fails ext4: fix potential NULL dereference in ext4_free_inodes_counts() ext4/jbd2: add metadata checksumming to the list of supported features ...	2012-06-01 10:12:15 -07:00
Tao Ma	6f2e9f0e7d	ext4: protect group inode free counting with group lock Now when we set the group inode free count, we don't have a proper group lock so that multiple threads may decrease the inode free count at the same time. And e2fsck will complain something like: Free inodes count wrong for group #1 (1, counted=0). Fix? no Free inodes count wrong for group #2 (3, counted=0). Fix? no Directories count wrong for group #2 (780, counted=779). Fix? no Free inodes count wrong for group #3 (2272, counted=2273). Fix? no So this patch try to protect it with the ext4_lock_group. btw, it is found by xfstests test case 269 and the volume is mkfsed with the parameter "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr" and I have run it 100 times and the error in e2fsck doesn't show up again. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-05-28 18:20:59 -04:00
Dan Carpenter	bb3d132a24	ext4: fix potential NULL dereference in ext4_free_inodes_counts() The ext4_get_group_desc() function returns NULL on error, and ext4_free_inodes_count() function dereferences it without checking. There is a check on the next line, but it's too late. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2012-05-28 14:16:57 -04:00
Eric W. Biederman	08cefc7ab8	userns: Convert ext4 to user kuid/kgid where appropriate Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-15 14:59:27 -07:00
Darrick J. Wong	feb0ab32a5	ext4: make block group checksums use metadata_csum algorithm metadata_csum supersedes uninit_bg. Convert the ROCOMPAT uninit_bg flag check to a helper function that covers both, and make the checksum calculation algorithm use either crc16 or the metadata_csum chosen algorithm depending on which flag is set. Print a warning if we try to mount a filesystem with both feature flags set. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-04-29 18:45:10 -04:00
Darrick J. Wong	fa77dcfafe	ext4: calculate and verify block bitmap checksum Compute and verify the checksum of the block bitmap; this checksum is stored in the block group descriptor. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-04-29 18:35:10 -04:00
Darrick J. Wong	41a246d1ff	ext4: calculate and verify checksums for inode bitmaps Compute and verify the checksum of the inode bitmap; the checkum is stored in the block group descriptor. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-04-29 18:33:10 -04:00
Darrick J. Wong	814525f4df	ext4: calculate and verify inode checksums This patch introduces to ext4 the ability to calculate and verify inode checksums. This requires the use of a new ro compatibility flag and some accompanying e2fsprogs patches to provide the relevant features in tune2fs and e2fsck. The inode generation changes have been integrated into this patch. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-04-29 18:31:10 -04:00
Theodore Ts'o	92b9781658	ext4: change some printk() calls to use ext4_msg() instead Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:41:49 -04:00
Theodore Ts'o	1084f252e3	ext4: remove trailing newlines from ext4_msg() and ext4_error() messages The functions ext4_msg() and ext4_error() already tack on a trailing newline, so remove the unnecessary extra newline. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-03-19 23:13:43 -04:00
Theodore Ts'o	813e57276f	ext4: fix race when setting bitmap_uptodate flag In ext4_read_{inode,block}_bitmap() we were setting bitmap_uptodate() before submitting the buffer for read. The is bad, since we check bitmap_uptodate() without locking the buffer, and so if another process is racing with us, it's possible that they will think the bitmap is uptodate even though the read has not completed yet, resulting in inodes and blocks potentially getting allocated more than once if we get really unlucky. Addresses-Google-Bug: 2828254 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-02-20 17:52:46 -05:00
Theodore Ts'o	119c0d4460	ext4: fold ext4_claim_inode into ext4_new_inode The function ext4_claim_inode() is only called by one function, ext4_new_inode(), and by folding the functionality into ext4_new_inode(), we can remove almost 50 lines of code, and put all of the logic of allocating a new inode into a single place. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-02-06 20:12:03 -05:00
Theodore Ts'o	ff9cb1c4ee	Merge branch 'for_linus' into for_linus_merged Conflicts: fs/ext4/ioctl.c	2012-01-10 11:54:07 -05:00
Al Viro	dcca3fec9f	ext4: propagate umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:54:59 -05:00
Akinobu Mita	597d508c17	ext4: use proper little-endian bitops ext4_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le() for ext4. Only two ext4_{set,clear}_bit() calls check the return value. The rest of calls ignore the return value and they can be replaced with __{set,clear}_bit_le(). This changes ext4_{set,clear}_bit() from __test_and_{set,clear}_bit_le() to __{set,clear}_bit_le() and introduces ext4_test_and_{set,clear}_bit() for the two places where old bit needs to be returned. This ext4_{set,clear}_bit() change is considered safe, because if someone uses these macros without noticing the change, new ext4_{set,clear}_bit don't have return value and causes compiler errors where the return value is used. This also removes unused ext4_find_first_zero_bit(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-12-28 20:32:07 -05:00
Theodore Ts'o	14c83c9fdd	ext4: avoid counting the number of free inodes twice in find_group_orlov() Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-12-28 20:25:13 -05:00
Jan Kara	acd6ad8351	ext4: fix error handling on inode bitmap corruption When insert_inode_locked() fails in ext4_new_inode() it most likely means inode bitmap got corrupted and we allocated again inode which is already in use. Also doing unlock_new_inode() during error recovery is wrong since the inode does not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:) which declares filesystem error and does not call unlock_new_inode(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-12-18 17:37:02 -05:00
Linus Torvalds	d211858837	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: vfs: add d_prune dentry operation vfs: protect i_nlink filesystems: add set_nlink() filesystems: add missing nlink wrappers logfs: remove unnecessary nlink setting ocfs2: remove unnecessary nlink setting jfs: remove unnecessary nlink setting hypfs: remove unnecessary nlink setting vfs: ignore error on forced remount readlinkat: ensure we return ENOENT for the empty pathname for normal lookups vfs: fix dentry leak in simple_fill_super()	2011-11-02 11:41:01 -07:00
Miklos Szeredi	6d6b77f163	filesystems: add missing nlink wrappers Replace direct i_nlink updates with the respective updater function (inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2011-11-02 12:53:43 +01:00
Eryu Guan	4af8350899	ext4: remove comments about extent mount option in ext4_new_inode() Remove comments about 'extent' mount option in ext4_new_inode(), since it's no longer exists. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-31 18:21:29 -04:00
Dmitry Monakhov	5cb81dabcc	ext4: fix quota accounting during migration The tmp_inode should have same uid/gid as the original inode. Otherwise new metadata blocks will be accounted to wrong quota-id, which will result in a quota leak after the inode migration is completed. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-29 09:05:00 -04:00
H Hartley Sweeten	e0cbee3e14	ext4: functions should not be declared extern The function declarations in ext4.h are already marked extern, so it's not necessary to do so in the .c files. This quiets the sparse noise: warning: function 'ext4_flush_completed_IO' with external linkage has definition warning: function 'ext4_init_inode_table' with external linkage has definition Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-18 10:57:51 -04:00
Lukas Czerner	4113c4caa4	ext4: remove deprecated oldalloc For a long time now orlov is the default block allocator in the ext4. It performs better than the old one and no one seems to claim otherwise so we can safely drop it and make oldalloc and orlov mount option deprecated. This is a part of the effort to reduce number of ext4 options hence the test matrix. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-10-08 14:34:47 -04:00
Theodore Ts'o	cff1dfd767	ext4: rename ext4_free_blocks_after_init() to ext4_free_clusters_after_init() This function really returns the number of clusters after initializing an uninitalized block bitmap has been initialized. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-09-09 19:12:51 -04:00
Theodore Ts'o	021b65bb1e	ext4: Rename ext4_free_blks_{count,set}() to refer to clusters The field bg_free_blocks_count_{lo,high} in the block group descriptor has been repurposed to hold the number of free clusters for bigalloc functions. So rename the functions so it makes it easier to read and audit the block allocation and block freeing code. Note: at this point in bigalloc development we doesn't support online resize, so this also makes it really obvious all of the places we need to fix up to add support for online resize. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-09-09 19:08:51 -04:00
Theodore Ts'o	24aaa8ef4e	ext4: convert the free_blocks field in s_flex_groups to be free_clusters Convert the free_blocks to be free_clusters to make the final revised bigalloc changes easier to read/understand. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-09-09 18:58:51 -04:00
Theodore Ts'o	5704265188	ext4: convert s_{dirty,free}blocks_counter to s_{dirty,free}clusters_counter Convert the percpu counters s_dirtyblocks_counter and s_freeblocks_counter in struct ext4_super_info to be s_dirtyclusters_counter and s_freeclusters_counter. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-09-09 18:56:51 -04:00
Theodore Ts'o	fd034a84e1	ext4: split out ext4_free_blocks_after_init() The function ext4_free_blocks_after_init() used to be a #define of ext4_init_block_bitmap(). This actually made it difficult to understand how the function worked, and made it hard make changes to support clusters. So as an initial cleanup, I've separated out the functionality of initializing block bitmap from calculating the number of free blocks in the new block group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-09-09 18:42:51 -04:00
Yongqiang Yang	33853a0dde	ext4: use the correct error exit path in ext4_init_inode_table() This patch lets ext4_init_inode_table() handle errors right. ext4_init_inode_table() should down_write() alloc_sem which has been up_write()ed and stop the started journal handle. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-08-01 06:32:19 -04:00
Linus Torvalds	ae005cbed1	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits) ext4: fix a BUG in mb_mark_used during trim. ext4: unused variables cleanup in fs/ext4/extents.c ext4: remove redundant set_buffer_mapped() in ext4_da_get_block_prep() ext4: add more tracepoints and use dev_t in the trace buffer ext4: don't kfree uninitialized s_group_info members ext4: add missing space in printk's in __ext4_grp_locked_error() ext4: add FITRIM to compat_ioctl. ext4: handle errors in ext4_clear_blocks() ext4: unify the ext4_handle_release_buffer() api ext4: handle errors in ext4_rename jbd2: add COW fields to struct jbd2_journal_handle jbd2: add the b_cow_tid field to journal_head struct ext4: Initialize fsync transaction ids in ext4_new_inode() ext4: Use single thread to perform DIO unwritten convertion ext4: optimize ext4_bio_write_page() when no extent conversion is needed ext4: skip orphan cleanup if fs has unknown ROCOMPAT features ext4: use the nblocks arg to ext4_truncate_restart_trans() ext4: fix missing iput of root inode for some mount error paths ext4: make FIEMAP and delayed allocation play well together ext4: suppress verbose debugging information if malloc-debug is off ... Fi up conflicts in fs/ext4/super.c due to workqueue changes	2011-03-25 09:57:41 -07:00
Jiaying Zhang	0562e0bad4	ext4: add more tracepoints and use dev_t in the trace buffer - Add more ext4 tracepoints. - Change ext4 tracepoints to use dev_t field with MAJOR/MINOR macros so that we can save 4 bytes in the ring buffer on some platforms. - Add sync_mode to ext4_da_writepages, ext4_da_write_pages, and ext4_da_writepages_result tracepoints. Also remove for_reclaim field from ext4_da_writepages since it is usually not very useful. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-03-21 21:38:05 -04:00
Theodore Ts'o	688f869ce3	ext4: Initialize fsync transaction ids in ext4_new_inode() When allocating a new inode, we need to make sure i_sync_tid and i_datasync_tid are initialized. Otherwise, one or both of these two values could be left initialized to zero, which could potentially result in BUG_ON in jbd2_journal_commit_transaction. (This could happen by having journal->commit_request getting set to zero, which could wake up the kjournald process even though there is no running transaction, which then causes a BUG_ON via the J_ASSERT(j_ruinning_transaction != NULL) statement. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-03-16 17:16:31 -04:00
James Morris	fe3fa43039	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next	2011-03-08 11:38:10 +11:00
Peter Huewe	7dc576158d	ext4: Fix sparse warning: Using plain integer as NULL pointer This patch fixes the warning "Using plain integer as NULL pointer", generated by sparse, by replacing the offending 0s with NULL. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-02-21 21:01:42 -05:00
Eric Paris	2a7dba391e	fs/vfs/security: pass last path component to LSM on inode creation SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>	2011-02-01 11:12:29 -05:00
Theodore Ts'o	353eb83c14	ext4: drop i_state_flags on architectures with 64-bit longs We can store the dynamic inode state flags in the high bits of EXT4_I(inode)->i_flags, and eliminate i_state_flags. This saves 8 bytes from the size of ext4_inode_info structure, which when multiplied by the number of the number of in the inode cache, can save a lot of memory. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2011-01-10 12:18:25 -05:00
Theodore Ts'o	a107e5a3a4	Merge branch 'next' into upstream-merge Conflicts: fs/ext4/inode.c fs/ext4/mballoc.c include/trace/events/ext4.h	2010-10-27 23:44:47 -04:00
Theodore Ts'o	61d08673de	ext4: rename mark_bitmap_end() to ext4_mark_bitmap_end() Fix a namespace leak from fs/ext4 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:15 -04:00
Theodore Ts'o	1f109d5a17	ext4: make various ext4 functions be static These functions have no need to be exported beyond file context. No functions needed to be moved for this commit; just some function declarations changed to be static and removed from header files. (A similar patch was submitted by Eric Sandeen, but I wanted to handle code movement in separate patches to make sure code changes didn't accidentally get dropped.) Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:14 -04:00
Lukas Czerner	857ac889cc	ext4: add interface to advertise ext4 features in sysfs User-space should have the opportunity to check what features doest ext4 support in each particular copy. This adds easy interface by creating new "features" directory in sys/fs/ext4/. In that directory files advertising feature names can be created. Add lazy_itable_init to the feature list. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:05 -04:00
Lukas Czerner	bfff68738f	ext4: add support for lazy inode table initialization When the lazy_itable_init extended option is passed to mke2fs, it considerably speeds up filesystem creation because inode tables are not zeroed out. The fact that parts of the inode table are uninitialized is not a problem so long as the block group descriptors, which contain information regarding how much of the inode table has been initialized, has not been corrupted However, if the block group checksums are not valid, e2fsck must scan the entire inode table, and the the old, uninitialized data could potentially cause e2fsck to report false problems. Hence, it is important for the inode tables to be initialized as soon as possble. This commit adds this feature so that mke2fs can safely use the lazy inode table initialization feature to speed up formatting file systems. This is done via a new new kernel thread called ext4lazyinit, which is created on demand and destroyed, when it is no longer needed. There is only one thread for all ext4 filesystems in the system. When the first filesystem with inititable mount option is mounted, ext4lazyinit thread is created, then the filesystem can register its request in the request list. This thread then walks through the list of requests picking up scheduled requests and invoking ext4_init_inode_table(). Next schedule time for the request is computed by multiplying the time it took to zero out last inode table with wait multiplier, which can be set with the (init_itable=n) mount option (default is 10). We are doing this so we do not take the whole I/O bandwidth. When the thread is no longer necessary (request list is empty) it frees the appropriate structures and exits (and can be created later later by another filesystem). We do not disturb regular inode allocations in any way, it just do not care whether the inode table is, or is not zeroed. But when zeroing, we have to skip used inodes, obviously. Also we should prevent new inode allocations from the group, while zeroing is on the way. For that we take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem in the ext4_claim_inode, so when we are unlucky and allocator hits the group which is currently being zeroed, it just has to wait. This can be suppresed using the mount option no_init_itable. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-10-27 21:30:05 -04:00
Linus Torvalds	5f248c9c25	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits) no need for list_for_each_entry_safe()/resetting with superblock list Fix sget() race with failing mount vfs: don't hold s_umount over close_bdev_exclusive() call sysv: do not mark superblock dirty on remount sysv: do not mark superblock dirty on mount btrfs: remove junk sb_dirt change BFS: clean up the superblock usage AFFS: wait for sb synchronization when needed AFFS: clean up dirty flag usage cifs: truncate fallout mbcache: fix shrinker function return value mbcache: Remove unused features add f_flags to struct statfs(64) pass a struct path to vfs_statfs update VFS documentation for method changes. All filesystems that need invalidate_inode_buffers() are doing that explicitly convert remaining ->clear_inode() to ->evict_inode() Make ->drop_inode() just return whether inode needs to be dropped fs/inode.c:clear_inode() is gone fs/inode.c:evict() doesn't care about delete vs. non-delete paths now ... Fix up trivial conflicts in fs/nilfs2/super.c	2010-08-10 11:26:52 -07:00
Al Viro	0930fcc1ee	convert ext4 to ->evict_inode() pretty much brute-force... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-08-09 16:48:30 -04:00
Theodore Ts'o	a0375156ca	ext4: Clean up s_dirt handling We don't need to set s_dirt in most of the ext4 code when journaling is enabled. In ext3/4 some of the summary statistics for # of free inodes, blocks, and directories are calculated from the per-block group statistics when the file system is mounted or unmounted. As a result the superblock doesn't have to be updated, either via the journal or by setting s_dirt. There are a few exceptions, most notably when resizing the file system, where the superblock needs to be modified --- and in that case it should be done as a journalled operation if possible, and s_dirt set only in no-journal mode. This patch will optimize out some unneeded disk writes when using ext4 with a journal. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-06-11 23:14:04 -04:00
Linus Torvalds	e4ce30f377	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: Make fsync sync new parent directories in no-journal mode ext4: Drop whitespace at end of lines ext4: Fix compat EXT4_IOC_ADD_GROUP ext4: Conditionally define compat ioctl numbers tracing: Convert more ext4 events to DEFINE_EVENT ext4: Add new tracepoints to track mballoc's buddy bitmap loads ext4: Add a missing trace hook ext4: restart ext4_ext_remove_space() after transaction restart ext4: Clear the EXT4_EOFBLOCKS_FL flag only when warranted ext4: Avoid crashing on NULL ptr dereference on a filesystem error ext4: Use bitops to read/modify i_flags in struct ext4_inode_info ext4: Convert calls of ext4_error() to EXT4_ERROR_INODE() ext4: Convert callers of ext4_get_blocks() to use ext4_map_blocks() ext4: Add new abstraction ext4_map_blocks() underneath ext4_get_blocks() ext4: Use our own write_cache_pages() ext4: Show journal_checksum option ext4: Fix for ext4_mb_collect_stats() ext4: check for a good block group before loading buddy pages ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate ext4: Remove extraneous newlines in ext4_msg() calls ... Fixed up trivial conflict in fs/ext4/fsync.c	2010-05-27 10:26:37 -07:00
Dmitry Monakhov	b10b852090	ext4: replace inode uid,gid,mode init with helper Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-05-21 18:31:24 -04:00
Dmitry Monakhov	12e9b89200	ext4: Use bitops to read/modify i_flags in struct ext4_inode_info At several places we modify EXT4_I(inode)->i_flags without holding i_mutex (ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_flags. So convert handling of i_flags to use bitops which are atomic. https://bugzilla.kernel.org/show_bug.cgi?id=15792 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 22:00:00 -04:00
Dmitry Monakhov	d17413c08c	ext4: clean up inode bitmaps manipulation in ext4_free_inode - Reorganize locking scheme to batch two atomic operation in to one. This also allow us to state what healthy group must obey following rule ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM); - Fix possible undefined pointer dereference. - Even if group descriptor stats aren't accessible we have to update inode bitmaps. - Move non-group members update out of group_lock. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-05-16 07:00:00 -04:00
Eric Sandeen	c4caae2518	ext4: Fixed inode allocator to correctly track a flex_bg's used_dirs When used_dirs was introduced for the flex_groups struct, it looks like the accounting was not put into place properly, in some places manipulating free_inodes rather than used_dirs. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-03-23 21:32:00 -04:00
Linus Torvalds	e213e26ab3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits) quota: stop using QUOTA_OK / NO_QUOTA dquot: cleanup dquot initialize routine dquot: move dquot initialization responsibility into the filesystem dquot: cleanup dquot drop routine dquot: move dquot drop responsibility into the filesystem dquot: cleanup dquot transfer routine dquot: move dquot transfer responsibility into the filesystem dquot: cleanup inode allocation / freeing routines dquot: cleanup space allocation / freeing routines ext3: add writepage sanity checks ext3: Truncate allocated blocks if direct IO write fails to update i_size quota: Properly invalidate caches even for filesystems with blocksize < pagesize quota: generalize quota transfer interface quota: sb_quota state flags cleanup jbd: Delay discarding buffers in journal_unmap_buffer ext3: quota_write cross block boundary behaviour quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota quota: split out compat_sys_quotactl support from quota.c quota: split out netlink notification support from quota.c quota: remove invalid optimization from quota_sync_all ... Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c	2010-03-05 13:20:53 -08:00
Christoph Hellwig	871a293155	dquot: cleanup dquot initialize routine Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	9f75475802	dquot: cleanup dquot drop routine Get rid of the drop dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_drop helper to __dquot_drop and vfs_dq_drop to dquot_drop to have a consistent namespace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:30 +01:00
Christoph Hellwig	63936ddaa1	dquot: cleanup inode allocation / freeing routines Get rid of the alloc_inode and free_inode dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always call the lowlevel dquot_alloc_inode / dqout_free_inode routines directly, which now lose the number argument which is always 1. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2010-03-05 00:20:28 +01:00
Curt Wohlgemuth	73b50c1c92	ext4: Fix BUG_ON at fs/buffer.c:652 in no journal mode Calls to ext4_handle_dirty_metadata should only pass in an inode pointer for inode-specific metadata, and not for shared metadata blocks such as inode table blocks, block group descriptors, the superblock, etc. The BUG_ON can get tripped when updating a special device (such as a block device) that is opened (so that i_mapping is set in fs/block_dev.c) and the file system is mounted in no journal mode. Addresses-Google-Bug: #2404870 Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-16 15:06:29 -05:00
Eric Sandeen	12062dddda	ext4: move __func__ into a macro for ext4_warning, ext4_error Just a pet peeve of mine; we had a mishash of calls with either __func__ or "function_name" and the latter tends to get out of sync. I think it's easier to just hide the __func__ in a macro, and it'll be consistent from then on. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-02-15 14:19:27 -05:00
Theodore Ts'o	19f5fb7ad6	ext4: Use bitops to read/modify EXT4_I(inode)->i_state At several places we modify EXT4_I(inode)->i_state without holding i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage, ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_state. So convert handling of i_state to use bitops which are atomic. Cc: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2010-01-24 14:34:07 -05:00
Peng Tao	785b4b3a5a	ext4: fix build warning when EXT4FS_DEBUG is on When compiling with EXT4FS_DEBUG on, gcc will complain with following warnings: linux-2.6/fs/ext4/ialloc.c: In function ‘ext4_count_free_inodes’: linux-2.6/fs/ext4/ialloc.c:1192: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 2 has type ‘ext4_group_t’ So add a type cast to suppress it. Signed-off-by: Peng Tao <bergwolf@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-27 21:44:40 -04:00
Johann Lombardi	e6462869e4	ext4: Fix goal inum check in the inode allocator The goal inode is specificed by inode number which belongs to [1; s_inodes_count]. Signed-off-by: Johann Lombardi <johann@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-07-05 23:45:11 -04:00
Andreas Dilger	11013911da	ext4: teach the inode allocator to use a goal inode number Enhance the inode allocator to take a goal inode number as a paremeter; if it is specified, it takes precedence over Orlov or parent directory inode allocation algorithms. The extents migration function uses the goal inode number so that the extent trees allocated the migration function use the correct flex_bg. In the future, the goal inode functionality will also be used to allocate an adjacent inode for the extended attributes. Also, for testing purposes the goal inode number can be specified via /sys/fs/{dev}/inode_goal. This can be useful for testing inode allocation beyond 2^32 blocks on very large filesystems. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 11:45:35 -04:00
Theodore Ts'o	f157a4aa98	ext4: Use a hash of the topdir directory name for the Orlov parent group Instead of using a random number to determine the goal parent grop for the Orlov top directories, use a hash of the directory name. This allows for repeatable results when trying to benchmark filesystem layout algorithms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-13 11:09:42 -04:00
Theodore Ts'o	9bffad1ed2	ext4: convert instrumentation from markers to tracepoints Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-06-17 11:48:11 -04:00
Theodore Ts'o	88b6edd17c	ext4: Clean up calls to ext4_get_group_desc() If the caller isn't planning on modifying the block group descriptors, there's no need to pass in a pointer to a struct buffer_head. Nuking this saves a tiny amount of CPU time and stack space usage. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-25 11:50:39 -04:00
Aneesh Kumar K.V	955ce5f5be	ext4: Convert ext4_lock_group to use sb_bgl_lock We have sb_bgl_lock() and ext4_group_info.bb_state bit spinlock to protech group information. The later is only used within mballoc code. Consolidate them to use sb_bgl_lock(). This makes the mballoc.c code much simpler and also avoid confusion with two locks protecting same info. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-02 20:35:09 -04:00
Theodore Ts'o	bb23c20a85	ext4: Move fs/ext4/group.h into ext4.h Move the function prototypes in group.h into ext4.h so they are all defined in one place. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 19:44:44 -04:00
Theodore Ts'o	8df9675f8b	ext4: Avoid races caused by on-line resizing and SMP memory reordering Ext4's on-line resizing adds a new block group and then, only at the last step adjusts s_groups_count. However, it's possible on SMP systems that another CPU could see the updated the s_group_count and not see the newly initialized data structures for the just-added block group. For this reason, it's important to insert a SMP read barrier after reading s_groups_count and before reading any (for example) the new block group descriptors allowed by the increased value of s_groups_count. Unfortunately, we rather blatently violate this locking protocol documented in fs/ext4/resize.c. Fortunately, (1) on-line resizes happen relatively rarely, and (2) it seems rare that the filesystem code will immediately try to use just-added block group before any memory ordering issues resolve themselves. So apparently problems here are relatively hard to hit, since ext3 has been vulnerable to the same issue for years with no one apparently complaining. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-05-01 08:50:38 -04:00
Theodore Ts'o	b5451f7b26	ext4: Fix potential inode allocation soft lockup in Orlov allocator If the Orlov allocator is having trouble finding an appropriate block group, the fallback code could loop forever, causing a soft lockup warning in find_group_orlov(): BUG: soft lockup - CPU#0 stuck for 61s! [cp:11728] ... Pid: 11728, comm: cp Not tainted (2.6.30-rc1-dirty #77) Lenovo EIP: 0060:[<c021650e>] EFLAGS: 00000246 CPU: 0 EIP is at ext4_get_group_desc+0x54/0x9d ... Call Trace: [<c0218021>] find_group_orlov+0x2ee/0x334 [<c0120a5f>] ? sched_clock+0x8/0xb [<c02188e3>] ext4_new_inode+0x2cf/0xb1a Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-22 21:00:36 -04:00
Chuck Ebbert	6b82f3cb2d	ext4: really print the find_group_flex fallback warning only once Missing braces caused the warning to print more than once. Signed-Off-By: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-04-14 07:37:40 -04:00
Theodore Ts'o	7d39db14a4	ext4: Use struct flex_groups to calculate get_orlov_stats() Instead of looping over all of the block groups in a flex group summing their summary statistics, start tracking used_dirs in struct flex_groups, and use struct flex_groups instead. This should save a bit of CPU for mkdir-heavy workloads. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-04 19:31:53 -05:00
Theodore Ts'o	9f24e4208f	ext4: Use atomic_t's in struct flex_groups Reduce pressure on the sb_bgl_lock family of locks by using atomic_t's to track the number of free blocks and inodes in each flex_group. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-04 19:09:10 -05:00
Theodore Ts'o	a4912123b6	ext4: New inode/block allocation algorithms for flex_bg filesystems The find_group_flex() inode allocator is now only used if the filesystem is mounted using the "oldalloc" mount option. It is replaced with the original Orlov allocator that has been updated for flex_bg filesystems (it should behave the same way if flex_bg is disabled). The inode allocator now functions by taking into account each flex_bg group, instead of each block group, when deciding whether or not it's time to allocate a new directory into a fresh flex_bg. The block allocator has also been changed so that the first block group in each flex_bg is preferred for use for storing directory blocks. This keeps directory blocks close together, which is good for speeding up e2fsck since large directories are more likely to look like this: debugfs: stat /home/tytso/Maildir/cur Inode: 1844562 Type: directory Mode: 0700 Flags: 0x81000 Generation: 1132745781 Version: 0x00000000:0000ad71 User: 15806 Group: 15806 Size: 1060864 File ACL: 0 Directory ACL: 0 Links: 2 Blockcount: 2072 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x499c0ff4:164961f4 -- Wed Feb 18 08:41:08 2009 atime: 0x499c0ff4:00000000 -- Wed Feb 18 08:41:08 2009 mtime: 0x49957f51:00000000 -- Fri Feb 13 09:10:25 2009 crtime: 0x499c0f57:00d51440 -- Wed Feb 18 08:38:31 2009 Size of extra inode fields: 28 BLOCKS: (0):7348651, (1-258):7348654-7348911 TOTAL: 259 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-12 12:18:34 -04:00
Duane Griffin	2dc6b0d48c	ext4: tighten restrictions on inode flags At the moment there are few restrictions on which flags may be set on which inodes. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. Introduces a flags masking function which masks flags based on mode and use it during inode creation and when flags are set via the ioctl to facilitate future consistency. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-15 18:09:20 -05:00
Duane Griffin	8fa43a81b9	ext4: don't inherit inappropriate inode flags from parent At present INDEX and EXTENTS are the only flags that new ext4 inodes do NOT inherit from their parent. In addition prevent the flags DIRTY, ECOMPR, IMAGIC, TOPDIR, HUGE_FILE and EXT_MIGRATE from being inherited. List inheritable flags explicitly to prevent future flags from accidentally being inherited. This fixes the TOPDIR flag inheritance bug reported at http://bugzilla.kernel.org/show_bug.cgi?id=9866. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-15 18:57:26 -05:00
Jan Kara	a269eb1829	ext4: Use lowercase names of quota functions Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Mingming Cao <cmm@us.ibm.com> CC: linux-ext4@vger.kernel.org	2009-03-26 02:18:36 +01:00
Theodore Ts'o	2842c3b544	ext4: Print the find_group_flex() warning only once This is a short-term warning, and even printk_ratelimit() can result in too much noise in system logs. So only print it once as a warning. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-12 12:20:01 -04:00
Eric Sandeen	7ce9d5d1f3	ext4: fix ext4_free_inode() vs. ext4_claim_inode() race I was seeing fsck errors on inode bitmaps after a 4 thread dbench run on a 4 cpu machine: Inode bitmap differences: -50736 -(50752--50753) etc... I believe that this is because ext4_free_inode() uses atomic bitops, and although ext4_new_inode() used to also use atomic bitops for synchronization, commit `393418676a` changed this to use the sb_bgl_lock, so that we could also synchronize against read_inode_bitmap and initialization of uninit inode tables. However, that change left ext4_free_inode using atomic bitops, which I think leaves no synchronization between setting & unsetting bits in the inode table. The below patch fixes it for me, although I wonder if we're getting at all heavy-handed with this spinlock... Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-03-04 18:38:18 -05:00
Theodore Ts'o	05bf9e839d	ext4: Add fallback for find_group_flex This is a workaround for find_group_flex() which badly needs to be replaced. One of its problems (besides ignoring the Orlov algorithm) is that it is a bit hyperactive about returning failure under suspicious circumstances. This can lead to spurious ENOSPC failures even when there are inodes still available. Work around this for now by retrying the search using find_group_other() if find_group_flex() returns -1. If find_group_other() succeeds when find_group_flex() has failed, log a warning message. A better block/inode allocator that will fix this problem for real has been queued up for the next merge window. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-02-21 12:13:24 -05:00
Theodore Ts'o	83982b6f47	ext4: Remove "extents" mount option This mount option is largely superfluous, and in fact the way it was implemented was buggy; if a filesystem which did not have the extents feature flag was mounted -o extents, the filesystem would attempt to create and use extents-based file even though the extents feature flag was not eabled. The simplest thing to do is to nuke the mount option entirely. It's not all that useful to force the non-creation of new extent-based files if the filesystem can support it. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-06 14:53:16 -05:00
Theodore Ts'o	ba80b1019a	ext4: Add markers for better debuggability Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-03 20:03:21 -05:00
Aneesh Kumar K.V	648f5879f5	ext4: mark the blocks/inode bitmap beyond end of group as used We need to mark the block/inode bitmap beyond the end of the group with '1'. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-01-05 21:46:04 -05:00
Aneesh Kumar K.V	2ccb5fb9f1	ext4: Use new buffer_head flag to check uninit group bitmaps initialization For uninit block group, the on-disk bitmap is not initialized. That implies we cannot depend on the uptodate flag on the bitmap buffer_head to find bitmap validity. Use a new buffer_head flag which would be set after we properly initialize the bitmap. This also prevents (re-)initializing the uninit group bitmap every time we call ext4_read_block_bitmap(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-01-05 21:49:55 -05:00
Aneesh Kumar K.V	393418676a	ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() We need to make sure we update the inode bitmap and clear EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide whether to initialize the inode bitmap each time it is called. (introduced by commit c806e68f.) ext4_read_inode_bitmap does: spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group)); if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) { ext4_init_inode_bitmap(sb, bh, block_group, desc); and ext4_new_inode does if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group), ino, inode_bitmap_bh->b_data)) ...... ... spin_lock(sb_bgl_lock(sbi, group)); gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT); i.e., on allocation we update the bitmap then we take the sb_bgl_lock and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a parallel ext4_read_inode_bitmap can zero out the bitmap in between the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..) The race results in below user visible errors EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449 EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ... EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ... # ls -al /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71 ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2009-01-05 21:38:14 -05:00
Aneesh Kumar K.V	3300beda52	ext4: code cleanup Rename some variables. We also unlock locks in the reverse order we acquired as a part of cleanup. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-03 22:33:39 -05:00
Aneesh Kumar K.V	560671a0d3	ext4: Use high 16 bits of the block group descriptor's free counts fields Rename the lower bits with suffix _lo and add helper to access the values. Also rename bg_itable_unused_hi to bg_pad as in e2fsprogs. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:20:24 -05:00
Theodore Ts'o	a9df9a4910	ext4: Make ext4_group_t be an unsigned int Nearly all places in the ext3/4 code which uses "unsigned long" is probably a bug, since on 32-bit systems a ulong a 32-bits, which means we are wasting stack space on 64-bit systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:18:16 -05:00
Theodore Ts'o	fde4d95ad8	ext4: remove extraneous newlines from calls to ext4_error() and ext4_warning() This removes annoying blank syslog entries emitted by ext4_error() or ext4_warning(), since these functions add their own newline. Signed-off-by: Nick Warne <nick@ukfsn.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-05 22:17:35 -05:00
Frank Mayhar	0390131ba8	ext4: Allow ext4 to run without a journal A few weeks ago I posted a patch for discussion that allowed ext4 to run without a journal. Since that time I've integrated the excellent comments from Andreas and fixed several serious bugs. We're currently running with this patch and generating some performance numbers against both ext2 (with backported reservations code) and ext4 with and without a journal. It just so happens that running without a journal is slightly faster for most everything. We did iozone -T -t 4 s 2g -r 256k -T -I -i0 -i1 -i2 which creates 4 threads, each of which create and do reads and writes on a 2G file, with a buffer size of 256K, using O_DIRECT for all file opens to bypass the page cache. Results: ext2 ext4, default ext4, no journal initial writes 13.0 MB/s 15.4 MB/s 15.7 MB/s rewrites 13.1 MB/s 15.6 MB/s 15.9 MB/s reads 15.2 MB/s 16.9 MB/s 17.2 MB/s re-reads 15.3 MB/s 16.9 MB/s 17.2 MB/s random readers 5.6 MB/s 5.6 MB/s 5.7 MB/s random writers 5.1 MB/s 5.3 MB/s 5.4 MB/s So it seems that, so far, this was a useful exercise. Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2009-01-07 00:06:22 -05:00
Al Viro	6b38e842bb	nfsd race fixes: ext4 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-12-31 18:07:44 -05:00
James Morris	2b82892565	Merge branch 'master' into next Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 11:29:12 +11:00
David Howells	4c9c544e49	CRED: Wrap task credential accesses in the Ext4 filesystem Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: adilger@sun.com Cc: linux-ext4@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:38:51 +11:00
Frederic Bohe	23712a9c28	ext4: add checksum calculation when clearing UNINIT flag in ext4_new_inode When initializing an uninitialized block group in ext4_new_inode(), its block group checksum must be re-calculated. This fixes a race when several threads try to allocate a new inode in an UNINIT'd group. There is some question whether we need to be initializing the block bitmap in ext4_new_inode() at all, but for now, if we are going to init the block group, let's eliminate the race. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-11-07 09:21:01 -05:00
Frederic Bohe	c806e68f56	ext4: fix initialization of UNINIT bitmap blocks This fixes a bug which caused on-line resizing of filesystems with a 1k blocksize to fail. The root cause of this bug was the fact that if an uninitalized bitmap block gets read in by userspace (which e2fsprogs does try to avoid, but can happen when the blocksize is less than the pagesize and an adjacent blocks is read into memory) ext4_read_block_bitmap() was erroneously depending on the buffer uptodate flag to decide whether it needed to initialize the bitmap block in memory --- i.e., to set the standard set of blocks in use by a block group (superblock, bitmaps, inode table, etc.). Essentially, ext4_read_block_bitmap() assumed it was the only routine that might try to read a block containing a block bitmap, which is simply not true. To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap() must always initialize uninitialized bitmap blocks. Once a block or inode is allocated out of that bitmap, it will be marked as initialized in the block group descriptor, so in general this won't result any extra unnecessary work. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-10 08:09:18 -04:00
Theodore Ts'o	c2ea3fde61	ext4: Remove old legacy block allocator Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-10-10 09:40:52 -04:00
Theodore Ts'o	af5bc92dde	ext4: Fix whitespace checkpatch warnings/errors Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-08 22:25:24 -04:00
Theodore Ts'o	4776004f54	ext4: Add printk priority levels to clean up checkpatch warnings Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-09-08 23:00:52 -04:00
Eric Sandeen	c001077f40	ext4: Fix bug where we return ENOSPC even though we have plenty of inodes The find_group_flex() function starts with best_flex as the parent_fbg_group, which happens to have 0 inodes free. Some of the flex groups searched have free blocks and free inodes, but the flex_freeb_ratio is < 10, so they're skipped. Then when a group is compared to the current "best" flex group, it does not have more free blocks than "best", so it is skipped as well. This continues until no flex group with free inodes is found which has a proper ratio or which has more free blocks than the "best" group, and we're left with a "best" group that has 0 inodes free, and we return -ENOSPC. We fix this by changing the logic so that if the current "best" flex group has no inodes free, and the current one does have room, it is promoted to the next "best." Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-08-19 22:19:50 -04:00
Eric Sandeen	b5f10eed81	ext4: lock block groups when initializing I noticed when filling a 1T filesystem with 4 threads using the fs_mark benchmark: fs_mark -d /mnt/test -D 256 -n 100000 -t 4 -s 20480 -F -S 0 that I occasionally got checksum mismatch errors: EXT4-fs error (device sdb): ext4_init_inode_bitmap: Checksum bad for group 6935 etc. I'd reliably get 4-5 of them during the run. It appears that the problem is likely a race to init the bg's when the uninit_bg feature is enabled. With the patch below, which adds sb_bgl_locking around initialization, I was able to complete several runs with no errors or warnings. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-08-02 21:21:08 -04:00
Eric Sandeen	e29d1cde63	ext4: sync up block and inode bitmap reading functions ext4_read_block_bitmap and read_inode_bitmap do essentially the same thing, and yet they are structured quite differently. I came across this difference while looking at doing bg locking during bg initialization. This patch: * removes unnecessary casts in the error messages * renames read_inode_bitmap to ext4_read_inode_bitmap * and more substantially, restructures the inode bitmap reading function to be more like the block bitmap counterpart. The change to the inode bitmap reader simplifies the locking to be applied in the next patch. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-08-02 21:21:02 -04:00
Eric Sandeen	e4079a11f5	ext4: do not set extents feature from the kernel We've talked for a while about getting rid of any feature- setting from the kernel; this gets rid of the code which would set the INCOMPAT_EXTENTS flag on the first file write when mounted as ext4[dev]. With this patch, if the extents feature is not already set on disk, then mounting as ext4 will fall back to noextents with a warning, and if -o extents is explicitly requested, the mount will fail, also with warning. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Jose R. Santos	772cb7c83b	ext4: New inode allocation for FLEX_BG meta-data groups. This patch mostly controls the way inode are allocated in order to make ialloc aware of flex_bg block group grouping. It achieves this by bypassing the Orlov allocator when block group meta-data are packed toghether through mke2fs. Since the impact on the block allocator is minimal, this patch should have little or no effect on other block allocation algorithms. By controlling the inode allocation, it can basically control where the initial search for new block begins and thus indirectly manipulate the block allocator. This allocator favors data and meta-data locality so the disk will gradually be filled from block group zero upward. This helps improve performance by reducing seek time. Since the group of inode tables within one flex_bg are treated as one giant inode table, uninitialized block groups would not need to partially initialize as many inode table as with Orlov which would help fsck time as the filesystem usage goes up. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Theodore Ts'o	574ca174c9	ext4: Rename read_block_bitmap() to ext4_read_block_bitmap() Since this a non-static function, make it be ext4 specific to avoid conflicts with potentially other filesystems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Duane Griffin	91ef4caf80	ext4: handle corrupted orphan list at mount If the orphan node list includes valid, untruncatable nodes with nlink > 0 the ext4_orphan_cleanup loop which attempts to delete them will not do so, causing it to loop forever. Fix by checking for such nodes in the ext4_orphan_get function. This patch fixes the second case (image hdb.20000009.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2008-07-11 19:27:31 -04:00
Aneesh Kumar K.V	8753e88f1b	ext4: mark inode dirty after initializing the extent tree We should mark the inode dirty only after initializing the extent tree. Also if we fail during extent initialization we need to call DQUOT_FREE_INODE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-29 22:00:36 -04:00
Christoph Hellwig	3dcf54515a	ext4: move headers out of include/linux Move ext4 headers out of include/linux. This is just the trivial move, there's some more thing that could be done later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-29 18:13:32 -04:00
Harvey Harrison	46e665e9d2	ext4: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-17 10:38:59 -04:00
Marcin Slusarz	e8546d0615	ext4: le*_add_cpu conversion replace all: little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) + expression_in_cpu_byteorder); with: leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder); generated with semantic patch Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: sct@redhat.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: adilger@clusterfs.com Cc: Mingming Cao <cmm@us.ibm.com>	2008-04-17 10:38:59 -04:00
Aneesh Kumar K.V	e65187e6d0	ext4: Enable extent format for symlinks. This patch enables extent-formatted normal symlinks. Using extents format allows a symlink to refer to a block number larger than 2^32 on large filesystems. We still don't enable extent format for fast symlinks, which are contained in the inode itself. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-04-29 08:11:12 -04:00
Benoit Boissinot	1cc8dcf569	ext*: spelling fix prefered -> preferred Spelling fix: prefered -> preferred Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>	2008-04-21 22:45:55 +00:00
Aneesh Kumar K.V	42bf0383d1	ext4: set EXT4_EXTENTS_FL only for directory and regular files In addition, don't inherit EXT4_EXTENTS_FL from parent directory. If we have a directory with extent flag set and later mount the file system with -o noextents, the files created in that directory will also have extent flag set but we would not have called ext4_ext_tree_init for them. This will cause error later when we are verifying the extent header Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-02-25 16:38:03 -05:00
David Howells	1d1fe1ee02	iget: stop EXT4 from using iget() and read_inode() Stop the EXT4 filesystem from using iget() and read_inode(). Replace ext4_read_inode() with ext4_iget(), and call that instead of iget(). ext4_iget() then uses iget_locked() directly and returns a proper error code instead of an inode in the event of an error. ext4_fill_super() returns any error incurred when getting the root inode instead of EINVAL. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-02-07 08:42:27 -08:00
Eric Sandeen	c549a95d40	ext4: fix up EXT4FS_DEBUG builds Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail without these changes. Clean up some format warnings too. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	a48380f769	ext4: Rename i_dir_acl to i_size_high Rename ext4_inode.i_dir_acl to i_size_high drop ext4_inode_info.i_dir_acl as it is not used Rename ext4_inode.i_size to ext4_inode.i_size_lo Add helper function for accessing the ext4_inode combined i_size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Aneesh Kumar K.V	99e6f829a8	ext4: Introduce ext4_update__feature Introduce ext4_update__feature and use them instead of opencoding. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2008-01-28 23:58:27 -05:00
Avantika Mathur	2aa9fc4c40	ext4: fixes block group number being set to a negative value This patch fixes various places where the group number is set to a negative value. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2008-01-28 23:58:27 -05:00
Avantika Mathur	fd2d42912f	ext4: add ext4_group_t, and change all group variables to this type. In many places variables for block group are of type int, which limits the maximum number of block groups to 2^31. Each block group can have up to 2^15 blocks, with a 4K block size, and the max filesystem size is limited to 2^31 * (2^15 * 2^12) = 2^58 -- or 256 PB This patch introduces a new type ext4_group_t, of type unsigned long, to represent block group numbers in ext4. All occurrences of block group variables are converted to type ext4_group_t. Signed-off-by: Avantika Mathur <mathur@us.ibm.com>	2008-01-28 23:58:27 -05:00
Andreas Dilger	717d50e497	Ext4: Uninitialized Block Groups In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count / __le16 bg_checksum; / crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:00 -04:00
Coly Li	f077d0d7ea	ext4: Remove (partial, never completed) fragment support Fragment support in ext2/3/4 was never implemented, and it probably will never be implemented. So remove it from ext4. Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:49:59 -04:00
Eric Sandeen	ef2fb67989	remove unused bh in calls to ext234_get_group_desc ext[234]_get_group_desc never tests the bh argument, and only sets it if it is passed in; it is perfectly happy with a NULL bh argument. But, many callers send one in and never use it. May as well call with NULL like other callers who don't use the bh. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:49 -07:00
Kalpak Shah	ef7f38359e	ext4: Add nanosecond timestamps This patch adds nanosecond timestamps for ext4. This involves adding *time_extra fields to the ext4_inode to extend the timestamps to 64-bits. Creation time is also added by this patch. These extended fields will fit into an inode if the filesystem was formatted with large inodes (-I 256 or larger) and there are currently no EAs consuming all of the available space. For new inodes we always reserve enough space for the kernel's known extended fields, but for inodes created with an old kernel this might not have been the case. So this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature flag(ro-compat so that older kernels can't create inodes with a smaller extra_isize). which indicates if the fields fitting inside s_min_extra_isize are available or not. If the expansion of inodes if unsuccessful then this feature will be disabled. This feature is only enabled if requested by the sysadmin. None of the extended inode fields is critical for correct filesystem operation. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Kalpak Shah <kalpak@clusterfs.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-07-18 09:15:20 -04:00
Andrew Morton	f4e5bc244f	[PATCH] ext4 64 bit divide fix With CONFIG_LBD=n, sector_div() expands to a plain old divide. But ext4 is _not_ passing in a sector_t as the first argument, so... fs/built-in.o: In function `ext4_get_group_no_and_offset': fs/ext4/balloc.c:39: undefined reference to `__umoddi3' fs/ext4/balloc.c:41: undefined reference to `__udivdi3' fs/built-in.o: In function `find_group_orlov': fs/ext4/ialloc.c:278: undefined reference to `__udivdi3' fs/built-in.o: In function `ext4_fill_super': fs/ext4/super.c:1488: undefined reference to `__udivdi3' fs/ext4/super.c:1488: undefined reference to `__umoddi3' fs/ext4/super.c:1594: undefined reference to `__udivdi3' fs/ext4/super.c:1601: undefined reference to `__umoddi3' Fix that up by calling do_div() directly. Also cast the arg to u64. do_div() is only defined on u64, and ext4_fsblk_t is supposed to be opaque. Note especially the changes to find_group_orlov(). It was attempting to do do_div(int, unsigned long long); which is royally screwed up. Switched it to plain old divide. Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:18 -07:00
Alexandre Ratchov	8fadc14323	[PATCH] ext4: move block number hi bits move '_hi' bits of block numbers in the larger part of the block group descriptor structure Signed-off-by: Alexandre Ratchov <alexandre.ratchov@bull.net> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:18 -07:00
Laurent Vivier	bd81d8eec0	[PATCH] ext4: 64bit metadata In-kernel super block changes to support >32 bit free blocks numbers. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Alexandre Ratchov <alexandre.ratchov@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:17 -07:00
Mingming Cao	3a5b2ecdd1	[PATCH] ext4: switch fsblk to sector_t Redefine ext3 in-kernel filesystem block type (ext3_fsblk_t) from unsigned long to sector_t, to allow kernel to handle >32 bit ext3 blocks. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:16 -07:00
Alex Tomas	a86c618126	[PATCH] ext3: add extent map support On disk extents format: /* * this is extent on-disk structure * it's used at the bottom of the tree / struct ext3_extent { __le32 ee_block; / first logical block extent covers / __le16 ee_len; / number of blocks covered by extent / __le16 ee_start_hi; / high 16 bits of physical block / __le32 ee_start; / low 32 bigs of physical block */ }; Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:16 -07:00
Mingming Cao	dab291af8d	[PATCH] jbd2: enable building of jbd2 and have ext4 use it rather than jbd Reworked from a patch by Mingming Cao and Randy Dunlap Signed-off-By: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:16 -07:00
Mingming Cao	617ba13b31	[PATCH] ext4: rename ext4 symbols to avoid duplication of ext3 symbols Mingming Cao originally did this work, and Shaggy reproduced it using some scripts from her. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:15 -07:00
Dave Kleikamp	ac27a0ec11	[PATCH] ext4: initial copy of files from ext3 Start of the ext4 patch series. See Documentation/filesystems/ext4.txt for details. This is a simple copy of the files in fs/ext3 to fs/ext4 and /usr/incude/linux/ext3* to /usr/include/ex4* Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-11 11:14:15 -07:00

1 2 3 4 5

232 Коммитов