WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Bob Peterson	9862ca056e	GFS2: Switch tr_touched to flag in transaction This patch eliminates the int variable tr_touched in favor of a new flag in the transaction. This is a step toward reducing contention on the gfs2_log_lock spin_lock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2017-01-27 08:20:13 -05:00
Bob Peterson	b63f5e8482	GFS2: Wake up io waiters whenever a flush is done Before this patch, if a process called function gfs2_log_reserve to reserve some journal blocks, but the journal not enough blocks were free, it would call io_schedule. However, in the log flush daemon, it woke up the waiters only if an gfs2_ail_flush was no longer required. This resulted in situations where processes would wait forever because the number of blocks required was so high that it pushed the journal into a perpetual state of flush being required. This patch changes the logd daemon so that it wakes up io waiters every time the log is actually flushed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2017-01-06 22:14:28 -05:00
Bob Peterson	f07b352021	GFS2: Made logd daemon take into account log demand Before this patch, the logd daemon only tried to flush things when the log blocks pinned exceeded a certain threshold. But when we're deleting very large files, it may require a huge number of journal blocks, and that, in turn, may exceed the threshold. This patch factors that into account. Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2017-01-05 16:01:45 -05:00
Christoph Hellwig	70fd76140a	block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-01 09:43:26 -06:00
Mike Christie	e1b1afa6f8	gfs2: use bio op accessors Separate the op from the rq_flag_bits and have gfs2 set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Benjamin Marzinski	400ac52e80	gfs2: clear journal live bit in gfs2_log_flush When gfs2 was unmounting filesystems or changing them to read-only it was clearing the SDF_JOURNAL_LIVE bit before the final log flush. This caused a race. If an inode glock got demoted in the gap between clearing the bit and the shutdown flush, it would be unable to reserve log space to clear out the active items list in inode_go_sync, causing an error in inode_go_inval because the glock was still dirty. To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the shutdown log flush. This means that, because of the locking on the log blocks, either inode_go_sync will be able to reserve space to clean the glock before the shutdown flush, or the shutdown flush will clean the glock itself, before inode_go_sync fails to reserve the space. Either way, the glock will be clean before inode_go_inval. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2015-12-14 12:19:41 -06:00
Benjamin Marzinski	2e60d7683c	GFS2: update freeze code to use freeze/thaw_super on all nodes The current gfs2 freezing code is considerably more complicated than it should be because it doesn't use the vfs freezing code on any node except the one that begins the freeze. This is because it needs to acquire a cluster glock before calling the vfs code to prevent a deadlock, and without the new freeze_super and thaw_super hooks, that was impossible. To deal with the issue, gfs2 had to do some hacky locking tricks to make sure that a frozen node couldn't be holding on a lock it needed to do the unfreeze ioctl. This patch makes use of the new hooks to simply the gfs2 locking code. Now, all the nodes in the cluster freeze and thaw in exactly the same way. Every node in the cluster caches the freeze glock in the shared state. The new freeze_super hook allows the freezing node to grab this freeze glock in the exclusive state without first calling the vfs freeze_super function. All the nodes in the cluster see this lock change, and call the vfs freeze_super function. The vfs locking code guarantees that the nodes can't get stuck holding the glocks necessary to unfreeze the system. To unfreeze, the freezing node uses the new thaw_super hook to drop the freeze glock. Again, all the nodes notice this, reacquire the glock in shared mode and call the vfs thaw_super function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-11-17 10:36:39 +00:00
Benjamin Marzinski	24972557b1	GFS2: remove transaction glock GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-05-14 10:04:34 +01:00
Bob Peterson	428fd95d85	GFS2: Re-add a call to log_flush_wait when flushing the journal Upstream commit `34cc178` changed a line of code from calling function log_flush_commit to calling log_write_header. This had the effect of eliminating a call to function log_flush_wait. That causes the journal to skip over log headers, which results in multiple wrap points, which itself leads to infinite loops in journal replay, both in the kernel code and fsck.gfs2 code. This patch re-adds that call. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-03-12 14:46:29 +00:00
Steven Whitehouse	b1ab1e44b4	GFS2: Remove extra "if" in gfs2_log_flush() By reordering some of the assignments in gfs2_log_flush() it is possible to remove one of the "if" statements as it can be merged with one higher up the function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-02-25 11:52:20 +00:00
Steven Whitehouse	022ef4feed	GFS2: Move log buffer accounting to transaction Now we have a master transaction into which other transactions are merged, the accounting can be done using this master transaction. We no longer require the superblock fields which were being used for this function. In addition, this allows for a clean up in calc_reserved() making it rather easier understand. Also, by reducing the number of variables used to track the buffers being added and removed from the journal, a number of error checks are now no longer required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-02-24 19:49:12 +00:00
Steven Whitehouse	d69a3c6561	GFS2: Move log buffer lists into transaction Over time, we hope to be able to improve the concurrency available in the log code. This is one small step towards that, by moving the buffer lists from the super block, and into the transaction structure, so that each transaction builds its own buffer lists. At transaction commit time, the buffer lists are merged into the currently accumulating transaction. That transaction then is passed into the before and after commit functions at journal flush time. Thus there should be no change in overall behaviour yet. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-02-24 16:54:54 +00:00
Steven Whitehouse	885bceca7f	GFS2: Plug on AIL flush When we do a flush of the AIL list, we are writing out what is likely to be a lot of small I/Os, which are possibly in an order which is not ideal performance-wise. Since this is done by calling filemap_fdatatwrite for each individual inode's address space there is no overall plugging going on. In addition to that, we do not always wait for AIL i/o when we flush it, so that it is possible for things to get left behind on the queue. By adding explicit plugging here, we reduce the chances of this being an issues. A quick test using the AIL flush tracepoint shows a small, but measurable improvement. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2014-02-03 09:57:29 +00:00
Bob Peterson	9290a9a7c0	GFS2: Fix use-after-free race when calling gfs2_remove_from_ail Function gfs2_remove_from_ail drops the reference on the bh via brelse. This patch fixes a race condition whereby bh is deferenced after the brelse when setting bd->bd_blkno = bh->b_blocknr; Under certain rare circumstances, bh might be gone or reused, and bd->bd_blkno is set to whatever that memory happens to be, which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails the test "bd->bd_blkno >= blkno" which causes it to never be freed. The end result is that the bd is never freed from the bufdata cache, which results in this error: slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-12-13 21:42:23 +00:00
Benjamin Marzinski	5d054964f5	GFS2: aggressively issue revokes in gfs2_log_flush This patch looks at all the outstanding blocks in all the transactions on the log, and moves the completed ones to the ail2 list. Then it issues revokes for these blocks. This will hopefully speed things up in situations where there is a lot of contention for glocks, especially if they are acquired serially. revoke_lo_before_commit will issue at most one log block's full of these preemptive revokes. The amount of reserved log space that gfs2_log_reserve() ignores has been incremented to allow for this extra block. This patch also consolidates the common revoke instructions into one function, gfs2_add_revoke(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-06-19 09:41:59 +01:00
Benjamin Marzinski	16ca9412d8	GFS2: replace gfs2_ail structure with gfs2_trans In order to allow transactions and log flushes to happen at the same time, gfs2 needs to move the transaction accounting and active items list code into the gfs2_trans structure. As a first step toward this, this patch removes the gfs2_ail structure, and handles the active items list in the gfs_trans structure. This keeps gfs2 from allocating an ail structure on log flushes, and gives us a struture that can later be used to store the transaction accounting outside of the gfs2 superblock structure. With this patch, at the end of a transaction, gfs2 will add the gfs2_trans structure to the superblock if there is not one already. This structure now has the active items fields that were previously in gfs2_ail. This is not necessary in the case where the transaction was simply used to add revokes, since these are never written outside of the journal, and thus, don't need an active items list. Also, in order to make sure that the transaction structure is not removed while it's still in use by gfs2_trans_end, unlocking the sd_log_flush_lock has to happen slightly later in ending the transaction. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-04-08 08:46:22 +01:00
Steven Whitehouse	4513899092	GFS2: Use ->writepages for ordered writes Instead of using a list of buffers to write ahead of the journal flush, this now uses a list of inodes and calls ->writepages via filemap_fdatawrite() in order to achieve the same thing. For most use cases this results in a shorter ordered write list, as well as much larger i/os being issued. The ordered write list is sorted by inode number before writing in order to retain the disk block ordering between inodes as per the previous code. The previous ordered write code used to conflict in its assumptions about how to write out the disk blocks with mpage_writepages() so that with this updated version we can also use mpage_writepages() for GFS2's ordered write, writepages implementation. So we will also send larger i/os from writeback too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2013-01-29 10:29:17 +00:00
Bob Peterson	c0752aa7e4	GFS2: eliminate log elements and simplify This patch eliminates the gfs2_log_element data structure and rolls its two components into the gfs2_bufdata. This makes the code easier to understand and makes it easier to migrate to a rbtree to keep the list sorted. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-05-02 09:14:36 +01:00
Steven Whitehouse	144a4c2ff7	GFS2: Log code fixes This patch removes a log lock from around atomic operation where it is not needed, removes an unused variable, and also changes a void pointer used incorrectly to a struct page pointer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:38 +01:00
Steven Whitehouse	c50b91c4bd	GFS2: Remove bd_list_tr This is another clean up in the logging code. This per-transaction list was largely unused. Its main function was to ensure that the number of buffers in a transaction was correct, however that counter was only used to check the number of buffers in the bd_list_tr, plus an assert at the end of each transaction. With the assert now changed to use the calculated buffer counts, we can remove both bd_list_tr and its associated counter. This should make the code easier to understand as well as shrinking a couple of structures. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:36 +01:00
Steven Whitehouse	e8c92ed769	GFS2: Clean up log write code path Prior to this patch, we have two ways of sending i/o to the log. One of those is used when we need to allocate both the data to be written itself and also a buffer head to submit it. This is done via sb_getblk and friends. This is used mostly for writing log headers. The other method is used when writing blocks which have some in-place counterpart. This is the case for all the metadata blocks which are journalled, and when journaled data is in use, for unescaped journalled data blocks. This patch replaces both of those two methods, and about half a dozen separate i/o submission points with a single i/o submission function. We also go direct to bio rather than using buffer heads, since this allows us to build i/o requests of the maximum size for the block device in question. It also reduces the memory required for flushing the log, which can be very useful in low memory situations. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:34 +01:00
Steven Whitehouse	fdb76a4228	GFS2: Drop "pull" argument from log_write_header() The "pull" argument to log_write_header() is only used for debug purposes and it is not really needed any more. There are other tests for this particular problem, so I think we can dispose of it in order to simplify the code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-04-24 16:44:24 +01:00
Steven Whitehouse	34cc1781c2	GFS2: Clean up log flush header writing We already send both a pre and post flush to the block device when writing a journal header. There is no need to wait for the previous I/O specifically when we do this, unless we've turned "barriers" off. As a side effect, this also cleans up the code path for flushing the journal and makes it more readable. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-03-09 14:07:06 +00:00
Steven Whitehouse	08728f2d8b	GFS2: Make bd_cmp() static Add missing static to bd_cmp() Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 17:11:27 +00:00
Bob Peterson	4a36d08d0d	GFS2: Sort the ordered write list This patch sorts the ordered write list for GFS2 writes. This increases the throughput for simultaneous writes. For example, if you have ten processes, all doing: dd if=/dev/zero of=/mnt/gfs2/fileX on different files, the throughput will be much better. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 17:10:53 +00:00
Steven Whitehouse	47ac5537a7	GFS2: Move two functions from log.c to lops.c gfs2_log_get_buf() and gfs2_log_fake_buf() are both used only in lops.c, so move them next to their callers and they can then become static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-02-28 17:09:59 +00:00
Linus Torvalds	eb59c505f8	Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits) PM / Hibernate: Implement compat_ioctl for /dev/snapshot PM / Freezer: fix return value of freezable_schedule_timeout_killable() PM / shmobile: Allow the A4R domain to be turned off at run time PM / input / touchscreen: Make st1232 use device PM QoS constraints PM / QoS: Introduce dev_pm_qos_add_ancestor_request() PM / shmobile: Remove the stay_on flag from SH7372's PM domains PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode PM: Drop generic_subsys_pm_ops PM / Sleep: Remove forward-only callbacks from AMBA bus type PM / Sleep: Remove forward-only callbacks from platform bus type PM: Run the driver callback directly if the subsystem one is not there PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412. PM / Sleep: Merge internal functions in generic_ops.c PM / Sleep: Simplify generic system suspend callbacks PM / Hibernate: Remove deprecated hibernation snapshot ioctls PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled() ARM: S3C64XX: Implement basic power domain support PM / shmobile: Use common always on power domain governor ... Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused XBT_FORCE_SLEEP bit	2012-01-08 13:10:57 -08:00
Tejun Heo	a0acae0e88	freezer: unexport refrigerator() and update try_to_freeze() slightly There is no reason to export two functions for entering the refrigerator. Calling refrigerator() instead of try_to_freeze() doesn't save anything noticeable or removes any race condition. * Rename refrigerator() to __refrigerator() and make it return bool indicating whether it scheduled out for freezing. * Update try_to_freeze() to return bool and relay the return value of __refrigerator() if freezing(). * Convert all refrigerator() users to try_to_freeze(). * Update documentation accordingly. * While at it, add might_sleep() to try_to_freeze(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org>	2011-11-21 12:32:22 -08:00
Steven Whitehouse	20ed0535d3	GFS2: Fix up REQ flags Christoph has split up REQ_PRIO from REQ_META. That means that we can drop REQ_PRIO from places where is it not needed. I'm not at all sure that the combination WRITE_FLUSH_FUA \| REQ_PRIO makes any kind of sense, anyway. In addition, I've added REQ_META to one place in the code where it was missing. REQ_PRIO has been left for read/writes triggered by glock acquisition and writeback only. We can adjust it again if required, but these are the most important points from a performance perspective. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>	2011-11-08 09:51:53 +00:00
Christoph Hellwig	65299a3b78	block: separate priority boosting from REQ_META Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule, and lave REQ_META purely for marking requests as metadata in blktrace. All existing callers of REQ_META except for XFS are updated to also set REQ_PRIO for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-08-23 14:50:29 +02:00
Steven Whitehouse	380f7c65a7	GFS2: Resolve inode eviction and ail list interaction bug This patch contains a few misc fixes which resolve a recently reported issue. This patch has been a real team effort and has received a lot of testing. The first issue is that the ail lock needs to be held over a few more operations. The lock thats added into gfs2_releasepage() may possibly be a candidate for replacing with RCU at some future point, but at this stage we've gone for the obvious fix. The second issue is that gfs2_write_inode() can end up calling a glock recursively when called from gfs2_evict_inode() via the syncing code, so it needs a guard added. The third issue is that we either need to not truncate the metadata pages of inodes which have zero link count, but which we cannot deallocate due to them still being in use by other nodes, or we need to ensure that those pages have all made it through the journal and ail lists first. This patch takes the former approach, but the latter has also been tested and there is nothing to choose between them performance-wise. So again, we could revise that decision in the future. Also, the inode eviction process is now better documented. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Reported-by: Barry J. Marson <bmarson@redhat.com> Reported-by: David Teigland <teigland@redhat.com>	2011-07-14 08:59:44 +01:00
Steven Whitehouse	26b06a6958	GFS2: Wait properly when flushing the ail list The ail flush code has always relied upon log flushing to prevent it from spinning needlessly. This fixes it to wait on the last I/O request submitted (we don't need to wait for all of it) instead of either spinning with io_schedule or sleeping. As a result cpu usage of gfs2_logd is much reduced with certain workloads. Reported-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-21 19:21:07 +01:00
Steven Whitehouse	4f1de01821	GFS2: Fix ail list traversal In the recent patches to update the AIL list code, I managed to forget that the ail list lock got dropped, even though I added a comment specifically to remind myself :( Reported-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-05-03 11:48:07 +01:00
Steven Whitehouse	c83ae9cad8	GFS2: Add an AIL writeback tracepoint Add a tracepoint for monitoring writeback of the AIL. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 09:01:58 +01:00
Steven Whitehouse	4667a0ec32	GFS2: Make writeback more responsive to system conditions This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 09:01:37 +01:00
Steven Whitehouse	5ac048bb7e	GFS2: Use filemap_fdatawrite() to write back the AIL In order to ensure that the mapping stats (and thus the bdi) are correctly updated, this patch changes the AIL writeback to use the filemap_datawrite function. This helps prevent stalls in balance_dirty_pages() due to large amounts of dirty metadata when there is little or no dirty data around. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-04-20 08:59:25 +01:00
Linus Torvalds	6c51038900	Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}	2011-03-24 10:16:26 -07:00
Steven Whitehouse	c618e87a5f	GFS2: Update to AIL list locking The previous patch missed a couple of places where the AIL list needed locking, so this fixes up those places, plus a comment is corrected too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>	2011-03-14 12:40:29 +00:00
Dave Chinner	d6a079e82e	GFS2: introduce AIL lock The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2011-03-11 11:52:25 +00:00
Jens Axboe	721a9602e6	block: kill off REQ_UNPLUG With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2011-03-10 08:52:27 +01:00
Jens Axboe	fa251f8990	Merge branch 'v2.6.36-rc8' into for-2.6.37/barrier Conflicts: block/blk-core.c drivers/block/loop.c mm/swapfile.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-10-19 09:13:04 +02:00
Steven Whitehouse	5f4874903d	GFS2: gfs2_logd should be using interruptible waits Looks like this crept in, in a recent update. Reported-by: Krzysztof Urbaniak <urban@bash.org.pl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-09-17 14:00:10 +01:00
Christoph Hellwig	f1e4d518c3	gfs2: replace barriers with explicit flush / FUA usage Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-09-10 12:35:39 +02:00
Christoph Hellwig	7b6d91daee	block: unify flags for struct bio and struct request Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:20:39 +02:00
Christoph Hellwig	41f2df6289	block: BARRIER request should imply SYNC A barrier request should by defintion have priority in get_request and let the queue be unplugged immediately as it's blocking all forward progress due to the queue draining. Most filesystems already get this implicitly by the way how submit_bh treats the buffer_ordered flag, and gfs2 sets it explicitly. But btrfs and XFS are still forgetting to set the flag, as is blkdev_issue_flush and some places in DM/MD. For XFS on metadata heavy workloads this gives a consistent speedup in the 2-3% range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>	2010-08-07 18:15:44 +02:00
Bob Peterson	ed4878e8a4	GFS2: Rework reclaiming unlinked dinodes The previous patch I wrote for reclaiming unlinked dinodes had some shortcomings and did not prevent all hangs. This version is much cleaner and more logical, and has passed very difficult testing. Sorry for the churn. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-21 16:11:36 +01:00
Steven Whitehouse	913a71d250	GFS2: Add some useful messages The following patch adds a message to indicate when barriers have been disabled due to a block device which doesn't support them. You could already tell this via the mount options in /proc/mounts, but all the other filesystems also log a message at the same time. Also, the same mechanisms are used to indicate when the lock demote interface has been used (only ever used for debugging) which is a request from our support team. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-06 11:03:29 +01:00
Benjamin Marzinski	5e687eac1b	GFS2: Various gfs2_logd improvements This patch contains various tweaks to how log flushes and active item writeback work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits for gfs2_logd to do the log flushing. Multiple functions were rewritten to remove the need to call gfs2_log_lock(). Instead of using one test to see if gfs2_logd had work to do, there are now seperate tests to check if there are two many buffers in the incore log or if there are two many items on the active items list. This patch is a port of a patch Steve Whitehouse wrote about a year ago, with some minor changes. Since gfs2_ail1_start always submits all the active items, it no longer needs to keep track of the first ai submitted, so this has been removed. In gfs2_log_reserve(), the order of the calls to prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has been switched. If it called wake_up first there was a small window for a race, where logd could run and return before gfs2_log_reserve was ready to get woken up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve() would be left waiting for gfs2_logd to eventualy run because it timed out. Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times out, and flushes the log, can now be set on mount with ar_commit. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-05-05 09:39:18 +01:00
Benjamin Marzinski	2e95e3f668	GFS2: Allow the number of committed revokes to temporarily be negative GFS2 tracks the number of revokes and unrevokes that are part of committed transactions via sd_log_commited_revoke. It is possible for one process to add revokes during its transaction, while another process unrevokes them during its transaction. If the second process finishes its transaction first, sd_log_commited_revoke will be decremented by the number of unrevokes that the second process did, without first being incremented by the number of revokes the first process did. This is fine, since all started transactions must be completed before the journal can be flushed. However, sd_log_commited_revoke is an unsigned integer, and log_refund() causes an assertion failure if it would go negative at the end of a transaction. This patch makes sd_log_commited_revoke a signed integer and allows it to go negative. __gfs2_log_flush() still checks that it mataches the actual number of revokes. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2010-03-11 09:50:46 +00:00
Steven Whitehouse	0ab7d13fcb	GFS2: Tag all metadata with jid There are two spare field in the header common to all GFS2 metadata. One is just the right size to fit a journal id in it, and this patch updates the journal code so that each time a metadata block is modified, we tag it with the journal id of the node which is performing the modification. The reason for this is that it should make it much easier to debug issues which arise if we can tell which node was the last to modify a particular metadata block. Since the field is updated before the block is written into the journal, each journal should only contain metadata which is tagged with its own journal id. The one exception to this is the journal header block, which might have a different node's id in it, if that journal was recovered by another node in the cluster. Thus each journal will contain a record of which nodes recovered it, via the journal header. The other field in the metadata header could potentially be used to hold information about what kind of operation was performed, but for the time being we just zero it on each transaction so that if we use it for that in future, we'll know that the information (where it exists) is reliable. I did consider using the other field to hold the journal sequence number, however since in GFS2's journaling we write the modified data into the journal and not the original data, this gives no information as to what action caused the modification, so I think we can probably come up with a better use for those 64 bits in the future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-12-03 11:58:47 +00:00
Steven Whitehouse	63997775b7	GFS2: Add tracepoints This patch adds the ability to trace various aspects of the GFS2 filesystem. The trace points are divided into three groups, glocks, logging and bmap. These points have been chosen because they allow inspection of the major internal functions of GFS2 and they are also generic enough that they are unlikely to need any major changes as the filesystem evolves. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-06-12 08:49:20 +01:00
Christoph Hellwig	b7d245de25	gfs2: remove ->write_super and stop maintaining ->s_dirt Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-11 21:36:05 -04:00
Steven Whitehouse	c969f58ca4	GFS2: Update the rw flags After Jens recent updates: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a1f242524c3c1f5d40f1c9c343427e34d1aadd6e et al. this is a patch to bring gfs2 uptodate with the core code. Also I've managed to squash another call to ll_rw_block() along the way. There is still one part of the GFS2 I/O paths which are not correctly annotated and that is due to the sharing of the writeback code between the data and metadata address spaces. I would like to change that too, but this patch is still worth doing on its own, I think. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-05-11 12:36:41 +01:00
Steven Whitehouse	f057f6cdf6	GFS2: Merge lock_dlm module into GFS2 This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2009-03-24 11:21:14 +00:00
Steven Whitehouse	254db57f9b	GFS2: Support for I/O barriers This patch adds barrier support to GFS2. There is not a lot of change really... we just add the barrier flag when we write journal header blocks. If the underlying device refuses to support them, we fall back to the previous way of doing things (wait for the I/O and hope) since there is nothing else we can do. There is no user configuration, barriers will always be on unless the device refuses to support them. This seems a reasonable solution to me since this is a correctness issue. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-09-26 10:23:22 +01:00
Harvey Harrison	2d81afb879	[GFS2] trivial sparse lock annotations Annotate the &sdp->sd_log_lock. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-06-27 09:39:31 +01:00
Roel Kluin	62be1f7167	[GFS2] fix assertion in log_refund() since unsigned, unused >= 0 is always true. Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-04-18 08:36:09 +01:00
Bob Peterson	d0109bfa84	[GFS2] Only do lo_incore_commit once This patch is performance related. When we're doing a log flush, I noticed we were calling buf_lo_incore_commit twice: once for data bufs and once for metadata bufs. Since this is the same function and does the same thing in both cases, there should be no reason to call it twice. Since we only need to call it once, we can also make it faster by removing it from the generic "lops" code and making it a stand-along static function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:39:54 +01:00
Steven Whitehouse	ac39aadd04	[GFS2] Fix assert in log code Although the values were all being calculated correctly, there was a race in the assert due to the way it was using atomic variables. This changes the value we assert on so that we get the same effect by testing a different variable. This prevents the assert triggering when it shouldn't. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:18:03 +00:00
Steven Whitehouse	ff91cc9bb4	[GFS2] Fix log block mapper A missing offset in the calculation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:15:58 +00:00
Bob Peterson	da6dd40d59	[GFS2] Journal extent mapping This patch saves a little time when gfs2 writes to the journals by keeping a mapping between logical and physical blocks on disk. That's better than constantly looking up indirect pointers in buffers, when the journals are several levels of indirection (which they typically are). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:11:46 +00:00
Bob Peterson	e9e1ef2b6e	[GFS2] Remove function gfs2_get_block This patch is just a cleanup. Function gfs2_get_block() just calls function gfs2_block_map reversing the last two parameters. By reversing the parameters, gfs2_block_map() may be called directly and function gfs2_get_block may be eliminated altogether. Since this function is done for every block operation, this streamlines the code and makes it a little bit more efficient. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:25 +00:00
Fabio Massimo Di Nitto	1a2781cfa5	[GFS2] Fix runtime issue with UP kernels The issue is indeed UP vs SMP and it is totally random. spin_is_locked() is a bad assertion because there is no correct answer on UP. on UP spin_is_locked() has to return either one value or another, always. This means that in my setup I am lucky enough to trigger the issue and your you are lucky enough not to. the patch in attachment removes the bogus calls to BUG_ON and according to David (in CC and thanks for the long explanation on the problem) we can rely upon things like lockdep to find problem that might be trying to catch. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:08:06 +00:00
Steven Whitehouse	e35b921185	[GFS2] Don't periodically update the jindex We only care about the content of the jindex in two cases, one is when we mount the fs and the other is when we need to recover another journal. In both cases we have to update the jindex anyway, so there is no point in updating it periodically between times, so this removes it to simplify gfs2_logd. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:59 +00:00
Steven Whitehouse	ec69b18883	[GFS2] Move gfs2_logd into log.c This means that we can mark gfs2_ail1_empty static and prepares the way for further changes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:56 +00:00
Steven Whitehouse	fd041f0b40	[GFS2] Use atomic_t for journal free blocks counter This patch changes the counter which keeps track of the free blocks in the journal to an atomic_t in preparation for the following patch which will update the log reservation code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:54 +00:00
Steven Whitehouse	2bcd610d2f	[GFS2] Don't add glocks to the journal The only reason for adding glocks to the journal was to keep track of which locks required a log flush prior to release. We add a flag to the glock to allow this check to be made in a simpler way. This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64) and means that we can avoid extra work during the journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:52 +00:00
Steven Whitehouse	b8e7cbb65b	[GFS2] Add writepages for GFS2 jdata This patch resolves a lock ordering issue where we had been getting a transaction lock in the wrong order with respect to the page lock. By using writepages rather than just writepage, it is then possible to start a transaction before locking the page, and thus matching the locking order elsewhere in the code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:28 +00:00
Steven Whitehouse	f91a0d3e24	[GFS2] Remove useless i_cache from inodes The i_cache was designed to keep references to the indirect blocks used during block mapping so that they didn't have to be looked up continually. The idea failed because there are too many places where the i_cache needs to be freed, and this has in the past been the cause of many bugs. In addition there was no performance benefit being gained since the disk blocks in question were cached anyway. So this patch removes it in order to simplify the code to prepare for other changes which would otherwise have had to add further support for this feature. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-01-25 08:07:16 +00:00
Steven Whitehouse	5a60c532c9	[GFS2] Get superblock a different way The mapping may be NULL by the time the I/O has completed, so we now get the superblock by a different route (via the bd and glock) to avoid this problem. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Wendy Cheng <wcheng@redhat.com>	2007-10-10 08:56:34 +01:00
Steven Whitehouse	16615be18c	[GFS2] Clean up journaled data writing This patch cleans up the code for writing journaled data into the log. It also removes the need to allocate a small "tag" structure for each block written into the log. Instead we just keep count of the outstanding I/O so that we can be sure that its all been written at the correct time. Another result of this patch is that a number of ll_rw_block() calls have become submit_bh() calls, closing some races at the same time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:56:24 +01:00
Steven Whitehouse	1ad38c437f	[GFS2] Clean up gfs2_trans_add_revoke() The following alters gfs2_trans_add_revoke() to take a struct gfs2_bufdata as an argument. This eliminates the memory allocation which was previously required by making use of the already existing struct gfs2_bufdata. It makes some sanity checks to ensure that the gfs2_bufdata has been removed from all the lists before its recycled as a revoke structure. This saves one memory allocation and one free per revoke structure. Also as a result, and to simplify the locking, since there is no longer any blocking code in gfs2_trans_add_revoke() we must hold the log lock whenever this function is called. This reduces the amount of times we take and unlock the log lock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:56:12 +01:00
Steven Whitehouse	d7b616e252	[GFS2] Clean up ordered write code The following patch removes the ordered write processing from databuf_lo_before_commit() and moves it to log.c. This has the effect of greatly simplyfying databuf_lo_before_commit() and well as potentially making the ordered write code more efficient. As a side effect of this, its now possible to remove ordered buffers from the ordered buffer list at any time, so we now make use of this in invalidatepage and releasepage to ensure timely release of these buffers. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:56:03 +01:00
Steven Whitehouse	1e1a3d03e9	[GFS2] Introduce gfs2_remove_from_ail This collects together the operations required to remove a gfs2_bufdata from the ail lists. Its only called from two places to start with, but expect to see more of this function in future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:55:55 +01:00
Steven Whitehouse	bb3b0e3df5	[GFS2] Clean up invalidatepage/releasepage This patch fixes some bugs relating to journaled data files by cleaning up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now never block during gfs2_releasepage(), instead we always either release or refuse to release depending on the status of the buffers. This fixes Red Hat bugzillas #248969 and #252392. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>	2007-10-10 08:55:29 +01:00
Bob Peterson	0f8468c8be	[GFS2] Detach buf data during in-place writeback This is patch 5 of 5 for bug #248176 Metadata corruption was occurring because page references weren't being removed in all cases. I previously added a function called detach_bufdata, but I discovered there already WAS a function out there to do the job. It's called gfs2_meta_cache_flush. So I added a call to that to remove the page references. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:55:01 +01:00
Bob Peterson	693ddeabbb	[GFS2] Revert part of earlier log.c changes This is patch 2 of 5 for bug #248176. The list_move code previously concocted in log.c for bug #238162 (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23) never runs as bh can now never be NULL at this point. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-10-10 08:54:53 +01:00
Steven Whitehouse	a0a24741ca	[GFS2] Small fixes to logging code This reverts part of an earlier patch which tried to reclaim gfs2_bufdata structures too early and resulted in a "use after free" case (this bit from me). Also a change to not write out log headers unless we really need to (in the case of flushing nothing we don't need a header) from Bob. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2007-07-09 15:43:07 +01:00
Robert Peterson	2332c4435b	[GFS2] assertion failure after writing to journaled file, umount This patch passes all my nasty tests that were causing the code to fail under one circumstance or another. Here is a complete summary of all changes from today's git tree, in order of appearance: 1. There are now separate variables for metadata buffer accounting. 2. Variable sd_log_num_hdrs is no longer needed, since the header accounting is taken care of by the reserve/refund sequence. 3. Fixed a tiny grammatical problem in a comment. 4. Added a new function "calc_reserved" to calculate the reserved log space. This isn't entirely necessary, but it has two benefits: First, it simplifies the gfs2_log_refund function greatly. Second, it allows for easier debugging because I could sprinkle the code with calls to this function to make sure the accounting is proper (by adding asserts and printks) at strategic point of the code. 5. In log_pull_tail there apparently was a kludge to fix up the accounting based on a "pull" parameter. The buffer accounting is now done properly, so the kludge was removed. 6. File sync operations were making a call to gfs2_log_flush that writes another journal header. Since that header was unplanned for (reserved) by the reserve/refund sequence, the free space had to be decremented so that when log_pull_tail gets called, the free space is be adjusted properly. (Did I hear you call that a kludge? well, maybe, but a lot more justifiable than the one I removed). 7. In the gfs2_log_shutdown code, it optionally syncs the log by specifying the PULL parameter to log_write_header. I'm not sure this is necessary anymore. It just seems to me there could be cases where shutdown is called while there are outstanding log buffers. 8. In the (data)buf_lo_before_commit functions, I changed some offset values from being calculated on the fly to being constants. That simplified some code and we might as well let the compiler do the calculation once rather than redoing those cycles at run time. 9. This version has my rewritten databuf_lo_add function. This version is much more like its predecessor, buf_lo_add, which makes it easier to understand. Again, this might not be necessary, but it seems as if this one works as well as the previous one, maybe even better, so I decided to leave it in. 10. In databuf_lo_before_commit, a previous data corruption problem was caused by going off the end of the buffer. The proper solution is to have the proper limit in place, rather than stopping earlier. (Thus my previous attempt to fix it is wrong). If you don't wrap the buffer, you're stopping too early and that causes more log buffer accounting problems. 11. In lops.h there are two new (previously mentioned) constants for figuring out the data offset for the journal buffers. 12. There are also two new functions, buf_limit and databuf_limit to calculate how many entries will fit in the buffer. 13. In function gfs2_meta_wipe, it needs to distinguish between pinned metadata buffers and journaled data buffers for proper journal buffer accounting. It can't use the JDATA gfs2_inode flag because it's sometimes passed the "real" inode and sometimes the "metadata inode" and the inode flags will be random bits in a metadata gfs2_inode. It needs to base its decision on which was passed in. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:47 +01:00
Robert Peterson	8fb68595d5	[GFS2] Journaled file write/unstuff bug This patch is for bugzilla bug 283162, which uncovered a number of bugs pertaining to writing to files that have the journaled bit on. These bugs happen most often when writing to the meta_fs because the files are always journaled. So operations like gfs2_grow were particularly vulnerable, although many of the problems could be recreated with normal files after setting the journaled bit on. The problems fixed are: -GFS2 wasn't ever writing unstuffed journaled data blocks to their in-place location on disk. Now it does. -If you unmounted too quickly after doing IO to a journaled file, GFS2 was crashing because you would discard a buffer whose bufdata was still on the active items list. GFS2 now deals with this gracefully. -GFS2 was losing track of the bufdata for journaled data blocks, and it wasn't getting freed, causing an error when you tried to unmount the module. GFS2 now frees all the bufdata structures. -There was a memory corruption occurring because GFS2 wrote twice as many log entries for journaled buffers. -It was occasionally trying to write journal headers in buffers that weren't currently mapped. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:40 +01:00
Benjamin Marzinski	ddf4b426aa	[GFS2] fix jdata issues This is a patch for the first three issues of RHBZ #238162 The first issue is that when you allocate a new page for a file, it will not start off uptodate. This makes sense, since you haven't written anything to that part of the file yet. Unfortunately, gfs2_pin() checks to make sure that the buffers are uptodate. The solution to this is to mark the buffers uptodate in gfs2_commit_write(), after they have been zeroed out and have the data written into them. I'm pretty confident with this fix, although it's not completely obvious that there is no problem with marking the buffers uptodate here. The second issue is simply that you can try to pin a data buffer that is already on the incore log, and thus, already pinned. This patch checks to see if this buffer is already on the log, and exits databuf_lo_add() if it is, just like buf_lo_add() does. The third issue is that gfs2_log_flush() doesn't do it's block accounting correctly. Both metadata and journaled data are logged, but gfs2_log_flush() only compares the number of metadata blocks with the number of blocks to commit to the ondisk journal. This patch also counts the journaled data blocks. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:08 +01:00
Steven Whitehouse	89918647a4	[GFS2] Make the log reserved blocks depend on block size The number of blocks which we reserve in the log at the start of each transaction needs to depends upon the block size since the overhead is related to the number of "pointers" which can be fitted into a single block. This relates to Red Hat bz #240435 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-07-09 08:23:03 +01:00
Ryusuke Konishi	aed3255f22	[GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning Fix a printk format warning in fs/gfs2/log.c: fs/gfs2/log.c:322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t' Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:37:04 -05:00
Steven Whitehouse	a25311c8e0	[GFS2] Move gfs2_meta_syncfs() into log.c By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start() can be made static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:36:45 -05:00
Steven Whitehouse	b004157ab5	[GFS2] Fix journal flush problem This fixes a bug which resulted in poor performance due to flushing the journal too often. The code path in question was via the inode_go_sync() function in glops.c. The solution is not to flush the journal immediately when inodes are ejected from memory, but batch up the work for glockd to deal with later on. This means that glocks may now live on beyond the end of the lifetime of their inodes (but not very much longer in the normal case). Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in calculation of the number of free journal blocks. The gfs2_logd process has been altered to be more responsive to the journal filling up. We now wake it up when the number of uncommitted journal blocks has reached the threshold level rather than trying to flush directly at the end of each transaction. This again means doing fewer, but larger, log flushes in general. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-11-30 10:36:42 -05:00
Steven Whitehouse	23591256d6	[GFS2] Fix bmap to map extents properly This fix means that bmap will map extents of the length requested by the VFS rather than guessing at it, or just mapping one block at a time. The other callers of gfs2_block_map are audited to ensure they send the correct max extent lengths (i.e. set bh->b_size correctly). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-10-20 09:13:40 -04:00
Steven Whitehouse	fe1a698ffe	[GFS2] Fix bug where lock not held The log lock needs to be held when manipulating the counter for the number of free journal blocks. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-10-12 17:10:55 -04:00
Steven Whitehouse	ddacfaf76d	[GFS2] Move logging code into log.c (mostly) This moves the logging code from meta_io.c into log.c and glops.c. As a result the routines can now be static and all the logging code is together in log.c, leaving meta_io.c with just metadata i/o code in it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-10-03 11:10:41 -04:00
Steven Whitehouse	74669416f7	[GFS2] Use list_for_each_entry_safe_reverse in gfs2_ail1_start() This is an attempt to fix Red Hat bz 204364. I don't hit it all the time, but with these changes, running postmark which used to trigger it on a regular basis no longer appears to. So I'm not saying that its 100% certain that its fixed, but it does look promising at the moment. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-19 11:17:38 -04:00
Fabio Massimo Di Nitto	7d308590ae	[GFS2] Export lm_interface to kernel headers lm_interface.h has a few out of the tree clients such as GFS1 and userland tools. Right now, these clients keeps a copy of the file in their build tree that can go out of sync. Move lm_interface.h to include/linux, export it to userland and clean up fs/gfs2 to use the new location. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-19 08:45:18 -04:00
Steven Whitehouse	7a6bbacbb8	[GFS2] Map multiple blocks at once where possible This is a tidy up of the GFS2 bmap code. The main change is that the bh is passed to gfs2_block_map allowing the flags to be set directly rather than having to repeat that code several times in ops_address.c. At the same time, the extent mapping code from gfs2_extent_map has been moved into gfs2_block_map. This allows all calls to gfs2_block_map to map extents in the case that no allocation is taking place. As a result reads and non-allocating writes should be faster. A quick test with postmark appears to support this. There is a limit on the number of blocks mapped in a single bmap call in that it will only ever map blocks which are pointed to from a single pointer block. So in other words, it will never try to do additional i/o in order to satisfy read-ahead. The maximum number of blocks is thus somewhat less than 512 (the GFS2 4k block size minus the header divided by sizeof(u64)). I've further limited the mapping of "normal" blocks to 32 blocks (to avoid extra work) since readpages() will currently read a maximum of 32 blocks ahead (128k). Some further work will probably be needed to set a suitable value for DIO as well, but for now thats left at the maximum 512 (see ops_address.c:gfs2_get_block_direct). There is probably a lot more that can be done to improve bmap for GFS2, but this is a good first step. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-18 17:18:23 -04:00
Steven Whitehouse	faa31ce85f	[GFS2] Tidy up log.c Based upon previous feedback from lkml and also removing some commented out debugging which is no longer needed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-13 11:13:27 -04:00
Jan Engelhardt	c53921248c	[GFS2] More style changes Remove redundant brackets Signed-off-by: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-07 09:42:56 -04:00
Steven Whitehouse	a67cdbd457	[GFS2] Style changes in logging code As per Jan Engelhardt's comments, removed some unused code and removed some brackets which were not required. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-05 14:41:30 -04:00
Steven Whitehouse	cd915493fc	[GFS2] Change all types to uX style This makes all fixed size types have consistent names. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-04 12:49:07 -04:00
Steven Whitehouse	e9fc2aa091	[GFS2] Update copyright, tidy up incore.h As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this updates the copyright message to say "version" in full rather than "v.2". Also incore.h has been updated to remove forward structure declarations which are not required. The gfs2_quota_lvb structure has now had endianess annotations added to it. Also quota.c has been updated so that we now store the lvb data locally in endian independant format to avoid needing a structure in host endianess too. As a result the endianess conversions are done as required at various points and thus the conversion routines in lvb.[ch] are no longer required. I've moved the one remaining constant in lvb.h thats used into lm.h and removed the unused lvb.[ch]. I have not changed the HIF_ constants. That is left to a later patch which I hope will unify the gh_flags and gh_iflags fields of the struct gfs2_holder. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-09-01 11:05:15 -04:00
Benjamin Marzinski	5dc39fe621	[GFS2] Fix journal off-by-one error log_refund() incorrectly assumed that if a transaction had been touched, it always committed buffers to the incore log. Thus, when you got around to flushing the log, you would need one more block than you committed, to account for the header. So it automatically set reserved to 1, which had the effect of making sdp->sd_log_blks_reserved one greater when you got to gfs2_log_flush(). However, if you don't actually commit anything to the incore log between flushes, you don't need the header, because you aren't writing anything out. With this patch, log_refund() only increments reservered to account for the header if something has been committed since the last flush. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-08-25 09:57:41 -04:00
Steven Whitehouse	59a1cc6bda	[GFS2] Fix lock ordering bug in page fault path Mmapped files were able to trigger a lock ordering bug. Private maps do not need to take the glock so early on. Shared maps do unfortunately, however we can get around that by adding a flag into the flags for the struct gfs2_file. This only works because we are taking an exclusive lock at this point, so we know that nobody else can be racing with us. Fixes Red Hat bugzilla: #201196 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-08-04 15:41:22 -04:00
Steven Whitehouse	e0f2bf780a	[GFS2] Fix endian conversion bug Fix an endian coversion bug in log.c spotted by Kevin Anderson. Cc: Kevin Anderson <kanderso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-07-17 09:36:28 -04:00
Steven Whitehouse	fd4de2d41a	[GFS2] Add cast for printk Cast a uint64_t to unsigned long long for a printk. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-07-05 13:14:59 -04:00
Steven Whitehouse	feaa7bba02	[GFS2] Fix unlinked file handling This patch fixes the way we have been dealing with unlinked, but still open files. It removes all limits (other than memory for inodes, as per every other filesystem) on numbers of these which we can support on GFS2. It also means that (like other fs) its the responsibility of the last process to close the file to deallocate the storage, rather than the person who did the unlinking. Note that with GFS2, those two events might take place on different nodes. Also there are a number of other changes: o We use the Linux inode subsystem as it was intended to be used, wrt allocating GFS2 inodes o The Linux inode cache is now the point which we use for local enforcement of only holding one copy of the inode in core at once (previous to this we used the glock layer). o We no longer use the unlinked "special" file. We just ignore it completely. This makes unlinking more efficient. o We now use the 4th block allocation state. The previously unused state is used to track unlinked but still open inodes. o gfs2_inoded is no longer needed o Several fields are now no longer needed (and removed) from the in core struct gfs2_inode o Several fields are no longer needed (and removed) from the in core superblock There are a number of future possible optimisations and clean ups which have been made possible by this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-06-14 15:32:57 -04:00
Steven Whitehouse	3a8a9a1034	[GFS2] Update copyright date to 2006 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-05-18 15:09:15 -04:00
Steven Whitehouse	bd8968010a	[GFS2] Remove semaphore.h from C files We no longer use semaphores, everything has been converted to mutex or rwsem, so we don't need to include this header any more. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-05-18 14:54:58 -04:00
Steven Whitehouse	fd88de569b	[GFS2] Readpages support This adds readpages support (and also corrects a small bug in the readpage error path at the same time). Hopefully this will improve performance by allowing GFS to submit larger lumps of I/O at a time. In order to simplify the setting of BH_Boundary, it currently gets set when we hit the end of a indirect pointer block. There is always a boundary at this point with the current allocation code. It doesn't get all the boundaries right though, so there is still room for improvement in this. See comments in fs/gfs2/ops_address.c for further information about readpages with GFS2. Signed-off-by: Steven Whitehouse	2006-05-05 16:59:11 -04:00
Steven Whitehouse	a74604bee2	[GFS2] sem -> mutex conversion in locking.c Convert a semaphore to a mutex in locking.c and also tidy up one or two loose ends. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-04-21 15:10:46 -04:00
Steven Whitehouse	190562bd84	[GFS2] Fix a bug: scheduling under a spinlock At some stage, a mutex was added to gfs2_glock_put() without checking all its call sites. Two of them were called from under a spinlock causing random delays at various points and crashes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-04-20 16:57:23 -04:00
Steven Whitehouse	f4154ea039	[GFS2] Update journal accounting code. A small update to the journaling code to change the way that the "extra" blocks are accounted for in the journal. These are used at a rate of one per 503 metadata blocks or one per 251 journaled data blocks (or just one if the total number of journaled blocks in the transaction is smaller). Since we are using them at two different rates the old method of accounting for them no longer works and we count them up as required. Since the "per transaction" accounting can't handle this (there is no fixed number of header blocks per transaction) we have to account for it in the general journal code. We now require that each transaction reserves more blocks than it actually needs to take account of the possible extra blocks. Also a final fix to dir.c to ensure that all ref counts are handled correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-04-11 14:49:06 -04:00
Steven Whitehouse	ed3865079b	[GFS2] Finally get ref counting correct The last patch missed some other instances of incorrect ref counting, this fixes all of those too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-04-07 16:28:07 -04:00
Steven Whitehouse	b09e593d79	[GFS2] Fix a ref count bug and other clean ups This fixes a ref count bug that sometimes showed up a umount time (causing it to hang) but it otherwise mostly harmless. At the same time there are some clean ups including making the log operations structures const, moving a memory allocation so that its not done in the fast path of checking to see if there is an outstanding transaction related to a particular glock. Removes the sd_log_wrap varaible which was updated, but never actually used anywhere. Updates the gfs2 ioctl() to run without the kernel lock (which it never needed anyway). Removes the "invalidate inodes" loop from GFS2's put_super routine. This is done in kill super anyway so we don't need to do it here. The loop was also bogus in that if there are any inodes "stuck" at this point its a bug and we need to know about it rather than hide it by hanging forever. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-04-07 11:17:32 -04:00
Steven Whitehouse	e3167ded1f	[GFS] Fix bug in endian conversion for metadata header In some cases 16 bit functions were being used rather than 32 bit functions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-03-30 15:46:23 -05:00
Steven Whitehouse	484adff8a0	[GFS2] Update locking in log.c Replace the lock_for_trans()/lock_for_flush() functions with an rwsem. In fact the sd_log_flush_lock becomes an rwsem (the write part of it) and is extended slightly to cover everything that the lock_for_flush() used to cover. The read part of the lock is instead of lock_for_trans(). This corrects the races in the original code and reduces the code size. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-03-29 09:12:12 -05:00
Steven Whitehouse	71b86f562b	[GFS2] Further updates to dir and logging code This reduces the size of the directory code by about 3k and gets readdir() to use the functions which were introduced in the previous directory code update. Two memory allocations are merged into one. Eliminates zeroing of some buffers which were never used before they were initialised by other data. There is still scope for further improvement in the directory code. On the logging side, a hand created mutex has been replaced by a standard Linux mutex in the log allocation code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-03-28 14:14:04 -05:00
Steven Whitehouse	b4dc72911d	[GFS2] Fix some bugs Fix a bug I introduced earlier with a kfree() and usage of a structure in the wrong order. Also try and get the counts of the journaled data buffers "more correct". Still some work to do in this area though. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-03-01 17:41:58 -05:00
Steven Whitehouse	5c676f6d35	[GFS2] Macros removal in gfs2.h As suggested by Pekka Enberg <penberg@cs.helsinki.fi>. The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h The other macros are gone from gfs2.h as (although not requested by Pekka Enberg) are a number of included header file which are now included individually. The inode number comparison function is now an inline function. The DT2IF and IF2DT may be addressed in a future patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-02-27 17:23:27 -05:00
Steven Whitehouse	568f4c9659	[GFS2] 80 Column audit of GFS2 Requested by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-02-27 12:00:42 -05:00
David Teigland	6a6b3d018f	[GFS2] Patch to remove stats gathering from GFS2 Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-02-23 10:11:47 +00:00
Steven Whitehouse	f55ab26a8f	[GFS2] Use mutices rather than semaphores As well as a number of minor bug fixes, this patch changes GFS to use mutices rather than semaphores. This results in better information in case there are any locking problems. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-02-21 12:51:39 +00:00
Steven Whitehouse	7359a19cc7	[GFS2] Fix for root inode ref count bug Umount is now working correctly again. The bug was due to not getting an extra ref count when mounting the fs. We should have bumped it by two (once for the internal pointer to the root inode from the super block and once for the inode hanging off the dcache entry for root). Also this patch tidys up the code dealing with looking up and creating inodes. We now pass Linux inodes (with gfs2_inodes attached) rather than the other way around and this reduces code duplication in various places. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-02-13 12:27:43 +00:00
Steven Whitehouse	18ec7d5c3f	[GFS2] Make journaled data files identical to normal files on disk This is a very large patch, with a few still to be resolved issues so you might want to check out the previous head of the tree since this is known to be unstable. Fixes for the various bugs will be forthcoming shortly. This patch removes the special data format which has been used up till now for journaled data files. Directories still retain the old format so that they will remain on disk compatible with earlier releases. As a result you can now do the following with journaled data files: 1) mmap them 2) export them over NFS 3) convert to/from normal files whenever you want to (the zero length restriction is gone) In addition the level at which GFS' locking is done has changed for all files (since they all now use the page cache) such that the locking is done at the page cache level rather than the level of the fs operations. This should mean that things like loopback mounts and other things which touch the page cache directly should now work. Current known issues: 1. There is a lock mode inversion problem related to the resource group hold function which needs to be resolved. 2. Any significant amount of I/O causes an oops with an offset of hex 320 (NULL pointer dereference) which appears to be related to a journaled data buffer appearing on a list where it shouldn't be. 3. Direct I/O writes are disabled for the time being (will reappear later) 4. There is probably a deadlock between the page lock and GFS' locks under certain combinations of mmap and fs operation I/O. 5. Issue relating to ref counting on internally used inodes causes a hang on umount (discovered before this patch, and not fixed by it) 6. One part of the directory metadata is different from GFS1 and will need to be resolved before next release. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-02-08 11:50:51 +00:00
Steven Whitehouse	4f3df04137	[GFS2] Change memory allocations to GFP_NOFS I'd like to be rid of these memory allocations entirely so far as is possible. For the moment though, mark them GFP_NOFS to make them less harmful. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-01-18 13:20:16 +00:00
David Teigland	b3b94faa5f	[GFS2] The core of GFS2 This patch contains all the core files for GFS2. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2006-01-16 16:50:04 +00:00

1 2 3 4 5

221 Коммитов