WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Chris Mason	247e743cbe	Btrfs: Use async helpers to deal with pages that have been improperly dirtied Higher layers sometimes call set_page_dirty without asking the filesystem to help. This causes many problems for the data=ordered and cow code. This commit detects pages that haven't been properly setup for IO and kicks off an async helper to deal with them. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	e6dcd2dc9c	Btrfs: New data=ordered implementation The old data=ordered code would force commit to wait until all the data extents from the transaction were fully on disk. This introduced large latencies into the commit and stalled new writers in the transaction for a long time. The new code changes the way data allocations and extents work: * When delayed allocation is filled, data extents are reserved, and the extent bit EXTENT_ORDERED is set on the entire range of the extent. A struct btrfs_ordered_extent is allocated an inserted into a per-inode rbtree to track the pending extents. * As each page is written EXTENT_ORDERED is cleared on the bytes corresponding to that page. * When all of the bytes corresponding to a single struct btrfs_ordered_extent are written, The previously reserved extent is inserted into the FS btree and into the extent allocation trees. The checksums for the file data are also updated. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	1b1e2135dc	Btrfs: Add a per-inode csum mutex to avoid races creating csum items Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:04 -04:00
Chris Mason	89ce8a63d0	Add btrfs_end_transaction_throttle to force writers to wait for pending commits The existing throttle mechanism was often not sufficient to prevent new writers from coming in and making a given transaction run forever. This adds an explicit wait at the end of most operations so they will allow the current transaction to close. There is no wait inside file_write, inode updates, or cow filling, all which have different deadlock possibilities. This is a temporary measure until better asynchronous commit support is added. This code leads to stalls as it waits for data=ordered writeback, and it really needs to be fixed. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	594a24eb0e	Fix btrfs_del_ordered_inode to allow forcing the drop during unlinks This allows us to delete an unlinked inode with dirty pages from the list instead of forcing commit to write these out before deleting the inode. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	a213501153	Btrfs: Replace the big fs_mutex with a collection of other locks Extent alloctions are still protected by a large alloc_mutex. Objectid allocations are covered by a objectid mutex Other btree operations are protected by a lock on individual btree nodes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	925baeddc5	Btrfs: Start btree concurrency work. The allocation trees and the chunk trees are serialized via their own dedicated mutexes. This means allocation location is still not very fine grained. The main FS btree is protected by locks on each block in the btree. Locks are taken top / down, and as processing finishes on a given level of the tree, the lock is released after locking the lower level. The end result of a search is now a path where only the lowest level is locked. Releasing or freeing the path drops any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Christoph Hellwig	f46b5a66b3	Btrfs: split out ioctl.c Split the ioctl handling out of inode.c into a file of it's own. Also fix up checkpatch.pl warnings for the moved code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	8b71284292	Btrfs: Add async worker threads for pre and post IO checksumming Btrfs has been using workqueues to spread the checksumming load across other CPUs in the system. But, workqueues only schedule work on the same CPU that queued the work, giving them a limited benefit for systems with higher CPU counts. This code adds a generic facility to schedule work with pools of kthreads, and changes the bio submission code to queue bios up. The queueing is important to make sure large numbers of procs on the system don't turn streaming workloads into random workloads by sending IO down concurrently. The end result of all of this is much higher performance (and CPU usage) when doing checksumming on large machines. Two worker pools are created, one for writes and one for endio processing. The two could deadlock if we tried to service both from a single pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Sage Weil	6bf13c0cc8	Btrfs: transaction ioctls These ioctls let a user application hold a transaction open while it performs a series of operations. A final ioctl does a sync on the fs (closing the current transaction). This is the main requirement for Ceph's OSD to be able to keep the data it's storing in a btrfs volume consistent, and AFAICS it works just fine. The application would do something like fd = ::open("some/file", O_RDONLY); ::ioctl(fd, BTRFS_IOC_TRANS_START); /* do a bunch of stuff */ ::ioctl(fd, BTRFS_IOC_TRANS_END); or just ::close(fd); And to ensure it commits to disk, ::ioctl(fd, BTRFS_IOC_SYNC); When a transaction is held open, the trans_handle is attached to the struct file (via private_data) so that it will get cleaned up if the process dies unexpectedly. A held transaction is also ended on fsync() to avoid a deadlock. A misbehaving application could also deliberately hold a transaction open, effectively locking up the FS, so it may make sense to restrict something like this to root or something. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Sven Wegener	3b96362cc8	Btrfs: Invalidate dcache entry after creating snapshot and We need to invalidate an existing dcache entry after creating a new snapshot or subvolume, because a negative dache entry will stop us from accessing the new snapshot or subvolume. --- ctree.h \| 23 +++++++++++++++++++++++ inode.c \| 4 ++++ transaction.c \| 4 ++++ 3 files changed, 31 insertions(+) Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Mingming	e1b81e6761	btrfs delete ordered inode handling fix Use btrfs_release_file instead of a put_inode call Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	211c17f51f	Fix corners in writepage and btrfs_truncate_page The extent_io writepage calls needed an extra check for discarding pages that started on th last byte in the file. btrfs_truncate_page needed checks to make sure the page was still part of the file after reading it, and most importantly, needed to wait for all IO to the page to finish before freeing the corresponding extents on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	1259ab75c6	Btrfs: Handle write errors on raid1 and raid10 When duplicate copies exist, writes are allowed to fail to one of those copies. This changeset includes a few changes that allow the FS to continue even when some IOs fail. It also adds verification of the parent generation number for btree blocks. This generation is stored in the pointer to a block, and it ensures that missed writes to are detected. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:03 -04:00
Chris Mason	bbaf549e0c	Btrfs: A number of nodatacow fixes Once part of a delalloc request fails the cow checks, just cow the entire range It is possible for the back references to all be from the same root, but still have snapshots against an extent. The checks are now more strict, forcing cow any time there are multiple refs against the data extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	a68d5933a0	Btrfs: Update nodatacow mode to support cloned single files and resizing Before, nodatacow only checked to make sure multiple roots didn't have references on a single extent. This check makes sure that multiple inodes don't have references. nodatacow needed an extra check to see if the block group was currently readonly. This way cows forced by the chunk relocation code are honored. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	a061fc8da7	Btrfs: Add support for online device removal This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	5d9cd9ecbf	Btrfs: Fix clone ioctl to not hold the path over inserts Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	b9d86667c9	Btrfs: Silence bogus inode.c compiler warnings Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Sage Weil	f2eb0a241f	Btrfs: Clone file data ioctl Add a new ioctl to clone file data Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	ec44a35cbe	Btrfs: Add balance ioctl to restripe the chunks Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	788f20eb5a	Btrfs: Add new ioctl to add devices Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	8e7bf94fd5	Btrfs: Do more optimal file RA during shrinking and defrag Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	8f18cf1339	Btrfs: Make the resizer work based on shrinking and growing devices Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	81d7ed29ff	Btrfs: Throttle file_write when data=ordered is flushing the inode Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	bcbfce8abd	Btrfs: Fix the unplug_io_fn to grab a consistent copy of page->mapping Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	e1c4b7451e	Fix btrfs_get_extent and get_block corner cases, and disable O_DIRECT reads The generic O_DIRECT code assumes all the bios have the same bdev, which isn't true for multi-device btrfs. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	f2d8d74d78	Btrfs: Make an unplug function that doesn't unplug every spindle Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	4ef64eae28	Btrfs: Remove debugging statements from the invalidatepage calls Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	9ad6b7bc2e	Force page->private removal in btrfs_invalidatepage btrfs_invalidatepage is not allowed to leave pages around on the lru. Any such pages will trigger an oops later on because the VM will see page->private and assume it is a buffer head. This also forces extra flushes of the async work queues before dropping all the pages on the btree inode during unmount. Left over items on the work queues are one possible cause of busy state ranges during truncate_inode_pages. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:02 -04:00
Chris Mason	3b951516ed	Btrfs: Use the extent map cache to find the logical disk block during data retries The data read retry code needs to find the logical disk block before it can resubmit new bios. But, finding this block isn't allowed to take the fs_mutex because that will deadlock with a number of different callers. This changes the retry code to use the extent map cache instead, but that requires the extent map cache to have the extent we're looking for. This is a problem because btrfs_drop_extent_cache just drops the entire extent instead of the little tiny part it is invalidating. The bulk of the code in this patch changes btrfs_drop_extent_cache to invalidate only a portion of the extent cache, and changes btrfs_get_extent to deal with the results. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	699122f559	Btrfs: Don't wait on tree block writeback before freeing them anymore This isn't required anymore because we don't reallocate blocks that have already been written in this transaction. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	e015640f9c	Btrfs: Write bio checksumming outside the FS mutex This significantly improves streaming write performance by allowing concurrency in the data checksumming. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	44b8bd7edd	Btrfs: Create a work queue for bio writes This allows checksumming to happen in parallel among many cpus, and keeps us from bogging down pdflush with the checksumming code. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	98d20f67cf	Add a min size parameter to btrfs_alloc_extent On huge machines, delayed allocation may try to allocate massive extents. This change allows btrfs_alloc_extent to return something smaller than the caller asked for, and the data allocation routines will loop over the allocations until it fills the whole delayed alloc. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	587f77043a	Btrfs: Fixup a few u64<->pointer casts for 32 bit Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	1643298592	Btrfs: Add O_DIRECT read and write (writes == buffered + cache flush) This adds basic O_DIRECT read and write support. In the write case, we just do a normal buffered write followed by a cache flush. O_DIRECT + O_SYNC are required to trigger metadata syncs. In the read case, there is a basic btrfs_get_block call for use by the generic O_DIRECT code. This does honor multi-volume mapping rules but it skips all checksumming. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	7e38326f5b	Btrfs: Handle checksumming errors while reading data blocks Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	f188591e98	Btrfs: Retry metadata reads in the face of checksum failures Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	22c599485b	Btrfs: Handle data block end_io through the async work queue Before it was done by the bio end_io routine, the work queue code is able to scale much better with faster IO subsystems. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	cea9e4452e	Change btrfs_map_block to return a structure with mappings for all stripes Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	8790d502e4	Btrfs: Add support for mirroring across drives Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	0416008814	Create a btrfs backing dev info This allows intelligent versions of unplug and congestion functions Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	593060d756	Btrfs: Implement raid0 when multiple devices are present Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	239b14b32d	Btrfs: Bring back mount -o ssd optimizations Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	6324fbf334	Btrfs: Dynamic chunk and block group allocation Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:01 -04:00
Chris Mason	0b86a832a1	Btrfs: Add support for multiple devices per filesystem Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	6885f308b5	Btrfs: Misc 2.6.25 updates Remove the btrfs read_inode method, and use save_mount_options Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Chris Mason	065631f6dc	Btrfs: checksum file data at bio submission time instead of during writepage When we checkum file data during writepage, the checksumming is done one page at a time, making it difficult to do bulk metadata modifications to insert checksums for large ranges of the file at once. This patch changes btrfs to checksum on a per-bio basis instead. The bios are checksummed before they are handed off to the block layer, so each bio is contiguous and only has pages from the same inode. Checksumming on a bio basis allows us to insert and modify the file checksum items in large groups. It also allows the checksumming to be done more easily by async worker threads. Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00
Yan Zheng	5e591a0703	Btrfs: Fix looping on readdir of the subvol roots Signed-off-by: Chris Mason <chris.mason@oracle.com>	2008-09-25 11:04:00 -04:00

1 2 3 4

158 Коммитов