WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yan, Zheng	687265e5a8	ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL GFP_NOFS memory allocation is required for page writeback path. But there is no need to use GFP_NOFS in syscall path and readpage path Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	f66fd9f095	ceph: pre-allocate data structure that tracks caps flushing Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	e548e9b93d	ceph: re-send flushing caps (which are revoked) in reconnect stage if flushing caps were revoked, we should re-send the cap flush in client reconnect stage. This guarantees that MDS processes the cap flush message before issuing the flushing caps to other client. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	8310b08913	ceph: track pending caps flushing globally So we know TID of the oldest pending caps flushing. Later patch will send this information to MDS, so that MDS can trim its completed caps flush list. Tracking pending caps flushing globally also simplifies syncfs code. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:31 +03:00
Yan, Zheng	553adfd941	ceph: track pending caps flushing accurately Previously we do not trace accurate TID for flushing caps. when MDS failovers, we have no choice but to re-send all flushing caps with a new TID. This can cause problem because MDS can has already flushed some caps and has issued the same caps to other client. The re-sent cap flush has a new TID, which makes MDS unable to detect if it has already processed the cap flush. This patch adds code to track pending caps flushing accurately. When re-sending cap flush is needed, we use its original flush TID. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:30 +03:00
Yan, Zheng	3e0708b990	ceph: ratelimit warn messages for MDS closes session Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:30 +03:00
Ilya Dryomov	5be7303477	ceph: simplify two mount_timeout sites No need to bifurcate wait now that we've got ceph_timeout_jiffies(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:29 +03:00
Ilya Dryomov	a319bf56a6	libceph: store timeouts in jiffies, verify user input There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-06-25 11:49:29 +03:00
Yan, Zheng	e8a7b8b12b	ceph: exclude setfilelock requests when calculating oldest tid setfilelock requests can block for a long time, which can prevent client from advancing its oldest tid. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:29 +03:00
Yan, Zheng	745a8e3bcc	ceph: don't pre-allocate space for cap release messages Previously we pre-allocate cap release messages for each caps. This wastes lots of memory when there are large amount of caps. This patch make the code not pre-allocate the cap release messages. Instead, we add the corresponding ceph_cap struct to a list when releasing a cap. Later when flush cap releases is needed, we allocate the cap release messages dynamically. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:29 +03:00
Yan, Zheng	affbc19a68	ceph: make sure syncfs flushes all cap snaps Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:29 +03:00
Yan, Zheng	622f3e250f	ceph: don't trim auth cap when there are cap snaps Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:28 +03:00
Yan, Zheng	10183a6955	ceph: check OSD caps before read/write Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-06-25 11:49:28 +03:00
Linus Torvalds	9ec3a646fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only	2015-04-26 17:22:07 -07:00
Yan, Zheng	c0bd50e2ee	ceph: fix null pointer dereference in send_mds_reconnect() sb->s_root can be null when umounting Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-22 18:33:31 +03:00
Yan, Zheng	1c841a96b5	ceph: cleanup unsafe requests when reconnecting is denied Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:37 +03:00
Yan, Zheng	a9f6eb6185	ceph: don't zero i_wrbuffer_ref when reconnecting is denied remove_session_caps_cb() does not truncate dirty data in page cache, but zeros i_wrbuffer_ref/i_wrbuffer_ref_head. This will result negtive i_wrbuffer_ref/i_wrbuffer_ref_head Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:36 +03:00
Yan, Zheng	571ade336a	ceph: don't mark dirty caps when there is no auth cap No i_auth_cap means reconnecting to MDS was denied. So don't add new dirty caps. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 18:55:36 +03:00
Nicholas Mc Guire	3563dbdd99	ceph: use msecs_to_jiffies for time conversion This is only an API consolidation and should make things more readable it replaces var * HZ / 1000 by msecs_to_jiffies(var). Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 17:30:22 +03:00
Yan, Zheng	6e6f09231a	ceph: drop cap releases in requests composed before cap reconnect These cap releases are stale because MDS will re-establish client caps according to the cap reconnect messages. Note: MDS can detect stale cap messages, so these stale cap releases are harmless even we don't drop them. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-04-20 17:30:22 +03:00
David Howells	2b0143b5c9	VFS: normal filesystems (and lustre): d_inode() annotations that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-15 15:06:57 -04:00
Linus Torvalds	4533f6e27a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "On the RBD side, there is a conversion to blk-mq from Christoph, several long-standing bug fixes from Ilya, and some cleanup from Rickard Strandqvist. On the CephFS side there is a long list of fixes from Zheng, including improved session handling, a few IO path fixes, some dcache management correctness fixes, and several blocking while !TASK_RUNNING fixes. The core code gets a few cleanups and Chaitanya has added support for TCP_NODELAY (which has been used on the server side for ages but we somehow missed on the kernel client). There is also an update to MAINTAINERS to fix up some email addresses and reflect that Ilya and Zheng are doing most of the maintenance for RBD and CephFS these days. Do not be surprised to see a pull request come from one of them in the future if I am unavailable for some reason" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) MAINTAINERS: update Ceph and RBD maintainers libceph: kfree() in put_osd() shouldn't depend on authorizer libceph: fix double __remove_osd() problem rbd: convert to blk-mq ceph: return error for traceless reply race ceph: fix dentry leaks ceph: re-send requests when MDS enters reconnecting stage ceph: show nocephx_require_signatures and notcp_nodelay options libceph: tcp_nodelay support rbd: do not treat standalone as flatten ceph: fix atomic_open snapdir ceph: properly mark empty directory as complete client: include kernel version in client metadata ceph: provide seperate {inode,file}_operations for snapdir ceph: fix request time stamp encoding ceph: fix reading inline data when i_size > PAGE_SIZE ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) rbd: fix error paths in rbd_dev_refresh() ...	2015-02-19 14:14:42 -08:00
Yan, Zheng	3de22be677	ceph: re-send requests when MDS enters reconnecting stage So that MDS can check if any request is already completed and process completed requests in clientreplay stage. When completed requests are processed in clientreplay stage, MDS can avoid sending traceless replies. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:40 +03:00
Yan, Zheng	a6a5ce4f0d	client: include kernel version in client metadata Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:39 +03:00
Yan, Zheng	1f041a89b4	ceph: fix request time stamp encoding struct timespec uses 'long' to present second and nanosecond. 'long' is 64 bits on 64bits machine. ceph MDS expects time stamp to be encoded as struct ceph_timespec, which uses 'u32' to present second and nanosecond. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:39 +03:00
Yan, Zheng	86d8f67b26	ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) use an atomic variable to track number of sessions, this can avoid block operation inside wait loops. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Yan, Zheng	d3383a8e37	ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) check_cap_flush() calls mutex_lock(), which may block. So we can't use it as condition check function for wait_event(); Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Yan, Zheng	982d6011bc	ceph: improve reference tracking for snaprealm When snaprealm is created, its initial reference count is zero. But in some rare cases, the newly created snaprealm is not referenced by anyone. This causes snaprealm with zero reference count not freed. The fix is set reference count of newly snaprealm to 1. The reference is return the function who requests to create the snaprealm. When the function finishes its job, it releases the reference. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:38 +03:00
Yan, Zheng	03f4fcb028	ceph: handle SESSION_FORCE_RO message mark session as readonly and wake up all cap waiters. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2015-02-19 13:31:37 +03:00
Jeff Layton	c362781cad	ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks There is only a single call site for each of these functions, and the caller takes the i_lock prior to calling them and drops it just afterward. Move the spinlocking into the functions instead. Signed-off-by: Jeff Layton <jlayton@primarydata.com> Acked-by: Christoph Hellwig <hch@lst.de>	2015-01-16 15:09:25 -05:00
Yan, Zheng	fb01d1f8b0	ceph: parse inline data in MClientReply and MClientCaps Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
John Spray	7cfa0313d0	ceph: message versioning fixes There were two places we were assigning version in host byte order instead of network byte order. Also in MSG_CLIENT_SESSION we weren't setting compat_version in the header to reflect continued compatability with older MDSs. Fixes: http://tracker.ceph.com/issues/9945 Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-12-17 20:09:51 +03:00
Yan, Zheng	33d0733796	libceph: message signature support Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:50 +03:00
SF Markus Elfring	e96a650a81	ceph, rbd: delete unnecessary checks before two function calls The functions ceph_put_snap_context() and iput() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [idryomov@redhat.com: squashed rbd.c hunk, changelog] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:50 +03:00
Yan, Zheng	9280be24dc	ceph: fix file lock interruption When a lock operation is interrupted, current code sends a unlock request to MDS to undo the lock operation. This method does not work as expected because the unlock request can drop locks that have already been acquired. The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR requests to interrupt blocked file lock request. These requests do not drop locks that have alread been acquired, they only interrupt blocked file lock request. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:49 +03:00
John Spray	a687ecaf50	ceph: export ceph_session_state_name function ...so that it can be used from the ceph debugfs code when dumping session info. Signed-off-by: John Spray <john.spray@redhat.com>	2014-10-14 12:56:50 -07:00
Yan, Zheng	25e6bae356	ceph: use pagelist to present MDS request data Current code uses page array to present MDS request data. Pages in the array are allocated/freed by caller of ceph_mdsc_do_request(). If request is interrupted, the pages can be freed while they are still being used by the request message. The fix is use pagelist to present MDS request data. Pagelist is reference counted. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:49 -07:00
Yan, Zheng	e4339d28f6	libceph: reference counting pagelist this allow pagelist to present data that may be sent multiple times. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:48 -07:00
John Spray	dbd0c8bf79	ceph: send client metadata to MDS Implement version 2 of CEPH_MSG_CLIENT_SESSION syntax, which includes additional client metadata to allow the MDS to report on clients by user-sensible names like hostname. Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 12:56:47 -07:00
Yan, Zheng	6cd3bcad0d	ceph: move ceph_find_inode() outside the s_mutex ceph_find_inode() may wait on freeing inode, using it inside the s_mutex may cause deadlock. (the freeing inode is waiting for OSD read reply, but dispatch thread is blocked by the s_mutex) Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:39 +04:00
Yan, Zheng	03974e8177	ceph: make sure request isn't in any waiting list when kicking request. we may corrupt waiting list if a request in the waiting list is kicked. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:24 +04:00
Yan, Zheng	656e438294	ceph: protect kick_requests() with mdsc->mutex Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 21:03:24 +04:00
Yan, Zheng	5d23371fdb	ceph: trim unused inodes before reconnecting to recovering MDS So the recovering MDS does not need to fetch these ununsed inodes during cache rejoin. This may reduce MDS recovery time. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-10-14 21:03:22 +04:00
Yan, Zheng	282c105225	ceph: fix kick_requests() __do_request() may unregister the request. So we should update iterator 'p' before calling __do_request() Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-08-07 14:30:00 +04:00
Yan, Zheng	51da8e8c6f	ceph: reset r_resend_mds after receiving -ESTALE this makes __choose_mds() choose mds according caps Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-14 10:49:15 +08:00
Yan, Zheng	c5c9a0bf1b	ceph: include time stamp in replayed MDS requests Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-07-08 15:08:46 +04:00
Linus Torvalds	6d87c225f5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This has a mix of bug fixes and cleanups. Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT check when a second rbd image is mapped and a couple memory leaks. Zheng fixes several issues with fragmented directories and multiple MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's patches fix setting and unsetting RBD images read-only. Naturally there are several other cleanups mixed in for good measure" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) rbd: only set disk to read-only once rbd: move calls that may sleep out of spin lock range rbd: add ioctl for rbd ceph: use truncate_pagecache() instead of truncate_inode_pages() ceph: include time stamp in every MDS request rbd: fix ida/idr memory leak rbd: use reference counts for image requests rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync() rbd: make sure we have latest osdmap on 'rbd map' libceph: add ceph_monc_wait_osdmap() libceph: mon_get_version request infrastructure libceph: recognize poolop requests in debugfs ceph: refactor readpage_nounlock() to make the logic clearer mds: check cap ID when handling cap export message ceph: remember subtree root dirfrag's auth MDS ceph: introduce ceph_fill_fragtree() ceph: handle cap import atomically ceph: pre-allocate ceph_cap struct for ceph_add_cap() ceph: update inode fields according to issued caps rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ...	2014-06-12 23:06:23 -07:00
Fabian Frederick	f3ae1b97be	fs/ceph: replace pr_warning by pr_warn Update the last pr_warning callsites in fs branch Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Sage Weil <sage@inktank.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-06-06 16:08:06 -07:00
Sage Weil	b8e69066d8	ceph: include time stamp in every MDS request We recently modified the client/MDS protocol to include a timestamp in the client request. This allows ctime updates to follow the client's clock in most cases, which avoids subtle problems when clocks are out of sync and timestamps are updated sometimes by the MDS clock (for most requests) and sometimes by the client clock (for cap writeback). Signed-off-by: Sage Weil <sage@inktank.com>	2014-06-06 09:30:00 +08:00
Yan, Zheng	a56371d9d9	ceph: flush cap release queue when trimming session caps Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>	2014-04-04 21:08:26 -07:00

1 2 3 4 5

225 Коммитов