Make sure nfs server errors out if request contains more ops
than channel allows.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: use helper function]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
RFC 5661 Section 18.11.3
The clientid field of the owner MAY be set to any value by the client
and MUST be ignored by the server. The reason the server MUST ignore
the clientid field is that the server MUST derive the client ID from
the session ID from the SEQUENCE operation of the COMPOUND request.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Teach the NFS server to reject invalid create_session flags.
Also do some minor formatting adjustments.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The NFS server uses nfsd_create_v3 to handle EXCLUSIVE4_1 opens, but
that function is not prepared to handle them.
Rename nfsd_create_v3() to do_nfsd_create(), and add handling of
EXCLUSIVE4_1.
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
When PUTFH is followed by an operation that uses the filehandle, and
when the current client is using a security flavor that is inconsistent
with the given filehandle, we have a choice: we can return WRONGSEC
either when the current filehandle is set using the PUTFH, or when the
filehandle is first used by the following operation.
Follow the recommendations of RFC 5661 in making this choice.
(Our current behavior prevented the client from doing security
negotiation by returning WRONGSEC on PUTFH+SECINFO_NO_NAME.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Most of the NFSD_MAY_* flags actually request permissions, but over the
years we've accreted a few that modify the behavior of the permission or
open code in other ways.
Distinguish the two cases a little more. In particular, allow the
shortcut at the start of nfsd_permission to ignore the
non-permission-requesting bits.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Currently when there's some failure to receive a callback (because we
couldn't find a matching xid, for example), we exit svc_recv with
sk_tcplen still set but without any pages saved with the socket. This
will cause a crash later in svc_tcp_restore_pages.
Instead, make sure we reset that tcp information whether the callback
received failed or succeeded.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Allow the NFSv4 server to make use of TCP autotuning behaviour, which
was previously disabled by setting the sk_userlocks variable.
Set the receive buffers to be big enough to receive the whole RPC
request, and set this for the listening socket, not the accept socket.
Remove the code that readjusts the receive/send buffer sizes for the
accepted socket. Previously this code was used to influence the TCP
window management behaviour, which is no longer needed when autotuning
is enabled.
This can improve IO bandwidth on networks with high bandwidth-delay
products, where a large tcp window is required. It also simplifies
performance tuning, since getting adequate tcp buffers previously
required increasing the number of nfsd threads.
Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Cc: Jim Rees <rees@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Ensure that we immediately read and buffer data from the incoming TCP
stream so that we grow the receive window quickly, and don't deadlock on
large READ or WRITE requests.
Also do some minor exit cleanup.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's much simpler just to copy the cb reply data than to play tricks
with pages. Callback replies will typically be very small (at least
until we implement cb_getattr, in which case files with very long ACLs
could pose a problem), so there's no loss in efficiency.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the client sents a record too short to contain even the beginning of
the rpc header, then just close the connection.
The current code drops the record data and continues. I don't see the
point. It's a hopeless situation and simpler just to cut off the
connection completely.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Minor cleanup in preparation for later patches.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Don't requeue the socket in some cases where we know it's unnecessary.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-2.6-block:
ide: always ensure that blk_delay_queue() is called if we have pending IO
block: fix request sorting at unplug
dm: improve block integrity support
fs: export empty_aops
ide: ide_requeue_and_plug() reinstate "always plug" behaviour
blk-throttle: don't call xchg on bool
ufs: remove unessecary blk_flush_plug
block: make the flush insertion use the tail of the dispatch list
block: get rid of elv_insert() interface
block: dump request state on seeing a corrupted request completion
On an error path in inotify_init1 a normal user can trigger a double
free of struct user. This is a regression introduced by a2ae4cc9a1
("inotify: stop kernel memory leak on file creation failure").
We fix this by making sure that if a group exists the user reference is
dropped when the group is cleaned up. We should not explictly drop the
reference on error and also drop the reference when the group is cleaned
up.
The new lifetime rules are that an inotify group lives from
inotify_new_group to the last fsnotify_put_group. Since the struct user
and inotify_devs are directly tied to this lifetime they are only
changed/updated in those two locations. We get rid of all special
casing of struct user or user->inotify_devs.
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: stable@kernel.org (2.6.37 and up)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just because we are not requeuing a request does not mean that
some aren't pending. So always issue a blk_delay_queue() if
either we are requeueing OR there's pending IO.
This fixes a boot problem for some IDE boxes.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Comparison function for list_sort() must be anticommutative,
otherwise it is not sorting in ordinary meaning.
But fortunately list_sort() always check ((*cmp)(priv, a, b) <= 0)
it not distinguish negative and zero, so comparison function can
implement only less-or-equal instead of full three-way comparison.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The current block integrity (DIF/DIX) support in DM is verifying that
all devices' integrity profiles match during DM device resume (which
is past the point of no return). To some degree that is unavoidable
(stacked DM devices force this late checking). But for most DM
devices (which aren't stacking on other DM devices) the ideal time to
verify all integrity profiles match is during table load.
Introduce the notion of an "initialized" integrity profile: a profile
that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
template. Add blk_integrity_is_initialized() to allow checking if a
profile was initialized.
Update DM integrity support to:
- check all devices with _initialized_ integrity profiles match
during table load; uninitialized profiles (e.g. for underlying DM
device(s) of a stacked DM device) are ignored.
- disallow a table load that would result in an integrity profile that
conflicts with a DM device's existing (in-use) integrity profile
- avoid clearing an existing integrity profile
- validate all integrity profiles match during resume; but if they
don't all we can do is report the mismatch (during resume we're past
the point of no return)
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
With the ->sync_page() hook gone, we have a few users that
add their own static address_space_operations without any
functions defined.
fs/inode.c already has an empty_aops that it uses for init
purposes. Lets export that and use it in the places where
an otherwise empty aops was defined.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
We see stalls if we don't always ensure that the queue gets run
again. Even if rq == NULL, we could have other pending requests
in the queue.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
xchg does not work portably with smaller than 32bit types.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
We already flush the per-process plugging list when context switching,
so a blk_flush_plug call just before a yield() is not needed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
It's not a preempt type request, in fact we have to insert it
behind requests that do specify INSERT_FRONT.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Merge it with __elv_add_request(), it's pretty pointless to
have a function with only two callers. The main interface
is elv_add_request()/__elv_add_request().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Currently we just dump a non-informative 'request botched' message.
Lets actually try and print something sane to help debug issues
around this.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pseries: Fix build without CONFIG_HOTPLUG_CPU
powerpc: Set nr_cpu_ids early and use it to free PACAs
powerpc/pseries: Don't register global initcall
powerpc/kexec: Fix mismatched ifdefs for PPC64/SMP.
edac/mpc85xx: Limit setting/clearing of HID1[RFXE] to e500v1/v2 cores
powerpc/85xx: Update dts for PCIe memory maps to match u-boot of Px020RDB
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: don't warn in btrfs_add_orphan
Btrfs: fix free space cache when there are pinned extents and clusters V2
Btrfs: Fix uninitialized root flags for subvolumes
btrfs: clear __GFP_FS flag in the space cache inode
Btrfs: fix memory leak in start_transaction()
Btrfs: fix memory leak in btrfs_ioctl_start_sync()
Btrfs: fix subvol_sem leak in btrfs_rename()
Btrfs: Fix oops for defrag with compression turned on
Btrfs: fix /proc/mounts info.
Btrfs: fix compiler warning in file.c
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
ipv6: Don't pass invalid dst_entry pointer to dst_release().
mlx4: fix kfree on error path in new_steering_entry()
tcp: len check is unnecessarily devastating, change to WARN_ON
sctp: malloc enough room for asconf-ack chunk
sctp: fix auth_hmacs field's length of struct sctp_cookie
net: Fix dev dev_ethtool_get_rx_csum() for forced NETIF_F_RXCSUM
usbnet: use eth%d name for known ethernet devices
starfire: clean up dma_addr_t size test
iwlegacy: fix bugs in change_interface
carl9170: Fix tx aggregation problems with some clients
iwl3945: disable hw scan by default
wireless: rt2x00: rt2800usb.c add and identify ids
iwl3945: do not deprecate software scan
mac80211: fix aggregation frame release during timeout
cfg80211: fix BSS double-unlinking (continued)
cfg80211:: fix possible NULL pointer dereference
mac80211: fix possible NULL pointer dereference
mac80211: fix NULL pointer dereference in ieee80211_key_alloc()
ath9k: fix a chip wakeup related crash in ath9k_start
mac80211: fix a crash in minstrel_ht in HT mode with no supported MCS rates
...
This is a revert of 428d2e828c.
This is broken in the same manner as for VGA: trying to write to an
invalid address on the (currently 7-bit) i2c bus.
One notable failure appears to be for MacBooks. The scary part was that
it gave the appearance of working (i.e. reporting the absence of the
panel) on various all-in-one machines with ghost LVDS panels and not
failing for laptops.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Dave Airlie <airlied@linux.ie>
Signed-off-by: Keith Packard <keithp@keithp.com>
This is a moral revert of 6ec3d0c0e9.
Following the fix to reset the GMBUS controller after a NAK, we finally
utilize the 0xa0 probe for a CRT connection. And discover that the code
is broken. Shock.
There are a number of issues, but following a key insight from Dave
Airlie, that 0xA0 is an invalid address on a 7-bit bus (though not if we
were to enable 10-bit addressing), and would look like the EDID port
0x50, it is possible to see where the confusion starts.
In short, a write to 0xA0 is accepted by the GMBUS controller which we
interpreted as meaning the existence of a connection (a slave on the
other end of the wire ACKing the write). That was false.
During testing with a broken GMBUS implementation, which never reset an
earlier NAK, this test always reported a NAK and so we proceeded on to
the next test.
Reported-and-tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=35904
Reported-and-tested-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=32612
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Dave Airlie <airlied@linux.ie>
Signed-off-by: Keith Packard <keithp@keithp.com>
Commit b3df895aeb "powerpc/kexec: Add support for FSL-BookE"
introduced the original PPC_STD_MMU_64 checks around the function
crash_kexec_wait_realmode(). Then commit c2be05481f
"powerpc: Fix default_machine_crash_shutdown #ifdef botch" changed
the ifdef around the calling site to add a check on SMP, but the
ifdef around the function itself was left unchanged, leaving an
unused function for PPC_STD_MMU_64=y and SMP=n
Rather than have two ifdefs that can get out of sync like this,
simply put the corrected conditional around the function and use
a stub to get rid of one set of ifdefs completely.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When I moved the orphan adding to btrfs_truncate I missed the fact that during
orphan cleanup we just add the orphan items to the orphan list without going
through btrfs_orphan_add, which results in lots of warnings on mount if you have
any orphan items that need to be truncated. Just remove this warning since it's
ok, this will allow all of the normal space accounting take place. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
I noticed a huge problem with the free space cache that was presenting
as an early ENOSPC. Turns out when writing the free space cache out I
forgot to take into account pinned extents and more importantly
clusters. This would result in us leaking free space everytime we
unmounted the filesystem and remounted it.
I fix this by making sure to check and see if the current block group
has a cluster and writing out any entries that are in the cluster to the
cache, as well as writing any pinned extents we currently have to the
cache since those will be available for us to use the next time the fs
mounts.
This patch also adds a check to the end of load_free_space_cache to make
sure we got the right amount of free space cache, and if not make sure
to clear the cache and re-cache the old fashioned way.
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
root_item->flags and root_item->byte_limit are not initialized when
a subvolume is created. This bug is not revealed until we added
readonly snapshot support - now you mount a btrfs filesystem and you
may find the subvolumes in it are readonly.
To work around this problem, we steal a bit from root_item->inode_item->flags,
and use it to indicate if those fields have been properly initialized.
When we read a tree root from disk, we check if the bit is set, and if
not we'll set the flag and initialize the two fields of the root item.
Reported-by: Andreas Philipp <philipp.andreas@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Andreas Philipp <philipp.andreas@gmail.com>
cc: stable@kernel.org
Signed-off-by: Chris Mason <chris.mason@oracle.com>
the object id of the space cache inode's key is allocated from the relative
root, just like the regular file. So we can't identify space cache inode by
checking the object id of the inode's key, and we have to clear __GFP_FS flag
at the time we look up the space cache inode.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Free btrfs_trans_handle when join_transaction() fails
in start_transaction()
Signed-off-by: Yoshinori Sano <yoshinori.sano@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs_rename() does not release the subvol_sem if the transaction failed to start.
Signed-off-by: Johann Lombardi <johann@whamcloud.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>