Граф коммитов

519790 Коммитов

Автор SHA1 Сообщение Дата
Mike Snitzer b61d950962 dm cache: prefix all DMERR and DMINFO messages with cache device name
Having the DM device name associated with the ERR or INFO message is
very helpful.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-11 17:13:01 -04:00
Joe Thornber 028ae9f76f dm cache: add fail io mode and needs_check flag
If a cache metadata operation fails (e.g. transaction commit) the
cache's metadata device will abort the current transaction, set a new
needs_check flag, and the cache will transition to "read-only" mode.  If
aborting the transaction or setting the needs_check flag fails the cache
will transition to "fail-io" mode.

Once needs_check is set the cache device will not be allowed to
activate.  Activation requires write access to metadata.  Future work is
needed to add proper support for running the cache in read-only mode.

Once in fail-io mode the cache will report a status of "Fail".

Also, add commit() wrapper that will disallow commits if in read_only or
fail mode.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-11 17:13:00 -04:00
Joe Thornber 88bf5184fa dm cache: wake the worker thread every time we free a migration object
When the cache is idle, writeback work was only being issued every
second.  With this change outstanding writebacks are streamed
constantly.  This offers a writeback performance improvement.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-11 17:13:00 -04:00
Joe Thornber 66a6363566 dm cache: add stochastic-multi-queue (smq) policy
The stochastic-multi-queue (smq) policy addresses some of the problems
with the current multiqueue (mq) policy.

Memory usage
------------

The mq policy uses a lot of memory; 88 bytes per cache block on a 64
bit machine.

SMQ uses 28bit indexes to implement it's data structures rather than
pointers.  It avoids storing an explicit hit count for each block.  It
has a 'hotspot' queue rather than a pre cache which uses a quarter of
the entries (each hotspot block covers a larger area than a single
cache block).

All these mean smq uses ~25bytes per cache block.  Still a lot of
memory, but a substantial improvement nontheless.

Level balancing
---------------

MQ places entries in different levels of the multiqueue structures
based on their hit count (~ln(hit count)).  This means the bottom
levels generally have the most entries, and the top ones have very
few.  Having unbalanced levels like this reduces the efficacy of the
multiqueue.

SMQ does not maintain a hit count, instead it swaps hit entries with
the least recently used entry from the level above.  The over all
ordering being a side effect of this stochastic process.  With this
scheme we can decide how many entries occupy each multiqueue level,
resulting in better promotion/demotion decisions.

Adaptability
------------

The MQ policy maintains a hit count for each cache block.  For a
different block to get promoted to the cache it's hit count has to
exceed the lowest currently in the cache.  This means it can take a
long time for the cache to adapt between varying IO patterns.
Periodically degrading the hit counts could help with this, but I
haven't found a nice general solution.

SMQ doesn't maintain hit counts, so a lot of this problem just goes
away.  In addition it tracks performance of the hotspot queue, which
is used to decide which blocks to promote.  If the hotspot queue is
performing badly then it starts moving entries more quickly between
levels.  This lets it adapt to new IO patterns very quickly.

Performance
-----------

In my tests SMQ shows substantially better performance than MQ.  Once
this matures a bit more I'm sure it'll become the default policy.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-06-11 17:12:59 -04:00
Joe Thornber 40775257b9 dm cache: boost promotion of blocks that will be overwritten
When considering whether to move a block to the cache we already give
preferential treatment to discarded blocks, since they are cheap to
promote (no read of the origin required since the data is junk).

The same is true of blocks that are about to be completely
overwritten, so we likewise boost their promotion chances.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:07 -04:00
Joe Thornber 651f5fa2a3 dm cache: defer whole cells
Currently individual bios are deferred to the worker thread if they
cannot be processed immediately (eg, a block is in the process of
being moved to the fast device).

This patch passes whole cells across to the worker.  This saves
reaquiring the cell, and also collects bios destined for the same block
together, which allows them to be mapped with a single look up to the
policy.  This reduces the overhead of using dm-cache.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:06 -04:00
Joe Thornber 3cdf93f9d8 dm bio prison: add dm_cell_promote_or_release()
Rather than always releasing the prisoners in a cell, the client may
want to promote one of them to be the new holder.  There is a race here
though between releasing an empty cell, and other threads adding new
inmates.  So this function makes the decision with its lock held.

This function can have two outcomes:
i)  An inmate is promoted to be the holder of the cell (return value of 0).
ii) The cell has no inmate for promotion and is released (return value of 1).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:06 -04:00
Joe Thornber 451b9e0071 dm cache: pull out some bitset utility functions for reuse
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:05 -04:00
Joe Thornber 20f6814b94 dm cache: pass a new 'critical' flag to the policies when requesting writeback work
We only allow non critical writeback if the origin is idle.  It is up
to the policy to decide what writeback work is critical.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:04 -04:00
Joe Thornber 066dbaa386 dm cache: track IO to the origin device using io_tracker
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:04 -04:00
Joe Thornber 77289d3207 dm cache: add io_tracker
A little class that keeps track of the volume of io that is in flight,
and the length of time that a device has been idle for.

FIXME: rather than jiffes, may be best to use ktime_t (to support faster
devices).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:03 -04:00
Joe Thornber fb4100ae7f dm cache: fix race when issuing a POLICY_REPLACE operation
There is a race between a policy deciding to replace a cache entry,
the core target writing back any dirty data from this block, and other
IO threads doing IO to the same block.

This sort of problem is avoided most of the time by the core target
grabbing a bio prison cell before making the request to the policy.
But for a demotion the core target doesn't know which block will be
demoted, so can't do this in advance.

Fix this demotion race by introducing a callback to the policy interface
that allows the policy to grab the cell on behalf of the core target.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-05-29 14:19:03 -04:00
Milan Broz 54cea3f668 dm crypt: add comments to better describe crypto processing logic
A crypto driver can process requests synchronously or asynchronously
and can use an internal driver queue to backlog requests.
Add some comments to clarify internal logic and completion return codes.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:02 -04:00
Lidong Zhong ed63287dd6 dm raid1: keep issuing IO after leg failure
Currently if there is a leg failure, the bio will be put into the hold
list until userspace does a remove/replace on the leg.  Doing so in a
cluster config (clvmd) is problematic because there may be a temporary
path failure that results in cluster raid1 remove/replace.  Such
recovery takes a long time due to a full resync.

Update dm-raid1 to optionally ignore these failures so bios continue
being issued without interrupton.  To enable this feature userspace
must pass "keep_log" when creating the dm-raid1 device.

Signed-off-by: Lidong Zhong <lzhong@suse.com>
Tested-by: Liuhua Wang <lwang@suse.com>
Acked-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:02 -04:00
Geert Uytterhoeven f4ad317aed dm log writes: use ULL suffix for 64-bit constants
On 32-bit:
drivers/md/dm-log-writes.c: In function ‘log_super’:
drivers/md/dm-log-writes.c:323: warning: integer constant is too large for ‘long’ type

Add a ULL suffix to WRITE_LOG_MAGIC to fix this.
Also add a ULL suffix to WRITE_LOG_VERSION as it's stored in a __le64
field.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:01 -04:00
Luis Henriques e223e1de4f dm stripe: drop useless exit point from dm_stripe_init()
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:01 -04:00
Heinz Mauelshagen 0cf4503174 dm raid: add support for the MD RAID0 personality
Add dm-raid access to the MD RAID0 personality to enable single zone
striping.

The following changes enable that access:
- add type definition to raid_types array
- make bitmap creation conditonal in super_validate(), because
  bitmaps are not allowed in raid0
- set rdev->sectors to the data image size in super_validate()
  to allow the raid0 personality to calculate the MD array
  size properly
- use mdddev(un)lock() functions instead of direct mutex_(un)lock()
  (wrapped in here because it's a trivial change)
- enhance raid_status() to always report full sync for raid0
  so that userspace checks for 100% sync will succeed and allow
  for resize (and takeover/reshape once added in future paches)
- enhance raid_resume() to not load bitmap in case of raid0
- add merge function to avoid data corruption (seen with readahead)
  that resulted from bio payloads that grew too large.  This problem
  did not occur with the other raid levels because it either did not
  apply without striping (raid1) or was avoided via stripe caching.
- raise version to 1.7.0 because of the raid0 API change

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:00 -04:00
Heinz Mauelshagen c76d53f43e dm raid: a few cleanups
- ensure maximum device limit in superblock
- rename DMPF_* (print flags) to CTR_FLAG_* (constructor flags)
  and their respective struct raid_set member
- use strcasecmp() in raid10_format_to_md_layout() as in the constructor

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:00 -04:00
Heinz Mauelshagen 0f4106b32f dm raid: fixup documentation for discard support
Remove comment above parse_raid_params() that claims
"devices_handle_discard_safely" is a table line argument when it is
actually is a module parameter.

Also, backfill dm-raid target version 1.6.0 documentation.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:18:59 -04:00
Mike Snitzer 49f154c732 dm thin metadata: remove in-core 'read_only' flag
Leverage the block manager's read_only flag instead of duplicating it;
access with new dm_bm_is_read_only() method.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:18:59 -04:00
Mike Snitzer f8ae75253e dm thin: cleanup schedule_zero() to read more logically
The overwrite has only ever about optimizing away the need to zero a
block if the entire block was being overwritten.  As such it is only
relevant when zeroing is enabled.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
2015-05-29 14:18:58 -04:00
Mike Snitzer 8b908f8e94 dm thin: cleanup overwrite's endio restore to be centralized
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:18:58 -04:00
Mike Snitzer 0f20972f7b dm: factor out a common cleanup_mapped_device()
Introduce a single common method for cleaning up a DM device's
mapped_device.  No functional change, just eliminates duplication of
delicate mapped_device cleanup code.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:18:58 -04:00
Mike Snitzer 2d76fff18f dm: cleanup methods that requeue requests
More often than not a request that is requeued _is_ mapped (meaning the
clone request is allocated and clone->q is initialized).  Rename
dm_requeue_unmapped_original_request() to avoid potential confusion due
to function name containing "unmapped".

Also, remove dm_requeue_unmapped_request() since callers can easily call
the dm_requeue_original_request() directly.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:18:57 -04:00
Mike Snitzer cbc4e3c135 dm: do not allocate any mempools for blk-mq request-based DM
Do not allocate the io_pool mempool for blk-mq request-based DM
(DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools().

Also refine __bind_mempools() to have more precise awareness of which
mempools each type of DM device uses -- avoids mempool churn when
reloading DM tables (particularly for DM_TYPE_REQUEST_BASED).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:18:57 -04:00
Mike Snitzer 183f7802e7 Merge remote-tracking branch 'jens/for-4.2/core' into dm-4.2 2015-05-29 14:17:16 -04:00
Joe Thornber 1c220c69ce dm: fix casting bug in dm_merge_bvec()
dm_merge_bvec() was originally added in f6fccb ("dm: introduce
merge_bvec_fn").  In that commit a value in sectors is converted to
bytes using << 9, and then assigned to an int.  This code made
assumptions about the value of BIO_MAX_SECTORS.

A later commit 148e51 ("dm: improve documentation and code clarity in
dm_merge_bvec") was meant to have no functional change but it removed
the use of BIO_MAX_SECTORS in favor of using queue_max_sectors().  At
this point the cast from sector_t to int resulted in a zero value.  The
fallout being dm_merge_bvec() would only allow a single page to be added
to a bio.

This interim fix is minimal for the benefit of stable@ because the more
comprehensive cleanup of passing a sector_t to all DM targets' merge
function will impact quite a few DM targets.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.19+
2015-05-29 13:41:16 -04:00
Junichi Nomura 15b94a6904 dm: fix reload failure of 0 path multipath mapping on blk-mq devices
dm-multipath accepts 0 path mapping.

  # echo '0 2097152 multipath 0 0 0 0' | dmsetup create newdev

Such a mapping can be used to release underlying devices while still
holding requests in its queue until working paths come back.

However, once the multipath device is created over blk-mq devices,
it rejects reloading of 0 path mapping:

  # echo '0 2097152 multipath 0 0 1 1 queue-length 0 1 1 /dev/sda 1' \
      | dmsetup create mpath1
  # echo '0 2097152 multipath 0 0 0 0' | dmsetup load mpath1
  device-mapper: reload ioctl on mpath1 failed: Invalid argument
  Command failed

With following kernel message:
  device-mapper: ioctl: can't change device type after initial table load.

DM tries to inherit the current table type using dm_table_set_type()
but it doesn't work as expected because of unnecessary check about
whether the target type is hybrid or not.

Hybrid type is for targets that work as either request-based or bio-based
and not required for blk-mq or non blk-mq checking.

Fixes: 65803c2059 ("dm table: train hybrid target type detection to select blk-mq if appropriate")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 13:41:16 -04:00
Mike Snitzer e5d8de32cc dm: fix false warning in free_rq_clone() for unmapped requests
When stacking request-based dm device on non blk-mq device and
device-mapper target could not map the request (error target is used,
multipath target with all paths down, etc), the WARN_ON_ONCE() in
free_rq_clone() will trigger when it shouldn't.

The warning was added by commit aa6df8d ("dm: fix free_rq_clone() NULL
pointer when requeueing unmapped request").  But free_rq_clone() with
clone->q == NULL is valid usage for the case where
dm_kill_unmapped_request() initiates request cleanup.

Fix this false warning by just removing the WARN_ON -- it only generated
false positives and was never useful in catching the intended case
(completing clone request not being mapped e.g. clone->q being NULL).

Fixes: aa6df8d ("dm: fix free_rq_clone() NULL pointer when requeueing unmapped request")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reported-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 11:07:36 -04:00
Mike Snitzer 45714fbed4 dm: requeue from blk-mq dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
Use BLK_MQ_RQ_QUEUE_BUSY to requeue a blk-mq request directly from the
DM blk-mq device's .queue_rq.  This cleans up the previous convoluted
handling of request requeueing that would return BLK_MQ_RQ_QUEUE_OK
(even though it wasn't) and then run blk_mq_requeue_request() followed
by blk_mq_kick_requeue_list().

Also, document that DM blk-mq ontop of old request_fn devices cannot
fail in clone_rq() since the clone request is preallocated as part of
the pdu.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-27 17:37:23 -04:00
Mike Snitzer 4c6dd53dd3 dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path
Otherwise kmemleak reported:

unreferenced object 0xffff88009b14e2b0 (size 16):
  comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
  hex dump (first 16 bytes):
    40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
  backtrace:
    [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
    [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
    [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
    [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
    [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
    [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
    [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
    [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
    [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
    [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
    [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
    [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
    [<ffffffff812c0ff2>] submit_bio+0x72/0x150
    [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
    [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
    [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390

Fixes: e5863d9ad ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+
2015-05-27 17:37:22 -04:00
Junichi Nomura 3a1407559a dm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED
When stacking request-based DM on blk_mq device, request cloning and
remapping are done in a single call to target's clone_and_map_rq().
The clone is allocated and valid only if clone_and_map_rq() returns
DM_MAPIO_REMAPPED.

The "IS_ERR(clone)" check in map_request() does not cover all the
!DM_MAPIO_REMAPPED cases that are possible (E.g. if underlying devices
are not ready or unavailable, clone_and_map_rq() may return
DM_MAPIO_REQUEUE without ever having established an ERR_PTR).  Fix this
by explicitly checking for a return that is not DM_MAPIO_REMAPPED in
map_request().

Without this fix, DM core may call setup_clone() for a NULL clone
and oops like this:

   BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
   IP: [<ffffffff81227525>] blk_rq_prep_clone+0x7d/0x137
   ...
   CPU: 2 PID: 5793 Comm: kdmwork-253:3 Not tainted 4.0.0-nm #1
   ...
   Call Trace:
    [<ffffffffa01d1c09>] map_tio_request+0xa9/0x258 [dm_mod]
    [<ffffffff81071de9>] kthread_worker_fn+0xfd/0x150
    [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
    [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
    [<ffffffff81071fdd>] kthread+0xe6/0xee
    [<ffffffff81093a59>] ? put_lock_stats+0xe/0x20
    [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
    [<ffffffff814c2d98>] ret_from_fork+0x58/0x90
    [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b

Fixes: e5863d9ad ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+
2015-05-27 09:48:51 -04:00
Julia Lawall f6454b049d block: fix returnvar.cocci warnings
Remove unneeded variable used to store return value.

Generated by: scripts/coccinelle/misc/returnvar.cocci

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-26 14:02:45 -06:00
Junichi Nomura 4ae9944d13 dm: run queue on re-queue
Without kicking queue, requeued request may stay forever in
the queue if there are no other I/O activities to the device.

The original error had been in v2.6.39 with commit 7eaceaccab
("block: remove per-queue plugging"), which replaced conditional
plugging by periodic runqueue.

Commit 9d1deb83d4 in v4.1-rc1 removed the periodic runqueue
and the problem started to manifest.

Fixes: 9d1deb83d4 ("dm: don't schedule delayed run of the queue if nothing to do")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-26 09:57:36 -04:00
Linus Torvalds ba155e2d21 Linux 4.1-rc5 2015-05-24 18:22:35 -07:00
Linus Torvalds 5b13966693 SCSI fixes on 20150523
This is a set of five fixes: Two MAINTAINER email updates (urgent because the
 non-avagotech emails will start bouncing) an lpfc big endian oops fix, a 256
 byte sector hang fix (to eliminate 256 byte sectors) and a storvsc fix which
 could cause test unit ready failures on bringup.
 
 Signed-off-by: James Bottomley <JBottomley@Odin.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJVYPemAAoJEDeqqVYsXL0MK8UIAJJ8ViLPTfhweb8QgHsUjc4S
 5x2jPUD69XT93G81NWKvFhYBnlAvktBZuy6uON7SkLWkbwgiu3pB1Iowqie+Z/h8
 1C5VwC8oaT6GZPuakRRxnYmoAaCX7kbdMsibSvc5ouTgvM5FdailhJpkDKcFQlkn
 t1GqmRQyH1P54O73af2d+z3AfNfivjoFFXz1YHcUOouiKBVyXzqk1c3CIwTQMelg
 B547Rz2cqgdDtgQzZzPkfust/GOYs0ZQMU3rhjeNyIvw/5xa0/Ta2m/v16lckV2I
 AvjK2Ht5XSr/Hc9m4yvr5vYCT5+mVrebYMghnUqpHjx1OGTBSU9lPT0exbCCJ/o=
 =lXOs
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of five fixes: Two MAINTAINER email updates (urgent
  because the non-avagotech emails will start bouncing) an lpfc big
  endian oops fix, a 256 byte sector hang fix (to eliminate 256 byte
  sectors) and a storvsc fix which could cause test unit ready failures
  on bringup"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  MAINTAINERS: Revise lpfc maintainers for Avago Technologies ownership of Emulex
  MAINTAINERS, be2iscsi: change email domain
  sd: Disable support for 256 byte/sector disks
  lpfc: Fix breakage on big endian kernels
  storvsc: Set the SRB flags correctly when no data transfer is needed
2015-05-24 11:15:28 -07:00
Linus Torvalds c5db6a3bde Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
 "One more fix from the timer departement:

    - Handle division of negative nanosecond values proper on 32bit.

      A recent cleanup wrecked the sign handling of the dividend and
      dropped the check for negative divisors"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ktime: Fix ktime_divns to do signed division
2015-05-23 17:57:40 -07:00
Linus Torvalds 9bd4459a3d Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irqchip fix from Thomas Gleixner:
 "A fix for a GIC-V3 irqchip regression which prevents some systems from
  booting"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gicv3-its: ITS table size should not be smaller than PSZ
2015-05-23 17:33:10 -07:00
Linus Torvalds 086e8ddb56 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two Ceph fixes from Sage Weil:
 "These fix an issue with the RBD notifications when there are topology
  changes in the cluster"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  Revert "libceph: clear r_req_lru_item in __unregister_linger_request()"
  libceph: request a new osdmap if lingering request maps to no osd
2015-05-23 11:28:25 -07:00
Linus Torvalds 7ce14f6ff2 Merge branch 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I fixed up a regression from 4.0 where conversion between different
  raid levels would sometimes bail out without converting.

  Filipe tracked down a race where it was possible to double allocate
  chunks on the drive.

  Mark has a fix for fiemap.  All three will get bundled off for stable
  as well"

* 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix regression in raid level conversion
  Btrfs: fix racy system chunk allocation when setting block group ro
  btrfs: clear 'ret' in btrfs_check_shared() loop
2015-05-23 11:14:10 -07:00
Linus Torvalds cf539cbd8a Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Radeon has two displayport fixes, one for a regression.

  i915 regression flicker fix needed so 4.0 can get fixed.

  A bunch of msm fixes and a bunch of exynos fixes, these two are
  probably a bit larger than I'd like, but most of them seems pretty
  good"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (29 commits)
  drm/radeon: fix error flag checking in native aux path
  drm/radeon: retry dcpd fetch
  drm/msm/mdp5: fix incorrect parameter for msm_framebuffer_iova()
  drm/exynos: dp: Lower level of EDID read success message
  drm/exynos: cleanup exynos_drm_plane
  drm/exynos: 'win' is always unsigned
  drm/exynos: mixer: don't dump registers under spinlock
  drm/exynos: Consolidate return statements in fimd_bind()
  drm/exynos: Constify exynos_drm_crtc_ops
  drm/exynos: Fix build breakage on !DRM_EXYNOS_FIMD
  drm/exynos: mixer: Constify platform_device_id
  drm/exynos: mixer: cleanup pixelformat handling
  drm/exynos: mixer: also allow NV21 for the video processor
  drm/exynos: mixer: remove buffer count handling in vp_video_buffer()
  drm/exynos: plane: honor buffer offset for dma_addr
  drm/exynos: fb: use drm_format_num_planes to get buffer count
  drm/i915: fix screen flickering
  drm/msm: fix locking inconsistencies in gpu->destroy()
  drm/msm/dsi: Simplify the code to get the number of read byte
  drm/msm: Attach assigned encoder to eDP and DSI connectors
  ...
2015-05-22 17:34:24 -07:00
Linus Torvalds 0b6280c620 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Don't leak ipvs->sysctl_tbl, from Tommi Rentala.

 2) Fix neighbour table entry leak in rocker driver, from Ying Xue.

 3) Do not emit bonding notifications for unregistered interfaces, from
    Nicolas Dichtel.

 4) Set ipv6 flow label properly when in TIME_WAIT state, from Florent
    Fourcot.

 5) Fix regression in ipv6 multicast filter test, from Henning Rogge.

 6) do_replace() in various footables netfilter modules is missing a
    check for 0 counters in the datastructure provided by the user.  Fix
    from Dave Jones, and found with trinity.

 7) Fix RCU bug in packet scheduler classifier module unloads, from
    Daniel Borkmann.

 8) Avoid deadlock in tcp_get_info() by using u64_sync.  From Eric
    Dumzaet.

 9) Input packet processing can race with inetdev_destroy() teardown,
    fix potential OOPS in ip_error() by explicitly testing whether the
    inetdev is still attached.  From Eric W Biederman.

10) MLDv2 parser in bridge multicast code breaks too early while
    parsing.  Fix from Thadeu Lima de Souza Cascardo.

11) Asking for settings on non-zero PHYID doesn't work because we do not
    import the command structure from the user and use the PHYID
    provided there.  Fix from Arun Parameswaran.

12) Fix UDP checksums with IPV6 RAW sockets, from Vlad Yasevich.

13) Missing NF_TABLES depends for TPROXY etc can cause build failures,
    fix from Florian Westphal.

14) Fix netfilter conntrack to handle RFC5961 challenge ACKs properly,
    from Jesper Dangaard Brouer.

15) If netlink autobind retry fails, we have to reset the sockets portid
    back to zero.  From Herbert Xu.

16) VXLAN netns exit code unregisters using wrong device, from John W
    Linville.

17) Add some USB device IDs to ath3k and btusb bluetooth drivers, from
    Dmitry Tunin and Wen-chien Jesse Sung.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  bridge: fix lockdep splat
  net: core: 'ethtool' issue with querying phy settings
  bridge: fix parsing of MLDv2 reports
  ARM: zynq: DT: Use the zynq binding with macb
  net: macb: Disable half duplex gigabit on Zynq
  net: macb: Document zynq gem dt binding
  ipv4: fill in table id when replacing a route
  cdc_ncm: Fix tx_bytes statistics
  ipv4: Avoid crashing in ip_error
  tcp: fix a potential deadlock in tcp_get_info()
  net: sched: fix call_rcu() race on classifier module unloads
  net: phy: Make sure phy_start() always re-enables the phy interrupts
  ipv6: fix ECMP route replacement
  ipv6: do not delete previously existing ECMP routes if add fails
  Revert "netfilter: bridge: query conntrack about skb dnat"
  netfilter: ensure number of counters is >0 in do_replace()
  netfilter: nfnetlink_{log,queue}: Register pernet in first place
  tcp: don't over-send F-RTO probes
  tcp: only undo on partial ACKs in CA_Loss
  net/ipv6/udp: Fix ipv6 multicast socket filter regression
  ...
2015-05-22 15:44:50 -07:00
Linus Torvalds 1c8df7bd48 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Three small fixes that have been picked up the last few weeks.
  Specifically:

   - Fix a memory corruption issue in NVMe with malignant user
     constructed request.  From Christoph.

   - Kill (now) unused blk_queue_bio(), dm was changed to not need this
     anymore.  From Mike Snitzer.

   - Always use blk_schedule_flush_plug() from the io_schedule() path
     when flushing a plug, fixing a !TASK_RUNNING warning with md.  From
     Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  sched: always use blk_schedule_flush_plug in io_schedule_out
  nvme: fix kernel memory corruption with short INQUIRY buffers
  block: remove export for blk_queue_bio
2015-05-22 15:15:30 -07:00
Linus Torvalds a30ec4b347 md fixes for 4.1-rc4
- one serious RAID0 data corruption - caused by recent bugfix that wasn't
   reviewed properly.
 - one raid5 fix in new code (a couple more of those to come).
 - one little fix to stop static analysis complaining about silly rcu
   annotation.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIVAwUAVV6rmDnsnt1WYoG5AQICpQ//VleOWnZERlVQN1nYzGHqIyPUz+dV6x8W
 RfzccMzUWqaHpZY/BaTMqJxPWgt6lxqqtN5F3u+XMdM+L2Pl0NXoRTqB+FzpLucR
 d0p76FwbW2IWnycxXUZtGjm8xT942KPdqXIP+LQRKaIhguV9P9th9WbPRvlXFfB6
 pO1h1AdxVlsfn6qgBpe49pYoUuNmpwSTTJ8Gvb7YRsHXnmB/lsjQvVsWLiMqPwug
 +bR0gM07OvTfYenrY82ztu42+xAoVoeu/XhHmPPLEV3S/SA/9kVLgBcWjq2bBjw0
 REzR2i+exEtCRe2yvetDX8320IKZcZSOE8BwBB2iUm8OVjplPxw83Bk/kB2+Zr8n
 VxAa9/vCZPLNCWyzOSi2V471NeFBys9PVkVIroHT5nQZGadkw0JYt1cEbcHSBKS/
 NYmJCu1IM0o9DPzn8jW7sXjDcB0WE8VwFMJ09QSiEYQu2bAb7j2f1pcoweN2MeNF
 qOitGj/luygXg2TBvema9qPL533DUj+eSYyyO7OQBsfkfZSDM5di4bGoZBVw118Q
 kARaJ3IZffT8gsVpgSTCsviDv5QUcs0izVS1sMKB6/SDJjtLXBMERW+LkrmjBsnB
 6+CMmwNZUIzugzBgSPBVozdNQQYjAXL7LNV6Q9YQhE0RK0+10fLLxmpovJR62bRf
 PWlg23+n4+E=
 =jXKM
 -----END PGP SIGNATURE-----

Merge tag 'md/4.1-rc4-fixes' of git://neil.brown.name/md

Pull md bugfixes from Neil Brown:
 "I have a few more raid5 bugfixes pending, but I want them to get a bit
  more review first.  In the meantime:

   - one serious RAID0 data corruption - caused by recent bugfix that
     wasn't reviewed properly.

   - one raid5 fix in new code (a couple more of those to come).

   - one little fix to stop static analysis complaining about silly rcu
     annotation"

* tag 'md/4.1-rc4-fixes' of git://neil.brown.name/md:
  md/bitmap: remove rcu annotation from pointer arithmetic.
  md/raid0: fix restore to sector variable in raid0_make_request
  raid5: fix broken async operation chain
2015-05-22 15:10:07 -07:00
Linus Torvalds 1d82b0baf9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "Updates for the input subsystem.

  The main change is that we tell joydev not to touch "absolute mice",
  such as VMware virtual mouse, as that produced bad result (cursor
  stuck in upper right corner) with games"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: smtpe-ts - wait 50mS until polling for pen-up
  Input: smtpe-ts - use msecs_to_jiffies() instead of HZ
  Input: joydev - don't classify the vmmouse as a joystick
  Input: vmmouse - do not reference non-existing version of X driver
  Input: alps - fix finger jumps on lifting 2 fingers on v7 touchpad
  Input: elantech - fix semi-mt protocol for v3 HW
  Input: sx8654 - fix memory allocation check
2015-05-22 14:49:55 -07:00
Linus Torvalds 2a058f388d Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull another crypto fix from Herbert Xu:
 "Fix ICV corruption in s390/ghash when the same tfm is used by more
  than one thread"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: s390/ghash - Fix incorrect ghash icv buffer handling.
2015-05-22 14:26:36 -07:00
Eric Dumazet 93a33a584e bridge: fix lockdep splat
Following lockdep splat was reported :

[   29.382286] ===============================
[   29.382315] [ INFO: suspicious RCU usage. ]
[   29.382344] 4.1.0-0.rc0.git11.1.fc23.x86_64 #1 Not tainted
[   29.382380] -------------------------------
[   29.382409] net/bridge/br_private.h:626 suspicious
rcu_dereference_check() usage!
[   29.382455]
               other info that might help us debug this:

[   29.382507]
               rcu_scheduler_active = 1, debug_locks = 0
[   29.382549] 2 locks held by swapper/0/0:
[   29.382576]  #0:  (((&p->forward_delay_timer))){+.-...}, at:
[<ffffffff81139f75>] call_timer_fn+0x5/0x4f0
[   29.382660]  #1:  (&(&br->lock)->rlock){+.-...}, at:
[<ffffffffa0450dc1>] br_forward_delay_timer_expired+0x31/0x140
[bridge]
[   29.382754]
               stack backtrace:
[   29.382787] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
4.1.0-0.rc0.git11.1.fc23.x86_64 #1
[   29.382838] Hardware name: LENOVO 422916G/LENOVO, BIOS A1KT53AUS 04/07/2015
[   29.382882]  0000000000000000 3ebfc20364115825 ffff880666603c48
ffffffff81892d4b
[   29.382943]  0000000000000000 ffffffff81e124e0 ffff880666603c78
ffffffff8110bcd7
[   29.383004]  ffff8800785c9d00 ffff88065485ac58 ffff880c62002800
ffff880c5fc88ac0
[   29.383065] Call Trace:
[   29.383084]  <IRQ>  [<ffffffff81892d4b>] dump_stack+0x4c/0x65
[   29.383130]  [<ffffffff8110bcd7>] lockdep_rcu_suspicious+0xe7/0x120
[   29.383178]  [<ffffffffa04520f9>] br_fill_ifinfo+0x4a9/0x6a0 [bridge]
[   29.383225]  [<ffffffffa045266b>] br_ifinfo_notify+0x11b/0x4b0 [bridge]
[   29.383271]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383320]  [<ffffffffa0450de8>]
br_forward_delay_timer_expired+0x58/0x140 [bridge]
[   29.383371]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383416]  [<ffffffff8113a033>] call_timer_fn+0xc3/0x4f0
[   29.383454]  [<ffffffff81139f75>] ? call_timer_fn+0x5/0x4f0
[   29.383493]  [<ffffffff8110a90f>] ? lock_release_holdtime.part.29+0xf/0x200
[   29.383541]  [<ffffffffa0450d90>] ? br_hold_timer_expired+0x70/0x70 [bridge]
[   29.383587]  [<ffffffff8113a6a4>] run_timer_softirq+0x244/0x490
[   29.383629]  [<ffffffff810b68cc>] __do_softirq+0xec/0x670
[   29.383666]  [<ffffffff810b70d5>] irq_exit+0x145/0x150
[   29.383703]  [<ffffffff8189f506>] smp_apic_timer_interrupt+0x46/0x60
[   29.383744]  [<ffffffff8189d523>] apic_timer_interrupt+0x73/0x80
[   29.383782]  <EOI>  [<ffffffff816f131f>] ? cpuidle_enter_state+0x5f/0x2f0
[   29.383832]  [<ffffffff816f131b>] ? cpuidle_enter_state+0x5b/0x2f0

Problem here is that br_forward_delay_timer_expired() is a timer
handler, calling br_ifinfo_notify() which assumes either rcu_read_lock()
or RTNL are held.

Simplest fix seems to add rcu read lock section.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: Dominick Grift <dac.override@gmail.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 16:23:56 -04:00
Arun Parameswaran f96dee13b8 net: core: 'ethtool' issue with querying phy settings
When trying to configure the settings for PHY1, using commands
like 'ethtool -s eth0 phyad 1 speed 100', the 'ethtool' seems to
modify other settings apart from the speed of the PHY1, in the
above case.

The ethtool seems to query the settings for PHY0, and use this
as the base to apply the new settings to the PHY1. This is
causing the other settings of the PHY 1 to be wrongly
configured.

The issue is caused by the '_ethtool_get_settings()' API, which
gets called because of the 'ETHTOOL_GSET' command, is clearing
the 'cmd' pointer (of type 'struct ethtool_cmd') by calling
memset. This clears all the parameters (if any) passed for the
'ETHTOOL_GSET' cmd. So the driver's callback is always invoked
with 'cmd->phy_address' as '0'.

The '_ethtool_get_settings()' is called from other files in the
'net/core'. So the fix is applied to the 'ethtool_get_settings()'
which is only called in the context of the 'ethtool'.

Signed-off-by: Arun Parameswaran <aparames@broadcom.com>
Reviewed-by: Ray Jui <rjui@broadcom.com>
Reviewed-by: Scott Branden <sbranden@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 16:14:17 -04:00
Thadeu Lima de Souza Cascardo 47cc84ce0c bridge: fix parsing of MLDv2 reports
When more than a multicast address is present in a MLDv2 report, all but
the first address is ignored, because the code breaks out of the loop if
there has not been an error adding that address.

This has caused failures when two guests connected through the bridge
tried to communicate using IPv6. Neighbor discoveries would not be
transmitted to the other guest when both used a link-local address and a
static address.

This only happens when there is a MLDv2 querier in the network.

The fix will only break out of the loop when there is a failure adding a
multicast address.

The mdb before the patch:

dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp

After the patch:

dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::fb temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
dev ovirtmgmt port bond0.86 grp ff02::d temp
dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
dev ovirtmgmt port bond0.86 grp ff02::16 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp

Fixes: 08b202b672 ("bridge br_multicast: IPv6 MLD support.")
Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 15:08:20 -04:00
Nathan Sullivan 9eeb516139 ARM: zynq: DT: Use the zynq binding with macb
Use the new zynq binding for macb ethernet, since it will disable half
duplex gigabit like the Zynq TRM says to do.

Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22 15:04:33 -04:00