Граф коммитов

535482 Коммитов

Автор SHA1 Сообщение Дата
Roopa Prabhu 3dcb615e68 af_mpls: add null dev check in find_outdev
This patch adds null dev check for the 'cfg->rc_via_table ==
NEIGH_LINK_TABLE or dev_get_by_index() failed' case

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:03:58 -07:00
David S. Miller 3c62181840 Merge branch 'test-bpf-next'
Nicolas Schichan says:

====================
test_bpf improvements

Please find below the patch series with my latest changes to test_bpf.

The first patch checks for unexpected NULL generated skbs before
running the filter.

The second patch adds the possibility for tests to generate fragmented
skbs.

The third patch tests LD_ABS and LD_IND on fragmented skbs.

The fourth patch adds the possibility to restrict the tests being run
by specifying the name/id/range of the test(s) to run via module
parameters.

The fifth patch tests LD_ABS and LD_IND on non fragmented skbs with
various sizes and alignments.

The sixth and final patch checks that the interpreter or JIT correctly
resets A and X to 0.

This serie is against today's net-next tree.

Changes in V2:

* move declaration of 'ptr' in if() block in patch 2/6.

* fix various typos in patch 4/6

* rework default init of test_range array and cleanup exclude_test()
  return condition in patch 4/6.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:02:32 -07:00
Nicolas Schichan 86bf1721b2 test_bpf: add tests checking that JIT/interpreter sets A and X to 0.
It is mandatory for the JIT or interpreter to reset the A and X
registers to 0 before running the filter. Check that it is the case on
various ALU and JMP instructions.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:02:32 -07:00
Nicolas Schichan 08fcb08fc0 test_bpf: add more tests for LD_ABS and LD_IND.
This exerces the LD_ABS and LD_IND instructions for various sizes and
alignments. This also checks that X when used as an offset to a
BPF_IND instruction first in a filter is correctly set to 0.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:02:32 -07:00
Nicolas Schichan d2648d4e26 test_bpf: add module parameters to filter the tests to run.
When developping on the interpreter or a particular JIT, it can be
interesting to restrict the tests list to a specific test or a
particular range of tests.

This patch adds the following module parameters to the test_bpf module:

* test_name=<string>: only the specified named test will be run.

* test_id=<number>: only the test with the specified id will be run
  (see the output of test_bpf without parameters to get the test id).

* test_range=<number>,<number>: only the tests within IDs in the
  specified id range are run (see the output of test_bpf without
  parameters to get the test ids).

Any invalid range, test id or test name will result in -EINVAL being
returned and no tests being run.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:02:32 -07:00
Nicolas Schichan 2cf1ad7593 test_bpf: test LD_ABS and LD_IND instructions on fragmented skbs.
These new tests exercise various load sizes and offsets crossing the
head/fragment boundary.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:02:32 -07:00
Nicolas Schichan bac142acb9 test_bpf: allow tests to specify an skb fragment.
This introduce a new test->aux flag (FLAG_SKB_FRAG) to tell the
populate_skb() function to add a fragment to the test skb containing
the data specified in test->frag_data).

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:02:31 -07:00
Nicolas Schichan e34684f88e test_bpf: avoid oopsing the kernel when generate_test_data() fails.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:02:31 -07:00
David S. Miller c71b5ad06e Merge branch 'mlx5e-next'
Amir Vadai says:

====================
net/mlx5e: Driver updates 04-Aug-2015

This patchset introduces two features to the ConnectX-4 driver: Patch 8/8
("Support physical port counters") exposes some hardware counters through
ethtool. Rest of the patches are preparation and usage of what we call
light-weight netdev open/close. Some flows that used to be in the ndo_open/stop
are moved to the PCI probe/remove flows - i.e. we will make the netdev
open/close operations more "light-weight".

The benefits of this change are:
1) Reduce the execution time of the stop/open operations.
2) Avoid saving SW shadows of resource configurations that must
   persist through stop/open operations (e.g flow table steering
   rules), and avoid deleting/applying them from/to the device upon
   netdev stop/open.
3) Avoid synchronizing threads that access those resources with the
   netdev stop/open threads.

Instead of create/destroy the resource during netdev open/stop, This patchset
changes the behavior such that upon netdev stop, traffic is redirected to a
"Drop RQ" (a RQ that silently drops, at the NIC HW level all incoming traffic).
After redirecting the traffic, RX/TX software resources could be destroyed.
During netdev open, the RX/TX rings are created and traffic is redirected to
the RX rings.

Patchset was applied and tested over commit ba7591d ("ebpf: add skb->hash to
offset map for usage in {cls, act}_bpf or filters")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:59 -07:00
Gal Pressman efea389d3c net/mlx5_core: Support physical port counters
Added physical port counters in the following standard formats to
ethtool statistics:
  - IEEE 802.3
  - RFC2863
  - RFC2819

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:59 -07:00
Achiad Shochat 9b37b07fcb net/mlx5e: Take advantage of the light-weight netdev open/stop
Now that TIRs, TISs and flow tables are kept alive while the netdev is
stopped (after executing ndo_stop()) we can do the following
improvements:

- Obsolete the active_vlans SW shadow.
- Do not delete/add flow table rules upon ndo_stop/open.
  In addition to simplifying the flow, this change also fastens
  the ndo_open/close operations.
- Obsolete synchronization of threads accessing the flow tables
  with the netdev stop/open threads.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:59 -07:00
Achiad Shochat 1cefa326ff net/mlx5e: Disable async events before unregister_netdev()
It does not make sense to allow events while the netdev is
unregistered.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:58 -07:00
Achiad Shochat 40ab6a6ebe net/mlx5e: Rename/move functions following the ndo_stop flow change
Rename some functions that used to be invoked upon ndo_open/stop and
are now invoked upon create/destroy_netdev() in order to better hint
their place in the flow.

Change some functions location in the file so that functions involved
in ndo_open/stop flow will not be interleaved with other functions.

This is a cosmetic change, no logical change here.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:58 -07:00
Achiad Shochat 5c50368f38 net/mlx5e: Light-weight netdev open/stop
Create/destroy TIRs, TISs and flow tables upon PCI probe/remove rather
than upon the netdev ndo_open/stop.

Upon ndo_stop(), redirect all RX traffic to the (lately introduced)
"Drop RQ" and then close only the RX/TX rings, leaving the TIRs,
TISs and flow tables alive.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:58 -07:00
Achiad Shochat d9eea403ca net/mlx5_core: Introduce access function to modify RSS/LRO params
To be used by the mlx5 Eth driver in following commit.

This is in preparation for netdev "light-weight" open/stop flow
change described in previous commit.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:58 -07:00
Achiad Shochat 50cfa25aba net/mlx5e: Introduce the "Drop RQ"
RX traffic routed to this RQ will be silently dropped, at the NIC HW
level.

This is in preparation for netdev "light-weight" open/stop flow
change described in previous commit.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:58 -07:00
Achiad Shochat 4cbeaff54f net/mlx5e: Unify the RX flow
Generally an RX packet flows through the following objects:
Flow table --> TIR --> RQT --> RQ

Where:
- TIR stands for "Transport Interface Receive", defining the RSS and
  LRO paramaters.
- RQT stands for "RQ Table", implementing the RSS indirection table.
- RQ stands for "Receive Queue"

For flows that do not need LRO, nor RSS, the driver made a shortcut to
the above RX flow by pointing to the RQ directly from the TIR, yielding
this flow:
Flow table --> TIR --> RQ

In this commit we remove this shortcut by "inserting" a single-RQ RQT
between the TIR and the RQ, i.e RX packets will reach the same RQ but
will go through an RQT of size 1, pointing to just a single RQ.

This way the RX traffic re-direction to/from the "Drop RQ" will be more
uniform (AKA "one flow"), as it will involve only RQTs re-direction and
no TIRs re-direction.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 22:00:58 -07:00
David S. Miller adc4cc99b7 Merge branch 'cpsw-next'
Mugunthan V N says:

====================
CPSW interrupt handling cleanup and performance improvement

This patch series removes the irq controller disable interrupt and
adding a napi for tx event handling which improves the performance by
~180Mbps on dra7-evm

[  5] local 192.168.10.116 port 5001 connected with 192.168.10.165 port 44176
[  5]  0.0-60.0 sec  1.48 GBytes   210 Mbits/sec
[  4] local 192.168.10.116 port 5001 connected with 192.168.10.165 port 33257
[  4]  0.0-60.0 sec  2.71 GBytes   386 Mbits/sec

Changes from initial version:
* Added a patch to have napi only for first interface as there is
  no use of having seperate napis for each interface as the
  interrupt is shared by both interface and only one napi is
  scheduled for each interrupt.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:59:27 -07:00
Mugunthan V N 32a7432c0f drivers: net: cpsw: add separate napi for tx
Instead of processing tx events in isr adding separate napi for
tx which improves performance by ~180Mbps with
omap2plus_defconfig on DRA74x platform. Also cleaning up rx napis
by renaming to napi_rx for better understanding the code.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:59:27 -07:00
Mugunthan V N d354eb85d6 drivers: net: cpsw: dual_emac: simplify napi usage
Since interrupt is shared between the two ethernet interface and
in isr only one napi is scheduled at an instance so having two
napis doesn't make any difference. So making napi also as a
common resource for the dual ethernet interfaces.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:59:27 -07:00
Mugunthan V N 870915feab drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself
CPSW interrupts can be disabled by masking CPSW interrupts and
clearing interrupt by writing appropriate EOI. So removing all
disable_irq/enable_irq as discussed in [1]

[1] http://patchwork.ozlabs.org/patch/492741/

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:59:27 -07:00
Dan Carpenter 5a9348b54d mpls: small cleanup in inet/inet6_fib_lookup_dev()
We recently changed this code from returning NULL to returning ERR_PTR.
There are some left over NULL assignments which we can remove.  We can
preserve the error code from ip_route_output() instead of always
returning -ENODEV.  Also these functions use a mix of gotos and direct
returns.  There is no cleanup necessary so I changed the gotos to
direct returns.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:56:55 -07:00
Herbert Xu a0a2a66024 net: Fix skb_set_peeked use-after-free bug
The commit 738ac1ebb9 ("net: Clone
skb before setting peeked flag") introduced a use-after-free bug
in skb_recv_datagram.  This is because skb_set_peeked may create
a new skb and free the existing one.  As it stands the caller will
continue to use the old freed skb.

This patch fixes it by making skb_set_peeked return the new skb
(or the old one if unchanged).

Fixes: 738ac1ebb9 ("net: Clone skb before setting peeked flag")
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:55:47 -07:00
David S. Miller 02b5242847 Merge branch 'bnx2x-cnic-bnx2fc-bd-support'
Yuval Mintz says:

====================
bnx2x, cnic, bnx2fc: add support for BD

Commit 230d00eb4b ("bnx2x: new Multi-function mode - BD") added support
for a new multi-function mode, but it added only the support required by
bnx2x for L2 interfaces.

This adds the required changes to support the new multi-function mode in
the offloaded storage protocols.

Dave,

Please consider applying this series to `net-next'.

Do notice that this involves non-networking driver changes -
but sending this as a single series seemed like the best approach as
we had to have bnx2x changes to support the new functionality.
If this is problematic, please tell us what's the preferred solution here.

Changes from previous versions
------------------------------

 - From v1 - no actual changes; v1 failed to reach netdev so in order to
   keep things in line I've termed this one v2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:54:13 -07:00
Joe Carnuccio 2971ff67bd bnx2fc: Read npiv table from nvram and create vports.
Signed-off-by: Joe Carnuccio <joe.carnuccio@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:54:13 -07:00
Yuval Mintz 97ac4ef78e bnx2x: Add BD support for storage
Commit 230d00eb4b ("bnx2x: new Multi-function mode - BD") adds support
for the new mode in bnx2x. This expands this support by implementing
APIs required by our storage drivers to support that mode.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:54:13 -07:00
Adheer Chandravanshi 9b8d504402 cnic: Add the interfaces to get FC-NPIV table.
Signed-off-by: Adheer Chandravanshi <adheer.chandravanshi@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:54:12 -07:00
Tej Parkash eddb755428 cnic: Populate upper layer driver state in MFW
Signed-off-by: Tej Parkash <tej.parkash@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:54:12 -07:00
Scott Feldman ff14702844 rocker: use netdev_err after register_netdev
After successful register_netdev, we can use netdev_err rather the more
generic dev_err.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:47:57 -07:00
Scott Feldman 6c4f7780a5 rocker: NULL port if port probe fails
Set port to NULL if port probe fails so we don't try to remove partially
initialized port on port probe err cleanup path.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 21:47:57 -07:00
Linus Torvalds 49d7c6559b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fix from David Miller:
 "FPU register corruption bug fix"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix userspace FPU register corruptions.
2015-08-07 05:28:24 +03:00
Linus Torvalds 8664b90bae Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "21 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
  writeback: fix initial dirty limit
  mm/memory-failure: set PageHWPoison before migrate_pages()
  mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
  mm/memory-failure: give up error handling for non-tail-refcounted thp
  mm/memory-failure: fix race in counting num_poisoned_pages
  mm/memory-failure: unlock_page before put_page
  ipc: use private shmem or hugetlbfs inodes for shm segments.
  mm: initialize hotplugged pages as reserved
  ocfs2: fix shift left overflow
  kthread: export kthread functions
  fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
  lib/iommu-common.c: do not use 0xffffffffffffffffl for computing align_mask
  mm/slub: allow merging when SLAB_DEBUG_FREE is set
  signalfd: fix information leak in signalfd_copyinfo
  signal: fix information leak in copy_siginfo_to_user
  signal: fix information leak in copy_siginfo_from_user32
  ocfs2: fix BUG in ocfs2_downconvert_thread_do_work()
  fs, file table: reinit files_stat.max_files after deferred memory initialisation
  mm, meminit: replace rwsem with completion
  mm, meminit: allow early_pfn_to_nid to be used during runtime
  ...
2015-08-07 05:20:40 +03:00
David S. Miller 44922150d8 sparc64: Fix userspace FPU register corruptions.
If we have a series of events from userpsace, with %fprs=FPRS_FEF,
like follows:

ETRAP
	ETRAP
		VIS_ENTRY(fprs=0x4)
		VIS_EXIT
		RTRAP (kernel FPU restore with fpu_saved=0x4)
	RTRAP

We will not restore the user registers that were clobbered by the FPU
using kernel code in the inner-most trap.

Traps allocate FPU save slots in the thread struct, and FPU using
sequences save the "dirty" FPU registers only.

This works at the initial trap level because all of the registers
get recorded into the top-level FPU save area, and we'll return
to userspace with the FPU disabled so that any FPU use by the user
will take an FPU disabled trap wherein we'll load the registers
back up properly.

But this is not how trap returns from kernel to kernel operate.

The simplest fix for this bug is to always save all FPU register state
for anything other than the top-most FPU save area.

Getting rid of the optimized inner-slot FPU saving code ends up
making VISEntryHalf degenerate into plain VISEntry.

Longer term we need to do something smarter to reinstate the partial
save optimizations.  Perhaps the fundament error is having trap entry
and exit allocate FPU save slots and restore register state.  Instead,
the VISEntry et al. calls should be doing that work.

This bug is about two decades old.

Reported-by: James Y Knight <jyknight@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 19:13:25 -07:00
Lucas Stach 14d2b7c1a9 net: fec: fix initial runtime PM refcount
The clocks are initially active and thus the device is marked active.
This still keeps the PM refcount at 0, the pm_runtime_put_autosuspend()
call at the end of probe then leaves us with an invalid refcount of -1,
which in turn leads to the device staying in suspended state even though
netdev open had been called.

Fix this by initializing the refcount to be coherent with the initial
device status.

Fixes:
8fff755e9f (net: fec: Ensure clocks are enabled while using mdio bus)

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-06 18:53:25 -07:00
Linus Torvalds a58997e1a6 Merge branch 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux
Pull amdgpu fixes from Alex Deucher:
 "Just a few amdgpu fixes to make sure we report the proper firmware
  information and number of render buffers to userspace and a typo in a
  debugging function"

[ Pulling directly from Alex since Dave Airlie is on vacation  - Linus ]

* 'drm-fixes-4.2' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: set fw_version and feature_version for smu fw loading
  drm/amdgpu: add feature version for SDMA ucode
  drm/amdgpu: add feature version for RLC and MEC v2
  drm/amdgpu: increment queue when iterating on this variable.
  drm/amdgpu: fix rb setting for CZ
2015-08-07 04:51:14 +03:00
Linus Torvalds ebc90be6b9 Merge branch 'drm-tda998x-fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull TDA998x i2c driver fixes from Russell King:
 "This fixes the double-checksumming of the AVI infoframe which was
  resulting in the checksum always being zero.  It went unnoticed as
  none of my HDMI devices had a problem with this"

[ Pulling directly from rmk since Dave Airlie is on vacation  - Linus ]

* 'drm-tda998x-fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  drm/i2c: tda998x: fix bad checksum of the HDMI AVI infoframe
2015-08-07 04:48:46 +03:00
Rabin Vincent a50fcb512d writeback: fix initial dirty limit
The initial value of global_wb_domain.dirty_limit set by
writeback_set_ratelimit() is zeroed out by the memset in
wb_domain_init().

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:42 +03:00
Naoya Horiguchi 4491f71260 mm/memory-failure: set PageHWPoison before migrate_pages()
Now page freeing code doesn't consider PageHWPoison as a bad page, so by
setting it before completing the page containment, we can prevent the
error page from being reused just after successful page migration.

I added TTU_IGNORE_HWPOISON for try_to_unmap() to make sure that the
page table entry is transformed into migration entry, not to hwpoison
entry.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:42 +03:00
Naoya Horiguchi f4c18e6f7b mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_*
The race condition addressed in commit add05cecef ("mm: soft-offline:
don't free target page in successful page migration") was not closed
completely, because that can happen not only for soft-offline, but also
for hard-offline.  Consider that a slab page is about to be freed into
buddy pool, and then an uncorrected memory error hits the page just
after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags &
PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not
necessary because the data on the affected page is not consumed.

To solve it, this patch drops __PG_HWPOISON from page flag checks at
allocation/free time.  I think it's justified because __PG_HWPOISON
flags is defined to prevent the page from being reused, and setting it
outside the page's alloc-free cycle is a designed behavior (not a bug.)

For recent months, I was annoyed about BUG_ON when soft-offlined page
remains on lru cache list for a while, which is avoided by calling
put_page() instead of putback_lru_page() in page migration's success
path.  This means that this patch reverts a major change from commit
add05cecef about the new refcounting rule of soft-offlined pages, so
"reuse window" revives.  This will be closed by a subsequent patch.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:42 +03:00
Naoya Horiguchi 98ed2b0052 mm/memory-failure: give up error handling for non-tail-refcounted thp
"non anonymous thp" case is still racy with freeing thp, which causes
panic due to put_page() for refcount-0 page.  It seems that closing up
this race might be hard (and/or not worth doing,) so let's give up the
error handling for this case.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Naoya Horiguchi a209ef09af mm/memory-failure: fix race in counting num_poisoned_pages
When memory_failure() is called on a page which are just freed after
page migration from soft offlining, the counter num_poisoned_pages is
raised twi= ce.  So let's fix it with using TestSetPageHWPoison.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Naoya Horiguchi a09233f3e1 mm/memory-failure: unlock_page before put_page
Recently I addressed a few of hwpoison race problems and the patches are
merged on v4.2-rc1.  It made progress, but unfortunately some problems
still remain due to less coverage of my testing.  So I'm trying to fix
or avoid them in this series.

One point I'm expecting to discuss is that patch 4/5 changes the page
flag set to be checked on free time.  In current behavior, __PG_HWPOISON
is not supposed to be set when the page is freed.  I think that there is
no strong reason for this behavior, and it causes a problem hard to fix
only in error handler side (because __PG_HWPOISON could be set at
arbitrary timing.) So I suggest to change it.

With this patchset, hwpoison stress testing in official mce-test
testsuite (which previously failed) passes.

This patch (of 5):

In "just unpoisoned" path, we do put_page and then unlock_page, which is
a wrong order and causes "freeing locked page" bug.  So let's fix it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dean Nelson <dnelson@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Stephen Smalley e1832f2923 ipc: use private shmem or hugetlbfs inodes for shm segments.
The shm implementation internally uses shmem or hugetlbfs inodes for shm
segments.  As these inodes are never directly exposed to userspace and
only accessed through the shm operations which are already hooked by
security modules, mark the inodes with the S_PRIVATE flag so that inode
security initialization and permission checking is skipped.

This was motivated by the following lockdep warning:

  ======================================================
   [ INFO: possible circular locking dependency detected ]
   4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G        W
  -------------------------------------------------------
   httpd/1597 is trying to acquire lock:
   (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
   but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
   which lock already depends on the new lock.
   the existing dependency chain (in reverse order) is:
   -> #3 (&mm->mmap_sem){++++++}:
        lock_acquire+0xc7/0x270
        __might_fault+0x7a/0xa0
        filldir+0x9e/0x130
        xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
        xfs_readdir+0x1b4/0x330 [xfs]
        xfs_file_readdir+0x2b/0x30 [xfs]
        iterate_dir+0x97/0x130
        SyS_getdents+0x91/0x120
        entry_SYSCALL_64_fastpath+0x12/0x76
   -> #2 (&xfs_dir_ilock_class){++++.+}:
        lock_acquire+0xc7/0x270
        down_read_nested+0x57/0xa0
        xfs_ilock+0x167/0x350 [xfs]
        xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
        xfs_attr_get+0xbd/0x190 [xfs]
        xfs_xattr_get+0x3d/0x70 [xfs]
        generic_getxattr+0x4f/0x70
        inode_doinit_with_dentry+0x162/0x670
        sb_finish_set_opts+0xd9/0x230
        selinux_set_mnt_opts+0x35c/0x660
        superblock_doinit+0x77/0xf0
        delayed_superblock_init+0x10/0x20
        iterate_supers+0xb3/0x110
        selinux_complete_init+0x2f/0x40
        security_load_policy+0x103/0x600
        sel_write_load+0xc1/0x750
        __vfs_write+0x37/0x100
        vfs_write+0xa9/0x1a0
        SyS_write+0x58/0xd0
        entry_SYSCALL_64_fastpath+0x12/0x76
  ...

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Morten Stevens <mstevens@fedoraproject.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Mel Gorman e298ff75f1 mm: initialize hotplugged pages as reserved
Commit 92923ca3aa ("mm: meminit: only set page reserved in the
memblock region") broke memory hotplug which expects the memmap for
newly added sections to be reserved until onlined by
online_pages_range().  This patch marks hotplugged pages as reserved
when adding new zones.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: David Vrabel <david.vrabel@citrix.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Joseph Qi 32e5a2a2be ocfs2: fix shift left overflow
When using a large volume, for example 9T volume with 2T already used,
frequent creation of small files with O_DIRECT when the IO is not
cluster aligned may clear sectors in the wrong place.  This will cause
filesystem corruption.

This is because p_cpos is a u32.  When calculating the corresponding
sector it should be converted to u64 first, otherwise it may overflow.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
David Kershner 18896451ea kthread: export kthread functions
The s-Par visornic driver, currently in staging, processes a queue being
serviced by the an s-Par service partition.  We can get a message that
something has happened with the Service Partition, when that happens, we
must not access the channel until we get a message that the service
partition is back again.

The visornic driver has a thread for processing the channel, when we get
the message, we need to be able to park the thread and then resume it
when the problem clears.

We can do this with kthread_park and unpark but they are not exported
from the kernel, this patch exports the needed functions.

Signed-off-by: David Kershner <david.kershner@unisys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Weinberger <richard.weinberger@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Jan Kara 8f2f3eb59d fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()
fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
drops mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference free
memory.

Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list.  This method
is safe even when entries from the list can disappear once we drop the
lock.

Signed-off-by: Jan Kara <jack@suse.com>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Sowmini Varadhan 447f6a95a9 lib/iommu-common.c: do not use 0xffffffffffffffffl for computing align_mask
Using a 64 bit constant generates "warning: integer constant is too
large for 'long' type" on 32 bit platforms.  Instead use ~0ul and
BITS_PER_LONG.

Detected by Andrew Morton on ARMD.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Konstantin Khlebnikov 3e810ae2db mm/slub: allow merging when SLAB_DEBUG_FREE is set
This patch fixes creation of new kmem-caches after enabling
sanity_checks for existing mergeable kmem-caches in runtime: before that
patch creation fails because unique name in sysfs already taken by
existing kmem-cache.

Unlike other debug options this doesn't change object layout and could
be enabled and disabled at any time.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Amanieu d'Antras 3ead7c52bd signalfd: fix information leak in signalfd_copyinfo
This function may copy the si_addr_lsb field to user mode when it hasn't
been initialized, which can leak kernel stack data to user mode.

Just checking the value of si_code is insufficient because the same
si_code value is shared between multiple signals.  This is solved by
checking the value of si_signo in addition to si_code.

Signed-off-by: Amanieu d'Antras <amanieu@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00