Граф коммитов

87384 Коммитов

Автор SHA1 Сообщение Дата
Andrey Vagin c62cce2cae net: add an ioctl to get a socket network namespace
Each socket operates in a network namespace where it has been created,
so if we want to dump and restore a socket, we have to know its network
namespace.

We have a socket_diag to get information about sockets, it doesn't
report sockets which are not bound or connected.

This patch introduces a new socket ioctl, which is called SIOCGSKNS
and used to get a file descriptor for a socket network namespace.

A task must have CAP_NET_ADMIN in a target network namespace to
use this ioctl.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-31 10:56:36 -04:00
David S. Miller 0a6ce1e3c1 Mellanox ConnectX-4/Connect-IB shared code (IB & ETH part)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYFfxWAAoJEORje4g2clinr4MQAMUMoO4akLhppyUuLiT2ZZHc
 KshFwUF5RnZq6c2qXSxTVhfajyG6q71EwQNGaAMeGsjoCXM+u99Adgp/lYkXxbzY
 e4W3gCvjGWIik5d3JRVq07XutVpeJIc/Qvvc5zc1lUYNR5f71iBw538eG9ic4PXi
 4/CpRvcsa8Z9sbtKPcjHwQQRd4ewx/KAD6QyOsVz9GgkBeNMYag3SO731DYSkjRC
 MYK85arNC1JUE/MHQKIfYQvjiJVfEyt2FvC8v9tW+bhzP6dAzxRY0yd8ZFJtCiYH
 GFGy8vdeCA/0dFRD5cYKPKBiFwUbRC8bt2lLC5ZoUic2nZ23LlO67uDkaPDRYckt
 oyhErFRX6Q/goqKFCI4tLUoSBF1bhy9EnbWyOWmcW7qpXRD3VCclS0Ctr++yJnv2
 bhhlID56f+dX+rnW/OAERrk8MdVHo5xBUzQ8ZAAF3WDP9LqW+qlYVrEvrFqFIeFM
 OCGUbW2xsZaHMZRyx0K6068hy8O4EujjgC9PARi65rrZAAwxlDm4ElJEYvzXZVXA
 YoMeXiZGrhoj7+h8OorV0TyB+7mUgxFNlq0tCoi193QS+zIuQqf3XYNmuQGCUflF
 YdzZs7/9LpANN4e5yDTCq3CIr8yYv9sRrdnSX0iShvbFBAKClLxpjj2xkpayHFVR
 8CRvlb5O1v1fIwSn2z/I
 =D5Ix
 -----END PGP SIGNATURE-----

Merge tag 'shared-for-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma

Saeed Mahameed says:

====================
Mellanox mlx5 core driver updates 2016-10-25

This series contains some updates and fixes of mlx5 core and
IB drivers with the addition of two features that demand
new low level commands and infrastructure updates.
 - SRIOV VF max rate limit support
 - mlx5e tc support for FWD rules with counter.

Needed for both net and rdma subsystems.

Updates and Fixes:
From Saeed Mahameed (2):
  - mlx5 IB: Skip handling unknown mlx5 events
  - Add ConnectX-5 PCIe 4.0 VF device ID

From Artemy Kovalyov (2):
  - Update struct mlx5_ifc_xrqc_bits
  - Ensure SRQ physical address structure endianness

From Eugenia Emantayev (1):
  - Fix length of async_event_mask

New Features:
From Mohamad Haj Yahia (3): mlx5 SRIOV VF max rate limit support
  - Introduce TSAR manipulation firmware commands
  - Introduce E-switch QoS management
  - Add SRIOV VF max rate configuration support

From Mark Bloch (7): mlx5e Tc support for FWD rule with counter
  - Don't unlock fte while still using it
  - Use fte status to decide on firmware command
  - Refactor find_flow_rule
  - Group similar rules under the same fte
  - Add multi dest support
  - Add option to add fwd rule with counter
  - mlx5e tc support for FWD rule with counter
  Mark here fixed two trivial issues with the flow steering core, and did
  some refactoring in the flow steering API to support adding mulit destination
  rules to the same hardware flow table entry at once.  In the last two patches
  added the ability to populate a flow rule with a flow counter to the same flow entry.

V2: Dropped some patches that added new structures without adding any usage of them.
    Added SRIOV VF max rate configuration support patch that introduces
    the usage of the TSAR infrastructure.
    Added flow steering fixes and refactoring in addition to mlx5 tc
    support for forward rule with counter.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-30 17:31:12 -04:00
David S. Miller 27058af401 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple overlapping changes.

For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-30 12:42:58 -04:00
Mark Bloch 74491de937 net/mlx5: Add multi dest support
Currently when calling mlx5_add_flow_rule we accept
only one flow destination, this commit allows to pass
multiple destinations.

This change forces us to change the return structure to a more
flexible one. We introduce a flow handle (struct mlx5_flow_handle),
it holds internally the number for rules created and holds an array
where each cell points the to a flow rule.

From the consumers (of mlx5_add_flow_rule) point of view this
change is only cosmetic and requires only to change the type
of the returned value they store.

From the core point of view, we now need to use a loop when
allocating and deleting rules (e.g given to us a flow handler).

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-10-30 15:43:17 +02:00
Mohamad Haj Yahia 813f854053 net/mlx5: Introduce TSAR manipulation firmware commands
TSAR (stands for Transmit Scheduling ARbiter) is a hardware component
that is responsible for selecting the next entity to serve on the
transmit path.
The arbitration defines the QoS policy between the agents connected to
the TSAR.
The TSAR is a consist two main features:
1) BW Allocation between agents:
The TSAR implements a defecit weighted round robin between the agents.
Each agent attached to the TSAR is assigned with a weight and it is
awarded transmission tokens according to this weight.
2) Rate limer per agent:
Each agent attached to the TSAR is (optionally) assigned with a rate
limit.
TSAR will not allow scheduling for an agent exceeding its defined rate
limit.

In this patch we implement the API of manipulating the TSAR.

Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-10-30 15:43:12 +02:00
Artemy Kovalyov dd257efb1e net/mlx5: Ensure SRQ physical address structure endianness
SRQ physical address structure field should be in big-endian format.

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-10-30 15:43:10 +02:00
Artemy Kovalyov 5579e1519b net/mlx5: Update struct mlx5_ifc_xrqc_bits
Update struct mlx5_ifc_xrqc_bits according to last specification

Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2016-10-30 15:43:02 +02:00
Linus Torvalds 2a26d99b25 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Lots of fixes, mostly drivers as is usually the case.

   1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
      Khoroshilov.

   2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
      Pedersen.

   3) Don't put aead_req crypto struct on the stack in mac80211, from
      Ard Biesheuvel.

   4) Several uninitialized variable warning fixes from Arnd Bergmann.

   5) Fix memory leak in cxgb4, from Colin Ian King.

   6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.

   7) Several VRF semantic fixes from David Ahern.

   8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.

   9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.

  10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.

  11) Fix stale link state during failover in NCSCI driver, from Gavin
      Shan.

  12) Fix netdev lower adjacency list traversal, from Ido Schimmel.

  13) Propvide proper handle when emitting notifications of filter
      deletes, from Jamal Hadi Salim.

  14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.

  15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.

  16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.

  17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.

  18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
      Leitner.

  19) Revert a netns locking change that causes regressions, from Paul
      Moore.

  20) Add recursion limit to GRO handling, from Sabrina Dubroca.

  21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.

  22) Avoid accessing stale vxlan/geneve socket in data path, from
      Pravin Shelar"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
  geneve: avoid using stale geneve socket.
  vxlan: avoid using stale vxlan socket.
  qede: Fix out-of-bound fastpath memory access
  net: phy: dp83848: add dp83822 PHY support
  enic: fix rq disable
  tipc: fix broadcast link synchronization problem
  ibmvnic: Fix missing brackets in init_sub_crq_irqs
  ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
  Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
  arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
  net/mlx4_en: Save slave ethtool stats command
  net/mlx4_en: Fix potential deadlock in port statistics flow
  net/mlx4: Fix firmware command timeout during interrupt test
  net/mlx4_core: Do not access comm channel if it has not yet been initialized
  net/mlx4_en: Fix panic during reboot
  net/mlx4_en: Process all completions in RX rings after port goes up
  net/mlx4_en: Resolve dividing by zero in 32-bit system
  net/mlx4_core: Change the default value of enable_qos
  net/mlx4_core: Avoid setting ports to auto when only one port type is supported
  net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
  ...
2016-10-29 20:33:20 -07:00
pravin shelar c6fcc4fc5f vxlan: avoid using stale vxlan socket.
When vxlan device is closed vxlan socket is freed. This
operation can race with vxlan-xmit function which
dereferences vxlan socket. Following patch uses RCU
mechanism to avoid this situation.

Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 20:56:31 -04:00
David S. Miller 32ab0a38f0 Among various cleanups and improvements, we have the following:
* client FILS authentication support in mac80211 (Jouni)
  * AP/VLAN multicast improvements (Michael Braun)
  * config/advertising support for differing beacon intervals on
    multiple virtual interfaces (Purushottam Kushwaha, myself)
  * deprecate the old WDS mode for cfg80211-based drivers, the
    mode is hardly usable since it doesn't support any "modern"
    features like WPA encryption (2003), HT (2009) or VHT (2014),
    I'm not even sure WEP (introduced in 1997) could be done.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYEy/9AAoJEGt7eEactAAd/V0P/0FmHGS8HjlSjm+1p6sbWKbt
 5v8bb3cuKHQiYiUM6euIXql2OYuOEHVAQEpNoPXN9CsfKFYgbIH6yW6d8HtKNedV
 n9lmMy/U6yJX9nYt7yMIQ3kLkbEg+YU58B9Hf47waWXLLSNVumS8rfNBn43EoNQf
 VKWYPWpetsCRIWJ1fnLuxvMHCtOOYCxH+491BUonof32+DKPEAsAbnszZ2ElufTR
 7KNyA3K6leOtTd5Ml52dvLOGNc+h2C83VAMxiShq/6r8OnlX5tPifaubzd9n3m41
 jiJJH/92ESrtF2AaWEm8slcgtcfHS/O7y/FSoV4r0PMSvPTBdjwQ9nqCsbONd831
 vjj6c6YWNxgHPcISX0XcWz+FHnLJdUGaDUtHjAJYw4oH4gaRXwfSw0U+jvdlSMUf
 2CBUArk5f0OEguzwa/5X4Jio3OPPIj4jY/lKplcpLOUu8K2FWTLDuIlww/FHXovs
 rDzTLQeXZkx+MkszkTJN42qSEfOFly91J6OA2Wju+emBqrLIbkAGmvyLVg8U8BQd
 gG7oltgmZ6Xg6fEnUQqpIDO7UJlQ+GXAU04SpNDMv1j/ueUJskxlr3hYM7E9ueQv
 LJcZcVV0RAwNRw52cEsdcYCMLuSMYRrO1OHlkl0wd+x2hFrUCWnVzUgLEUhNBV+c
 ICmNMr96nKhrZI217yzF
 =SjfQ
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2016-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Among various cleanups and improvements, we have the following:
 * client FILS authentication support in mac80211 (Jouni)
 * AP/VLAN multicast improvements (Michael Braun)
 * config/advertising support for differing beacon intervals on
   multiple virtual interfaces (Purushottam Kushwaha, myself)
 * deprecate the old WDS mode for cfg80211-based drivers, the
   mode is hardly usable since it doesn't support any "modern"
   features like WPA encryption (2003), HT (2009) or VHT (2014),
   I'm not even sure WEP (introduced in 1997) could be done.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 17:28:45 -04:00
Eugenia Emantayev 6f2e0d2c3b net/mlx4: Fix firmware command timeout during interrupt test
Currently interrupt test that is part of ethtool selftest runs the
check over all interrupt vectors of the device.
In mlx4_en package part of interrupt vectors are uninitialized since
mlx4_ib doesn't exist. This causes NOP FW command to time out.
Change logic to test current port interrupt vectors only.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 16:23:48 -04:00
Thomas Graf ebb676daa1 bpf: Print function name in addition to function id
The verifier currently prints raw function ids when printing CALL
instructions or when complaining:

	5: (85) call 23
	unknown func 23

print a meaningful function name instead:

	5: (85) call bpf_redirect#23
	unknown func bpf_redirect#23

Moves the function documentation to a single comment and renames all
helpers names in the list to conform to the bpf_ prefix notation so
they can be greped in the kernel source.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 15:55:13 -04:00
David S. Miller 880b583ce1 Just two fixes:
* a fix to process all events while suspending, so any
    potential calls into the driver are done before it is
    suspended
  * small markup fixes for the sphinx documentation conversion
    that's coming into the tree via the doc tree
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYEbSGAAoJEGt7eEactAAdOo4P/A1RyOl2uTspwB4lZFMMA5Tp
 IIwaB+WlNcjGq05C8kemdQJ30AKJUy6aKMTb5FIuFXWLRK6u5uIoQsYe6cRI8WBs
 bgTpo5RMcNGx6dUgFS/XwM2bLXUNXAXOt+43c2kWSjnqQZpQuSXh8/4R8ybZxyZB
 SNyKCzkx1LYPox4Qnufmi+pz/hI8hHRY7gMdyhUWU2uUk9u/nuY0Zxyvj4OFgyow
 SOGasCPoDzT9556hUGyG9M8HmwBiRZUzfzo5mSR2V0a9Tij4ZOvoSkcAUDbj38fZ
 0UPJ48xwUDMF88W+NEO+rrd1IQXtlsHJk6x/mAQpSkEvzymb5BEvraVeTV3+chgo
 fn45ZDP/GbTwqJ3YacSmKgyGwaGPF9XN2+Iuxotp0UeuYxvxwnkWNr0JbRecDPau
 ZL/P2qV7Y3gjy0+6nHafrYIYPuPFZjwPIJ7k6x7mVtO6zfJhtKF57YP/YoFYD78l
 TtCEE5o7lAYkdWKa2nm4Fg16NM897PS/GTM9dQCFtnYLV7ynZPY2TGf4TelqJ3cz
 FU1/vlM0t/jRkkRKIhlmy8re4ASjTEccf//dllz1CDyPdxHY1NmGftftz539s7qW
 NpCvvO0IXDchBT1jeoSMwcMzlZSfQeIKtkp1P0z4lyPiMcmekdgUQ7sb84FhwSa+
 SYRpiZ/Xv6xa5Djhonci
 =toQY
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two fixes:
 * a fix to process all events while suspending, so any
   potential calls into the driver are done before it is
   suspended
 * small markup fixes for the sphinx documentation conversion
   that's coming into the tree via the doc tree
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 15:54:16 -04:00
Florian Westphal b917783c7b flow_dissector: __skb_get_hash_symmetric arg can be const
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 15:10:21 -04:00
Eric Dumazet 5ea8ea2cb7 tcp/dccp: drop SYN packets if accept queue is full
Per listen(fd, backlog) rules, there is really no point accepting a SYN,
sending a SYNACK, and dropping the following ACK packet if accept queue
is full, because application is not draining accept queue fast enough.

This behavior is fooling TCP clients that believe they established a
flow, while there is nothing at server side. They might then send about
10 MSS (if using IW10) that will be dropped anyway while server is under
stress.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 15:09:21 -04:00
Stephen Hemminger e934f68485 Revert "hv_netvsc: report vmbus name in ethtool"
This reverts commit e3f74b841d
("hv_netvsc: report vmbus name in ethtool")'
because of problem introduced by commit f9a56e5d6a0ba
("Drivers: hv: make VMBus bus ids persistent").
This changed the format of the vmbus name and this new format is too
long to fit in the bus_info field of ethtool.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 15:03:14 -04:00
Thomas Graf b15ca182ed netlink: Add nla_memdup() to wrap kmemdup() use on nlattr
Wrap several common instances of:
	kmemdup(nla_data(attr), nla_len(attr), GFP_KERNEL);

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 14:57:42 -04:00
Mohamad Haj Yahia 04c0c1ab38 net/mlx5: PCI error recovery health care simulation
In case that the kernel PCI error handlers are not called, we will
trigger our own recovery flow.

The health work will give priority to the kernel pci error handlers to
recover the PCI by waiting for a small period, if the pci error handlers
are not triggered the manual recovery flow will be executed.

We don't save pci state in case of manual recovery because it will ruin the
pci configuration space and we will lose dma sync.

Fixes: 89d44f0a6c ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 12:00:39 -04:00
Mohamad Haj Yahia 05ac2c0b74 net/mlx5: Fix race between PCI error handlers and health work
Currently there is a race between the health care work and the kernel
pci error handlers because both of them detect the error, the first one
to be called will do the error handling.
There is a chance that health care will disable the pci after resuming
pci slot.
Also create a separate WQ because now we will have two types of health
works, one for the error detection and one for the recovery.

Fixes: 89d44f0a6c ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 12:00:39 -04:00
Daniel Jurgens b47bd6ea40 {net, ib}/mlx5: Make cache line size determination at runtime.
ARM 64B cache line systems have L1_CACHE_BYTES set to 128.
cache_line_size() will return the correct size.

Fixes: cf50b5efa2fe('net/mlx5_core/ib: New device capabilities
handling.')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29 12:00:39 -04:00
Linus Torvalds c067affcd3 ACPI fixes for v4.9-rc3
Specifics:
 
  - Fix three ACPICA issues related to the interpreter locking and
    introduced by recent changes in that area (Lv Zheng).
 
  - Fix a PCI IRQ management regression introduced during the 4.7
    cycle and related to the configuration of shared IRQs on systems
    with an ISA bus (Sinan Kaya).
 
  - Fix up a return value of one function in the APEI code (Punit
    Agrawal).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJYE+wwAAoJEILEb/54YlRxe0wQAKIyO3ktsxxbz2iACxFPZGmn
 ML1+OTBIGQKYDSGCINhMV5PGd98IMBaVCB9RllG/B9iALb8VCGiJ6AuJKoR7q2pZ
 6mr7ioXNfTNlLISykt63cD/Lp/YobZMG6WhoNWzoKslVUQrSWAISV+wGpBxoj08i
 8X7t/QtvRIVWfy4H4reDgpQMIKUeDhk6REeb8FESiXYboOvNhbXZpPS+bv8XXEfD
 bu/ASQIZs3Z9YB2uTij16Tx95eJETHhr9zYIxbi848YDxjelpZNs1QuVoYxq3GO2
 S7vbGHZITMLSEz4jD0w98YvDcb0jywfSXX53NBMaSJqOAleVKNH1rE8KvywG6H0s
 2298yBc0o0ldBb+4nszoL7NyGQKDrmxMEGBRFlk67gZ1LA4cpk+Usv9Q0GmNx38H
 KQfuA144n9ICaM9Kw9CKD8xrQ+PtpoTIBXzKGsdqIwHsS+2XSwzgp4IWEuMRfZNu
 5ermtO2tRz47UDnQ5UxdweB0n2pEMrZXDDzBONdqp3ds4aOvOvVqQP3OB+iMaMrT
 rPvVlYLr2Q+ekIzHPCpB7uBwI//bJ3L2cLqHxp9hlbNeFQWdv/NlMlmbWgYVVpn3
 tCvqqpH+jtof8lnmWZtBdc4M0GhwM+CP6TYd65n2yDqVCfPraXokE0UdNBW+je4Z
 BRuhjVEi0vBsrc3M4IC9
 =wjnL
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix recent ACPICA regressions, an older PCI IRQ management
  regression, and an incorrect return value of a function in the APEI
  code.

  Specifics:

   - Fix three ACPICA issues related to the interpreter locking and
     introduced by recent changes in that area (Lv Zheng).

   - Fix a PCI IRQ management regression introduced during the 4.7 cycle
     and related to the configuration of shared IRQs on systems with an
     ISA bus (Sinan Kaya).

   - Fix up a return value of one function in the APEI code (Punit
     Agrawal)"

* tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
  ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
  ACPICA: Dispatcher: Fix order issue of method termination
  ACPI / APEI: Fix incorrect return value of ghes_proc()
  ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
  ACPI/PCI: pci_link: penalize SCI correctly
  ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages
2016-10-28 18:34:19 -07:00
Linus Torvalds b49c3170bf Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc kernel fixes: a virtualization environment related fix, an uncore
  PMU driver removal handling fix, a PowerPC fix and new events for
  Knights Landing"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
  perf/powerpc: Don't call perf_event_disable() from atomic context
  perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
  perf/x86/intel/cstate: Add C-state residency events for Knights Landing
2016-10-28 16:27:16 -07:00
Linus Torvalds bdb520845b patches to fix a regression in 4.9-rc1 on x86 PAT
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYEFHpAAoJEAx081l5xIa+tyYP/0xq+ZqHwS90k1mge/2uWYB3
 sVQvFFIV55r6siOjIdDek+dsHq7IGFOChbsxegGyGvfwYVjzSmdoBwO1aMTV+Ii9
 OoqLS/53kts9jHOVm1UNsbxW1lzJVWoFWpMY57KDodWsWxVbd0NuP9mfTRIH2Sfj
 MmymKigXgwHSndn07+2xp9jI9Y5krtOLl+4YDsly7JF2IR7UBRRoW8n/WHR75lny
 MNn2Vtn9NBwxDieFQc/KQGUQ1nC8wB0c3wtGDDQIux0gp6IVW+pQoCLo6CMtgHXB
 IXGDojVA9KpcyEUz5RkBsVHYmvZR1PoS+nrnEE6b/C8p7UDuyCrk1Zfy0ZTGV/hq
 LKmfRKB3NWbgKnBbqOdFYhsh/iyVjqoNdGYqfR4qJx5JGIltVWbjYwlwUpImlrIY
 gKqtAdVFaFuoJs8MpFharxFlBf/o9DPDTPTWPQxGI16y7poH+86v7QmAJT9dJHRE
 pf3oyYI3eHTeIQb42f7PHSp4hsVJMX5Awkm9a8b9PhNlu/3cHUOYkCT060ripMBc
 ZksIUqKFzuk+TDRTnQrCQjaC4vJ6s8XUwntFhfHCZUmnnH8YhYpryDwdyzavcUvX
 or8rkKsO/+Jxa1kRr8d2c1JYi2FIMHBP0srAu43WeYyAsSPFIL/9l5flIeHi2Ow3
 tSHbCo4W5YRbQaVcBzxG
 =prah
 -----END PGP SIGNATURE-----

Merge tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux

Pull drm x86/pat regression fixes from Dave Airlie:
 "This is a standalone pull request for the fix for a regression
  introduced in -rc1 by a change to vm_insert_mixed to start using the
  PAT range tracking to validate page protections. With this fix in
  place, all the VRAM mappings for GPU drivers ended up at UC instead of
  WC.

  There are probably better ways to fix this long term, but nothing I'd
  considered for -fixes that wouldn't need more settling in time. So
  I've just created a new arch API that the drivers can reserve all
  their VRAM aperture ranges as WC"

* tag 'drm-x86-pat-regression-fix' of git://people.freedesktop.org/~airlied/linux:
  drm/drivers: add support for using the arch wc mapping API.
  x86/io: add interface to reserve io memtype for a resource range. (v1.1)
2016-10-28 09:36:07 -07:00
Jiri Olsa 5aab90ce1e perf/powerpc: Don't call perf_event_disable() from atomic context
The trinity syscall fuzzer triggered following WARN() on powerpc:

  WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
  ...
  NIP [c00000000093aedc] .hw_breakpoint_handler+0x28c/0x2b0
  LR [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0
  Call Trace:
  [c0000002f7933580] [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 (unreliable)
  [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
  [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
  [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
  [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
  [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48

Followed by a lockdep warning:

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.8.0-rc5+ #7 Tainted: G        W
  -------------------------------
  ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side critical section!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 0
  2 locks held by ls/2998:
   #0:  (rcu_read_lock){......}, at: [<c0000000000f6a00>] .__atomic_notifier_call_chain+0x0/0x1c0
   #1:  (rcu_read_lock){......}, at: [<c00000000093ac50>] .hw_breakpoint_handler+0x0/0x2b0

  stack backtrace:
  CPU: 9 PID: 2998 Comm: ls Tainted: G        W       4.8.0-rc5+ #7
  Call Trace:
  [c0000002f7933150] [c00000000094b1f8] .dump_stack+0xe0/0x14c (unreliable)
  [c0000002f79331e0] [c00000000013c468] .lockdep_rcu_suspicious+0x138/0x180
  [c0000002f7933270] [c0000000001005d8] .___might_sleep+0x278/0x2e0
  [c0000002f7933300] [c000000000935584] .mutex_lock_nested+0x64/0x5a0
  [c0000002f7933410] [c00000000023084c] .perf_event_ctx_lock_nested+0x16c/0x380
  [c0000002f7933500] [c000000000230a80] .perf_event_disable+0x20/0x60
  [c0000002f7933580] [c00000000093aeec] .hw_breakpoint_handler+0x29c/0x2b0
  [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
  [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
  [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
  [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
  [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48

While it looks like the first WARN() is probably valid, the other one is
triggered by disabling event via perf_event_disable() from atomic context.

The event is disabled here in case we were not able to emulate
the instruction that hit the breakpoint. By disabling the event
we unschedule the event and make sure it's not scheduled back.

But we can't call perf_event_disable() from atomic context, instead
we need to use the event's pending_disable irq_work method to disable it.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161026094824.GA21397@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28 11:06:25 +02:00
Linus Torvalds 14970f204b Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "20 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
  cris/arch-v32: cryptocop: print a hex number after a 0x prefix
  ipack: print a hex number after a 0x prefix
  block: DAC960: print a hex number after a 0x prefix
  fs: exofs: print a hex number after a 0x prefix
  lib/genalloc.c: start search from start of chunk
  mm: memcontrol: do not recurse in direct reclaim
  CREDITS: update credit information for Martin Kepplinger
  proc: fix NULL dereference when reading /proc/<pid>/auxv
  mm: kmemleak: ensure that the task stack is not freed during scanning
  lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
  latent_entropy: raise CONFIG_FRAME_WARN by default
  kconfig.h: remove config_enabled() macro
  ipc: account for kmem usage on mqueue and msg
  mm/slab: improve performance of gathering slabinfo stats
  mm: page_alloc: use KERN_CONT where appropriate
  mm/list_lru.c: avoid error-path NULL pointer deref
  h8300: fix syscall restarting
  kcov: properly check if we are in an interrupt
  mm/slab: fix kmemcg cache creation delayed issue
2016-10-27 19:58:39 -07:00
Masahiro Yamada c0a0aba8e4 kconfig.h: remove config_enabled() macro
The use of config_enabled() is ambiguous.  For config options,
IS_ENABLED(), IS_REACHABLE(), etc.  will make intention clearer.
Sometimes config_enabled() has been used for non-config options because
it is useful to check whether the given symbol is defined or not.

I have been tackling on deprecating config_enabled(), and now is the
time to finish this work.

Some new users have appeared for v4.9-rc1, but it is trivial to replace
them:

 - arch/x86/mm/kaslr.c
  replace config_enabled() with IS_ENABLED() because
  CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.

 - include/asm-generic/export.h
  replace config_enabled() with __is_defined().

Then, config_enabled() can be removed now.

Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
options, and __is_defined() for non-config symbols.

Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
David Ahern d5d32e4b76 net: ipv6: Do not consider link state for nexthop validation
Similar to IPv4, do not consider link state when validating next hops.

Currently, if the link is down default routes can fail to insert:
 $ ip -6 ro add vrf blue default via 2100:2::64 dev eth2
 RTNETLINK answers: No route to host

With this patch the command succeeds.

Fixes: 8c14586fc3 ("net: ipv6: Use passed in table for nexthop lookups")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:33:12 -04:00
David Ahern 830218c1ad net: ipv6: Fix processing of RAs in presence of VRF
rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB
table from the device index, but the corresponding rt6_get_route_info
and rt6_get_dflt_router functions were not leading to the failure to
process RA's:

    ICMPv6: RA: ndisc_router_discovery failed to add default route

Fix the 'get' functions by using the table id associated with the
device when applicable.

Also, now that default routes can be added to tables other than the
default table, rt6_purge_dflt_routers needs to be updated as well to
look at all tables. To handle that efficiently, add a flag to the table
denoting if it is has a default route via RA.

Fixes: ca254490c8 ("net: Add VRF support to IPv6 stack")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:30:52 -04:00
Johannes Berg 56989f6d85 genetlink: mark families as __ro_after_init
Now genl_register_family() is the only thing (other than the
users themselves, perhaps, but I didn't find any doing that)
writing to the family struct.

In all families that I found, genl_register_family() is only
called from __init functions (some indirectly, in which case
I've add __init annotations to clarifly things), so all can
actually be marked __ro_after_init.

This protects the data structure from accidental corruption.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:09 -04:00
Johannes Berg 2ae0f17df1 genetlink: use idr to track families
Since generic netlink family IDs are small integers, allocated
densely, IDR is an ideal match for lookups. Replace the existing
hand-written hash-table with IDR for allocation and lookup.

This lets the families only be written to once, during register,
since the list_head can be removed and removal of a family won't
cause any writes.

It also slightly reduces the code size (by about 1.3k on x86-64).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:09 -04:00
Johannes Berg 489111e5c2 genetlink: statically initialize families
Instead of providing macros/inline functions to initialize
the families, make all users initialize them statically and
get rid of the macros.

This reduces the kernel code size by about 1.6k on x86-64
(with allyesconfig).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:09 -04:00
Johannes Berg a07ea4d994 genetlink: no longer support using static family IDs
Static family IDs have never really been used, the only
use case was the workaround I introduced for those users
that assumed their family ID was also their multicast
group ID.

Additionally, because static family IDs would never be
reserved by the generic netlink code, using a relatively
low ID would only work for built-in families that can be
registered immediately after generic netlink is started,
which is basically only the control family (apart from
the workaround code, which I also had to add code for so
it would reserve those IDs)

Thus, anything other than GENL_ID_GENERATE is flawed and
luckily not used except in the cases I mentioned. Move
those workarounds into a few lines of code, and then get
rid of GENL_ID_GENERATE entirely, making it more robust.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:09 -04:00
Johannes Berg c90c39dab3 genetlink: introduce and use genl_family_attrbuf()
This helper function allows family implementations to access
their family's attrbuf. This gets rid of the attrbuf usage
in families, and also adds locking validation, since it's not
valid to use the attrbuf with parallel_ops or outside of the
dumpit callback.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:16:08 -04:00
Antonio Quartulli 4fe77d82ef skbedit: allow the user to specify bitmask for mark
The user may want to use only some bits of the skb mark in
his skbedit rules because the remaining part might be used by
something else.

Introduce the "mask" parameter to the skbedit actor in order
to implement such functionality.

When the mask is specified, only those bits selected by the
latter are altered really changed by the actor, while the
rest is left untouched.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-27 16:07:25 -04:00
Linus Torvalds e890038e6a xfs: updates for 4.9-rc3
Changes in this update:
 o iomap page offset masking fix for page faults
 o add IOMAP_REPORT to distinguish between read and fiemap map requests
 o cleanups to new shared data extent code
 o fix mount active status on failed log recovery
 o fix broken dquots in a buffer calculation
 o fix locking order issues and merge xfs_reflink_remap_range and
   xfs_file_share_range
 o rework unmapping of CoW extents and remove now unused functions
 o clean state when CoW is done.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYEfdWAAoJEK3oKUf0dfoddg4P/0Tl/i58sBL/Um90kSGOjxjI
 yaOKuFImS3MFSYDwiYADnXdhq6BgVLUJWS07t9/P6Nn3OZr1wBCZDZdyRS1+JwAA
 qOui4sp/v21HprydscN+BAdxyYmuo4yFgu9lkFFSM55yiaAb5C8hsYKF42Gja1+m
 gS40/Lsa5nauSz58UOZ5oEljAvBldAdyMlk8rVSGXVm7+pqs7Lxmhjif/Y8y/Y+i
 097auIrGk+oRDukXqhtZyCQ7VP99WzM+ksajtrNwVOOzSMhrcDCHKuLe0i4LsyjN
 UTx1ioY/AD8PUYhSmLqALD9vtFHnJbx50/MQFHNLc+hDQb2jb/jQmqx9LyEYDt38
 sw/Wy55hh9PylILdE//bWH0vSgqmnNCWviBUzjDtAJ9FKfv19slFlwtu2K4lOHoq
 C6Q2uh2mB7BC6efksk9DeA6/N9tFQuiXa48sN5+D2zMfZAmdkgzDCKfGrpRnS1Yl
 4h+sfiK/DTf11Q2nTaPAHylt02SmHsikQWvb5Fxu76UI8k4RsjCZc3ep/NUNJBlU
 E8f+cdNlAF5k/AWBY7107N1iUqL/vS2wXLdburJkckmQqRcI5WuRaLhi9g4tFjFI
 o+m9EM1WuOP6jeOuVImwgCRJoLVnTVKwee/d4J8y9Ad//Rs6B9pB0SIDfxJa9LY6
 B1XjT8z/NVyK6GsfP1Qs
 =LDDu
 -----END PGP SIGNATURE-----

Merge tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
 "This update contains fixes for most of the outstanding regressions
  introduced with the 4.9-rc1 XFS merge. There is also a fix for an
  iomap bug, too.

  This is a quite a bit larger than I'd prefer for a -rc3, but most of
  the change comes from cleaning up the new reflink copy on write code;
  it's much simpler and easier to understand now. These changes fixed
  several bugs in the new code, and it wasn't clear that there was an
  easier/simpler way to fix them. The rest of the fixes are the usual
  size you'd expect at this stage.

  I've left the commits to soak in linux-next for a some extra time
  because of the size before asking you to pull, no new problems with
  them have been reported so I think it's all OK.

  Summary:
   - iomap page offset masking fix for page faults
   - add IOMAP_REPORT to distinguish between read and fiemap map
     requests
   - cleanups to new shared data extent code
   - fix mount active status on failed log recovery
   - fix broken dquots in a buffer calculation
   - fix locking order issues and merge xfs_reflink_remap_range and
     xfs_file_share_range
   - rework unmapping of CoW extents and remove now unused functions
   - clean state when CoW is done"

* tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (25 commits)
  xfs: clear cowblocks tag when cow fork is emptied
  xfs: fix up inode cowblocks tracking tracepoints
  fs: Do to trim high file position bits in iomap_page_mkwrite_actor
  xfs: remove xfs_bunmapi_cow
  xfs: optimize xfs_reflink_end_cow
  xfs: optimize xfs_reflink_cancel_cow_blocks
  xfs: refactor xfs_bunmapi_cow
  xfs: optimize writes to reflink files
  xfs: don't bother looking at the refcount tree for reads
  xfs: handle "raw" delayed extents xfs_reflink_trim_around_shared
  xfs: add xfs_trim_extent
  iomap: add IOMAP_REPORT
  xfs: merge xfs_reflink_remap_range and xfs_file_share_range
  xfs: remove xfs_file_wait_for_io
  xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range
  xfs: fix the same_inode check in xfs_file_share_range
  xfs: remove the same fs check from xfs_file_share_range
  libxfs: v3 inodes are only valid on crc-enabled filesystems
  libxfs: clean up _calc_dquots_per_chunk
  xfs: unset MS_ACTIVE if mount fails
  ...
2016-10-27 12:34:50 -07:00
Linus Torvalds 9dcb8b685f mm: remove per-zone hashtable of bitlock waitqueues
The per-zone waitqueues exist because of a scalability issue with the
page waitqueues on some NUMA machines, but it turns out that they hurt
normal loads, and now with the vmalloced stacks they also end up
breaking gfs2 that uses a bit_wait on a stack object:

     wait_on_bit(&gh->gh_iflags, HIF_WAIT, TASK_UNINTERRUPTIBLE)

where 'gh' can be a reference to the local variable 'mount_gh' on the
stack of fill_super().

The reason the per-zone hash table breaks for this case is that there is
no "zone" for virtual allocations, and trying to look up the physical
page to get at it will fail (with a BUG_ON()).

It turns out that I actually complained to the mm people about the
per-zone hash table for another reason just a month ago: the zone lookup
also hurts the regular use of "unlock_page()" a lot, because the zone
lookup ends up forcing several unnecessary cache misses and generates
horrible code.

As part of that earlier discussion, we had a much better solution for
the NUMA scalability issue - by just making the page lock have a
separate contention bit, the waitqueue doesn't even have to be looked at
for the normal case.

Peter Zijlstra already has a patch for that, but let's see if anybody
even notices.  In the meantime, let's fix the actual gfs2 breakage by
simplifying the bitlock waitqueues and removing the per-zone issue.

Reported-by: Andreas Gruenbacher <agruenba@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 09:27:57 -07:00
vamsi krishna 088e8df82f cfg80211: Add support to update connection parameters
Add functionality to update the connection parameters when in connected
state, so that driver/firmware uses the updated parameters for
subsequent roaming. This is for drivers that support internal BSS
selection and roaming. The new command does not change the current
association state, i.e., it can be used to update IE contents for future
(re)associations without causing an immediate disassociation or
reassociation with the current BSS.

This commit implements the required functionality for updating IEs for
(Re)Association Request frame only. Other parameters can be added in
future when required.

Signed-off-by: vamsi krishna <vamsin@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:28 +02:00
Michael Braun ce0ce13a1c cfg80211: configure multicast to unicast for AP interfaces
Add the ability to configure if an AP (and associated VLANs) will
do multicast-to-unicast conversion for ARP, IPv4 and IPv6 frames
(possibly within 802.1Q). If enabled, such frames are to be sent
to each station separately, with the DA replaced by their own MAC
address rather than the group address.

Note that this may break certain expectations of the receiver,
such as the ability to drop unicast IP packets received within
multicast L2 frames, or the ability to not send ICMP destination
unreachable messages for packets received in L2 multicast (which
is required, but the receiver can't tell the difference if this
new option is enabled.)

This also doesn't implement the 802.11 DMS (directed multicast
service).

Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
[fix disabling, add better documentation & commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:27 +02:00
Jouni Malinen 348bd45669 cfg80211: Add KEK/nonces for FILS association frames
The new nl80211 attributes can be used to provide KEK and nonces to
allow the driver to encrypt and decrypt FILS (Re)Association
Request/Response frames in station mode.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:24 +02:00
Jouni Malinen 631810603a cfg80211: Add Fast Initial Link Setup (FILS) auth algs
This defines authentication algorithms for FILS (IEEE 802.11ai).

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:23 +02:00
Jouni Malinen 3f817fe718 cfg80211: Define IEEE P802.11ai (FILS) information elements
Define the Element IDs and Element ID Extensions from IEEE
P802.11ai/D11.0. In addition, add a new cfg80211_find_ext_ie() wrapper
to make it easier to find information elements that used the Element ID
Extension field.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:23 +02:00
Jouni Malinen 60b8084e84 cfg80211: Add feature flag for Fast Initial Link Setup (FILS) as STA
This defines a feature flag that drivers can use to indicate that they
support FILS authentication/association (IEEE 802.11ai) when using user
space SME (NL80211_CMD_AUTHENTICATE) in station mode.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:22 +02:00
Jouni Malinen 11b6b5a4ce cfg80211: Rename SAE_DATA to more generic AUTH_DATA
This adds defines and nl80211 extensions to allow FILS Authentication to
be implemented similarly to SAE. FILS does not need the special rules
for the Authentication transaction number and Status code fields, but it
does need to add non-IE fields. The previously used
NL80211_ATTR_SAE_DATA can be reused for this to avoid having to
duplicate that implementation. Rename that attribute to more generic
NL80211_ATTR_AUTH_DATA (with backwards compatibility define for
NL80211_SAE_DATA).

Also document the special rules related to the Authentication
transaction number and Status code fiels.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 16:03:20 +02:00
Johannes Berg 4c8dea638c cfg80211: validate beacon int as part of iface combinations
Remove the pointless checking against interface combinations in
the initial basic beacon interval validation, that currently isn't
taking into account radar detection or channels properly. Instead,
just validate the basic range there, and then delay real checking
to the interface combination validation that drivers must do.

This means that drivers wanting to use the beacon_int_min_gcd will
now have to pass the new_beacon_int when validating the AP/mesh
start.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:18:07 +02:00
Arend Van Spriel 73c7da3dae cfg80211: add generic helper to check interface is running
Add a helper using wdev to check if interface is running. This
deals with both non-netdev and netdev interfaces. In struct
wireless_dev replace 'p2p_started' and 'nan_started' by
'is_running' as those are mutually exclusive anyway, and unify
all the code to use wdev_running().

Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-27 09:08:44 +02:00
Eric Dumazet 10df8e6152 udp: fix IP_CHECKSUM handling
First bug was added in commit ad6f939ab1 ("ip: Add offset parameter to
ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));

Then commit e6afc8ace6 ("udp: remove headers from UDP packets before
queueing") forgot to adjust the offsets now UDP headers are pulled
before skb are put in receive queue.

Fixes: ad6f939ab1 ("ip: Add offset parameter to ip_cmsg_recv")
Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sam Kumar <samanthakumar@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Tested-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:33:22 -04:00
Stephen Hemminger 293de7dee4 doc: update docbook annotations for socket and skb
The skbuff and sock structure both had missing parameter annotation
values.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:31:23 -04:00
Xo Wang d92ead16be net: phy: broadcom: Add support for BCM54612E
This PHY has internal delays enabled after reset. This clears the
internal delay enables unless the interface specifically requests them.

Signed-off-by: Xo Wang <xow@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:15:26 -04:00
Xo Wang 3cf25904fe net: phy: broadcom: Update Auxiliary Control Register macros
Add the RXD-to-RXC skew (delay) time bit in the Miscellaneous Control
shadow register and a mask for the shadow selector field.

Remove a re-definition of MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL.

Signed-off-by: Xo Wang <xow@google.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-26 17:15:26 -04:00
Jani Nikula b4f7f4ad42 mac80211: fix some sphinx warnings
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-26 08:01:07 +02:00
Dave Airlie 8ef4227615 x86/io: add interface to reserve io memtype for a resource range. (v1.1)
A recent change to the mm code in:
87744ab383 mm: fix cache mode tracking in vm_insert_mixed()

started enforcing checking the memory type against the registered list for
amixed pfn insertion mappings. It happens that the drm drivers for a number
of gpus relied on this being broken. Currently the driver only inserted
VRAM mappings into the tracking table when they came from the kernel,
and userspace mappings never landed in the table. This led to a regression
where all the mapping end up as UC instead of WC now.

I've considered a number of solutions but since this needs to be fixed
in fixes and not next, and some of the solutions were going to introduce
overhead that hadn't been there before I didn't consider them viable at
this stage. These mainly concerned hooking into the TTM io reserve APIs,
but these API have a bunch of fast paths I didn't want to unwind to add
this to.

The solution I've decided on is to add a new API like the arch_phys_wc
APIs (these would have worked but wc_del didn't take a range), and
use them from the drivers to add a WC compatible mapping to the table
for all VRAM on those GPUs. This means we can then create userspace
mapping that won't get degraded to UC.

v1.1: use CONFIG_X86_PAT + add some comments in io.h

Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: x86@kernel.org
Cc: mcgrof@suse.com
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-26 15:45:38 +10:00
Linus Torvalds b5cd891716 This is the first batch of clk driver fixes for this release. We have a handful
of fixes for the uniphier clk driver that was introduced recently, as well as
 Kconfig option hiding, module autoloading markings, and a few fixes for clk_hw
 based registration patches that went in this merge window.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJYDnAWAAoJEK0CiJfG5JUl0swQAIsxEXNsm/IBV5HP6NfFEYdK
 zPQbHCiHKdvPXJyyTXKnGYq4k/6xCpXgRWX7nxgEqSGZIteBbyQ7vd1++yBUwWTq
 hJaitk4euoR4XGia1kKsxs4pgzAOAE6V9eMV9Is2P+Mm0c2EDH6yXYlsqCk26aYM
 M/AG0jMhbvWIL7t34MN8L4cd7o6S5PdFpFGOV+b1aPoTiDijejIj8ew9C2hsotd4
 sh6DH9BDVqbJGdLE7gas/7rpO1lNZ5PgVBVOd/RP9cbqgP5BPUmr6SEy0AhC0SHu
 T+b72eg459gWg8OEvVmLm2aUmyBznQvfYV14Zdkqx3ncI68F+v4TPu2u8m5QIwqF
 khKqyfdvUQv5orufzHibJLwkd8elEPkxvg0xZPV1EKPVPSQ16Eu/3aWFlhHjMYFR
 v9zxYGZ5/tmImjaSaEGU2B5Cd6MTmJew+aP6hg9dI4GkR6qtccKfp7T+SMXK7Mja
 GA6y0GNBq0V+Fpn7LsoJwZoo/r6RbIyzJW00YR1Yr5cJEXffKVP25UTHXr8M/4Uj
 FqpNoje8vepmO4b+0pEYM5x46U7rXnyEyfznVSK+SRhTujCVJ+WOUM5QxM+4wze9
 247JzNEdx0ED5ahYmpWTb+JApQ6rkmnBmhsFjP50aae1zWGqNaE9UbzOmEZmpdjN
 b2uXD2blbZ93Oqn6vrHh
 =mWXv
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "This is the first batch of clk driver fixes for this release.

  We have a handful of fixes for the uniphier clk driver that was
  introduced recently, as well as Kconfig option hiding, module
  autoloading markings, and a few fixes for clk_hw based registration
  patches that went in this merge window"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: at91: Fix a return value in case of error
  clk: uniphier: rename MIO clock to SD clock for Pro5, PXs2, LD20 SoCs
  clk: uniphier: fix memory overrun bug
  clk: hi6220: use CLK_OF_DECLARE_DRIVER for sysctrl and mediactrl clock init
  clk: mvebu: armada-37xx-periph: Fix the clock gate flag
  clk: bcm2835: Clamp the PLL's requested rate to the hardware limits.
  clk: max77686: fix number of clocks setup for clk_hw based registration
  clk: mvebu: armada-37xx-periph: Fix the clock provider registration
  clk: core: add __init decoration for CLK_OF_DECLARE_DRIVER function
  clk: mediatek: Add hardware dependency
  clk: samsung: clk-exynos-audss: Fix module autoload
  clk: uniphier: fix type of variable passed to regmap_read()
  clk: uniphier: add system clock support for sLD3 SoC
2016-10-24 21:30:19 -07:00
Lorenzo Stoakes 0d73175982 mm: unexport __get_user_pages()
This patch unexports the low-level __get_user_pages() function.

Recent refactoring of the get_user_pages* functions allow flags to be
passed through get_user_pages() which eliminates the need for access to
this function from its one user, kvm.

We can see that the two calls to get_user_pages() which replace
__get_user_pages() in kvm_main.c are equivalent by examining their call
stacks:

  get_user_page_nowait():
    get_user_pages(start, 1, flags, page, NULL)
    __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL,
			    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, start, 1,
		     flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL)

  check_user_page_hwpoison():
    get_user_pages(addr, 1, flags, NULL, NULL)
    __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL,
			    false, flags | FOLL_TOUCH)
    __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL,
		     NULL, NULL)

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-24 19:13:20 -07:00
Sinan Kaya f1caa61df2 ACPI/PCI: pci_link: penalize SCI correctly
Ondrej reported that IRQs stopped working in v4.7 on several
platforms.  A typical scenario, from Ondrej's VT82C694X/694X, is:

ACPI: Using PIC for interrupt routing
ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 *11 12 14 15)
ACPI: No IRQ available for PCI Interrupt Link [LNKA]
8139too 0000:00:0f.0: PCI INT A: no GSI

We're using PIC routing, so acpi_irq_balance == 0, and LNKA is already
active at IRQ 11. In that case, acpi_pci_link_allocate() only tries
to use the active IRQ (IRQ 11) which also happens to be the SCI.

We should penalize the SCI by PIRQ_PENALTY_PCI_USING, but
irq_get_trigger_type(11) returns something other than
IRQ_TYPE_LEVEL_LOW, so we penalize it by PIRQ_PENALTY_ISA_ALWAYS
instead, which makes acpi_pci_link_allocate() assume the IRQ isn't
available and give up.

Add acpi_penalize_sci_irq() so platforms can tell us the SCI IRQ,
trigger, and polarity directly and we don't have to depend on
irq_get_trigger_type().

Fixes: 103544d869 (ACPI,PCI,IRQ: reduce resource requirements)
Link: http://lkml.kernel.org/r/201609251512.05657.linux@rainbow-software.org
Reported-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Tested-by: Jonathan Liu <net147@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-24 14:18:14 +02:00
Linus Torvalds a55da8a0dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Here are the outstanding target-pending fixes for v4.9-rc2.

  This includes:

   - Fix v4.1.y+ reference leak regression with concurrent TMR
     ABORT_TASK + session shutdown. (Vaibhav Tandon)

   - Enable tcm_fc w/ SCF_USE_CPUID to avoid host exchange timeouts
     (Hannes)

   - target/user error sense handling fixes. (Andy + MNC + HCH)

   - Fix iscsi-target NOP_OUT error path iscsi_cmd descriptor leak
     (Varun)

   - Two EXTENDED_COPY SCSI status fixes for ESX VAAI (Dinesh Israni +
     Nixon Vincent)

   - Revert a v4.8 residual overflow change, that breaks sg_inq with
     small allocation lengths.

  There are a number of folks stress testing the v4.1.y regression fix
  in their environments, and more folks doing iser-target I/O stress
  testing atop recent v4.x.y code.

  There is also one v4.2.y+ RCU conversion regression related to
  explicit NodeACL configfs changes, that is still being tracked down"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target/tcm_fc: use CPU affinity for responses
  target/tcm_fc: Update debugging statements to match libfc usage
  target/tcm_fc: return detailed error in ft_sess_create()
  target/tcm_fc: print command pointer in debug message
  target: fix potential race window in target_sess_cmd_list_waiting()
  Revert "target: Fix residual overflow handling in target_complete_cmd_with_length"
  target: Don't override EXTENDED_COPY xcopy_pt_cmd SCSI status code
  target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE
  target: Re-add missing SCF_ACK_KREF assignment in v4.1.y
  iscsi-target: fix iscsi cmd leak
  iscsi-target: fix spelling mistake "Unsolicitied" -> "Unsolicited"
  target/user: Fix comments to not refer to data ring
  target/user: Return an error if cmd data size is too large
  target/user: Use sense_reason_t in tcmu_queue_cmd_ring
2016-10-23 16:37:58 -07:00
Cyrill Gorcunov 432490f9d4 net: ip, diag -- Add diag interface for raw sockets
In criu we are actively using diag interface to collect sockets
present in the system when dumping applications. And while for
unix, tcp, udp[lite], packet, netlink it works as expected,
the raw sockets do not have. Thus add it.

v2:
 - add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
 - implement @destroy for diag requests (by dsa@)

v3:
 - add export of raw_abort for IPv6 (by dsa@)
 - pass net-admin flag into inet_sk_diag_fill due to
   changes in net-next branch (by dsa@)

v4:
 - use @pad in struct inet_diag_req_v2 for raw socket
   protocol specification: raw module carries sockets
   which may have custom protocol passed from socket()
   syscall and sole @sdiag_protocol is not enough to
   match underlied ones
 - start reporting protocol specifed in socket() call
   when sockets are raw ones for the same reason: user
   space tools like ss may parse this attribute and use
   it for socket matching

v5 (by eric.dumazet@):
 - use sock_hold in raw_sock_get instead of atomic_inc,
   we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
   when looking up so counter won't be zero here.

v6:
 - use sdiag_raw_protocol() helper which will access @pad
   structure used for raw sockets protocol specification:
   we can't simply rename this member without breaking uapi

v7:
 - sine sdiag_raw_protocol() helper is not suitable for
   uapi lets rather make an alias structure with proper
   names. __check_inet_diag_req_raw helper will catch
   if any of structure unintentionally changed.

CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Ahern <dsa@cumulusnetworks.com>
CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
CC: James Morris <jmorris@namei.org>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrey Vagin <avagin@openvz.org>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 19:35:24 -04:00
Linus Torvalds 5766e9d25f A small bug fix and a new driver for acting as an IPMI device.
I was on vacation during the merge window (a long vacation)
 but this is a bug fix that should go in and a new driver that shouldn't
 hurt anything.
 
 This has been in linux-next for a month or so.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlgKVpoACgkQIXnXXONXERdGBACeONBS0wOf4Rv+bxSOdeJcTwLJ
 rmoAoJ2R0BpWE1imvcC+AqXOoqg2c48k
 =P4Dz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.9-2' of git://git.code.sf.net/p/openipmi/linux-ipmi

Pull IPMI updates from Corey Minyard:
 "A small bug fix and a new driver for acting as an IPMI device.

  I was on vacation during the merge window (a long vacation) but this
  is a bug fix that should go in and a new driver that shouldn't hurt
  anything.

  This has been in linux-next for a month or so"

* tag 'for-linus-4.9-2' of git://git.code.sf.net/p/openipmi/linux-ipmi:
  ipmi: fix crash on reading version from proc after unregisted bmc
  ipmi/bt-bmc: remove redundant return value check of platform_get_resource()
  ipmi/bt-bmc: add a dependency on ARCH_ASPEED
  ipmi: Fix ioremap error handling in bt-bmc
  ipmi: add an Aspeed BT IPMI BMC driver
2016-10-23 15:56:23 -07:00
Thomas Graf f76a9db351 lwt: Remove unused len field
The field is initialized by ILA and MPLS but never used. Remove it.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-23 17:45:01 -04:00
Sudarsana Reddy Kalluru 0e19182738 qed*: Reduce the memory footprint for Rx path
With the current default values for Rx path i.e., 8 queues of 8Kb entries
each with 4Kb size, interface will consume 256Mb for Rx. The default values
causing the driver probe to fail when the system memory is low. Based on
the perforamnce results, rx-ring count value of 1Kb gives the comparable
performance with Rx coalesce timeout of 12 seconds. Updating the default
values.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:08:07 -04:00
Daniel Borkmann 2d0e30c30f bpf: add helper for retrieving current numa node id
Use case is mainly for soreuseport to select sockets for the local
numa node, but since generic, lets also add this for other networking
and tracing program types.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:52 -04:00
Paolo Abeni f970bd9e3a udp: implement memory accounting helpers
Avoid using the generic helpers.
Use the receive queue spin lock to protect the memory
accounting operation, both on enqueue and on dequeue.

On dequeue perform partial memory reclaiming, trying to
leave a quantum of forward allocated memory.

On enqueue use a custom helper, to allow some optimizations:
- use a plain spin_lock() variant instead of the slightly
  costly spin_lock_irqsave(),
- avoid dst_force check, since the calling code has already
  dropped the skb dst
- avoid orphaning the skb, since skb_steal_sock() already did
  the work for us

The above needs custom memory reclaiming on shutdown, provided
by the udp_destruct_sock().

v5 -> v6:
  - don't orphan the skb on enqueue

v4 -> v5:
  - replace the mem_lock with the receive queue spin lock
  - ensure that the bh is always allowed to enqueue at least
    a skb, even if sk_rcvbuf is exceeded

v3 -> v4:
  - reworked memory accunting, simplifying the schema
  - provide an helper for both memory scheduling and enqueuing

v1 -> v2:
  - use a udp specific destrctor to perform memory reclaiming
  - remove a couple of helpers, unneeded after the above cleanup
  - do not reclaim memory on dequeue if not under memory
    pressure
  - reworked the fwd accounting schema to avoid potential
    integer overflow

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:05 -04:00
Paolo Abeni f8c3bf00d4 net/socket: factor out helpers for memory and queue manipulation
Basic sock operations that udp code can use with its own
memory accounting schema. No functional change is introduced
in the existing APIs.

v4 -> v5:
  - avoid whitespace changes

v2 -> v4:
  - avoid exporting __sock_enqueue_skb

v1 -> v2:
  - avoid export sock_rmem_free

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-22 17:05:05 -04:00
Linus Torvalds 0c2b6dc4fd Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "This updates contains:

   - A revert which addresses a boot failure on ARM Sun5i platforms

   - A new clocksource driver, which has been delayed beyond rc1 due to
     an interrupt driver issue which was unearthed by this driver. The
     debugging of that issue and the discussion about the proper
     solution made this driver miss the merge window. There is no point
     in delaying it for a full cycle as it completes the basic mainline
     support for the new JCore platform and does not create any risk
     outside of that platform"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init"
  clocksource: Add J-Core timer/clocksource driver
  of: Add J-Core timer bindings
2016-10-22 10:23:15 -07:00
Linus Torvalds 3e9679a365 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Three fixes, a hw-enablement and a cross-arch fix/enablement change:

   - SGI/UV fix for older platforms

   - x32 signal handling fix

   - older x86 platform bootup APIC fix

   - AVX512-4VNNIW (Neural Network Instructions) and AVX512-4FMAPS
     (Multiply Accumulation Single precision instructions) enablement.

   - move thread_info back into x86 specific code, to make life easier
     for other architectures trying to make use of
     CONFIG_THREAD_INFO_IN_TASK_STRUCT=y"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/smp: Don't try to poke disabled/non-existent APIC
  sched/core, x86: Make struct thread_info arch specific again
  x86/signal: Remove bogus user_64bit_mode() check from sigaction_compat_abi()
  x86/platform/UV: Fix support for EFI_OLD_MEMMAP after BIOS callback updates
  x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS features
  x86/vmware: Skip timer_irq_works() check on VMware
2016-10-22 09:58:49 -07:00
Linus Torvalds 86c5bf7101 Merge branch 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull vmap stack fixes from Ingo Molnar:
 "This is fallout from CONFIG_HAVE_ARCH_VMAP_STACK=y on x86: stack
  accesses that used to be just somewhat questionable are now totally
  buggy.

  These changes try to do it without breaking the ABI: the fields are
  left there, they are just reporting zero, or reporting narrower
  information (the maps file change)"

* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
  fs/proc: Stop trying to report thread stacks
  fs/proc: Stop reporting eip and esp in /proc/PID/stat
  mm/numa: Remove duplicated include from mprotect.c
2016-10-22 09:39:10 -07:00
Linus Torvalds bfb7bfef6f Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "Mostly irqchip driver fixes, plus a symbol export"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kernel/irq: Export irq_set_parent()
  irqchip/gic: Add missing \n to CPU IF adjustment message
  irqchip/jcore: Don't show Kconfig menu item for driver
  irqchip/eznps: Drop pointless static qualifier in nps400_of_init()
  irqchip/gic-v3-its: Fix entry size mask for GITS_BASER
  irqchip/gic-v3-its: Fix 64bit GIC{R,ITS}_TYPER accesses
2016-10-22 09:33:51 -07:00
Linus Torvalds 43ef55daa7 ACPI fixes for v4.9-rc2
Specifics:
 
  - Update the ACPI WDAT-based watchdog driver to ping the hardware
    during system resume to prevent a reset from occurring after the
    resume is complete (Mika Westerberg).
 
  - Fix the return value of the pcc_mbox_request_channel() stub for
    CONFIG_PCC unset (Hoan Tran).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJYCn8AAAoJEILEb/54YlRxo7YQAIeJ0Sx6wdHrgpFVKuajhTtv
 eJ2j21ttMZODk6iWX0Xfr1pT5GYteKoBXQvc2qHdtGnN9H9RbBJ/Ai0qL4XrnSGP
 TuaTBuePapogt02eK8mVb3mrFQrjOJvsWhE4hQSfHvCylAAM0w682APIQshjY8xY
 RtYWCoDHupFm5CGp4bN0X+AzPorB+BX+cg8bGqFDnk/b1pCVwCuW3VTq0Ni6SIJZ
 k+eDCT8K+r04pcBo+dMHx2RKRR9ZaGavm++u+5tnWoxRy7rhfj2txZyRnmd2znlq
 brKm4Tv1cnnCo8FLCuRNuqJ9OcYMpUPVtWN2ESzE/k6WPjzs2jEWEoSdmW/JB6dM
 4a8LsjRQLjMw3IS6sonAS6wo1DO8Tb+RcZH1MeDWMm3K6rqnYIcPknldoWDaoHYz
 9uNDMWeHsaYKtvAXZXHkaOd3cl1ESY1E9jGO0xR/CBikaRKBAAHzCIHPIUvJzIRm
 3KLWpMIiT++AcSrUQxVLgEHjkERh+Ph9LpnEdVOqcB+r7O/p3zr7aFO4iZCgZ9kH
 LZXfVI53YbX7vnCzMHgUToutZPkHsqzxathAsJnVS83GgxTsciX+O5/k7lHgCb7G
 gl9ReeweWpVKqrT/o4wStbTB56UYjOWzZcAsA0pWeKUiqUycTSVfiad1tr7lM+cs
 lJb5i8Pl6bJu6+nWm4lN
 =kG+5
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix an issue related to system resume in the new WDAT-based
  watchdog driver and a return value of a stub function in the ACPI CPPC
  framework.

  Specifics:

   - Update the ACPI WDAT-based watchdog driver to ping the hardware
     during system resume to prevent a reset from occurring after the
     resume is complete (Mika Westerberg).

   - Fix the return value of the pcc_mbox_request_channel() stub for
     CONFIG_PCC unset (Hoan Tran)"

* tag 'acpi-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  watchdog: wdat_wdt: Ping the watchdog on resume
  mailbox: PCC: Fix return value of pcc_mbox_request_channel()
2016-10-21 15:54:45 -07:00
Rafael J. Wysocki 956c8974da Merge branches 'acpi-wdat' and 'acpi-cppc'
* acpi-wdat:
  watchdog: wdat_wdt: Ping the watchdog on resume

* acpi-cppc:
  mailbox: PCC: Fix return value of pcc_mbox_request_channel()
2016-10-21 22:24:23 +02:00
Thomas Gleixner a442950d4a GIC updates for Linux 4.9-rc2
- Fix for 32bit accesses that should be 64bit on 64bit machines
 - Fix for a field decoding macro
 - Beautify a warning message
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYCkl/AAoJECPQ0LrRPXpDpJwQAMOOccZo7T2Ekt41R2kup1lg
 l1OWvJ5N22uTxnS6+fNp+JhHdwzVk2TJeYFkoxYfbSPUtEVJYEKm4ZPdlIG58bhE
 N/Yh5QIzOcHUWRr000PaTgap9G0HUNSm0RJfLlH/zbscpiDWuJQLZ/OK2uWtRn1+
 hIQd8Poz85jfCZIb/AnSdA4YKJfww7JwviA3UuQjatqpkoIlfz/3CvJbx4DXPcrT
 EBC3vEeCYiALYQGImwY/P6VzYriS/fU+QWHkE1er5PPq+PRak1cmPAS15FBvQwVy
 wCI3zo3aSD+Y7tbKa1AvcOK6xIZq4UJaVX+sWd2pb0oz5T17dYjyXbzm+VRblT+b
 CNTrRZvFOxJH4ImHM4FT9ioRUme65XiUOQ9g3LrMTJGI4nfKmRaF4tOzlTsfHJxT
 HiBXWV6s98maK3ltfqKdEY4qhsNMdi3bZNnu8fTVhZEUugMu4ywLXyNqfISdJ+MG
 gvbDojXkvMtFGPZHg1vuHuEqdBHH+5Wbh5TCbWykMkhGbYEkLT23YZgLRo5TbN1E
 OMLVhx+dblmaMn+Npyo2BtST9E2Akzym9R+hx4rqNrRH5NQvGrAlPAIFA/1ommYI
 qdC5vJNYGH+B+uPRSKUUx9kwl+NUTPDLSzYSrIAHzTyYnenJ+QvBE2Zd3z6NplCn
 NtctzCoOEjN2Q6WsSGVI
 =DL0w
 -----END PGP SIGNATURE-----

Merge tag 'gic-fixes-for-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

Pull GIC updates from Marc Zyngier:

 - Fix for 32bit accesses that should be 64bit on 64bit machines
 - Fix for a field decoding macro
 - Beautify a warning message
2016-10-21 21:40:29 +02:00
Linus Torvalds ecd06f2883 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "A set of fixes that missed the merge window, mostly due to me being
  away around that time.

  Nothing major here, a mix of nvme cleanups and fixes, and one fix for
  the badblocks handling"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvmet: use symbolic constants for CNS values
  nvme: use symbolic constants for CNS values
  nvme.h: add an enum for cns values
  nvme.h: don't use uuid_be
  nvme.h: resync with nvme-cli
  nvme: Add tertiary number to NVME_VS
  nvme : Add sysfs entry for NVMe CMBs when appropriate
  nvme: don't schedule multiple resets
  nvme: Delete created IO queues on reset
  nvme: Stop probing a removed device
  badblocks: fix overlapping check for clearing
2016-10-21 10:54:01 -07:00
WANG Cong 8651be8f14 ipv6: fix a potential deadlock in do_ipv6_setsockopt()
Baozeng reported this deadlock case:

       CPU0                    CPU1
       ----                    ----
  lock([  165.136033] sk_lock-AF_INET6);
                               lock([  165.136033] rtnl_mutex);
                               lock([  165.136033] sk_lock-AF_INET6);
  lock([  165.136033] rtnl_mutex);

Similar to commit 87e9f03159
("ipv4: fix a potential deadlock in mcast getsockopt() path")
this is due to we still have a case, ipv6_sock_mc_close(),
where we acquire sk_lock before rtnl_lock. Close this deadlock
with the similar solution, that is always acquire rtnl lock first.

Fixes: baf606d9c9 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: Baozeng Ding <sploving1@gmail.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-21 11:29:02 -04:00
Linus Torvalds 6f33d6458e Power management fix for v4.9-rc2
This fixes pointer arithmetics mess-up in the cpufreq core introduced
 by one of recent commits and leading to all kinds of breakage from
 kernel crashes to incorrect governor decisions (Sergey Senozhatsky).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJYCT7eAAoJEILEb/54YlRxHnMP/3HOV7MsdthmGCl3y5GLVpcA
 KUbLd5rIpfAYmT6nMIrU0CzddROdcP1XD/dHpoEvq693GG8bjS9K0rTiBAI/xlAd
 osx9wTI0nX6XZpQvWY0UwJ8EVOgbX/cwcDMdMLbT7sVS4jCcwAfqopFNT9vRHVv+
 YVzw/hsGTa1YmFlRbraAtgl2MDRjTcqPKYf6HkhrW+IW8yvI3mEuCF0hsx31bBE/
 4MxTZrJuScE3f7FcfpPNr67yCSeDJEZ3ZqZq5ANqpiKdsV2PFf2YIhkAfsuCjSIO
 aBKpbxifXsV9klzIQCDkPOK/fz+5od1NzOJiJ5ZgC80iV7NNWtWLWBo/75ABPeAk
 8sx1+EIjtUZ9lbr3KmErZzebZpMIdB9r+kHndBWVDt4RXiZZnUPykuyuRDzRW465
 o4CA8foANDFQwjHHzgSkilKJVjhN23R5PpJHM8GhI1jAkrPXevOxOvTdgz2yeAfN
 rL8Zw56L7oXJ0EjTdh93OV26udk7eZyMZl/MS/D4/bSlyF7Ma8/9gqmh/pkQj1yc
 yNbucupdkEc5YDUbytfqHkcbu2IULZSnNkkNdYtDDkJ89r0LEsVMhQFapwRqU6o1
 StKvdWEFmgutS4CG4zyzVYw7m/R1kw1p9T8ae6Dvesqn3egd+LwTOFu6Dvp1IsMC
 ONb3WBd4t88MRJSeAZLa
 =zILz
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "This fixes the pointer arithmetics mess-up in the cpufreq core
  introduced by one of recent commits and leading to all kinds of
  breakage from kernel crashes to incorrect governor decisions (Sergey
  Senozhatsky)"

* tag 'pm-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: fix overflow in cpufreq_table_find_index_dl()
2016-10-20 15:32:51 -07:00
Rafael J. Wysocki 350d32395b Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: fix overflow in cpufreq_table_find_index_dl()
2016-10-20 23:24:58 +02:00
Jarod Wilson b3e3893e12 net: use core MTU range checking in misc drivers
firewire-net:
- set min/max_mtu
- remove fwnet_change_mtu

nes:
- set max_mtu
- clean up nes_netdev_change_mtu

xpnet:
- set min/max_mtu
- remove xpnet_dev_change_mtu

hippi:
- set min/max_mtu
- remove hippi_change_mtu

batman-adv:
- set max_mtu
- remove batadv_interface_change_mtu
- initialization is a little async, not 100% certain that max_mtu is set
  in the optimal place, don't have hardware to test with

rionet:
- set min/max_mtu
- remove rionet_change_mtu

slip:
- set min/max_mtu
- streamline sl_change_mtu

um/net_kern:
- remove pointless ndo_change_mtu

hsi/clients/ssi_protocol:
- use core MTU range checking
- remove now redundant ssip_pn_set_mtu

ipoib:
- set a default max MTU value
- Note: ipoib's actual max MTU can vary, depending on if the device is in
  connected mode or not, so we'll just set the max_mtu value to the max
  possible, and let the ndo_change_mtu function continue to validate any new
  MTU change requests with checks for CM or not. Note that ipoib has no
  min_mtu set, and thus, the network core's mtu > 0 check is the only lower
  bounds here.

mptlan:
- use net core MTU range checking
- remove now redundant mpt_lan_change_mtu

fddi:
- min_mtu = 21, max_mtu = 4470
- remove now redundant fddi_change_mtu (including export)

fjes:
- min_mtu = 8192, max_mtu = 65536
- The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to
  get past the core net MTU range checks so fjes_change_mtu can validate a
  new MTU against what it supports (see fjes_support_mtu in fjes_hw.c)

hsr:
- min_mtu = 0 (calls ether_setup, max_mtu is 1500)

f_phonet:
- min_mtu = 6, max_mtu = 65541

u_ether:
- min_mtu = 14, max_mtu = 15412

phonet/pep-gprs:
- min_mtu = 576, max_mtu = 65530
- remove redundant gprs_set_mtu

CC: netdev@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: Stefan Richter <stefanr@s5r6.in-berlin.de>
CC: Faisal Latif <faisal.latif@intel.com>
CC: linux-rdma@vger.kernel.org
CC: Cliff Whickman <cpw@sgi.com>
CC: Robin Holt <robinmholt@gmail.com>
CC: Jes Sorensen <jes@trained-monkey.org>
CC: Marek Lindner <mareklindner@neomailbox.ch>
CC: Simon Wunderlich <sw@simonwunderlich.de>
CC: Antonio Quartulli <a@unstable.cc>
CC: Sathya Prakash <sathya.prakash@broadcom.com>
CC: Chaitra P B <chaitra.basappa@broadcom.com>
CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
CC: MPT-FusionLinux.pdl@broadcom.com
CC: Sebastian Reichel <sre@kernel.org>
CC: Felipe Balbi <balbi@kernel.org>
CC: Arvid Brodin <arvid.brodin@alten.se>
CC: Remi Denis-Courmont <courmisch@gmail.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:10 -04:00
Jarod Wilson 8b6b4135e4 net: use core MTU range checking in WAN drivers
- set min/max_mtu in all hdlc drivers, remove hdlc_change_mtu
- sent max_mtu in lec driver, remove lec_change_mtu
- set min/max_mtu in x25_asy driver

CC: netdev@vger.kernel.org
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: Krzysztof Halasa <khalasa@piap.pl>
CC: Jan "Yenya" Kasprzak <kas@fi.muni.cz>
CC: Francois Romieu <romieu@fr.zoreil.com>
CC: Kevin Curtis <kevin.curtis@farsite.co.uk>
CC: Zhao Qiang <qiang.zhao@nxp.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:09 -04:00
Jarod Wilson d894be57ca ethernet: use net core MTU range checking in more drivers
Somehow, I missed a healthy number of ethernet drivers in the last pass.
Most of these drivers either were in need of an updated max_mtu to make
jumbo frames possible to enable again. In a few cases, also setting a
different min_mtu to match previous lower bounds. There are also a few
drivers that had no upper bounds checking, so they're getting a brand new
ETH_MAX_MTU that is identical to IP_MAX_MTU, but accessible by includes
all ethernet and ethernet-like drivers all have already.

acenic:
- min_mtu = 0, max_mtu = 9000

amazon/ena:
- min_mtu = 128, max_mtu = adapter->max_mtu

amd/xgbe:
- min_mtu = 0, max_mtu = 9000

sb1250:
- min_mtu = 0, max_mtu = 1518

cxgb3:
- min_mtu = 81, max_mtu = 65535

cxgb4:
- min_mtu = 81, max_mtu = 9600

cxgb4vf:
- min_mtu = 81, max_mtu = 65535

benet:
- min_mtu = 256, max_mtu = 9000

ibmveth:
- min_mtu = 68, max_mtu = 65535

ibmvnic:
- min_mtu = adapter->min_mtu, max_mtu = adapter->max_mtu
- remove now redundant ibmvnic_change_mtu

jme:
- min_mtu = 1280, max_mtu = 9202

mv643xx_eth:
- min_mtu = 64, max_mtu = 9500

mlxsw:
- min_mtu = 0, max_mtu = 65535
- Basically bypassing the core checks, and instead relying on dynamic
  checks in the respective switch drivers' ndo_change_mtu functions

ns83820:
- min_mtu = 0
- remove redundant ns83820_change_mtu, only checked for mtu > 1500

netxen:
- min_mtu = 0, max_mtu = 8000 (P2), max_mtu = 9600 (P3)

qlge:
- min_mtu = 1500, max_mtu = 9000
- driver only supports setting mtu to 1500 or 9000, so the core check only
  rules out < 1500 and > 9000, qlge_change_mtu still needs to check that
  the value is 1500 or 9000

qualcomm/emac:
- min_mtu = 46, max_mtu = 9194

xilinx_axienet:
- min_mtu = 64, max_mtu = 9000

Fixes: 61e84623ac ("net: centralize net_device min/max MTU checking")
CC: netdev@vger.kernel.org
CC: Jes Sorensen <jes@trained-monkey.org>
CC: Netanel Belgazal <netanel@annapurnalabs.com>
CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Santosh Raspatur <santosh@chelsio.com>
CC: Hariprasad S <hariprasad@chelsio.com>
CC: Sathya Perla <sathya.perla@broadcom.com>
CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
CC: Somnath Kotur <somnath.kotur@broadcom.com>
CC: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
CC: John Allen <jallen@linux.vnet.ibm.com>
CC: Guo-Fu Tseng <cooldavid@cooldavid.org>
CC: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Ido Schimmel <idosch@mellanox.com>
CC: Manish Chopra <manish.chopra@qlogic.com>
CC: Sony Chacko <sony.chacko@qlogic.com>
CC: Rajesh Borundia <rajesh.borundia@qlogic.com>
CC: Timur Tabi <timur@codeaurora.org>
CC: Anirudha Sarangi <anirudh@xilinx.com>
CC: John Linn <John.Linn@xilinx.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:51:08 -04:00
Eric Dumazet 286c72deab udp: must lock the socket in udp_disconnect()
Baozeng Ding reported KASAN traces showing uses after free in
udp_lib_get_port() and other related UDP functions.

A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.

I could write a reproducer with two threads doing :

static int sock_fd;
static void *thr1(void *arg)
{
	for (;;) {
		connect(sock_fd, (const struct sockaddr *)arg,
			sizeof(struct sockaddr_in));
	}
}

static void *thr2(void *arg)
{
	struct sockaddr_in unspec;

	for (;;) {
		memset(&unspec, 0, sizeof(unspec));
	        connect(sock_fd, (const struct sockaddr *)&unspec,
			sizeof(unspec));
        }
}

Problem is that udp_disconnect() could run without holding socket lock,
and this was causing list corruptions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:45:52 -04:00
Sabrina Dubroca fcd91dd449 net: add recursion limit to GRO
Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.  This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.

Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.

Fixes: CVE-2016-7039
Fixes: 9b174d88c2 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20 14:32:22 -04:00
Rich Felker 9995f4f184 clocksource: Add J-Core timer/clocksource driver
At the hardware level, the J-Core PIT is integrated with the interrupt
controller, but it is represented as its own device and has an
independent programming interface. It provides a 12-bit countdown
timer, which is not presently used, and a periodic timer. The interval
length for the latter is programmable via a 32-bit throttle register
whose units are determined by a bus-period register. The periodic
timer is used to implement both periodic and oneshot clock event
modes; in oneshot mode the interrupt handler simply disables the timer
as soon as it fires.

Despite its device tree node representing an interrupt for the PIT,
the actual irq generated is programmable, not hard-wired. The driver
is responsible for programming the PIT to generate the hardware irq
number that the DT assigns to it.

On SMP configurations, J-Core provides cpu-local instances of the PIT;
no broadcast timer is needed. This driver supports the creation of the
necessary per-cpu clock_event_device instances.

A nanosecond-resolution clocksource is provided using the J-Core "RTC"
registers, which give a 64-bit seconds count and 32-bit nanoseconds
that wrap every second. The driver converts these to a full-range
32-bit nanoseconds count.

Signed-off-by: Rich Felker <dalias@libc.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: devicetree@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Link: http://lkml.kernel.org/r/b591ff12cc5ebf63d1edc98da26046f95a233814.1476393790.git.dalias@libc.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-20 20:10:17 +02:00
Sergey Senozhatsky c6fe46a79e cpufreq: fix overflow in cpufreq_table_find_index_dl()
'best' is always less or equals to 'pos', so `best - pos' returns
a negative value which is then getting casted to `unsigned int'
and passed to __cpufreq_driver_target()->acpi_cpufreq_target()
for policy->freq_table selection. This results in

 BUG: unable to handle kernel paging request at ffff881019b469f8
 IP: [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
 PGD 267f067
 PUD 0

 Oops: 0000 [#1] PREEMPT SMP
 CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-20161017-dbg-dirty
 Workqueue: events dbs_work_handler
 task: ffff88041b808000 task.stack: ffff88041b810000
 RIP: 0010:[<ffffffffa00356c1>]  [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
 RSP: 0018:ffff88041b813c60  EFLAGS: 00010282
 RAX: ffff880419b46a00 RBX: ffff88041b848400 RCX: ffff880419b20f80
 RDX: 00000000001dff38 RSI: 00000000ffffffff RDI: ffff88041b848400
 RBP: ffff88041b813cb0 R08: 0000000000000006 R09: 0000000000000040
 R10: ffffffff8207f9e0 R11: ffffffff8173595b R12: 0000000000000000
 R13: ffff88041f1dff38 R14: 0000000000262900 R15: 0000000bfffffff4
 FS:  0000000000000000(0000) GS:ffff88041f000000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffff881019b469f8 CR3: 000000041a2d3000 CR4: 00000000001406e0
 Stack:
  ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663
  ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000
  0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc
 Call Trace:
  [<ffffffff813347f9>] ? cpufreq_freq_transition_begin+0xf1/0xfc
  [<ffffffff81334663>] ? get_cpu_idle_time+0x97/0xa6
  [<ffffffff813355dc>] __cpufreq_driver_target+0x3b6/0x44e
  [<ffffffff81336ca3>] cs_dbs_timer+0x11a/0x135
  [<ffffffff81336fda>] dbs_work_handler+0x39/0x62
  [<ffffffff81057823>] process_one_work+0x280/0x4a5
  [<ffffffff81058719>] worker_thread+0x24f/0x397
  [<ffffffff810584ca>] ? rescuer_thread+0x30b/0x30b
  [<ffffffff81418380>] ? nl80211_get_key+0x29/0x36a
  [<ffffffff8105d2b7>] kthread+0xfc/0x104
  [<ffffffff8107ceea>] ? put_lock_stats.isra.9+0xe/0x20
  [<ffffffff8105d1bb>] ? kthread_create_on_node+0x3f/0x3f
  [<ffffffff814b2092>] ret_from_fork+0x22/0x30
 Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41
 08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 <46> 8b 74 38 04 45
 3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00
 RIP  [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq]
  RSP <ffff88041b813c60>
 CR2: ffff881019b469f8
 ---[ end trace 16d9fc7a17897d37 ]---

[ rjw: In some cases this bug may also cause incorrect frequencies to
  be selected by cpufreq governors. ]

Fixes: 899bb6642f (cpufreq: skip invalid entries when searching the frequency)
Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2
Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 4.8+ <stable@vger.kernel.org> # 4.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-20 16:35:50 +02:00
Heiko Carstens c8061485a0 sched/core, x86: Make struct thread_info arch specific again
The following commit:

  c65eacbe29 ("sched/core: Allow putting thread_info into task_struct")

... made 'struct thread_info' a generic struct with only a
single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is
selected.

This change however seems to be quite x86 centric, since at least the
generic preemption code (asm-generic/preempt.h) assumes that struct
thread_info also has a preempt_count member, which apparently was not
true for x86.

We could add a bit more #ifdefs to solve this problem too, but it seems
to be much simpler to make struct thread_info arch specific
again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a
bit easier for architectures that have a couple of arch specific stuff
in their thread_info definition.

The arch specific stuff _could_ be moved to thread_struct. However
keeping them in thread_info makes it easier: accessing thread_info
members is simple, since it is at the beginning of the task_struct,
while the thread_struct is at the end. At least on s390 the offsets
needed to access members of the thread_struct (with task_struct as
base) are too large for various asm instructions.  This is not a
problem when keeping these members within thread_info.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: keescook@chromium.org
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 13:27:47 +02:00
Andy Lutomirski d17af5056c mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
Asking for a non-current task's stack can't be done without races
unless the task is frozen in kernel mode.  As far as I know,
vm_is_stack_for_task() never had a safe non-current use case.

The __unused annotation is because some KSTK_ESP implementations
ignore their parameter, which IMO is further justification for this
patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Jann Horn <jann@thejh.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux API <linux-api@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tycho Andersen <tycho.andersen@canonical.com>
Link: http://lkml.kernel.org/r/4c3f68f426e6c061ca98b4fc7ef85ffbb0a25b0c.1475257877.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-20 09:21:41 +02:00
Christoph Hellwig d33fd776f9 iomap: add IOMAP_REPORT
This allows the file system to tell a FIEMAP from a read operation, and thus
avoids the need to report flags that aren't actually used in the read path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-10-20 15:51:28 +11:00
Nicholas Bellinger 449a137846 target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE
This patch addresses a bug where EXTENDED_COPY across multiple LUNs
results in a CHECK_CONDITION when the source + destination are not
located on the same physical node.

ESX Host environments expect sense COPY_ABORTED w/ COPY TARGET DEVICE
NOT REACHABLE to be returned when this occurs, in order to signal
fallback to local copy method.

As described in section 6.3.3 of spc4r22:

  "If it is not possible to complete processing of a segment because the
   copy manager is unable to establish communications with a copy target
   device, because the copy target device does not respond to INQUIRY,
   or because the data returned in response to INQUIRY indicates
   an unsupported logical unit, then the EXTENDED COPY command shall be
   terminated with CHECK CONDITION status, with the sense key set to
   COPY ABORTED, and the additional sense code set to COPY TARGET DEVICE
   NOT REACHABLE."

Tested on v4.1.y with ESX v5.5u2+ with BlockCopy across multiple nodes.

Reported-by: Nixon Vincent <nixon.vincent@calsoftinc.com>
Tested-by: Nixon Vincent <nixon.vincent@calsoftinc.com>
Cc: Nixon Vincent <nixon.vincent@calsoftinc.com>
Tested-by: Dinesh Israni <ddi@datera.io>
Signed-off-by: Dinesh Israni <ddi@datera.io>
Cc: Dinesh Israni <ddi@datera.io>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2016-10-19 21:22:32 -07:00
Christoph Hellwig 329dd7681c nvme.h: add an enum for cns values
Ported over from nvme-cli.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19 11:36:22 -06:00
Christoph Hellwig 8d63687afd nvme.h: don't use uuid_be
This makes life easier for nvme-cli and we don't really need the uuid
type anyway to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19 11:36:22 -06:00
Christoph Hellwig a446c0840e nvme.h: resync with nvme-cli
Import a few updates to nvme.h from nvme-cli.  This mostly includes a few
new fields and error codes, but also a few renames that so far are only
used in user space.  Also one field is moved from an array of two le64
values to one of 16 u8 values so that we can more easily access it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19 11:36:22 -06:00
Gabriel Krisman Bertazi 8ef2074d28 nvme: Add tertiary number to NVME_VS
NVMe 1.2.1 specification adds a tertiary element to the version number.
This updates the macro and its callers to include the final number and
fixup a single place in nvmet where the version was generated manually.

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19 11:36:22 -06:00
Linus Torvalds 63ae602cea Merge branch 'gup_flag-cleanups'
Merge the gup_flags cleanups from Lorenzo Stoakes:
 "This patch series adjusts functions in the get_user_pages* family such
  that desired FOLL_* flags are passed as an argument rather than
  implied by flags.

  The purpose of this change is to make the use of FOLL_FORCE explicit
  so it is easier to grep for and clearer to callers that this flag is
  being used.  The use of FOLL_FORCE is an issue as it overrides missing
  VM_READ/VM_WRITE flags for the VMA whose pages we are reading
  from/writing to, which can result in surprising behaviour.

  The patch series came out of the discussion around commit 38e0885465
  ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
  which addressed a BUG_ON() being triggered when a page was faulted in
  with PROT_NONE set but having been overridden by FOLL_FORCE.
  do_numa_page() was run on the assumption the page _must_ be one marked
  for NUMA node migration as an actual PROT_NONE page would have been
  dealt with prior to this code path, however FOLL_FORCE introduced a
  situation where this assumption did not hold.

  See

      https://marc.info/?l=linux-mm&m=147585445805166

  for the patch proposal"

Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
FOLL_WRITE by me.

[ This branch was rebased recently to add a few more acked-by's and
  reviewed-by's ]

* gup_flag-cleanups:
  mm: replace access_process_vm() write parameter with gup_flags
  mm: replace access_remote_vm() write parameter with gup_flags
  mm: replace __access_remote_vm() write parameter with gup_flags
  mm: replace get_user_pages_remote() write/force parameters with gup_flags
  mm: replace get_user_pages() write/force parameters with gup_flags
  mm: replace get_vaddr_frames() write/force parameters with gup_flags
  mm: replace get_user_pages_locked() write/force parameters with gup_flags
  mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
  mm: remove write/force parameters from __get_user_pages_unlocked()
  mm: remove write/force parameters from __get_user_pages_locked()
  mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
2016-10-19 08:39:47 -07:00
Lorenzo Stoakes f307ab6dce mm: replace access_process_vm() write parameter with gup_flags
This removes the 'write' argument from access_process_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:31:25 -07:00
Lorenzo Stoakes 6347e8d5bc mm: replace access_remote_vm() write parameter with gup_flags
This removes the 'write' argument from access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:12:14 -07:00
Lorenzo Stoakes 9beae1ea89 mm: replace get_user_pages_remote() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:12:02 -07:00
Lorenzo Stoakes 768ae309a9 mm: replace get_user_pages() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:11:43 -07:00
Lorenzo Stoakes 7f23b3504a mm: replace get_vaddr_frames() write/force parameters with gup_flags
This removes the 'write' and 'force' from get_vaddr_frames() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:11:24 -07:00
Lorenzo Stoakes 3b913179c3 mm: replace get_user_pages_locked() write/force parameters with gup_flags
This removes the 'write' and 'force' use from get_user_pages_locked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-19 08:11:05 -07:00
Thomas Graf 57a09bf0a4 bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers
A BPF program is required to check the return register of a
map_elem_lookup() call before accessing memory. The verifier keeps
track of this by converting the type of the result register from
PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
jump ensures safety. This check is currently exclusively performed
for the result register 0.

In the event the compiler reorders instructions, BPF_MOV64_REG
instructions may be moved before the conditional jump which causes
them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
verifier objects when the register is accessed:

0: (b7) r1 = 10
1: (7b) *(u64 *)(r10 -8) = r1
2: (bf) r2 = r10
3: (07) r2 += -8
4: (18) r1 = 0x59c00000
6: (85) call 1
7: (bf) r4 = r0
8: (15) if r0 == 0x0 goto pc+1
 R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
9: (7a) *(u64 *)(r4 +0) = 0
R4 invalid mem access 'map_value_or_null'

This commit extends the verifier to keep track of all identical
PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
assigning them an ID and then marking them all when the conditional
jump is observed.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 11:09:28 -04:00
Jiri Pirko 85dda4e5b0 rtnetlink: Add rtnexthop offload flag to compare mask
The offload flag is a status flag and should not be used by
FIB semantics for comparison.

Fixes: 37ed949369 ("rtnetlink: add RTNH_F_EXTERNAL flag for fib offload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 11:07:57 -04:00
Ido Schimmel e4961b0768 net: core: Correctly iterate over lower adjacency list
Tamir reported the following trace when processing ARP requests received
via a vlan device on top of a VLAN-aware bridge:

 NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[...]
 CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
 Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
 task: ffff88017edfea40 task.stack: ffff88017ee10000
 RIP: 0010:[<ffffffff815dcc73>]  [<ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60
[...]
 Call Trace:
  <IRQ>
  [<ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
  [<ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
  [<ffffffff810ad07a>] notifier_call_chain+0x4a/0x70
  [<ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20
  [<ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20
  [<ffffffff815f2eb6>] neigh_update+0x306/0x740
  [<ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0
  [<ffffffff8165ea3f>] arp_process+0x66f/0x700
  [<ffffffff8170214c>] ? common_interrupt+0x8c/0x8c
  [<ffffffff8165ec29>] arp_rcv+0x139/0x1d0
  [<ffffffff816e505a>] ? vlan_do_receive+0xda/0x320
  [<ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0
  [<ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20
  [<ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge]
  [<ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge]
  [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
  [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
  [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
  [<ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge]
  [<ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge]
  [<ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge]
  [<ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge]
  [<ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge]
  [<ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0
  [<ffffffff81374815>] ? find_next_bit+0x15/0x20
  [<ffffffff8135e802>] ? cpumask_next_and+0x32/0x50
  [<ffffffff810c9968>] ? load_balance+0x178/0x9b0
  [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
  [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
  [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
  [<ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
  [<ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
  [<ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
  [<ffffffff81091986>] tasklet_action+0xf6/0x110
  [<ffffffff81704556>] __do_softirq+0xf6/0x280
  [<ffffffff8109213f>] irq_exit+0xdf/0xf0
  [<ffffffff817042b4>] do_IRQ+0x54/0xd0
  [<ffffffff8170214c>] common_interrupt+0x8c/0x8c

The problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.

Fix this by advancing the iterator and avoid the infinite loop.

Fixes: 7ce856aaaf ("mlxsw: spectrum: Add couple of lower device helper functions")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Tamir Winetroub <tamirw@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-19 10:38:08 -04:00
Ilan Peer 0711d63878 cfg80211: allow aborting in-progress connection atttempts
On a disconnect request from userspace, cfg80211 currently calls
called rdev_disconnect() only in case that 'current_bss' was set,
i.e. connection had been established.

Change this to allow the userspace call to succeed and call the
driver's disconnect() method also while the connection attempt is
in progress, to be able to abort attempts.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
[change commit subject/message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:15:38 +02:00
Emmanuel Grumbach f438ceb81d mac80211: uapsd_queues is in QoS IE order
The uapsd_queue field is in QoS IE order and not in
IEEE80211_AC_*'s order.
This means that mac80211 would get confused between
BK and BE which is certainly not such a big deal but
needs to be fixed.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-10-19 12:13:54 +02:00