Граф коммитов

782224 Коммитов

Автор SHA1 Сообщение Дата
Alexey Kodanev 93bbadd6e0 ipv6: don't get lwtstate twice in ip6_rt_copy_init()
Commit 80f1a0f4e0 ("net/ipv6: Put lwtstate when destroying fib6_info")
partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
with ip6_rt_copy_init(), and it should be done only once there.

rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
ip6_rt_copy_init(), so there is no need to get it again at the end.

With this patch, lwtstate also isn't copied from RTF_REJECT routes.

[1]:
unreferenced object 0xffff880b6aaa14e0 (size 64):
  comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
  hex dump (first 32 bytes):
    01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
    [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
    [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
    [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
    [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
    [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
    [<000000004c893c76>] netlink_unicast+0x417/0x5a0
    [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
    [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
    [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
    [<000000001e7426c8>] __sys_sendmsg+0xde/0x170
    [<00000000fe411443>] do_syscall_64+0x9f/0x4a0
    [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000006d21f353>] 0xffffffffffffffff

Fixes: 6edb3c96a5 ("net/ipv6: Defer initialization of dst to data path")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-01 17:42:12 -07:00
Samuel Neves e78e5a9145 x86/vdso: Fix lsl operand order
In the __getcpu function, lsl is using the wrong target and destination
registers. Luckily, the compiler tends to choose %eax for both variables,
so it has been working so far.

Fixes: a582c540ac ("x86/vdso: Use RDPID in preference to LSL when available")
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
2018-09-01 23:01:56 +02:00
Linus Torvalds 360bd62dc4 linux-watchdog 4.19-rc2 tag
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iEYEABECAAYFAluKTH8ACgkQ+iyteGJfRspLOgCgybxs7ktaE4RFal8KM7X8g5sB
 wVEAoMQ1IlQFkuxhJmHt8YemhRL3JqWC
 =cMKd
 -----END PGP SIGNATURE-----

Merge tag 'linux-watchdog-4.19-rc2' of git://www.linux-watchdog.org/linux-watchdog

Pull watchdog fixlet from Wim Van Sebroeck:
 "Document support for r8a774a1"

* tag 'linux-watchdog-4.19-rc2' of git://www.linux-watchdog.org/linux-watchdog:
  dt-bindings: watchdog: renesas-wdt: Document r8a774a1 support
2018-09-01 13:17:15 -07:00
Linus Torvalds b18ed664c2 Two small fixes, one for the x86 Stoney SoC to get a more accurate clk
frequency and the other to fix a bad allocation in the Nuvoton NPCM7XX
 driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAluJ96QRHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSVxShAAwmk9fLKBRHQAwmmdO5zNL5hM0VAFaSkA
 xaKTRWEv38FGCiILg9HaXqlzzBA0QqZyXvNFq1ucTFZ7z1yqK3LBI8XLB/IPIh+2
 dI6VInfi2IwQIQyNjplMaJu0V3R6qH7+xuwQNux17PMDR5Jb1VoiX4nwbqS/50Tv
 poqe4xNuedav2QTFm684HDN8PPZ/3wrior/xfzPEzOZjWlXDeKO2DUKoYALRzQcn
 9+IZatKNjwvC0ZFV60EDtIVOYmE15vFtxJfgTDcSZKL7BAEnmS/s8N0J6NvlQlKQ
 PfRO5WozupYhC7FtYjRb8nHsgp+jkz5A1VBnUfdukL2MPZhMev41eYQMdY+q3ryv
 U9RzzzFSGy5SE7jMEq64S14u8ql7zun9ahpUST+SZoL3tq+N8iWoRkY4zD0QaPQf
 K8jw6gYecltYsuwDtJgGIE0K6QEpuCeHG42vQE+wwti9rU0NjSB7LTKMbDJVkRCD
 7HiHqCbfwkj3fVGFDgxsU0d8vQs9Qd/OZRIEmPK3EtJY7+TwjsUrcrjR7+NyG3MQ
 VoEldFSVu8wpNua/CsfrnGbcjALna4mhHT4BsihwmyHJ3l8e2aDc7oCK5SdBKVYd
 bGq2ddueuGKUnCLSuMtQksYJ3EJ8QafVXGJnMAge1UHFV7BQI7XtrUzv0uslh6cL
 TUrILnKKsb0=
 =b74w
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Two small fixes, one for the x86 Stoney SoC to get a more accurate clk
  frequency and the other to fix a bad allocation in the Nuvoton NPCM7XX
  driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: x86: Set default parent to 48Mhz
  clk: npcm7xx: fix memory allocation
2018-09-01 13:03:32 -07:00
Kees Cook 9b25436662 random: make CPU trust a boot parameter
Instead of forcing a distro or other system builder to choose
at build time whether the CPU is trusted for CRNG seeding via
CONFIG_RANDOM_TRUST_CPU, provide a boot-time parameter for end users to
control the choice. The CONFIG will set the default state instead.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-09-01 12:51:54 -04:00
Christoph Hellwig c1d0af1a1d kernel/dma/direct: take DMA offset into account in dma_direct_supported
When a device has a DMA offset the dma capable result will change due
to the difference between the physical and DMA address.  Take that into
account.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-09-01 15:42:28 +02:00
LuckTony c7486104a5 x86/mce: Fix set_mce_nospec() to avoid #GP fault
The trick with flipping bit 63 to avoid loading the address of the 1:1
mapping of the poisoned page while the 1:1 map is updated used to work when
unmapping the page. But it falls down horribly when attempting to directly
set the page as uncacheable.

The problem is that when the cache mode is changed to uncachable, the pages
needs to be flushed from the cache first. But the decoy address is
non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a
#GP fault.

Add code to change_page_attr_set_clr() to fix the address before calling
flush.

Fixes: 284ce4011b ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
2018-09-01 14:59:19 +02:00
Thomas Falcon f611a5b4a5 ibmvnic: Include missing return code checks in reset function
Check the return codes of these functions and halt reset
in case of failure. The driver will remain in a dormant state
until the next reset event, when device initialization will be
re-attempted.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:15:50 -07:00
Sabrina Dubroca c81c7012e0 selftests: pmtu: detect correct binary to ping ipv6 addresses
Some systems don't have the ping6 binary anymore, and use ping for
everything. Detect the absence of ping6 and try to use ping instead.

Fixes: d1f1b9cbf3 ("selftests: net: Introduce first PMTU test")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:14:20 -07:00
Sabrina Dubroca 902b5417f2 selftests: pmtu: maximum MTU for vti4 is 2^16-1-20
Since commit 82612de1c9 ("ip_tunnel: restore binding to ifaces with a
large mtu"), the maximum MTU for vti4 is based on IP_MAX_MTU instead of
the mysterious constant 0xFFF8.  This makes this selftest fail.

Fixes: 82612de1c9 ("ip_tunnel: restore binding to ifaces with a large mtu")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:14:20 -07:00
Florian Westphal 63cc357f7b tcp: do not restart timewait timer on rst reception
RFC 1337 says:
 ''Ignore RST segments in TIME-WAIT state.
   If the 2 minute MSL is enforced, this fix avoids all three hazards.''

So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
expire rather than removing it instantly when a reset is received.

However, Linux will also re-start the TIME-WAIT timer.

This causes connect to fail when tying to re-use ports or very long
delays (until syn retry interval exceeds MSL).

packetdrill test case:
// Demonstrate bogus rearming of TIME-WAIT timer in rfc1337 mode.
`sysctl net.ipv4.tcp_rfc1337=1`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
0.000 bind(3, ..., ...) = 0
0.000 listen(3, 1) = 0

0.100 < S 0:0(0) win 29200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>
0.200 < . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4

// Receive first segment
0.310 < P. 1:1001(1000) ack 1 win 46

// Send one ACK
0.310 > . 1:1(0) ack 1001

// read 1000 byte
0.310 read(4, ..., 1000) = 1000

// Application writes 100 bytes
0.350 write(4, ..., 100) = 100
0.350 > P. 1:101(100) ack 1001

// ACK
0.500 < . 1001:1001(0) ack 101 win 257

// close the connection
0.600 close(4) = 0
0.600 > F. 101:101(0) ack 1001 win 244

// Our side is in FIN_WAIT_1 & waits for ack to fin
0.7 < . 1001:1001(0) ack 102 win 244

// Our side is in FIN_WAIT_2 with no outstanding data.
0.8 < F. 1001:1001(0) ack 102 win 244
0.8 > . 102:102(0) ack 1002 win 244

// Our side is now in TIME_WAIT state, send ack for fin.
0.9 < F. 1002:1002(0) ack 102 win 244
0.9 > . 102:102(0) ack 1002 win 244

// Peer reopens with in-window SYN:
1.000 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>

// Therefore, reply with ACK.
1.000 > . 102:102(0) ack 1002 win 244

// Peer sends RST for this ACK.  Normally this RST results
// in tw socket removal, but rfc1337=1 setting prevents this.
1.100 < R 1002:1002(0) win 244

// second syn. Due to rfc1337=1 expect another pure ACK.
31.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>
31.0 > . 102:102(0) ack 1002 win 244

// .. and another RST from peer.
31.1 < R 1002:1002(0) win 244
31.2 `echo no timer restart;ss -m -e -a -i -n -t -o state TIME-WAIT`

// third syn after one minute.  Time-Wait socket should have expired by now.
63.0 < S 1000:1000(0) win 9200 <mss 1460,nop,nop,sackOK,nop,wscale 7>

// so we expect a syn-ack & 3whs to proceed from here on.
63.0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7>

Without this patch, 'ss' shows restarts of tw timer and last packet is
thus just another pure ack, more than one minute later.

This restores the original code from commit 283fd6cf0be690a83
("Merge in ANK networking jumbo patch") in netdev-vger-cvs.git .

For some reason the else branch was removed/lost in 1f28b683339f7
("Merge in TCP/UDP optimizations and [..]") and timer restart became
unconditional.

Reported-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:10:35 -07:00
Pavel Machek b0e0b0abbd net/rds: RDS is not Radio Data System
Getting prompt "The RDS Protocol" (RDS) is not too helpful, and it is
easily confused with Radio Data System (which we may want to support
in kernel, too).

Signed-off-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:09:53 -07:00
Dexuan Cui e04e7a7bbd hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()
This patch fixes the race between netvsc_probe() and
rndis_set_subchannel(), which can cause a deadlock.

These are the related 3 paths which show the deadlock:

path #1:
    Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
    Call Trace:
     schedule
     schedule_preempt_disabled
     __mutex_lock
     __device_attach
     bus_probe_device
     device_add
     vmbus_device_register
     vmbus_onoffer
     vmbus_onmessage_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path #2:
    schedule
     schedule_preempt_disabled
     __mutex_lock
     netvsc_probe
     vmbus_probe
     really_probe
     __driver_attach
     bus_for_each_dev
     driver_attach_async
     async_run_entry_fn
     process_one_work
     worker_thread
     kthread
     ret_from_fork

path #3:
    Workqueue: events netvsc_subchan_work [hv_netvsc]
    Call Trace:
     schedule
     rndis_set_subchannel
     netvsc_subchan_work
     process_one_work
     worker_thread
     kthread
     ret_from_fork

Before path #1 finishes, path #2 can start to run, because just before
the "bus_probe_device(dev);" in device_add() in path #1, there is a line
"object_uevent(&dev->kobj, KOBJ_ADD);", so systemd-udevd can
immediately try to load hv_netvsc and hence path #2 can start to run.

Next, path #2 offloads the subchannal's initialization to a workqueue,
i.e. path #3, so we can end up in a deadlock situation like this:

Path #2 gets the device lock, and is trying to get the rtnl lock;
Path #3 gets the rtnl lock and is waiting for all the subchannel messages
to be processed;
Path #1 is trying to get the device lock, but since #2 is not releasing
the device lock, path #1 has to sleep; since the VMBus messages are
processed one by one, this means the sub-channel messages can't be
procedded, so #3 has to sleep with the rtnl lock held, and finally #2
has to sleep... Now all the 3 paths are sleeping and we hit the deadlock.

With the patch, we can make sure #2 gets both the device lock and the
rtnl lock together, gets its job done, and releases the locks, so #1
and #3 will not be blocked for ever.

Fixes: 8195b1396e ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:07:54 -07:00
Jakub Kicinski 9ad716b95f nfp: wait for posted reconfigs when disabling the device
To avoid leaking a running timer we need to wait for the
posted reconfigs after netdev is unregistered.  In common
case the process of deinitializing the device will perform
synchronous reconfigs which wait for posted requests, but
especially with VXLAN ports being actively added and removed
there can be a race condition leaving a timer running after
adapter structure is freed leading to a crash.

Add an explicit flush after deregistering and for a good
measure a warning to check if timer is running just before
structures are freed.

Fixes: 3d780b926a ("nfp: add async reconfiguration mechanism")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:01:30 -07:00
Eric Dumazet 3a7ad0634f Revert "packet: switch kvzalloc to allocate memory"
This reverts commit 71e4128620.

mmap()/munmap() can not be backed by kmalloced pages :

We fault in :

    VM_BUG_ON_PAGE(PageSlab(page), page);

    unmap_single_vma+0x8a/0x110
    unmap_vmas+0x4b/0x90
    unmap_region+0xc9/0x140
    do_munmap+0x274/0x360
    vm_munmap+0x81/0xc0
    SyS_munmap+0x2b/0x40
    do_syscall_64+0x13e/0x1c0
    entry_SYSCALL_64_after_hwframe+0x42/0xb7

Fixes: 71e4128620 ("packet: switch kvzalloc to allocate memory")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: John Sperbeck <jsperbeck@google.com>
Bisected-by: John Sperbeck <jsperbeck@google.com>
Cc: Zhang Yu <zhangyu31@baidu.com>
Cc: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:00:28 -07:00
Guoqing Jiang 41a9504112 md-cluster: release RESYNC lock after the last resync message
All the RESYNC messages are sent with resync lock held, the only
exception is resync_finish which releases resync_lockres before
send the last resync message, this should be changed as well.
Otherwise, we can see deadlock issue as follows:

clustermd2-gqjiang2:~ # cat /proc/mdstat
Personalities : [raid10] [raid1]
md0 : active raid1 sdg[0] sdf[1]
      134144 blocks super 1.2 [2/2] [UU]
      [===================>.]  resync = 99.6% (134144/134144) finish=0.0min speed=26K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>
clustermd2-gqjiang2:~ # ps aux|grep md|grep D
root     20497  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_raid1]
clustermd2-gqjiang2:~ # cat /proc/20497/stack
[<ffffffffc05ff51e>] dlm_lock_sync+0x8e/0xc0 [md_cluster]
[<ffffffffc05ff7e8>] __sendmsg+0x98/0x130 [md_cluster]
[<ffffffffc05ff900>] sendmsg+0x20/0x30 [md_cluster]
[<ffffffffc05ffc35>] resync_info_update+0xb5/0xc0 [md_cluster]
[<ffffffffc0593e84>] md_reap_sync_thread+0x134/0x170 [md_mod]
[<ffffffffc059514c>] md_check_recovery+0x28c/0x510 [md_mod]
[<ffffffffc060c882>] raid1d+0x42/0x800 [raid1]
[<ffffffffc058ab61>] md_thread+0x121/0x150 [md_mod]
[<ffffffff9a0a5b3f>] kthread+0xff/0x140
[<ffffffff9a800235>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

clustermd-gqjiang1:~ # ps aux|grep md|grep D
root     20531  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_raid1]
root     20537  0.0  0.0      0     0 ?        D    16:00   0:00 [md0_cluster_rec]
root     20676  0.0  0.0      0     0 ?        D    16:01   0:00 [md0_resync]
clustermd-gqjiang1:~ # cat /proc/mdstat
Personalities : [raid10] [raid1]
md0 : active raid1 sdf[1] sdg[0]
      134144 blocks super 1.2 [2/2] [UU]
      [===================>.]  resync = 97.3% (131072/134144) finish=8076.8min speed=0K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>
clustermd-gqjiang1:~ # cat /proc/20531/stack
[<ffffffffc080974d>] metadata_update_start+0xcd/0xd0 [md_cluster]
[<ffffffffc079c897>] md_update_sb.part.61+0x97/0x820 [md_mod]
[<ffffffffc079f15b>] md_check_recovery+0x29b/0x510 [md_mod]
[<ffffffffc0816882>] raid1d+0x42/0x800 [raid1]
[<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
[<ffffffff9e0a5b3f>] kthread+0xff/0x140
[<ffffffff9e800235>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
clustermd-gqjiang1:~ # cat /proc/20537/stack
[<ffffffffc0813222>] freeze_array+0xf2/0x140 [raid1]
[<ffffffffc080a56e>] recv_daemon+0x41e/0x580 [md_cluster]
[<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
[<ffffffff9e0a5b3f>] kthread+0xff/0x140
[<ffffffff9e800235>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
clustermd-gqjiang1:~ # cat /proc/20676/stack
[<ffffffffc080951e>] dlm_lock_sync+0x8e/0xc0 [md_cluster]
[<ffffffffc080957f>] lock_token+0x2f/0xa0 [md_cluster]
[<ffffffffc0809622>] lock_comm+0x32/0x90 [md_cluster]
[<ffffffffc08098f5>] sendmsg+0x15/0x30 [md_cluster]
[<ffffffffc0809c0a>] resync_info_update+0x8a/0xc0 [md_cluster]
[<ffffffffc08130ba>] raid1_sync_request+0xa9a/0xb10 [raid1]
[<ffffffffc079b8ea>] md_do_sync+0xbaa/0xf90 [md_mod]
[<ffffffffc0794b61>] md_thread+0x121/0x150 [md_mod]
[<ffffffff9e0a5b3f>] kthread+0xff/0x140
[<ffffffff9e800235>] ret_from_fork+0x35/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2018-08-31 17:38:10 -07:00
Xiao Ni 1d0ffd2642 RAID10 BUG_ON in raise_barrier when force is true and conf->barrier is 0
In raid10 reshape_request it gets max_sectors in read_balance. If the underlayer disks
have bad blocks, the max_sectors is less than last. It will call goto read_more many
times. It calls raise_barrier(conf, sectors_done != 0) every time. In this condition
sectors_done is not 0. So the value passed to the argument force of raise_barrier is
true.

In raise_barrier it checks conf->barrier when force is true. If force is true and
conf->barrier is 0, it panic. In this case reshape_request submits bio to under layer
disks. And in the callback function of the bio it calls lower_barrier. If the bio
finishes before calling raise_barrier again, it can trigger the BUG_ON.

Add one pair of raise_barrier/lower_barrier to fix this bug.

Signed-off-by: Xiao Ni <xni@redhat.com>
Suggested-by: Neil Brown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2018-08-31 17:38:10 -07:00
Shaohua Li e254de6bcf md/raid5-cache: disable reshape completely
We don't support reshape yet if an array supports log device. Previously we
determine the fact by checking ->log. However, ->log could be NULL after a log
device is removed, but the array is still marked to support log device. Don't
allow reshape in this case too. User can disable log device support by setting
'consistency_policy' to 'resync' then do reshape.

Reported-by: Xiao Ni <xni@redhat.com>
Tested-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
2018-08-31 17:38:09 -07:00
Dennis Zhou (Facebook) 3111885015 blkcg: use tryget logic when associating a blkg with a bio
There is a very small change a bio gets caught up in a really
unfortunate race between a task migration, cgroup exiting, and itself
trying to associate with a blkg. This is due to css offlining being
performed after the css->refcnt is killed which triggers removal of
blkgs that reach their blkg->refcnt of 0.

To avoid this, association with a blkg should use tryget and fallback to
using the root_blkg.

Fixes: 08e18eab0c ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-31 14:48:58 -06:00
Dennis Zhou (Facebook) 59b57717ff blkcg: delay blkg destruction until after writeback has finished
Currently, blkcg destruction relies on a sequence of events:
  1. Destruction starts. blkcg_css_offline() is called and blkgs
     release their reference to the blkcg. This immediately destroys
     the cgwbs (writeback).
  2. With blkgs giving up their reference, the blkcg ref count should
     become zero and eventually call blkcg_css_free() which finally
     frees the blkcg.

Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
on the completion of all writeback associated with the blkcg. A count of
the number of cgwbs is maintained and once that goes to zero, blkg
destruction can follow. This should prevent premature blkg destruction
related to writeback.

The new process for blkcg cleanup is as follows:
  1. Destruction starts. blkcg_css_offline() is called which offlines
     writeback. Blkg destruction is delayed on the cgwb_refcnt count to
     avoid punting potentially large amounts of outstanding writeback
     to root while maintaining any ongoing policies. Here, the base
     cgwb_refcnt is put back.
  2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
     and handles destruction of blkgs. This is where the css reference
     held by each blkg is released.
  3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
     This finally frees the blkg.

It seems in the past blk-throttle didn't do the most understandable
things with taking data from a blkg while associating with current. So,
the simplification and unification of what blk-throttle is doing caused
this.

Fixes: 08e18eab0c ("block: add bi_blkg to the bio for cgroups")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-31 14:48:56 -06:00
Dennis Zhou (Facebook) 6b06546206 Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"
This reverts commit 4c6994806f.

Destroying blkgs is tricky because of the nature of the relationship. A
blkg should go away when either a blkcg or a request_queue goes away.
However, blkg's pin the blkcg to ensure they remain valid. To break this
cycle, when a blkcg is offlined, blkgs put back their css ref. This
eventually lets css_free() get called which frees the blkcg.

The above commit (4c6994806f) breaks this order of events by trying to
destroy blkgs in css_free(). As the blkgs still hold references to the
blkcg, css_free() is never called.

The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
addressed in the following patch by delaying destruction of a blkg until
all writeback associated with the blkcg has been finished.

Fixes: 4c6994806f ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-08-31 14:48:54 -06:00
Eugeniy Paltsev 678c8110d2 ARC: dma [IOC]: mark DMA devices connected as dma-coherent
Mark DMA devices on AXS103 and HSDK boards connected through IOC
port as dma-coherent.

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-08-31 12:47:26 -07:00
Will Deacon 3fcbb8260a ARC: atomics: unbork atomic_fetch_##op()
In 4.19-rc1, Eugeniy reported weird boot and IO errors on ARC HSDK

| INFO: task syslogd:77 blocked for more than 10 seconds.
|       Not tainted 4.19.0-rc1-00007-gf213acea4e88 #40
| "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
| message.
| syslogd         D    0    77     76 0x00000000
|
| Stack Trace:
|  __switch_to+0x0/0xac
|  __schedule+0x1b2/0x730
|  io_schedule+0x5c/0xc0
|  __lock_page+0x98/0xdc
|  find_lock_entry+0x38/0x100
|  shmem_getpage_gfp.isra.3+0x82/0xbfc
|  shmem_fault+0x46/0x138
|  handle_mm_fault+0x5bc/0x924
|  do_page_fault+0x100/0x2b8
|  ret_from_exception+0x0/0x8

He bisected to 84c6591103 ("locking/atomics,
asm-generic/bitops/lock.h: Rewrite using atomic_fetch_*()")

This commit however only unmasked the real issue introduced by commit
4aef66c8ae ("locking/atomic, arch/arc: Fix build") which missed the
retry-if-scond-failed branch in atomic_fetch_##op() macros.

The bisected commit started using atomic_fetch_##op() macros for building
the rest of atomics.

Fixes: 4aef66c8ae ("locking/atomic, arch/arc: Fix build")
Reported-by: Eugeniy Paltsev <paltsev@synopsys.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: wrote changelog]
2018-08-31 10:14:51 -07:00
Paul Burton 0f02cfbc3d
MIPS: VDSO: Match data page cache colouring when D$ aliases
When a system suffers from dcache aliasing a user program may observe
stale VDSO data from an aliased cache line. Notably this can break the
expectation that clock_gettime(CLOCK_MONOTONIC, ...) is, as its name
suggests, monotonic.

In order to ensure that users observe updates to the VDSO data page as
intended, align the user mappings of the VDSO data page such that their
cache colouring matches that of the virtual address range which the
kernel will use to update the data page - typically its unmapped address
within kseg0.

This ensures that we don't introduce aliasing cache lines for the VDSO
data page, and therefore that userland will observe updates without
requiring cache invalidation.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Reported-by: Hauke Mehrtens <hauke@hauke-m.de>
Reported-by: Rene Nielsen <rene.nielsen@microsemi.com>
Reported-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Patchwork: https://patchwork.linux-mips.org/patch/20344/
Tested-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Tested-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org # v4.4+
2018-08-31 10:07:21 -07:00
Lukas Bulwahn bc8d2e20a3 kconfig: remove a spurious self-assignment
The self assignment was probably introduced by an automated code
refactoring in
commit 694c49a7c0 ("kconfig: drop localization support").

The issue was identified by a self-assign warning when running
make menuconfig with clang.

Fixes: 694c49a7c0 ("kconfig: drop localization support")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-09-01 01:21:42 +09:00
Genki Sky 6147b1cf19 scripts/setlocalversion: git: Make -dirty check more robust
$(git diff-index) relies on the index being refreshed. This refreshing
of the index used to happen, but was removed in cdf2bc632e
("scripts/setlocalversion on write-protected source tree", 2013-06-14)
due to issues with a read-only filesystem.

If the index is not refreshed, one runs into problems. E.g. as
described in [0], git stores the uid in its index, so even if just the
uid has changed (or git is tricked into thinking so), then we will
think the tree is dirty. So as in [1], if you package linux-git with a
system that uses fakeroot(1), you get a "-dirty" version. Unless you
manually $(git update-index --refresh) themselves.

The simplest solution seems to be $(git status --porcelain), with an
additional flag saying "ignore untracked files". It seems clearer
about what it does, and avoids issues regarding cached indexes and
writable filesystems, but still has stable output for scripting.

[0]: https://public-inbox.org/git/0190ae30-b6c8-2a8b-b1fb-fd9d84e6dfdf@oracle.com/
[1]: https://bbs.archlinux.org/viewtopic.php?id=236702

Signed-off-by: Genki Sky <sky@genki.is>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-09-01 01:21:42 +09:00
Linus Torvalds 420f51f4ab A handful of arm64 fixes
- Fix typos in SVE documentation
 - Fix type-checking and implicit truncation for SMCCC calls
 - Force CONFIG_HOLES_IN_ZONE=y so that SLAB doesn't fall over NOMAP regions
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJbiSyOAAoJELescNyEwWM0Jc4IAMV6UIPNnARqKESMMGI8CPW3
 +b75RKJvOz06wIsd/ko+at+4SU4om/qr5k1Yx6F2s9t1y7+1RokkP1ZXOivsOegp
 KBtbDEzvwYWuePdMtZmXMMLOIOVzLC2UlqVGqdEBLNxYqfdS6H7IwgPlaXpu1GIu
 n4F0d6oEKY3hTmFrmH9FN68ZrTpx8S2MZYIApokhBrNIaSyr7x8bUj8/v9OoaJsO
 TwlG0y7W252alGni97WnX6gw0eM0HQ6yg8h+zNVmwksjUY+ZCS3w4ib3H8sS2FBH
 vzr3XkgEPeWR1oSYO7P7Vv7erMQUCnS+q7UjQ09TVvHTcXGb3A+iqP+w3rXMbyo=
 =gy5J
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "A few arm64 fixes came in this week, specifically fixing some nasty
  truncation of return values from firmware calls and resolving a
  VM_BUG_ON due to accessing uninitialised struct pages corresponding to
  NOMAP pages.

  Summary:

   - Fix typos in SVE documentation

   - Fix type-checking and implicit truncation for SMCCC calls

   - Force CONFIG_HOLES_IN_ZONE=y so that SLAB doesn't fall over NOMAP
     regions"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: mm: always enable CONFIG_HOLES_IN_ZONE
  arm/arm64: smccc-1.1: Handle function result as parameters
  arm/arm64: smccc-1.1: Make return values unsigned long
  Documentation/arm64/sve: Couple of improvements and typos
2018-08-31 09:20:30 -07:00
Joerg Roedel eeb89e2bb1 x86/efi: Load fixmap GDT in efi_call_phys_epilog()
When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap
for the simple reason that this address is also mapped for user-space.

The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT
to call EFI runtime services and switch back to the kernel GDT when they
return. But the switch-back uses the writable GDT, not the fixmap GDT.

When that happened and and the CPU returns to user-space it switches to the
user %cr3 and tries to restore user segment registers. This fails because
the writable GDT is not mapped in the user page-table, and without a GDT
the fault handlers also can't be launched. The result is a triple fault and
reboot of the machine.

Fix that by restoring the GDT back to the fixmap GDT which is also mapped
in the user page-table.

Fixes: 7757d607c6 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: hpa@zytor.com
Cc: linux-efi@vger.kernel.org
Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
2018-08-31 17:45:54 +02:00
Linus Torvalds 4290d5b9ca xen: fixes for 4.19-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW4lM6AAKCRCAXGG7T9hj
 vs8AAQDysFccg97UdopW3B7yklIaRqkfEIAsxe65f191MXsH2AEAp5SKxZqRPqBP
 a9WHDj8ShB3BhZ/IxpdO9Y59U3Jo4wA=
 =Gt4c
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:

 - minor cleanup avoiding a warning when building with new gcc

 - a patch to add a new sysfs node for Xen frontend/backend drivers to
   make it easier to obtain the state of a pv device

 - two fixes for 32-bit pv-guests to avoid intermediate L1TF vulnerable
   PTEs

* tag 'for-linus-4.19b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen: remove redundant variable save_pud
  xen: export device state to sysfs
  x86/pae: use 64 bit atomic xchg function in native_ptep_get_and_clear
  x86/xen: don't write ptes directly in 32-bit PV guests
2018-08-31 08:45:16 -07:00
Linus Torvalds 01f6543a0d m68k fixes for 4.19
- Fix wrong date and time on PMU-based Macs.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQQ9qaHoIs/1I4cXmEiKwlD9ZEnxcAUCW4lCFhUcZ2VlcnRAbGlu
 dXgtbTY4ay5vcmcACgkQisJQ/WRJ8XButAD/Z+zyKOaFZQ28cYfAmhUMgi4LvICF
 THHND3O321KT5WEA/1E1/SxqvH5juQCoaF7GdJGVIQ6E0w6WYgt8LC1WFWML
 =J7Db
 -----END PGP SIGNATURE-----

Merge tag 'm68k-for-v4.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k fix from Geert Uytterhoeven:
 "Just a single fix for a bug introduced during the merge window: fix
  wrong date and time on PMU-based Macs"

* tag 'm68k-for-v4.19-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k/mac: Use correct PMU response format
2018-08-31 08:42:46 -07:00
Linus Torvalds 754cf4b243 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:

 - regression fixes for i801 and designware

 - better API and leak fix for releasing DMA safe buffers

 - better greppable strings for the bitbang algorithm

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: sh_mobile: fix leak when using DMA bounce buffer
  i2c: sh_mobile: define start_ch() void as it only returns 0 anyhow
  i2c: refactor function to release a DMA safe buffer
  i2c: algos: bit: make the error messages grepable
  i2c: designware: Re-init controllers with pm_disabled set on resume
  i2c: i801: Allow ACPI AML access I/O ports not reserved for SMBus
2018-08-31 08:38:53 -07:00
Andy Lutomirski 4012e77a90 x86/nmi: Fix NMI uaccess race against CR3 switching
A NMI can hit in the middle of context switching or in the middle of
switch_mm_irqs_off().  In either case, CR3 might not match current->mm,
which could cause copy_from_user_nmi() and friends to read the wrong
memory.

Fix it by adding a new nmi_uaccess_okay() helper and checking it in
copy_from_user_nmi() and in __copy_from_user_nmi()'s callers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Rik van Riel <riel@surriel.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
2018-08-31 17:08:22 +02:00
Ben Hutchings 829fe4aa9a x86: Allow generating user-space headers without a compiler
When bootstrapping an architecture, it's usual to generate the kernel's
user-space headers (make headers_install) before building a compiler.  Move
the compiler check (for asm goto support) to the archprepare target so that
it is only done when building code for the target.

Fixes: e501ce957a ("x86: Force asm-goto")
Reported-by: Helmut Grohne <helmutg@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
2018-08-31 17:08:22 +02:00
Jann Horn 342db04ae7 x86/dumpstack: Don't dump kernel memory based on usermode RIP
show_opcodes() is used both for dumping kernel instructions and for dumping
user instructions. If userspace causes #PF by jumping to a kernel address,
show_opcodes() can be reached with regs->ip controlled by the user,
pointing to kernel code. Make sure that userspace can't trick us into
dumping kernel memory into dmesg.

Fixes: 7cccf0725c ("x86/dumpstack: Add a show_ip() function")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: security@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
2018-08-31 17:08:22 +02:00
Rob Herring 0413bedabc of: Add device_type access helper functions
In preparation to remove direct access to device_node.type, add
of_node_is_type() and of_node_get_device_type() helpers to check and
retrieve the device type.

Cc: Frank Rowand <frowand.list@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
2018-08-31 08:30:42 -04:00
Mukesh Ojha 6fb86d9720 cpu/hotplug: Remove skip_onerr field from cpuhp_step structure
When notifiers were there, `skip_onerr` was used to avoid calling
particular step startup/teardown callbacks in the CPU up/down rollback
path, which made the hotplug asymmetric.

As notifiers are gone now after the full state machine conversion, the
`skip_onerr` field is no longer required.

Remove it from the structure and its usage.

Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1535439294-31426-1-git-send-email-mojha@codeaurora.org
2018-08-31 14:13:03 +02:00
James Morse f52bb98f5a arm64: mm: always enable CONFIG_HOLES_IN_ZONE
Commit 6d526ee26c ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA")
only enabled HOLES_IN_ZONE for NUMA systems because the NUMA code was
choking on the missing zone for nomap pages. This problem doesn't just
apply to NUMA systems.

If the architecture doesn't set HAVE_ARCH_PFN_VALID, pfn_valid() will
return true if the pfn is part of a valid sparsemem section.

When working with multiple pages, the mm code uses pfn_valid_within()
to test each page it uses within the sparsemem section is valid. On
most systems memory comes in MAX_ORDER_NR_PAGES chunks which all
have valid/initialised struct pages. In this case pfn_valid_within()
is optimised out.

Systems where this isn't true (e.g. due to nomap) should set
HOLES_IN_ZONE and provide HAVE_ARCH_PFN_VALID so that mm tests each
page as it works with it.

Currently non-NUMA arm64 systems can't enable HOLES_IN_ZONE, leading to
a VM_BUG_ON():

| page:fffffdff802e1780 is uninitialized and poisoned
| raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
| raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
| page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
| ------------[ cut here ]------------
| kernel BUG at include/linux/mm.h:978!
| Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[...]
| CPU: 1 PID: 25236 Comm: dd Not tainted 4.18.0 #7
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 40000085 (nZcv daIf -PAN -UAO)
| pc : move_freepages_block+0x144/0x248
| lr : move_freepages_block+0x144/0x248
| sp : fffffe0071177680
[...]
| Process dd (pid: 25236, stack limit = 0x0000000094cc07fb)
| Call trace:
|  move_freepages_block+0x144/0x248
|  steal_suitable_fallback+0x100/0x16c
|  get_page_from_freelist+0x440/0xb20
|  __alloc_pages_nodemask+0xe8/0x838
|  new_slab+0xd4/0x418
|  ___slab_alloc.constprop.27+0x380/0x4a8
|  __slab_alloc.isra.21.constprop.26+0x24/0x34
|  kmem_cache_alloc+0xa8/0x180
|  alloc_buffer_head+0x1c/0x90
|  alloc_page_buffers+0x68/0xb0
|  create_empty_buffers+0x20/0x1ec
|  create_page_buffers+0xb0/0xf0
|  __block_write_begin_int+0xc4/0x564
|  __block_write_begin+0x10/0x18
|  block_write_begin+0x48/0xd0
|  blkdev_write_begin+0x28/0x30
|  generic_perform_write+0x98/0x16c
|  __generic_file_write_iter+0x138/0x168
|  blkdev_write_iter+0x80/0xf0
|  __vfs_write+0xe4/0x10c
|  vfs_write+0xb4/0x168
|  ksys_write+0x44/0x88
|  sys_write+0xc/0x14
|  el0_svc_naked+0x30/0x34
| Code: aa1303e0 90001a01 91296421 94008902 (d4210000)
| ---[ end trace 1601ba47f6e883fe ]---

Remove the NUMA dependency.

Link: https://www.spinics.net/lists/arm-kernel/msg671851.html
Cc: <stable@vger.kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-08-31 11:06:45 +01:00
Vincent Whitchurch d49b48f088 gpio: Fix crash due to registration race
gpiochip_add_data_with_key() adds the gpiochip to the gpio_devices list
before of_gpiochip_add() is called, but it's only the latter which sets
the ->of_xlate function pointer.  gpiochip_find() can be called by
someone else between these two actions, and it can find the chip and
call of_gpiochip_match_node_and_xlate() which leads to the following
crash due to a NULL ->of_xlate().

 Unhandled prefetch abort: page domain fault (0x01b) at 0x00000000
 Modules linked in: leds_gpio(+) gpio_generic(+)
 CPU: 0 PID: 830 Comm: insmod Not tainted 4.18.0+ #43
 Hardware name: ARM-Versatile Express
 PC is at   (null)
 LR is at of_gpiochip_match_node_and_xlate+0x2c/0x38
 Process insmod (pid: 830, stack limit = 0x(ptrval))
  (of_gpiochip_match_node_and_xlate) from  (gpiochip_find+0x48/0x84)
  (gpiochip_find) from  (of_get_named_gpiod_flags+0xa8/0x238)
  (of_get_named_gpiod_flags) from  (gpiod_get_from_of_node+0x2c/0xc8)
  (gpiod_get_from_of_node) from  (devm_fwnode_get_index_gpiod_from_child+0xb8/0x144)
  (devm_fwnode_get_index_gpiod_from_child) from  (gpio_led_probe+0x208/0x3c4 [leds_gpio])
  (gpio_led_probe [leds_gpio]) from  (platform_drv_probe+0x48/0x9c)
  (platform_drv_probe) from  (really_probe+0x1d0/0x3d4)
  (really_probe) from  (driver_probe_device+0x78/0x1c0)
  (driver_probe_device) from  (__driver_attach+0x120/0x13c)
  (__driver_attach) from  (bus_for_each_dev+0x68/0xb4)
  (bus_for_each_dev) from  (bus_add_driver+0x1a8/0x268)
  (bus_add_driver) from  (driver_register+0x78/0x10c)
  (driver_register) from  (do_one_initcall+0x54/0x1fc)
  (do_one_initcall) from  (do_init_module+0x64/0x1f4)
  (do_init_module) from  (load_module+0x2198/0x26ac)
  (load_module) from  (sys_finit_module+0xe0/0x110)
  (sys_finit_module) from  (ret_fast_syscall+0x0/0x54)

One way to fix this would be to rework the hairy registration sequence
in gpiochip_add_data_with_key(), but since I'd probably introduce a
couple of new bugs if I attempted that, simply add a check for a
non-NULL of_xlate function pointer in
of_gpiochip_match_node_and_xlate().  This works since the driver looking
for the gpio will simply fail to find the gpio and defer its probe and
be reprobed when the driver which is registering the gpiochip has fully
completed its probe.

Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-08-31 11:30:45 +02:00
Finn Thain 0986b16ab4 m68k/mac: Use correct PMU response format
Now that the 68k Mac port has adopted the via-pmu driver, it must decode
the PMU response accordingly otherwise the date and time will be wrong.

Fixes: ebd722275f ("macintosh/via-pmu: Replace via-pmu68k driver with via-pmu driver")
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
2018-08-31 09:33:45 +02:00
Linus Torvalds 4658aff6ee drm fixes for mediatek, amdgpu and i915.
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbiLmbAAoJEAx081l5xIa+bDwP/0g7rknYVIMDFFw4NQBcyLmT
 yetrdF+/fnIBYRJqDz8kIDMn26ftNvO4qeWSaikN2V5Xt6E8JyVKsMIFKmMP5Amx
 VqZA286vQTvcXAGcMecDQQ4P7LOIC/i2gA3NSp7XGTApbhiYHQM4Xe1O9BBrNCMa
 amA+ZoGKR/eJ3ihs8+VCWeGS+wh+O05LRj41fJXmOMYRL97IsxLX0VTNK3srlrbC
 F8nrfJxwGRqhgB634HXAAnbC8/CsxJZvSIVs5qg2qTVVi13VP+1sCyzvUt5tV9QN
 q2afF3vPtbN4Qvb1WfwZNjsPCDCvWGXBXSTgpvumHsoHaACJUHaf3JOT3HLDNjJ/
 YV9PTZOYLa8R2zQvHsGFjy0eWW5hWJ+FNhOl/bxL3T0+4aoEMCUr+Xf8Bknzicar
 jxpRVU8RU6G2wWBfxgKXSvcPx+1NGaotMNZhFUmiFks2oX9f0luUtYkZ3V9WHN4f
 ke87bHMVqjRHUFlVx8Qrnl01L9b6wlaowuAi9pPOqzOhr9l38ompO9P/MAu7RXUN
 YD5bKCrPVMl6nCjE7viKN13+tBLAbn1khpnRTpMu9ygU+2d4Uxo/+HljN/cXLrYG
 rKtNRq0dI1hBoKxBXtaBsJs3TvdqI8UqwSn2KrDKVZwvLwLasYsXmJVa9UGvBDJB
 8TPc6aupyN/ilDBCBGqJ
 =0BQW
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-2018-08-31' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Regular fixes pull:

   - Mediatek has a bunch of fixes to their RDMA and Overlay engines.

   - i915 has some Cannonlake/Geminilake watermark workarounds, LSPCON
     fix, HDCP free fix, audio fix and a ppgtt reference counting fix.

   - amdgpu has some SRIOV, Kasan, memory leaks and other misc fixes"

* tag 'drm-fixes-2018-08-31' of git://anongit.freedesktop.org/drm/drm: (35 commits)
  drm/i915/audio: Hook up component bindings even if displays are disabled
  drm/i915: Increase LSPCON timeout
  drm/i915: Stop holding a ref to the ppgtt from each vma
  drm/i915: Free write_buf that we allocated with kzalloc.
  drm/i915: Fix glk/cnl display w/a #1175
  drm/amdgpu: Need to set moved to true when evict bo
  drm/amdgpu: Remove duplicated power source update
  drm/amd/display: Fix memory leak caused by missed dc_sink_release
  drm/amdgpu: fix holding mn_lock while allocating memory
  drm/amdgpu: Power on uvd block when hw_fini
  drm/amdgpu: Update power state at the end of smu hw_init.
  drm/amdgpu: Fix vce initialize failed on Kaveri/Mullins
  drm/amdgpu: Enable/disable gfx PG feature in rlc safe mode
  drm/amdgpu: Adjust the VM size based on system memory size v2
  drm/mediatek: fix connection from RDMA2 to DSI1
  drm/mediatek: update some variable name from ovl to comp
  drm/mediatek: use layer_nr function to get layer number to init plane
  drm/mediatek: add function to return RDMA layer number
  drm/mediatek: add function to return OVL layer number
  drm/mediatek: add function to get layer number for component
  ...
2018-08-30 21:18:05 -07:00
Stephen Rothwell 217c3e0196 disable stringop truncation warnings for now
They are too noisy

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-30 18:15:34 -07:00
Linus Torvalds b6935d2aa4 Power management fixes for 4.19-rc2
- Make the menu cpuidle governor avoid stopping the scheduler tick
    if the predicted idle duration exceeds the tick period length, but
    the selected idle state is shallow and deeper idle states with
    high target residencies are available (Rafael Wysocki).
 
  - Make the PM core's generic clock management code use a proper data
    type for one variable to make error handling work (Dan Carpenter).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJbiH91AAoJEILEb/54YlRxhgwQAITWMBHl0zWwqD7NrXwR96Nw
 hKpJLjhdJb7i9ZYI9TwvlPlivNgheY2h4pmsGrT4QoKxj9HWH3L8+8tvapucXWcM
 FtgaS9klvFaoPWDqMUpdt59nKcjhUrq8N17NRacqH2aYgRRV1jgYEdwrsYuV+HfE
 YfYeCsulKhhueWRPC9OeKzn2XGv9P/gQZRn0qRpybrChxN8IzWmrlBPfTDU1lh3m
 mr1ggVWVofljjRsi0zwi2Rv0dsE4HcofAqHE99u64Vv7NEtO+IDyOUZP/VHfmJop
 IniaKWhOEjlaBysBnTLo0XKhAOq/7suNzamF74oWg6waEIjwV2qGvYaR/o1nXR/9
 SCA+OptJ+jEVuzgzpfRtpR9u4u22/m+oZ0+lK5GxbgJEz46PdDhrl5gYNeLCy3va
 ArWZ9T9CBJhOdeD/lXfg9yX8r+mwb3hwJCFSsG6Grcrh7hGRCkqcv6XBT+KeoO9Z
 XG4d0ftInOTc8AiztmfhFntpDCSDEf6YFLaLsyz3BDECN0ZChW5AmVJ/XmoiaZ8Y
 J6dRjPVzOFwh6TfHmUKilbSoRz0dHsTmGKxOJJAEOlExO+R1AS4vIFHXZTWqF3Re
 M1YCxQ5CnKGxJbH3EKXuFlx5dXOglS6//LLTT+EgOtXKF13VXzNrx2oUlR56c0By
 MGDrS+8cedr8eeh5Y3p1
 =Kx6O
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These address a corner case in the menu cpuidle governor and fix error
  handling in the PM core's generic clock management code.

  Specifics:

   - Make the menu cpuidle governor avoid stopping the scheduler tick if
     the predicted idle duration exceeds the tick period length, but the
     selected idle state is shallow and deeper idle states with high
     target residencies are available (Rafael Wysocki).

   - Make the PM core's generic clock management code use a proper data
     type for one variable to make error handling work (Dan Carpenter)"

* tag 'pm-4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpuidle: menu: Retain tick when shallow state is selected
  PM / clk: signedness bug in of_pm_clk_add_clks()
2018-08-30 18:02:02 -07:00
Masahiro Yamada 2b52e2a67c arc: remove redundant GCC version checks
Commit cafa0010cd ("Raise the minimum required gcc version to 4.6")
bumped the minimum GCC version to 4.6 for all architectures.

With GCC >= 4.6 assumed, 'upto_gcc44' is empty, 'atleast_gcc44' is y.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-08-30 17:51:44 -07:00
Rafael J. Wysocki a0b9c4de7b Merge branch 'pm-core'
Merge a generic clock management fix for 4.19-rc2.

* pm-core:
  PM / clk: signedness bug in of_pm_clk_add_clks()
2018-08-31 01:23:31 +02:00
Akshu Agrawal bded6c03e3 clk: x86: Set default parent to 48Mhz
System clk provided in ST soc can be set to:
48Mhz, non-spread
25Mhz, spread
To get accurate rate, we need it to set it at non-spread
option which is 48Mhz.

Signed-off-by: Akshu Agrawal <akshu.agrawal@amd.com>
Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
Fixes: 421bf6a1f0 ("clk: x86: Add ST oscout platform clock")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-08-30 14:47:41 -07:00
Wolfram Sang cebc07d84a i2c: sh_mobile: fix leak when using DMA bounce buffer
We only freed the bounce buffer after successful DMA, missing the cases
where DMA setup may have gone wrong. Use a better location which always
gets called after each message and use 'stop_after_dma' as a flag for a
successful transfer.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-30 23:13:59 +02:00
Wolfram Sang 531db50170 i2c: sh_mobile: define start_ch() void as it only returns 0 anyhow
After various refactoring over the years, start_ch() doesn't return
errno anymore, so make the function return void. This saves the error
handling when calling it which in turn eases cleanup of resources of a
future patch.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-30 23:13:30 +02:00
Wolfram Sang 82fe39a6bc i2c: refactor function to release a DMA safe buffer
a) rename to 'put' instead of 'release' to match 'get' when obtaining
   the buffer
b) change the argument order to have the buffer as first argument
c) add a new argument telling the function if the message was
   transferred. This allows the function to be used also in cases
   where setting up DMA failed, so the buffer needs to be freed without
   syncing to the message buffer.

Also convert the only user.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-30 23:13:15 +02:00
Jan Kundrát 1204d12a49 i2c: algos: bit: make the error messages grepable
Yep, I went looking for one of these, and I wasn't able to find it
easily.  That's worse than a line which is 82-chars long, IMHO.

Signed-off-by: Jan Kundrát <jan.kundrat@cesnet.cz>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-30 23:11:42 +02:00
Hans de Goede 9d9a152eba i2c: designware: Re-init controllers with pm_disabled set on resume
On Bay Trail and Cherry Trail devices we set the pm_disabled flag for I2C
busses which the OS shares with the PUNIT as these need special handling.
Until now we called dev_pm_syscore_device(dev, true) for I2C controllers
with this flag set to keep these I2C controllers always on.

After commit 12864ff854 ("ACPI / LPSS: Avoid PM quirks on suspend and
resume from hibernation"), this no longer works. This commit modifies
lpss_iosf_exit_d3_state() to only run if lpss_iosf_enter_d3_state() has ran
before it, so that it does not run on a resume from hibernate (or from S3).

On these systems the conditions for lpss_iosf_enter_d3_state() to run
never become true, so lpss_iosf_exit_d3_state() never gets called and
the 2 LPSS DMA controllers never get forced into D0 mode, instead they
are left in their default automatic power-on when needed mode.

The not forcing of D0 mode for the DMA controllers enables these systems
to properly enter S0ix modes, which is a good thing.

But after entering S0ix modes the I2C controller connected to the PMIC
no longer works, leading to e.g. broken battery monitoring.

The _PS3 method for this I2C controller looks like this:

            Method (_PS3, 0, NotSerialized)  // _PS3: Power State 3
            {
                If ((((PMID == 0x04) || (PMID == 0x05)) || (PMID == 0x06)))
                {
                    Return (Zero)
                }

                PSAT |= 0x03
                Local0 = PSAT /* \_SB_.I2C5.PSAT */
            }

Where PMID = 0x05, so we enter the Return (Zero) path on these systems.

So even if we were to not call dev_pm_syscore_device(dev, true) the
I2C controller will be left in D0 rather then be switched to D3.

Yet on other Bay and Cherry Trail devices S0ix is not entered unless *all*
I2C controllers are in D3 mode. This combined with the I2C controller no
longer working now that we reach S0ix states on these systems leads to me
believing that the PUNIT itself puts the I2C controller in D3 when all
other conditions for entering S0ix states are true.

Since now the I2C controller is put in D3 over a suspend/resume we must
re-initialize it afterwards and that does indeed fix it no longer working.

This commit implements this fix by:

1) Making the suspend_late callback a no-op if pm_disabled is set and
making the resume_early callback skip the clock re-enable (since it now was
not disabled) while still doing the necessary I2C controller re-init.

2) Removing the dev_pm_syscore_device(dev, true) call, so that the suspend
and resume callbacks are actually called. Normally this would cause the
ACPI pm code to call _PS3 putting the I2C controller in D3, wreaking havoc
since it is shared with the PUNIT, but in this special case the _PS3 method
is a no-op so we can safely allow a "fake" suspend / resume.

Fixes: 12864ff854 ("ACPI / LPSS: Avoid PM quirks on suspend and resume ...")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200861
Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2018-08-30 23:02:13 +02:00