Граф коммитов

2540 Коммитов

Автор SHA1 Сообщение Дата
David Decotigny 3f1ac7a700 net: ethtool: add new ETHTOOL_xLINKSETTINGS API
This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
handled by the new get_link_ksettings/set_link_ksettings callbacks.
This API provides support for most legacy ethtool_cmd fields, adds
support for larger link mode masks (up to 4064 bits, variable length),
and removes ethtool_cmd deprecated
fields (transceiver/maxrxpkt/maxtxpkt).

This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
the following backward compatibility properties:
 - legacy ethtool with legacy drivers: no change, still using the
   get_settings/set_settings callbacks.
 - legacy ethtool with new get/set_link_ksettings drivers: the new
   driver callbacks are used, data internally converted to legacy
   ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
   mode mask. ETHTOOL_SSET will fail if user tries to set the
   ethtool_cmd deprecated fields to
   non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
   driver sets higher bits.
 - future ethtool with legacy drivers: no change, still using the
   get_settings/set_settings callbacks, internally converted to new data
   structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
   ignored and seen as 0 from user space. Note that that "future"
   ethtool tool will not allow changes to these deprecated fields.
 - future ethtool with new drivers: direct call to the new callbacks.

By "future" ethtool, what is meant is:
 - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if
   fails
 - set: query first and remember which of ETHTOOL_GLINKSETTINGS or
   ETHTOOL_GSET was successful
   + if ETHTOOL_GLINKSETTINGS was successful, then change config with
     ETHTOOL_SLINKSETTINGS. A failure there is final (do not try
     ETHTOOL_SSET).
   + otherwise ETHTOOL_GSET was successful, change config with
     ETHTOOL_SSET. A failure there is final (do not try
     ETHTOOL_SLINKSETTINGS).

The interaction user/kernel via the new API requires a small
ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link
mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
length it is expecting from user as a negative length (and cmd field is
0). When kernel and user agree, kernel returns valid info in all
fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS).

Data structure crossing user/kernel boundary is 32/64-bit
agnostic. Converted internally to a legal kernel bitmap.

The internal __ethtool_get_settings kernel helper will gradually be
replaced by __ethtool_get_link_ksettings by the time the first
"link_settings" drivers start to appear. So this patch doesn't change
it, it will be removed before it needs to be changed.

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 22:06:45 -05:00
David Ahern f1705ec197 net: ipv6: Make address flushing on ifdown optional
Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    << nothing; all addresses have been flushed>>

Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces

    $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
    $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1

Will keep addresses on eth1 on an admin down.

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
        inet6 2100:1::2/120 scope global tentative
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
           valid_lft forever preferred_lft forever

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 21:45:15 -05:00
Daniel Borkmann 2f72959a9c bpf: fix csum update in bpf_l4_csum_replace helper for udp
When using this helper for updating UDP checksums, we need to extend
this in order to write CSUM_MANGLED_0 for csum computations that result
into 0 as sum. Reason we need this is because packets with a checksum
could otherwise become incorrectly marked as a packet without a checksum.
Likewise, if the user indicates BPF_F_MARK_MANGLED_0, then we should
not turn packets without a checksum into ones with a checksum.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:07:10 -05:00
Daniel Borkmann 7d672345ed bpf: add generic bpf_csum_diff helper
For L4 checksums, we currently have bpf_l4_csum_replace() helper. It's
currently limited to handle 2 and 4 byte changes in a header and feeds the
from/to into inet_proto_csum_replace{2,4}() helpers of the kernel. When
working with IPv6, for example, this makes it rather cumbersome to deal
with, similarly when editing larger parts of a header.

Instead, extend the API in a more generic way: For bpf_l4_csum_replace(),
add a case for header field mask of 0 to change the checksum at a given
offset through inet_proto_csum_replace_by_diff(), and provide a helper
bpf_csum_diff() that can generically calculate a from/to diff for arbitrary
amounts of data.

This can be used in multiple ways: for the bpf_l4_csum_replace() only
part, this even provides us with the option to insert precalculated diffs
from user space f.e. from a map, or from bpf_csum_diff() during runtime.

bpf_csum_diff() has a optional from/to stack buffer input, so we can
calculate a diff by using a scratchbuffer for scenarios where we're
inserting (from is NULL), removing (to is NULL) or diffing (from/to buffers
don't need to be of equal size) data. Also, bpf_csum_diff() allows to
feed a previous csum into csum_partial(), so the function can also be
cascaded.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:07:09 -05:00
Alexei Starovoitov d5a3b1f691 bpf: introduce BPF_MAP_TYPE_STACK_TRACE
add new map type to store stack traces and corresponding helper
bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id
@ctx: struct pt_regs*
@map: pointer to stack_trace map
@flags: bits 0-7 - numer of stack frames to skip
        bit 8 - collect user stack instead of kernel
        bit 9 - compare stacks by hash only
        bit 10 - if two different stacks hash into the same stackid
                 discard old
        other bits - reserved
Return: >= 0 stackid on success or negative error

stackid is a 32-bit integer handle that can be further combined with
other data (including other stackid) and used as a key into maps.

Userspace will access stackmap using standard lookup/delete syscall commands to
retrieve full stack trace for given stackid.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20 00:21:44 -05:00
Kan Liang ac2c7ad0e5 net/ethtool: introduce a new ioctl for per queue setting
Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
The following patches will enable some SUB_COMMANDs for per queue
setting.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19 22:54:09 -05:00
Nikolay Aleksandrov 2125715635 bridge: mdb: add support for more attributes and export timer
Currently mdb entries are exported directly as a structure inside
MDBA_MDB_ENTRY_INFO attribute, we can't really extend it without
breaking user-space. In order to export new mdb fields, I've converted
the MDBA_MDB_ENTRY_INFO into a nested attribute which starts like before
with struct br_mdb_entry (without header, as it's casted directly in
iproute2) and continues with MDBA_MDB_EATTR_ attributes. This way we
keep compatibility with older users and can export new data.
I've tested this with iproute2, both with and without support for the
added attribute and it works fine.
So basically we again have MDBA_MDB_ENTRY_INFO with struct br_mdb_entry
inside but it may contain also some additional MDBA_MDB_EATTR_ attributes
such as MDBA_MDB_EATTR_TIMER which can be parsed by user-space.

So the new structure is:
[MDBA_MDB] = {
     [MDBA_MDB_ENTRY] = {
         [MDBA_MDB_ENTRY_INFO]
         [MDBA_MDB_ENTRY_INFO] { <- Nested attribute
             struct br_mdb_entry <- nla_put_nohdr()
             [MDBA_MDB_ENTRY attributes] <- normal netlink attributes
         }
     }
}

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19 15:27:36 -05:00
Florian Westphal d1b4c689d4 netlink: remove mmapped netlink support
mmapped netlink has a number of unresolved issues:

- TX zerocopy support had to be disabled more than a year ago via
  commit 4682a03586 ("netlink: Always copy on mmap TX.")
  because the content of the mmapped area can change after netlink
  attribute validation but before message processing.

- RX support was implemented mainly to speed up nfqueue dumping packet
  payload to userspace.  However, since commit ae08ce0021
  ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
  with the socket-based interface too (via the skb_zerocopy helper).

The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:

- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.

- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb->head.
See for instance
commit aa3a022094 ("netlink: not trim skb for mmaped socket when dump").

- skbs handed e.g. to netlink_ack must have non-NULL skb->sk, else we
crash because it needs the sk to check if a tx ring is attached.

Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").

mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch.  Daniel Borkmann fixed this via
commit 6bb0fef489 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.

nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
  Problem is that in the mmap case, the allocation time also determines
  the ordering in which the frame will be seen by userspace (A
  allocating before B means that A is located in earlier ring slot,
  but this also means that B might get a lower sequence number then A
  since seqno is decided later.  To fix this we would need to extend the
  spinlocked region to also cover the allocation and message setup which
  isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
  Queing GSO packets is faster than having to force a software segmentation
  in the kernel, so this is a desirable option.  However, with a mmap based
  ring one has to use 64kb per ring slot element, else mmap has to fall back
  to the socket path (NL_MMAP_STATUS_COPY) for all large packets.

To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ken-ichirou MATSUZAWA <chamaken@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 11:42:18 -05:00
Eric Dumazet cd9b266095 tcp: add tcpi_min_rtt and tcpi_notsent_bytes to tcp_info
tcpi_min_rtt reports the minimal rtt observed by TCP stack for the flow,
in usec unit. Might be ~0U if not yet known.

tcpi_notsent_bytes reports the amount of bytes in the write queue that
were not yet sent.

This is done in a single patch to not add a temporary 32bit padding hole
in tcp_info.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:27:35 -05:00
Nikolay Aleksandrov e02564ee33 ethtool: make validate_speed accept all speeds between 0 and INT_MAX
Devices these days can have any speed and as was recently pointed out
any speed from 0 to INT_MAX is valid so adjust speed validation to
accept such values.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 11:55:38 -05:00
Tycho Andersen 4a92602aa1 openvswitch: allow management from inside user namespaces
Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.

Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
which should be allowed inside a user namespace.

The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.

v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
    massive one

Reported-by: James Page <james.page@canonical.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Pravin Shelar <pshelar@ovn.org>
CC: Justin Pettit <jpettit@nicira.com>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 09:53:19 -05:00
Michael S. Tsirkin 4456ed04ea ethtool: future-proof interface for speed extensions
Many virtual and not quite virtual devices allow any speed to be set
through ethtool. In particular, this applies to the virtio-net devices.
Document this fact to make sure people don't assume the enum lists all
possible values.  Reserve values greater than INT_MAX for future
extension and to avoid conflict with SPEED_UNKNOWN.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 09:52:12 -05:00
Edward Cree 72bb68721f ethtool: add IPv6 to the NFC API
Signed-off-by: Edward Cree <ecree@solarflare.com>
Reviewed-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 07:16:18 -05:00
Johannes Berg 7a02bf892d ipv6: add option to drop unsolicited neighbor advertisements
In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_unsolicited_na".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:36 -05:00
Johannes Berg abbc30436d ipv6: add option to drop unicast encapsulated in L2 multicast
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:36 -05:00
Johannes Berg 97daf33145 ipv4: add option to drop gratuitous ARP packets
In certain 802.11 wireless deployments, there will be ARP proxies
that use knowledge of the network to correctly answer requests.
To prevent gratuitous ARP frames on the shared medium from being
a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_gratuitous_arp".

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:35 -05:00
Johannes Berg 12b74dfadb ipv4: add option to drop unicast encapsulated in L2 multicast
In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv4 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Additionally, enabling this option provides compliance with a SHOULD
clause of RFC 1122.

Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 04:27:35 -05:00
Elad Raz 157ede6784 bridge: mdb: add support for offloaded mdb entries
Add new bitmask member 'flags' to br_mdb_entry structure. Adding
MDB_FLAGS_OFFLOAD bit which indicates MDB entries is offloaded to hardware.

Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09 04:42:47 -05:00
Nikolay Aleksandrov 103a8ad1fa ethtool: add speed/duplex validation functions
Add functions which check if the speed/duplex are defined.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07 14:30:45 -05:00
David Ahern 67eb03318b net: Add support for fill_slave_info to VRF device
Allows userspace to have direct access to VRF table association
versus looking up master device and its table.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07 14:02:22 -05:00
Alexei Starovoitov a10423b87a bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map
Primary use case is a histogram array of latency
where bpf program computes the latency of block requests or other
events and stores histogram of latency into array of 64 elements.
All cpus are constantly running, so normal increment is not accurate,
bpf_xadd causes cache ping-pong and this per-cpu approach allows
fastest collision-free counters.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06 03:34:36 -05:00
Alexei Starovoitov 824bd0ce6c bpf: introduce BPF_MAP_TYPE_PERCPU_HASH map
Introduce BPF_MAP_TYPE_PERCPU_HASH map type which is used to do
accurate counters without need to use BPF_XADD instruction which turned
out to be too costly for high-performance network monitoring.
In the typical use case the 'key' is the flow tuple or other long
living object that sees a lot of events per second.

bpf_map_lookup_elem() returns per-cpu area.
Example:
struct {
  u32 packets;
  u32 bytes;
} * ptr = bpf_map_lookup_elem(&map, &key);
/* ptr points to this_cpu area of the value, so the following
 * increments will not collide with other cpus
 */
ptr->packets ++;
ptr->bytes += skb->len;

bpf_update_elem() atomically creates a new element where all per-cpu
values are zero initialized and this_cpu value is populated with
given 'value'.
Note that non-per-cpu hash map always allocates new element
and then deletes old after rcu grace period to maintain atomicity
of update. Per-cpu hash map updates element values in-place.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06 03:34:35 -05:00
Jarod Wilson 6e7333d315 net: add rx_nohandler stat counter
This adds an rx_nohandler stat counter, along with a sysfs statistics
node, and copies the counter out via netlink as well.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@mellanox.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Tom Herbert <tom@herbertland.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-06 02:59:51 -05:00
Dan Williams 9f4736fe7c block: revert runtime dax control of the raw block device
Dynamically enabling DAX requires that the page cache first be flushed
and invalidated.  This must occur atomically with the change of DAX mode
otherwise we confuse the fsync/msync tracking and violate data
durability guarantees.  Eliminate the possibilty of DAX-disabled to
DAX-enabled transitions for now and revisit this for the next cycle.

Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-01-30 13:35:31 -08:00
Linus Torvalds 391f2a16b7 Some locking and page fault bug fixes from Jan Kara, some ext4
encryption fixes from me, and Li Xi's Project Quota commits.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJWnwkiAAoJEPL5WVaVDYGjyAAH/1dj1nNL9h+d12V3zXbvoPkg
 5RFw/2QfMZ+GE3Lln9gxTBDSyo/9m8hUK8eg0WpIRtGX9NbKcyrWEGJa2XF++43k
 tVpKGyN6cqkwPu4M6EPIK9yRvuALGB5PJE/u0q1lu9VoIAgtin3F/bAQK/iHnrUg
 M3+lVDtKcmbhqCdocaLLZD6Q4xlQI3wJne99pYt+Dtx95aOQY9v9SV030i7sOnEt
 R5JrhmfkgNqVTB8Zz0IxOp5LQlOkuyvtnZ44yYgJH8ckCUnDQI2hbksSqcMamJ1Y
 QJWBzRhVXU9gs1nCRy/Xh48mSk+nvZW9aglk+Syzbzg5C63SgwYcqvbCBqJJEdc=
 =HjkT
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Some locking and page fault bug fixes from Jan Kara, some ext4
  encryption fixes from me, and Li Xi's Project Quota commits"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  fs: clean up the flags definition in uapi/linux/fs.h
  ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
  ext4: add project quota support
  ext4: adds project ID support
  ext4 crypto: simplify interfaces to directory entry insert functions
  ext4 crypto: add missing locking for keyring_key access
  ext4: use pre-zeroed blocks for DAX page faults
  ext4: implement allocation of pre-zeroed blocks
  ext4: provide ext4_issue_zeroout()
  ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag
  ext4: document lock ordering
  ext4: fix races of writeback with punch hole and zero range
  ext4: fix races between buffered IO and collapse / insert range
  ext4: move unlocked dio protection from ext4_alloc_file_blocks()
  ext4: fix races between page faults and hole punching
2016-01-22 11:23:35 -08:00
Linus Torvalds d5ffdf8b4a xfs: Update 2 for 4.5-rc1
This update contains:
 
 o promotion of XFS_IOC_FS[GS]ETXATTR ioctl to the vfs level so that
   it can be shared with other filesystems. The ext4 project quota
   functionality is the first target for this. The commits in this
   series have not been updated with review or final SOB tags because
   the branch they were originally published in was needed by ext4.
   Those tags are:
 
   Reviewed-by: Theodore Ts'o <tytso@mit.edu>
   Signed-off-by: Dave Chinner <david@fromrobit.com>
 
 o Revert a change that is causing suspend failures.
 o Fix a use-after-free that can occur on log mount failures. Been
   around forever, but now exposed by other changes to log recovery
   made in the first 4.5 merge.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJWoV0hAAoJEK3oKUf0dfodSCQP/RXlZp6TQhv2DQ2MW4AeZRzs
 kzp3zvWUN1udB0fgAARMUDbHHeqEp5gUB6Fj8GOjgh69VGac1pjR2GOvEA9UbnhL
 uLQwaRggVB/BJV6+hDUw283kENXE1H8JcDiEIFratwdiZ6KrhniMptzrbUnG22LO
 cBLzHOCFI0x4ib2fdTvrVV8bNaAaLYViYUxuVwzblzhoODN4Nmv5HZ5BlMHDFJsd
 E47Yw/0tdYFVRDuujN22ylYsKsySXBxPaWyUvDDlW/ryeKSfwn3V8Y7BSDZU4vUZ
 CFstsqlzEySGrNNCfor5bFn9EO3i882M+DU60UhZAKRgvAzANAsxjJ97B8Of5KA+
 /0OQarl0ZNJ93g6mZJ2bhuVpRCIGWJ3rBl9+GK8JdtsjF0mPOvrusKTQKoz1frK7
 B8h52P+jxfqrrqeqpNigMWfDKYkXCfUUMAJm57+QILAoTNRupAzgFyXZnSgAermE
 jaDfvnkaSZxfaLtTOlkkpGukhbFubhAWTk3TksVxICPXztZelQLmmbqjZnTYFCT/
 dKieKbwop58DBTycFuzCrWiSjXjodAq/+IfpAQcvJ5xZPLtgfjHxQaHD6zsOVKzQ
 lWosgYOnIaN/PYPOpAzo0sRDf80d5KFjwcdSjrWZVZ5lGfAsx8iYErh3v0Xv3rkE
 YuKQw2AjVVtD64SfHvIn
 =wEy8
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull more xfs updates from Dave Chinner:
 "This is the second update for XFS that I mentioned in the original
  pull request last week.

  It contains a revert for a suspend regression in 4.4 and a fix for a
  long standing log recovery issue that has been further exposed by all
  the log recovery changes made in the original 4.5 merge.

  There is one more thing in this pull request - one that I forgot to
  merge into the origin.  That is, pulling the XFS_IOC_FS[GS]ETXATTR
  ioctl up to the VFS level so that other filesystems can also use it
  for modifying project quota IDs

  Summary:

   - promotion of XFS_IOC_FS[GS]ETXATTR ioctl to the vfs level so that
     it can be shared with other filesystems.  The ext4 project quota
     functionality is the first target for this.  The commits in this
     series have not been updated with review or final SOB tags because
     the branch they were originally published in was needed by ext4.
     Those tags are:

        Reviewed-by: Theodore Ts'o <tytso@mit.edu>
        Signed-off-by: Dave Chinner <david@fromrobit.com>

   - Revert a change that is causing suspend failures.

   - Fix a use-after-free that can occur on log mount failures.  Been
     around forever, but now exposed by other changes to log recovery
     made in the first 4.5 merge"

* tag 'xfs-for-linus-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: log mount failures don't wait for buffers to be released
  Revert "xfs: clear PF_NOFREEZE for xfsaild kthread"
  xfs: introduce per-inode DAX enablement
  xfs: use FS_XFLAG definitions directly
  fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR promotion
2016-01-22 10:54:13 -08:00
Linus Torvalds 3e1e21c7bf Merge branch 'for-4.5/nvme' of git://git.kernel.dk/linux-block
Pull NVMe updates from Jens Axboe:
 "Last branch for this series is the nvme changes.  It's in a separate
  branch to avoid splitting too much between core and NVMe changes,
  since NVMe is still helping drive some blk-mq changes.  That said, not
  a huge amount of core changes in here.  The grunt of the work is the
  continued split of the code"

* 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits)
  uapi: update install list after nvme.h rename
  NVMe: Export NVMe attributes to sysfs group
  NVMe: Shutdown controller only for power-off
  NVMe: IO queue deletion re-write
  NVMe: Remove queue freezing on resets
  NVMe: Use a retryable error code on reset
  NVMe: Fix admin queue ring wrap
  nvme: make SG_IO support optional
  nvme: fixes for NVME_IOCTL_IO_CMD on the char device
  nvme: synchronize access to ctrl->namespaces
  nvme: Move nvme_freeze/unfreeze_queues to nvme core
  PCI/AER: include header file
  NVMe: Export namespace attributes to sysfs
  NVMe: Add pci error handlers
  block: remove REQ_NO_TIMEOUT flag
  nvme: merge iod and cmd_info
  nvme: meta_sg doesn't have to be an array
  nvme: properly free resources for cancelled command
  nvme: simplify completion handling
  nvme: special case AEN requests
  ...
2016-01-21 19:58:02 -08:00
Linus Torvalds 0a13daedf7 Merge branch 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block
Pull lightnvm fixes and updates from Jens Axboe:
 "This should have been part of the drivers branch, but it arrived a bit
  late and wasn't based on the official core block driver branch.  So
  they got a small scolding, but got a pass since it's still new.  Hence
  it's in a separate branch.

  This is mostly pure fixes, contained to lightnvm/, and minor feature
  additions"

* 'for-4.5/lightnvm' of git://git.kernel.dk/linux-block: (26 commits)
  lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM
  lightnvm: introduce factory reset
  lightnvm: use system block for mm initialization
  lightnvm: introduce ioctl to initialize device
  lightnvm: core on-disk initialization
  lightnvm: introduce mlc lower page table mappings
  lightnvm: add mccap support
  lightnvm: manage open and closed blocks separately
  lightnvm: fix missing grown bad block type
  lightnvm: reference rrpc lun in rrpc block
  lightnvm: introduce nvm_submit_ppa
  lightnvm: move rq->error to nvm_rq->error
  lightnvm: support multiple ppas in nvm_erase_ppa
  lightnvm: move the pages per block check out of the loop
  lightnvm: sectors first in ppa list
  lightnvm: fix locking and mempool in rrpc_lun_gc
  lightnvm: put block back to gc list on its reclaim fail
  lightnvm: check bi_error in gc
  lightnvm: return the get_bb_tbl return value
  lightnvm: refactor end_io functions for sync
  ...
2016-01-21 19:01:55 -08:00
Linus Torvalds eae21770b4 Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:
 "I'm pretty much done for -rc1 now:

   - the rest of MM, basically

   - lib/ updates

   - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit

   - cpu_mask simplifications

   - kexec, rapidio, MAINTAINERS etc, etc.

   - more dma-mapping cleanups/simplifications from hch"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
  MAINTAINERS: add/fix git URLs for various subsystems
  mm: memcontrol: add "sock" to cgroup2 memory.stat
  mm: memcontrol: basic memory statistics in cgroup2 memory controller
  mm: memcontrol: do not uncharge old page in page cache replacement
  Documentation: cgroup: add memory.swap.{current,max} description
  mm: free swap cache aggressively if memcg swap is full
  mm: vmscan: do not scan anon pages if memcg swap limit is hit
  swap.h: move memcg related stuff to the end of the file
  mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
  mm: vmscan: pass memcg to get_scan_count()
  mm: memcontrol: charge swap to cgroup2
  mm: memcontrol: clean up alloc, online, offline, free functions
  mm: memcontrol: flatten struct cg_proto
  mm: memcontrol: rein in the CONFIG space madness
  net: drop tcp_memcontrol.c
  mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
  mm: memcontrol: allow to disable kmem accounting for cgroup2
  mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
  mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
  mm: memcontrol: separate kmem code from legacy tcp accounting code
  ...
2016-01-21 12:32:08 -08:00
Linus Torvalds e9f57ebcba Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This contains several bug fixes and a new mount option
  'default_permissions' that allows read-only exported NFS
  filesystems to be used as lower layer"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: check dentry positiveness in ovl_cleanup_whiteouts()
  ovl: setattr: check permissions before copy-up
  ovl: root: copy attr
  ovl: move super block magic number to magic.h
  ovl: use a minimal buffer in ovl_copy_xattr
  ovl: allow zero size xattr
  ovl: default permissions
2016-01-21 12:20:46 -08:00
Linus Torvalds 5c89e9ea7e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
 "This adds SEEK_HOLE and SEEK_DATA support in lseek"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add support for SEEK_HOLE and SEEK_DATA in lseek
2016-01-21 12:14:24 -08:00
Jason Baron df0108c5da epoll: add EPOLLEXCLUSIVE flag
Currently, epoll file descriptors or epfds (the fd returned from
epoll_create[1]()) that are added to a shared wakeup source are always
added in a non-exclusive manner.  This means that when we have multiple
epfds attached to a shared fd source they are all woken up.  This creates
thundering herd type behavior.

Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the
'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation.  This new
flag allows for exclusive wakeups when there are multiple epfds attached
to a shared fd event source.

The implementation walks the list of exclusive waiters, and queues an
event to each epfd, until it finds the first waiter that has threads
blocked on it via epoll_wait().  The idea is to search for threads which
are idle and ready to process the wakeup events.  Thus, we queue an event
to at least 1 epfd, but may still potentially queue an event to all epfds
that are attached to the shared fd source.

Performance testing was done by Madars Vitolins using a modified version
of Enduro/X.  The use of the 'EPOLLEXCLUSIVE' flag reduce the length of
this particular workload from 860s down to 24s.

Sample epoll_clt text:

EPOLLEXCLUSIVE

  Sets an exclusive wakeup mode for the epfd file descriptor that is
  being attached to the target file descriptor, fd.  Thus, when an event
  occurs and multiple epfd file descriptors are attached to the same
  target file using EPOLLEXCLUSIVE, one or more epfds will receive an
  event with epoll_wait(2).  The default in this scenario (when
  EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
  EPOLLEXCLUSIVE may only be specified with the op EPOLL_CTL_ADD.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Tested-by: Madars Vitolins <m@silodev.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Linus Torvalds 5807fcaa9b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:

 - EVM gains support for loading an x509 cert from the kernel
   (EVM_LOAD_X509), into the EVM trusted kernel keyring.

 - Smack implements 'file receive' process-based permission checking for
   sockets, rather than just depending on inode checks.

 - Misc enhancments for TPM & TPM2.

 - Cleanups and bugfixes for SELinux, Keys, and IMA.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (41 commits)
  selinux: Inode label revalidation performance fix
  KEYS: refcount bug fix
  ima: ima_write_policy() limit locking
  IMA: policy can be updated zero times
  selinux: rate-limit netlink message warnings in selinux_nlmsg_perm()
  selinux: export validatetrans decisions
  gfs2: Invalid security labels of inodes when they go invalid
  selinux: Revalidate invalid inode security labels
  security: Add hook to invalidate inode security labels
  selinux: Add accessor functions for inode->i_security
  security: Make inode argument of inode_getsecid non-const
  security: Make inode argument of inode_getsecurity non-const
  selinux: Remove unused variable in selinux_inode_init_security
  keys, trusted: seal with a TPM2 authorization policy
  keys, trusted: select hash algorithm for TPM2 chips
  keys, trusted: fix: *do not* allow duplicate key options
  tpm_ibmvtpm: properly handle interrupted packet receptions
  tpm_tis: Tighten IRQ auto-probing
  tpm_tis: Refactor the interrupt setup
  tpm_tis: Get rid of the duplicate IRQ probing code
  ...
2016-01-17 19:13:15 -08:00
Linus Torvalds 984065055e Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
 "This is the main drm pull request for 4.5.  I don't think I've missed
  anything too major, I'm mostly back at work now but I'll probably get
  some sleep in 5 years time.

  Summary:

  New drivers:
   - etnaviv:

     GPU driver for the 3D core on the Vivante core used in numerous
     ARM boards.

  Highlights:

  Core:
   - Atomic suspend/resume helpers
   - Move the headers to using userspace friendlier types.
   - Documentation updates
   - Lots of struct_mutex removal.
   - Bunch of DP MST fixes from AMD.

  Panel:
   - More DSI helpers
   - Support for some new basic panels

  i915:
   - Basic Kabylake support
   - DP link training and detect code refactoring
   - fbc/psr fixes
   - FIFO underrun fixes
   - SDE interrupt handling fixes
   - dma-buf/fence support in pageflip path.
   - GPU side for MST audio support

  radeon/amdgpu:
   - Drop UMS support
   - GPUVM/Scheduler optimisations
   - Initial Powerplay support for Tonga/Fiji/CZ/ST
   - ACP audio prerequisites

  nouveau:
   - GK20a instmem improvements
   - PCIE link speed change support

  msm:
   - DSI support for msm8960/apq8064

  tegra:
   - Host1X support for Tegra210 SoC

  vc4:
   - 3D acceleration support

  armada:
   - Get rid of struct mutex

  tda998x:
   - Atomic modesetting support
   - TMDS clock limitations

  omapdrm:
   - Atomic modesetting support
   - improved TILER performance

  rockchip:
   - RK3036 VOP support
   - Atomic modesetting support
   - Synopsys DW MIPI DSI support

  exynos:
   - Runtime PM support
   - of_graph binding for DP panels
   - Cleanup of IPP code
   - Configurable plane support
   - Kernel panic fixes at release time"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (711 commits)
  drm/fb_cma_helper: Remove implicit call to disable_unused_functions
  drm/amdgpu: add missing irq.h include
  drm/vmwgfx: Fix a width / pitch mismatch on framebuffer updates
  drm/vmwgfx: Fix an incorrect lock check
  drm: nouveau: fix nouveau_debugfs_init prototype
  drm/nouveau/pci: fix check in nvkm_pcie_set_link
  drm/amdgpu: validate duplicates first
  drm/amdgpu: move VM page tables to the LRU end on CS v2
  drm/ttm: add ttm_bo_move_to_lru_tail function v2
  drm/ttm: fix adding foreign BOs to the swap LRU
  drm/ttm: fix adding foreign BOs to the LRU during init v2
  drm/radeon: use kobj_to_dev()
  drm/amdgpu: use kobj_to_dev()
  drm/amdgpu/cz: force vce clocks when sclks are forced
  drm/amdgpu/cz: force uvd clocks when sclks are forced
  drm/amdgpu/cz: add code to enable forcing VCE clocks
  drm/amdgpu/cz: add code to enable forcing UVD clocks
  drm/amdgpu: fix lost sync_to if scheduler is enabled.
  drm/amd/powerplay: fix static checker warning for return meaningless value.
  drm/sysfs: use kobj_to_dev()
  ...
2016-01-17 13:40:25 -08:00
Linus Torvalds 37cea93b99 VFIO updates for v4.5-rc1
- Fixes in AMD xgbe reset, spapr structure padding, type 1 flags
    (Dan Carpenter, Alexey Kardashevskiy, Pierre Morel)
  - Re-introduce no-iommu mode, with a user this time (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWmVq0AAoJECObm247sIsiAZQP/2oPDw+KyCApPlIRIr7dQ2Xu
 TdHmGed1QlWaXdt5YWutaglIhxwUZppLtfQrwM9gPsRZygfZ2A/lhx8F5+i+8eib
 kN1CUKX+HV9hl8mDQLRmjwjuFEQ9bwXaOFaq+TBpTki+T/8ENW/69IvMa0oFttQi
 CYiCsAveADMmxzjOpfKCszpKaEn2v4zpghDmuvdCNbu7IZOT/X7BNG9YUJ9e7F+m
 /x1L9LhC7VllnLmJb/8gyCU0omoW0Z/DzrQQ0DgH9KZSRgU/9f2Sfj1Y6WytRAwJ
 xBMCnjElkJxl8QjKQ3nrdoYyu+hTgm/qYpQ2PlGDybVKYhiUlQ4/on9GUY5I5iFc
 L3vQIkY3zmruMrdA/syAB9b3ku0lFelvbUW4aJ/1OME9rpmgEkvzEJmJ931cbrL1
 vLt2wbSDAbiW3sHKMEA70tcKbWWSpXX6e0QlBLtTYFwuhtUSDMKffkqfokTQrMuw
 Hy0+FCCXYcckxp3qRQXfzpp9c4vM5dJ6bHTVMhZ6Akvhl+BQb4m5J1WpYxYqCqBK
 oJ0u92bGHOBvcXwxZ8KwkP+tjClFxe2mTpmPWfSZ0rj3Z5rHLfK8eCEQ/KgQGiGI
 7F2pnxdeAYlhgEULAh+9VtfsH3tEZCpEY68dLZAyBGn2kEAijtgOMaCeX7+GahVz
 nV89Yczp3Wyaa70idiQS
 =Z9Kq
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v4.5-rc1' of git://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

 - Fixes in AMD xgbe reset, spapr structure padding, type 1 flags (Dan
   Carpenter, Alexey Kardashevskiy, Pierre Morel)

 - Re-introduce no-iommu mode, with a user this time (Alex Williamson)

* tag 'vfio-v4.5-rc1' of git://github.com/awilliam/linux-vfio:
  vfio/iommu_type1: make use of info.flags
  vfio: Include No-IOMMU mode
  vfio: Add explicit alignments in vfio_iommu_spapr_tce_create
  VFIO: platform: reset: fix a warning message condition
2016-01-15 13:01:01 -08:00
Linus Torvalds 3c28c9ccaf md updates for 4.5
Mostly clustered-raid1 and raid5 journal updates.
 one Y2038 fix and other minor stuff.
 
 One patch removes me from the MAINTAINERS file and adds a record of
 my md maintainership to Credits.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWmJEhAAoJEDnsnt1WYoG5raQQAI9lBrHO+Q8C8RImPsemLX0X
 ypjH38XUwEwKNYYfsCVI7PKAqCl7r8ITzY054gKsU0iHAfqLQlEN8aMz0v0fJQhg
 Msb7utrEMQ0UERwNcc+3J78ffFAdWrkHVd64Ley0h/pizFPSlL0K2RuIGTBc9sGX
 Hz2Ci11Ch7FdK7C/Zl7I6tK1pkthu3hBXYEZyg1GngRRhZEJj2U7mBmy1E37NA72
 o66B5r5FSlnIA8MAo/EAViCxtMJKBPRWU/WnkMhOJ1Yyw/FwMpbM2prLBLtYFqwF
 OLZOLuDUHY5HxdX2U+3R0hBzF78aozcH6od60SWg7wOmI/IkXYiYFujlxMd132FE
 OT+aa+UHHDEkATTSyt98OmxIkQ8uqKiNsSYqBk9lpNAPtmEbhqRX4RAOdrqP0G83
 DX7iyZpAK4YhB4BkJxMtNdSIOnss1TwfOdKyvoBZYmY6bTKh7p+dpw4cvIjV4VDi
 p6+BUQdJQ7mHRLV9QI4IuG52AJO8cRGc1OVvqLEMzO8uZlpyxX9nJrSqeP/dKKfa
 pJ5pYssilXEeKCDnODGqSRdt9aU4ENDW/oIkAW2U3cnSHUwBoMLF1WJ+M3Atbm+s
 i3/iDp26SnSiHM+DVHije5v0OGOroYdJwKDIFWToElcfc9Q5IDHU+KP8oeuPqqOS
 WA08l+zj+ahfP7Yu1DUC
 =gl+r
 -----END PGP SIGNATURE-----

Merge tag 'md/4.5' of git://neil.brown.name/md

Pull md updates from Neil Brown:
 "Mostly clustered-raid1 and raid5 journal updates.  one Y2038 fix and
  other minor stuff.

  One patch removes me from the MAINTAINERS file and adds a record of my
  md maintainership to Credits"

Many thanks to Neil, who has been around for a _looong_ time.

* tag 'md/4.5' of git://neil.brown.name/md: (26 commits)
  md/raid: only permit hot-add of compatible integrity profiles
  Remove myself as MD Maintainer, and add to Credits.
  raid5-cache: handle journal hotadd in quiesce
  MD: add journal with array suspended
  md: set MD_HAS_JOURNAL in correct places
  md: Remove 'ready' field from mddev.
  md: remove unnecesary md_new_event_inintr
  raid5: allow r5l_io_unit allocations to fail
  raid5-cache: use a mempool for the metadata block
  raid5-cache: use a bio_set
  raid5-cache: add journal hot add/remove support
  drivers: md: use ktime_get_real_seconds()
  md: avoid warning for 32-bit sector_t
  raid5-cache: free meta_page earlier
  raid5-cache: simplify r5l_move_io_unit_list
  md: update comment for md_allow_write
  md-cluster: update comments for MD_CLUSTER_SEND_LOCKED_ALREADY
  md-cluster: Protect communication with mutexes
  md-cluster: Defer MD reloading to mddev->thread
  md-cluster: update the documentation
  ...
2016-01-15 12:28:00 -08:00
Linus Torvalds d080827f85 libnvdimm for 4.5
1/ Media error handling: The 'badblocks' implementation that originated
    in md-raid is up-levelled to a generic capability of a block device.
    This initial implementation is limited to being consulted in the pmem
    block-i/o path.  Later, 'badblocks' will be consulted when creating
    dax mappings.
 
 2/ Raw block device dax: For virtualization and other cases that want
    large contiguous mappings of persistent memory, add the capability to
    dax-mmap a block device directly.
 
 3/ Increased /dev/mem restrictions: Add an option to treat all io-memory
    as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access while a driver is
    actively using an address range.  This behavior is controlled via the
    new CONFIG_IO_STRICT_DEVMEM option and can be overridden by the
    existing "iomem=relaxed" kernel command line option.
 
 4/ Miscellaneous fixes include a 'pfn'-device huge page alignment fix,
    block device shutdown crash fix, and other small libnvdimm fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWlrhjAAoJEB7SkWpmfYgCFbAQALKsQfFwT6JFS+zlPgiNpbqw
 2VMNKEH0AfGYGj96mT02j2q+vSUmXLMIDMTsbe0sDdtwFZtQbFmhmryzPWUVppSu
 KGTlLPW8vuEhQVs91+UI3BQKkvpi0+tbR8hPOh9W6QhjpRT+lyHFKnsNR5HZy5wB
 K4/VMaT5ffd5/pXRTjkYiPQYTwWyfcvNjICj0YtqhPvOwS031m77JpFsWJ8HSpEX
 K99VlzNUPMXd1pYkHmFNXWw52fhRGNhwAEomLeKMdQfKms+KnbKp8BOSA0aCqU8E
 kpujQcilDXJwykFQZOFI3Z5Dxvrv8lxFTU8HRMBvo3ESzfTWjfqcvyjGOjDUcruw
 ihESFSJtdZzhrBiMnf9RRqSpMFJvAT8MVT6Q4D3mZUHCMPbUqFJsQjMPt9hEH3ho
 4F0D2lesOCkubUKFTZmjMoDb+szuKbVhYK8TeFVVEhizinc/Aj0NKuazJqi+CXB/
 xh0ER4ZxD8wvzqFFWvS5UvR1G9I5fr7+3jGRUrqGLHlSdeXP9dkEg28ao3QbWk3x
 1dPOen6ZqQ9WJ/E7eGmXbVEz2R4Xd79hMXQzdQwmKDk/KbxRoAp7hyU8BslAyrBf
 HCdmVt+RAgrxZYfFRXuLhqwEBThJnNrgZA3qu74FUpkpFg6xRUu1bAYBiF7N+bFi
 82b5UbMkveBTtkXjJoiR
 =7V5r
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "The bulk of this has appeared in -next and independently received a
  build success notification from the kbuild robot.  The 'for-4.5/block-
  dax' topic branch was rebased over the weekend to drop the "block
  device end-of-life" rework that Al would like to see re-implemented
  with a notifier, and to address bug reports against the badblocks
  integration.

  There is pending feedback against "libnvdimm: Add a poison list and
  export badblocks" received last week.  Linda identified some localized
  fixups that we will handle incrementally.

  Summary:

   - Media error handling: The 'badblocks' implementation that
     originated in md-raid is up-levelled to a generic capability of a
     block device.  This initial implementation is limited to being
     consulted in the pmem block-i/o path.  Later, 'badblocks' will be
     consulted when creating dax mappings.

   - Raw block device dax: For virtualization and other cases that want
     large contiguous mappings of persistent memory, add the capability
     to dax-mmap a block device directly.

   - Increased /dev/mem restrictions: Add an option to treat all
     io-memory as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access
     while a driver is actively using an address range.  This behavior
     is controlled via the new CONFIG_IO_STRICT_DEVMEM option and can be
     overridden by the existing "iomem=relaxed" kernel command line
     option.

   - Miscellaneous fixes include a 'pfn'-device huge page alignment fix,
     block device shutdown crash fix, and other small libnvdimm fixes"

* tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (32 commits)
  block: kill disk_{check|set|clear|alloc}_badblocks
  libnvdimm, pmem: nvdimm_read_bytes() badblocks support
  pmem, dax: disable dax in the presence of bad blocks
  pmem: fail io-requests to known bad blocks
  libnvdimm: convert to statically allocated badblocks
  libnvdimm: don't fail init for full badblocks list
  block, badblocks: introduce devm_init_badblocks
  block: clarify badblocks lifetime
  badblocks: rename badblocks_free to badblocks_exit
  libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.h
  libnvdimm: Add a poison list and export badblocks
  nfit_test: Enable DSMs for all test NFITs
  md: convert to use the generic badblocks code
  block: Add badblock management for gendisks
  badblocks: Add core badblock management code
  block: fix del_gendisk() vs blkdev_ioctl crash
  block: enable dax for raw block devices
  block: introduce bdev_file_inode()
  restrict /dev/mem to idle io memory ranges
  arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug
  ...
2016-01-13 19:15:14 -08:00
Linus Torvalds 77a76b04d2 media updates for v4.5-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWlNJNAAoJEAhfPr2O5OEVrVEP/3wGrkzC3ykROabLiuFx6+uN
 fUxMlSnnIwSPsiK7OIP45EHI7PJr2zRLfg4p3X6ABOTBtycziUQGP3ARCcqllrSG
 Tl7KoJokNBTiNRPY2nefC3gB7H36D+TBv3zgpR+vPggr6HGSfc8c6y6sETl/IAvl
 do7EltQ5BUJzgQ9/ZAM/FLBLNSLGexkqeJe7EC5IX0hqJ/tlc1vqIEu2xcdsG1cd
 w1k03C+/ukOr1wfcVmqlh9K2WCPZ59V0c3XcNS+4SeCKH4wnmmf4/BOrAuRFtCjh
 691RuXHXpZKdX1l9TVAmp9CuVvXNDjWdFfNovxVG8POn9TF769zI6t+9rCBSg5Gb
 lzSMOB75/EArOml8sU7gxhEsVsVw9RlukMN7luVwiqPi9vDSPzQw0fzvYkX85Ay3
 TMBW2z1cVjWBvjotTBkSg/rgqetmTgaU04kJ2vIqlfLLd8r/8o5ILZU5WzQtugLJ
 HHipyqG5FZIhXCRmqmr1cEmk+PKP8IA59zKQQG7Z2W+H3rqM6jakc3AxXtJFcz08
 8c4gwXrNBNCWYh9x6e3h0x3p5pdTvrhH0cG9BxWYwaqdCiFZVL+3ZgZB0LmlRc0m
 shJkNAGNX503SqeMQOJnfwEi7Kdeb8mIQBSvM1ZmCrheNwyxtrdJnFkJK3w+q/Tf
 PsycImJtKpskfiXbhqWR
 =6J4t
 -----END PGP SIGNATURE-----

Merge tag 'media/v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull second batch of media updates from Mauro Carvalho Chehab:
 "This is the second part of the media patches.  It contains the media
  controller next generation patches, with is the result of one year of
  discussions and development.  It also contains patches to enable media
  controller support at the DVB subsystem.

  The goal is to improve the media controller to allow proper support
  for other types of Video4Linux devices (radio and TV ones) and to
  extend the media controller functionality to allow it to be used by
  other subsystems like DVB, ALSA and IIO.

  In order to use the new functionality, a new ioctl is needed
  (MEDIA_IOC_G_TOPOLOGY).  As we're still discussing how to pack the
  struct fields of this ioctl in order to avoid compat32 issues, I
  decided to add a patch at the end of this series commenting out the
  new ioctl, in order to postpone the addition of the new ioctl to the
  next Kernel version (4.6).

  With that, no userspace visible changes should happen at the media
  controller API, as the existing ioctls are untouched.  Yet, it helps
  DVB, ALSA and IIO developers to develop and test the patches adding
  media controller support there, as the core will contain all required
  internal changes to allow adding support for devices that belong to
  those subsystems"

* tag 'media/v4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (177 commits)
  [media] Postpone the addition of MEDIA_IOC_G_TOPOLOGY
  [media] mxl111sf: Add a tuner entity
  [media] dvbdev: create links on devices with multiple frontends
  [media] media-entitiy: add a function to create multiple links
  [media] dvb-usb-v2: postpone removal of media_device
  [media] dvbdev: Add RF connector if needed
  [media] dvbdev: remove two dead functions if !CONFIG_MEDIA_CONTROLLER_DVB
  [media] call media_device_init() before registering the V4L2 device
  [media] uapi/media.h: Use u32 for the number of graph objects
  [media] media-entity: don't sleep at media_device_register_entity()
  [media] media-entity: increase max number of PADs
  [media] media-entity.h: document the remaining functions
  [media] media-device.h: use just one u32 counter for object ID
  [media] media-entity.h fix documentation for several parameters
  [media] DocBook: document media_entity_graph_walk_cleanup()
  [media] move documentation to the header files
  [media] media: Move MEDIA_ENTITY_MAX_PADS from media-entity.h to media-entity.c
  [media] media: Remove pre-allocated entity enumeration bitmap
  [media] staging: v4l: davinci_vpbe: Use the new media graph walk interface
  [media] staging: v4l: omap4iss: Use the new media graph walk interface
  ...
2016-01-13 11:46:37 -08:00
Linus Torvalds 1c5ff2ab7b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
 - new driver for eGalaxTouch serial touchscreen
 - new driver for TS-4800 touchscreen
 - an update for Goodix touchscreen driver
 - PS/2 mouse module was reworked to limit number of protocols we try on
   pass-through ports to speed up their detection time
 - wacom_w8001 touchscreen driver now reports pen and touch via separate
   instances of input devices
 - other driver changes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (42 commits)
  Input: elantech - mark protocols v2 and v3 as semi-mt
  Input: wacom_w8001 - drop use of ABS_MT_TOOL_TYPE
  Input: gpio-keys - fix check for disabling unsupported keys
  Input: omap-keypad - remove dead check
  Input: ti_am335x_tsc - fix HWPEN interrupt handling
  Input: omap-keypad - set tasklet data earlier
  Input: rohm_bu21023 - fix handling of retrying firmware update
  Input: ALPS - report v3 pinnacle trackstick device only if is present
  Input: ALPS - detect trackstick presence for v7 protocol
  Input: pcap_ts - use to_delayed_work
  Input: bma150 - constify bma150_cfg structure
  Input: i8042 - add Fujitsu Lifebook U745 to the nomux list
  Input: egalax_ts_serial - fix potential NULL dereference on error
  Input: uinput - sanity check on ff_effects_max and EV_FF
  Input: uinput - rework ABS validation
  Input: uinput - add new UINPUT_DEV_SETUP and UI_ABS_SETUP ioctl
  Input: goodix - use "inverted_[xy]" flags instead of "rotated_screen"
  Input: goodix - add axis swapping and axis inversion support
  Input: goodix - use goodix_i2c_write_u8 instead of i2c_master_send
  Input: goodix - add power management support
  ...
2016-01-13 11:14:05 -08:00
Mike Frysinger a9cf8284b4 uapi: update install list after nvme.h rename
Commit 9d99a8dda1 ("nvme: move hardware structures out of the uapi
version of nvme.h") renamed nvme.h to nvme_ioctl.h, but the uapi list
still refers to nvme.h.  People trying to install the headers hit a
failure as the header no longer exists.

Cc: stable@vger.kernel.org
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-13 11:40:04 -07:00
Linus Torvalds 4c257ec37b char/misc patches for 4.5-rc1
Here's the big set of char/misc patches for 4.5-rc1.
 
 Nothing major, lots of different driver subsystem updates, full details
 in the shortlog.  All of these have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlaV0GsACgkQMUfUDdst+ymlPgCg07GSc4SOWlUL2V36vQ0kucoO
 YjAAoMfeUEhsf/NJ7iaAMGQVUQKuYVqr
 =BlqH
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc updates from Greg KH:
 "Here's the big set of char/misc patches for 4.5-rc1.

  Nothing major, lots of different driver subsystem updates, full
  details in the shortlog.  All of these have been in linux-next for a
  while"

* tag 'char-misc-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (71 commits)
  mei: fix fasync return value on error
  parport: avoid assignment in if
  parport: remove unneeded space
  parport: change style of NULL comparison
  parport: remove unnecessary out of memory message
  parport: remove braces
  parport: quoted strings should not be split
  parport: code indent should use tabs
  parport: fix coding style
  parport: EXPORT_SYMBOL should follow function
  parport: remove trailing white space
  parport: fix a trivial typo
  coresight: Fix a typo in Kconfig
  coresight: checking for NULL string in coresight_name_match()
  Drivers: hv: vmbus: Treat Fibre Channel devices as performance critical
  Drivers: hv: utils: fix hvt_op_poll() return value on transport destroy
  Drivers: hv: vmbus: fix the building warning with hyperv-keyboard
  extcon: add Maxim MAX3355 driver
  Drivers: hv: ring_buffer: eliminate hv_ringbuffer_peek()
  Drivers: hv: remove code duplication between vmbus_recvpacket()/vmbus_recvpacket_raw()
  ...
2016-01-13 10:23:36 -08:00
Linus Torvalds 67ad058d97 TTY/Serial patches for 4.5-rc1
Here is the big serial/tty driver updates for 4.5-rc1.  Lots of driver
 updates and some tty core changes.  All of these have been in linux-next
 and the details are in the shortlog.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlaV0iQACgkQMUfUDdst+ynukgCeNdulE6XMg5Xp3Wn3hs0ZW6fo
 YmUAoMRrtjFCixhiGHoNKTm35V4gC2sy
 =D64i
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here is the big serial/tty driver update for 4.5-rc1.

  Lots of driver updates and some tty core changes.  All of these have
  been in linux-next and the details are in the shortlog"

* tag 'tty-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (127 commits)
  drivers/tty/serial: delete unused MODULE_DEVICE_TABLE from atmel_serial.c
  serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock
  serial: 8250: of: Fix the driver and actually compile the 8250_of
  tty: amba-pl011: use iotype instead of access_32b to track 32-bit I/O
  tty: amba-pl011: fix earlycon register offsets
  serial: sh-sci: Drop the sci_fck clock fallback
  sh: sh7734: Correct SCIF type for BRG
  sh: Remove sci_ick clock alias
  sh: Rename sci_ick and sci_fck clock to fck
  serial: sh-sci: Add support for optional BRG on (H)SCIF
  serial: sh-sci: Add support for optional external (H)SCK input
  serial: sh-sci: Prepare for multiple sampling clock sources
  serial: sh-sci: Correct SCIF type on R-Car for BRG
  serial: sh-sci: Correct SCIF type on RZ/A1H
  serial: sh-sci: Replace struct sci_port_info by type/regtype encoding
  serial: sh-sci: Add BRG register definitions
  serial: sh-sci: Take into account sampling rate for max baud rate
  serial: sh-sci: Merge sci_scbrr_calc() and sci_baud_calc_hscif()
  serial: sh-sci: Avoid calculating the receive margin for HSCIF
  serial: sh-sci: Improve bit rate error calculation for HSCIF
  ...
2016-01-13 10:02:05 -08:00
Linus Torvalds 34a9304a96 Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - cgroup v2 interface is now official.  It's no longer hidden behind a
   devel flag and can be mounted using the new cgroup2 fs type.

   Unfortunately, cpu v2 interface hasn't made it yet due to the
   discussion around in-process hierarchical resource distribution and
   only memory and io controllers can be used on the v2 interface at the
   moment.

 - The existing documentation which has always been a bit of mess is
   relocated under Documentation/cgroup-v1/. Documentation/cgroup-v2.txt
   is added as the authoritative documentation for the v2 interface.

 - Some features are added through for-4.5-ancestor-test branch to
   enable netfilter xt_cgroup match to use cgroup v2 paths.  The actual
   netfilter changes will be merged through the net tree which pulled in
   the said branch.

 - Various cleanups

* 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rename cgroup documentations
  cgroup: fix a typo.
  cgroup: Remove resource_counter.txt in Documentation/cgroup-legacy/00-INDEX.
  cgroup: demote subsystem init messages to KERN_DEBUG
  cgroup: Fix uninitialized variable warning
  cgroup: put controller Kconfig options in meaningful order
  cgroup: clean up the kernel configuration menu nomenclature
  cgroup_pids: fix a typo.
  Subject: cgroup: Fix incomplete dd command in blkio documentation
  cgroup: kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] and friends
  cpuset: Replace all instances of time_t with time64_t
  cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation
  cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/
  cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
2016-01-12 19:20:32 -08:00
Linus Torvalds aee3bfa330 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from Davic Miller:

 1) Support busy polling generically, for all NAPI drivers.  From Eric
    Dumazet.

 2) Add byte/packet counter support to nft_ct, from Floriani Westphal.

 3) Add RSS/XPS support to mvneta driver, from Gregory Clement.

 4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
    Frederic Sowa.

 5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.

 6) Add support for VLAN device bridging to mlxsw switch driver, from
    Ido Schimmel.

 7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.

 8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.

 9) Reorganize wireless drivers into per-vendor directories just like we
    do for ethernet drivers.  From Kalle Valo.

10) Provide a way for administrators "destroy" connected sockets via the
    SOCK_DESTROY socket netlink diag operation.  From Lorenzo Colitti.

11) Add support to add/remove multicast routes via netlink, from Nikolay
    Aleksandrov.

12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.

13) Add forwarding and packet duplication facilities to nf_tables, from
    Pablo Neira Ayuso.

14) Dead route support in MPLS, from Roopa Prabhu.

15) TSO support for thunderx chips, from Sunil Goutham.

16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.

17) Rationalize, consolidate, and more completely document the checksum
    offloading facilities in the networking stack.  From Tom Herbert.

18) Support aborting an ongoing scan in mac80211/cfg80211, from
    Vidyullatha Kanchanapally.

19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
  net: bnxt: always return values from _bnxt_get_max_rings
  net: bpf: reject invalid shifts
  phonet: properly unshare skbs in phonet_rcv()
  dwc_eth_qos: Fix dma address for multi-fragment skbs
  phy: remove an unneeded condition
  mdio: remove an unneed condition
  mdio_bus: NULL dereference on allocation error
  net: Fix typo in netdev_intersect_features
  net: freescale: mac-fec: Fix build error from phy_device API change
  net: freescale: ucc_geth: Fix build error from phy_device API change
  bonding: Prevent IPv6 link local address on enslaved devices
  IB/mlx5: Add flow steering support
  net/mlx5_core: Export flow steering API
  net/mlx5_core: Make ipv4/ipv6 location more clear
  net/mlx5_core: Enable flow steering support for the IB driver
  net/mlx5_core: Initialize namespaces only when supported by device
  net/mlx5_core: Set priority attributes
  net/mlx5_core: Connect flow tables
  net/mlx5_core: Introduce modify flow table command
  net/mlx5_core: Managing root flow table
  ...
2016-01-12 18:57:02 -08:00
Linus Torvalds 4d58967783 GFS2: merge window
Here is a list of patches we've accumulated for GFS2 for the current upstream
 merge window. Last window's set was short, but I warned that this one would
 be bigger, and so it is. We've got 19 patches:
 
 - A patch from Abhi Das to propagate the GFS2_DIF_SYSTEM bit so that newly
   added journals don't get flagged, deleted, and recreated by fsck.gfs2.
 - Two patches from Andreas Gruenbacher to improve GFS2 performance where
   extended attributes are involved.
 - A patch from Andy Price to fix a suspicious rcu dereference error.
 - Two patches from Ben Marzinski that rework how GFS2's NFS cookies are
   managed. This fixes readdir problems with nfs-over-gfs2.
 - A patch from Ben Marzinski that fixes a race in unmounting GFS2.
 - A set of four patches from me to move the resource group reservations
   inside the gfs2 inode to improve performance and fix a bug whereby
   get_write_access improperly prevented some operations like chown.
 - A patch from me to spinlock-protect the setting of system statfs file data.
   This was causing small discrepancies between df and du.
 - A patch from me to reintroduce a timeout while clearing glocks which was
   accidentally dropped some time ago.
 - A patch from me to wait for iopen glock dequeues in order to improve
   deleting of files that were unlinked from a different cluster node.
 - A patch from me to ensure metadata address spaces get truncated when an
   inode is evicted.
 - A patch from me to fix a bug in which a memory leak could occur in some
   error cases when inodes were trying to be created.
 - A patch to consistently use iopen glocks to transition from the unlinked
   state to the deleted state.
 - A patch to fix a glock reference count error when inode creation fails.
 - A patch from Junxiao Bi to fix an flock panic.
 - A patch from Markus Elfring that removes an unnecessary if.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWlUOCAAoJENeLYdPf93o78/IH/0PfoQtQnYQpvbTPgaFICEUI
 rx+uLqPxpJgjJMsbIy+VA/bhZsbuENDbor99Cs6GiTLy3Q/4TKNY9NxDN+aO8o+Q
 qrp3ZANkyTGaneCzrXzfgmOxe1G8xRQ7pboRUEt2yPlGK1oLax+PR4uPb+IuJ5Wa
 QvSwnZj+9K2LkMQkKPyxoltjZiZLywvNB5dx0F35+dvriQxHXfJ1Naek6gs140MB
 SfG4/Zi8dfGOJQjSQTUENatFhyFB0V7CbDPFuBVzTGD8IS4ASgqIBMFjO0VBHkP4
 gt4zb0BJAn0aQcbxIaDtsuEffpn+5zdK/Onq5g4Fr9ZJv3fkVpVcOgYui/rkDD0=
 =28ev
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "Here is a list of patches we've accumulated for GFS2 for the current
  upstream merge window.  Last window's set was short, but I warned that
  this one would be bigger, and so it is.  We've got 19 patches:

   - A patch from Abhi Das to propagate the GFS2_DIF_SYSTEM bit so that
     newly added journals don't get flagged, deleted, and recreated by
     fsck.gfs2.

   - Two patches from Andreas Gruenbacher to improve GFS2 performance
     where extended attributes are involved.

   - A patch from Andy Price to fix a suspicious rcu dereference error.

   - Two patches from Ben Marzinski that rework how GFS2's NFS cookies
     are managed.  This fixes readdir problems with nfs-over-gfs2.

   - A patch from Ben Marzinski that fixes a race in unmounting GFS2.

   - A set of four patches from me to move the resource group
     reservations inside the gfs2 inode to improve performance and fix a
     bug whereby get_write_access improperly prevented some operations
     like chown.

   - A patch from me to spinlock-protect the setting of system statfs
     file data.  This was causing small discrepancies between df and du.

   - A patch from me to reintroduce a timeout while clearing glocks
     which was accidentally dropped some time ago.

   - A patch from me to wait for iopen glock dequeues in order to
     improve deleting of files that were unlinked from a different
     cluster node.

   - A patch from me to ensure metadata address spaces get truncated
     when an inode is evicted.

   - A patch from me to fix a bug in which a memory leak could occur in
     some error cases when inodes were trying to be created.

   - A patch to consistently use iopen glocks to transition from the
     unlinked state to the deleted state.

   - A patch to fix a glock reference count error when inode creation
     fails.

   - A patch from Junxiao Bi to fix an flock panic.

   - A patch from Markus Elfring that removes an unnecessary if"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: fix flock panic issue
  GFS2: Don't do glock put on when inode creation fails
  GFS2: Always use iopen glock for gl_deletes
  GFS2: Release iopen glock in gfs2_create_inode error cases
  GFS2: Truncate address space mapping when deleting an inode
  GFS2: Wait for iopen glock dequeues
  gfs2: clear journal live bit in	gfs2_log_flush
  gfs2: change gfs2 readdir cookie
  gfs2: keep offset when splitting dir leaf blocks
  GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear
  GFS2: Update master statfs buffer with sd_statfs_spin locked
  GFS2: Reduce size of incore inode
  GFS2: Make rgrp reservations part of the gfs2_inode structure
  GFS2: Extract quota data from reservations structure (revert 5407e24)
  gfs2: Extended attribute readahead optimization
  gfs2: Extended attribute readahead
  GFS2: Use rht_for_each_entry_rcu in glock_hash_walk
  GFS2: Delete an unnecessary check before the function call "iput"
  gfs2: Automatically set GFS2_DIF_SYSTEM flag on system files
2016-01-12 18:09:35 -08:00
Linus Torvalds fce205e9da Merge branch 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs copy_file_range updates from Al Viro:
 "Several series around copy_file_range/CLONE"

* 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  btrfs: use new dedupe data function pointer
  vfs: hoist the btrfs deduplication ioctl to the vfs
  vfs: wire up compat ioctl for CLONE/CLONE_RANGE
  cifs: avoid unused variable and label
  nfsd: implement the NFSv4.2 CLONE operation
  nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
  vfs: pull btrfs clone API to vfs layer
  locks: new locks_mandatory_area calling convention
  vfs: Add vfs_copy_file_range() support for pagecache copies
  btrfs: add .copy_file_range file operation
  x86: add sys_copy_file_range to syscall tables
  vfs: add copy_file_range syscall and vfs helper
2016-01-12 16:30:34 -08:00
Linus Torvalds 1baa5efbeb * s390: Support for runtime instrumentation within guests,
support of 248 VCPUs.
 
 * ARM: rewrite of the arm64 world switch in C, support for
 16-bit VM identifiers.  Performance counter virtualization
 missed the boat.
 
 * x86: Support for more Hyper-V features (synthetic interrupt
 controller), MMU cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWlSKwAAoJEL/70l94x66DY0UIAK5vp4zfQoQOJC4KP4Xgxwdu
 kpnK2Boz3/74o1b0y5+eJZoUZCsXCVLtmP5uhmMxUYWDgByFG2X8ZDhPFwB5FYLT
 2dN+Lr4tsolgIfRdHZtrT6Svp9SDL039bWTdscnbR6l37/j9FRWvpKdhI3orloFD
 /i4CSW2dVIq1/9Xctwu/rtcOEesEx4Cad+6YV3/530eVAXFzE908nXfmqJNZTocY
 YCGcmrMVCOu0ng5QM4xSzmmYjKMLUcRs+QzZWkVBzdJtTgwZUr09yj7I2dZ1yj/i
 cxYrJy6shSwE74XkXsmvG+au3C5u3vX4tnXjBFErnPJ99oqzHatVnFWNRhj4dLQ=
 =PIj1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC changes will come next week.

   - s390: Support for runtime instrumentation within guests, support of
     248 VCPUs.

   - ARM: rewrite of the arm64 world switch in C, support for 16-bit VM
     identifiers.  Performance counter virtualization missed the boat.

   - x86: Support for more Hyper-V features (synthetic interrupt
     controller), MMU cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (115 commits)
  kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
  kvm/x86: Hyper-V SynIC timers tracepoints
  kvm/x86: Hyper-V SynIC tracepoints
  kvm/x86: Update SynIC timers on guest entry only
  kvm/x86: Skip SynIC vector check for QEMU side
  kvm/x86: Hyper-V fix SynIC timer disabling condition
  kvm/x86: Reorg stimer_expiration() to better control timer restart
  kvm/x86: Hyper-V unify stimer_start() and stimer_restart()
  kvm/x86: Drop stimer_stop() function
  kvm/x86: Hyper-V timers fix incorrect logical operation
  KVM: move architecture-dependent requests to arch/
  KVM: renumber vcpu->request bits
  KVM: document which architecture uses each request bit
  KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requests
  kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock
  KVM: s390: implement the RI support of guest
  kvm/s390: drop unpaired smp_mb
  kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa
  KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()
  arm/arm64: KVM: Detect vGIC presence at runtime
  ...
2016-01-12 13:22:12 -08:00
Matias Bjørling 8b4970c41f lightnvm: introduce factory reset
Now that a device can be managed using the system blocks, a method to
reset the device is necessary as well. This patch introduces logic to
reset the device easily to factory state and exposes it through an
ioctl.

The ioctl takes the following flags:

  NVM_FACTORY_ERASE_ONLY_USER
      By default all blocks, except host-reserved blocks are erased upon
      factory reset. Instead of this, only erase host-reserved blocks.
  NVM_FACTORY_RESET_HOST_BLKS
      Mark host-reserved blocks to be erased and set their type to free.
  NVM_FACTORY_RESET_GRWN_BBLKS
      Mark "grown bad blocks" to be erased and set their type to free.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 08:21:18 -07:00
Matias Bjørling 5569615424 lightnvm: introduce ioctl to initialize device
Based on the previous patch, we now introduce an ioctl to initialize the
device using nvm_init_sysblock and create the necessary system blocks.
The user may specify the media manager that they wish to instantiate on
top. Default from user-space will be "gennvm".

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 08:21:18 -07:00
Matias Bjørling e3eb3799f7 lightnvm: core on-disk initialization
An Open-Channel SSD shall be initialized before use. To initialize, we
define an on-disk format, that keeps a small set of metadata to bring up
the media manager on top of the device.

The initial step is introduced to allow a user to format the disks for a
given media manager. During format, a system block is stored on one to
three separate luns on the device. Each lun has the system block
duplicated. During initialization, the system block can be retrieved and
the appropriate media manager can initialized.

The on-disk format currently covers (struct nvm_system_block):

 - Magic value "NVMS".
 - Monotonic increasing sequence number.
 - The physical block erase count.
 - Version of the system block format.
 - Media manager type.
 - Media manager superblock physical address.

The interface provides three functions to manage the system block:

 int nvm_init_sysblock(struct nvm_dev *, struct nvm_sb_info *)
 int nvm_get_sysblock(struct nvm *dev, struct nvm_sb_info *)
 int nvm_update_sysblock(struct nvm *dev, struct nvm_sb_info *)

Each implement a part of the logic to manage the system block. The
initialization creates the first system blocks and mark them on the
device. Get retrieves the latest system block by scanning all pages in
the associated system blocks. The update sysblock writes new metadata
and allocates new block if necessary.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-01-12 08:21:18 -07:00