Граф коммитов

44070 Коммитов

Автор SHA1 Сообщение Дата
Yan, Zheng af5e5eb574 ceph: fix race during filling readdir cache
Readdir cache uses page cache to save dentry pointers. When adding
dentry pointers to middle of a page, we need to make sure the page
already exists. Otherwise the beginning part of the page will be
invalid pointers.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:53 +01:00
Ilya Dryomov 34b759b4a2 ceph: kill ceph_empty_snapc
ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
for sizing the request message.

Further, in all four cases the subsequent ceph_osdc_build_request() is
passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
and making ceph_empty_snapc entirely useless.  The two cases where it
actually mattered were removed in commits 8605609049 ("ceph: avoid
sending unnessesary FLUSHSNAP message") and 23078637e0 ("ceph: fix
queuing inode to mdsdir's snaprealm").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by:  Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Anton Protopopov ce4355932a ceph: fix a wrong comparison
A negative value rc compared to the positive value ENOENT in the
finish_read() function.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Deepa Dinamani 8bbd47140c ceph: replace CURRENT_TIME by current_fs_time()
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Yan, Zheng 5b64640cf6 ceph: scattered page writeback
This patch makes ceph_writepages_start() try using single OSD request
to write all dirty pages within a strip unit. When a nonconsecutive
dirty page is found, ceph_writepages_start() tries starting a new write
operation to existing OSD request. If it succeeds, it uses the new
operation to writeback the dirty page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:51 +01:00
Yan, Zheng a587d71b0a ceph: remove useless BUG_ON
ceph_osdc_start_request() never return -EOLDSNAP

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:41 +01:00
Yan, Zheng 133e91566c ceph: don't enable rbytes mount option by default
When rbytes mount option is enabled, directory size is recursive
size. Recursive size is not updated instantly. This can cause
directory size to change between successive stat(1)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:41 +01:00
Yan, Zheng d1eee0c0e1 ceph: encode ctime in cap message
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:40 +01:00
Ilya Dryomov 82dcabad75 libceph: revamp subs code, switch to SUBSCRIBE2 protocol
It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime".  To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs.  Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.

Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0).  SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).

Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:38 +01:00
Linus Torvalds 1d02369dba Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Final round of fixes for this merge window - some of this has come up
  after the initial pull request, and some of it was put in a post-merge
  branch before the merge window.

  This contains:

   - Fix for a bad check for an error on dma mapping in the mtip32xx
     driver, from Alexey Khoroshilov.

   - A set of fixes for lightnvm, from Javier, Matias, and Wenwei.

   - An NVMe completion record corruption fix from Marta, ensuring that
     we read things in the right order.

   - Two writeback fixes from Tejun, marked for stable@ as well.

   - A blk-mq sw queue iterator fix from Thomas, fixing an oops for
     sparse CPU maps.  They hit this in the hot plug/unplug rework"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme: avoid cqe corruption when update at the same time as read
  writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
  writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
  blk-mq: Use proper cpumask iterator
  mtip32xx: fix checks for dma mapping errors
  lightnvm: do not load L2P table if not supported
  lightnvm: do not reserve lun on l2p loading
  nvme: lightnvm: return ppa completion status
  lightnvm: add a bitmap of luns
  lightnvm: specify target's logical address area
  null_blk: add lightnvm null_blk device to the nullb_list
2016-03-24 20:00:44 -07:00
Linus Torvalds 8f40842e42 MTD updates for v4.6
NAND:
  * Add sunxi_nand randomizer support
  * begin refactoring NAND ecclayout structs
  * fix pxa3xx_nand dmaengine usage
  * brcmnand: fix support for v7.1 controller
  * add Qualcomm NAND controller driver
 
 SPI NOR:
  * add new ls1021a, ls2080a support to Freescale QuadSPI
  * add new flash ID entries
  * support bottom-block protection for Winbond flash
  * support Status Register Write Protect
  * remove broken QPI support for Micron SPI flash
 
 JFFS2:
  * improve post-mount CRC scan efficiency
 
 General:
  * refactor bcm63xxpart parser, to later extend for NAND
  * add writebuf size parameter to mtdram
 
 Other minor code quality improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW9CzVAAoJEFySrpd9RFgtQFwQAJdH0wnsZTYfeqToIaD8yMM4
 rtakV/oIMSvMSWuqK+Mx0k6OjGwswgnGZ+tfQLRAYIhb33P8UD0F8Dv5D0x/+zRo
 EgiDlnss/lliXpbh2u4fsANSpFF/JUPXFqU6NanjqQ1rtvR60LUeKOFEz1NRciuV
 Ib6oDLFeXQFxwG0J+EBDo5MrT8aiPODtx4TS8VVo0o0y/WLkEujQPP5592TnCPha
 zX0n9azi26pARo7VLqWjVD8GigY5PadqJAWOZcQr0dGMQv5URtWcCCdThiNsCEzY
 SW9cYSr4CBdy1FIeoJ47yoBg8aFzhyeeuF1efb1U0MoYVL0rdIbznop3Kwilj48L
 Rnh4hvKkrTH16rO6RfKm1lIJaJQYKMErXyEceYMIjV91fEL3qhfbU9W6+Q5HT4hY
 oJmlH+4e/I1Jtf+vW4xFGMYclmYwCO6GJ4HHqnNpby/iH/nZ07hNX3lbxrlqHMwh
 MrSIidqLTsseXcyHBFc+42AsWs8unaYWVB0N3VFkEgl0BFyPObAtvwnHA6zywMvp
 EqJijXFG8VPcztE3eTIMbd0WOkxTjpMT6YHzpZqli/ENxCgu79OWELYrJ0/vC5Uj
 HK0qxgvIzUyJgmikkySDvd/Hc6HWItYonlcAht0VErNfTTfkMwWgRz1W4ZRB6bOJ
 7M83aytLyRYaPGEbwaoR
 =xOlP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 "NAND:
   - Add sunxi_nand randomizer support
   - begin refactoring NAND ecclayout structs
   - fix pxa3xx_nand dmaengine usage
   - brcmnand: fix support for v7.1 controller
   - add Qualcomm NAND controller driver

  SPI NOR:
   - add new ls1021a, ls2080a support to Freescale QuadSPI
   - add new flash ID entries
   - support bottom-block protection for Winbond flash
   - support Status Register Write Protect
   - remove broken QPI support for Micron SPI flash

  JFFS2:
   - improve post-mount CRC scan efficiency

  General:
   - refactor bcm63xxpart parser, to later extend for NAND
   - add writebuf size parameter to mtdram

  Other minor code quality improvements"

* tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd: (72 commits)
  mtd: nand: remove kerneldoc for removed function parameter
  mtd: nand: Qualcomm NAND controller driver
  dt/bindings: qcom_nandc: Add DT bindings
  mtd: nand: don't select chip in nand_chip's block_bad op
  mtd: spi-nor: support lock/unlock for a few Winbond chips
  mtd: spi-nor: add TB (Top/Bottom) protect support
  mtd: spi-nor: add SPI_NOR_HAS_LOCK flag
  mtd: spi-nor: use BIT() for flash_info flags
  mtd: spi-nor: disallow further writes to SR if WP# is low
  mtd: spi-nor: make lock/unlock bounds checks more obvious and robust
  mtd: spi-nor: silently drop lock/unlock for already locked/unlocked region
  mtd: spi-nor: wait for SR_WIP to clear on initial unlock
  mtd: nand: simplify nand_bch_init() usage
  mtd: mtdswap: remove useless if (!mtd->ecclayout) test
  mtd: create an mtd_oobavail() helper and make use of it
  mtd: kill the ecclayout->oobavail field
  mtd: nand: check status before reporting timeout
  mtd: bcm63xxpart: give width specifier an 'int', not 'size_t'
  mtd: mtdram: Add parameter for setting writebuf size
  mtd: nand: pxa3xx_nand: kill unused field 'drcmr_cmd'
  ...
2016-03-24 19:57:15 -07:00
Linus Torvalds 88875667eb This pull request contains cleanups and a maintainer update
for UBI and UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJW9C7iAAoJEEtJtSqsAOnWLSIP/3ijRdZdGCD9QlKQfb0qg+wH
 kS0yHiPRp6mabMqPrgvDvF5/+ULRkVdEJef1Pcvs24odB2cH8SmwY7sPXTJIAh49
 R9tbAQsgz+nPgplBnCnKaO1Io6aHZ3v4eJZL6UF5Y89B2i8d5zAzql/9Y+a2gFZP
 wTCSxVC+NsO1Bf822701pQXON9fTZbBEYomeb1xIIC2paOp3XrBuLIT6grKv+h2Z
 wvZ9iCuMzWdPkRoIi/qLUkrWVdsuCt2oN/GqKJn5uSbjybS/bIqUYe5dcm322c2s
 vPhcQptgv4zkjHoJd6hyf/hts+y1i1K6NNX3hCGZ1sgL1qldeq5VMAlf+Ksuy9V5
 NlfNTU7/AJEJT1DgcuRqvXBP+sB7W/4T3TTphW7sn4uTFMDQ94N3LBkUC7dOkWLg
 qOIvBJho1im/gZyZS9g3mwg0h7SCnYCNOoZFTU17f5wCyYCLgVaFWIzcBH4iwPXk
 BSADYNv9crYyyV2RDdcYif4QDZ9QwUPanvJdrdH3CsieymboHseMW9N3I53kr3Bq
 lU2pRSeQz9aTLCN1l06bkXHHtoE+RWb5oiL3Gy28a+QwW7dluz58NjIRkT6vEost
 UzIgvBl5GJTjO7JInrIgwDIuAeznwFkG7xDS0VG9lCki+HlIH4R5SGJdqkFtcVIF
 sXeRIQCDh03niv37vwAh
 =y4hr
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.6-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:
 "This contains cleanups and a maintainer update for UBI and UBIFS"

* tag 'upstream-4.6-rc1' of git://git.infradead.org/linux-ubifs:
  ubifs: Remove unused header
  MAINTAINERS: Update UBIFS entry
  mtd: ubi: Add logging functions ubi_msg, ubi_warn and ubi_err
  ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warn
2016-03-24 19:55:41 -07:00
Linus Torvalds 8b306a2e7c Various bugfixes, a RDMA update from Chuck Lever, and support for a new
pnfs layout type from Christoph Hellwig.  The new layout type is a
 variant of the block layout which uses SCSI features to offer improved
 fencing and device identification.
 
 Note this pull request also includes the client side of SCSI layout,
 with Trond's permission.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW9D0/AAoJECebzXlCjuG+fYcP/ibluAOSRrQ523gQcJNS+QSV
 3B7YY6diJkfQNkm4oAROwPd1KHT2qhoVAO3JHXA3SZnjVVYQxAHeh2wsZJ2jL6Ft
 uyZARxix+F9alJVT3S+uYLwagjh9LXLhb0MaRTMheaWGsPKLQTU4JtsLsjAIhCah
 R0EIIdQfWcb83XoVPmiflVO4Nl/TQWmfA5wHfoVtITJcL3AaC9gzCGNbc8dHLnFC
 HRjGVgHr3nSL3suvUEFfxSEo4QoNPWIX4kBaWXgqbVgOQqmbtQtaXdnd3gIRtkzj
 9Q/lxiwaArtDjdAQdyNtRRBUpkpWo+xWp/vpnNUxTXKoRtpSyqYQX5FaPCPRVAAp
 GYGw2qHrvWn2hSajtVtKyWwsQ3lYsDmbkxAkgScO9kQdS+kuxNyIzYIEvakdtFyJ
 txFsauJczkNNFeHKzLPDoGbuX7KB/+pUsjmX5nYtMhwRriXA5S8zcO4AvTrmTPDF
 vQrLM97mqI60LWmpQUO1OE8CEFPVx5DUZ0KdLMvFNKPZph8BTPJxJMmxJK4R6stV
 /TWglRTEO8IGUh0ww8+3PfMfxVG5XHnQc99+VGVZOS9hJ4GOXbWYAqZ0m+sRJ2Pi
 JPawILie5x2gH1FrVYbcTZsQzdmdn/BF9yePNzAkMucjuEUHXFTlf3MMfEhKpYTl
 0l8LBCv6ZvtGU+PUJxZn
 =MToz
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux

Pull more nfsd updates from Bruce Fields:
 "Apologies for the previous request, which omitted the top 8 commits
  from my for-next branch (including the SCSI layout commits).  Thanks
  to Trond for spotting my error!"

This actually includes the new layout types, so here's that part of
the pull message repeated:

 "Support for a new pnfs layout type from Christoph Hellwig.  The new
  layout type is a variant of the block layout which uses SCSI features
  to offer improved fencing and device identification.

  Note this pull request also includes the client side of SCSI layout,
  with Trond's permission"

* tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux:
  nfsd: use short read as well as i_size to set eof
  nfsd: better layoutupdate bounds-checking
  nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK
  nfsd: add SCSI layout support
  nfsd: move some blocklayout code
  nfsd: add a new config option for the block layout driver
  nfs/blocklayout: add SCSI layout support
  nfs4.h: add SCSI layout definitions
2016-03-24 19:50:32 -07:00
Chris Mason 232cad8413 Merge branch 'misc-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.6 2016-03-24 17:36:13 -07:00
Linus Torvalds 5b1e167d8d Various bugfixes, a RDMA update from Chuck Lever, and support for a new
pnfs layout type from Christoph Hellwig.  The new layout type is a
 variant of the block layout which uses SCSI features to offer improved
 fencing and device identification.
 
 (Also: note this pull request also includes the client side of SCSI
 layout, with Trond's permission.)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8+uhAAoJECebzXlCjuG+26YP/35DP4MPfszEJ5G0dYq5HMwl
 dJUni8ajSHRswZ/2FqiBsRwmg3Djfc+uoXdOneD1f6ogkDe7S16yp+FRyh8/VwUs
 Ym6LcxSjT28uqkxO0MblcnUl0G9nNSuOwqIsZ0HG7/UC7E6RmCF4o3r5fFUfOsA+
 B3koB5UcHNAFythAk+GDwOQ46Fr96VkZ7Y+OhdNAwmeXZIdKXIufweueI/o2uipB
 RoJFJ7lqrzAjFe+CqAUBr2l2k6lEKzdxbEH6HXQ5+cvVNwfVIgnrONpF78uF/p9T
 NNDnZ+fn3YdRhd+W9RxUHZq7ZL5YOEA8kHsAlloeBH74GqCy7IcS+DrKt1ReM3px
 bhgsXM3dqqJ9xiDGqmeE4VQwRF30SxgYZbO386E+cLHnCYV+vfY6RUaWPrk6On/r
 FL9g3iyVvhyC4HO06Xm+uvvERw8R+fTZY9KZQKH2RL0Tr5DkWRRNJfasMO+PwGOv
 Fdku01vyoA4Y6mbqUgQ9DmrbLO4gK3UyMiOTanQV9shrIDxI0MOuLK03zL25vZCM
 s1A4YBpXmg4gx3XsOFM+tygv6EVujDu6scICeb+hj0vi0oG82Lx7T9e3MJEiYC+T
 jbi8bu+x+0bX2obMprvDNVUzi/PgSUVpGCnRlbRTaXBa0lB6nV7uUiQ1HC9gGesm
 ZWWiOv7du+7WlFP5c6r5
 =mY8w
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Various bugfixes, a RDMA update from Chuck Lever, and support for a
  new pnfs layout type from Christoph Hellwig.  The new layout type is a
  variant of the block layout which uses SCSI features to offer improved
  fencing and device identification.

  (Also: note this pull request also includes the client side of SCSI
  layout, with Trond's permission.)"

* tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
  sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
  nfsd: recover: fix memory leak
  nfsd: fix deadlock secinfo+readdir compound
  nfsd4: resfh unused in nfsd4_secinfo
  svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
  svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
  svcrdma: Remove close_out exit path
  svcrdma: Hook up the logic to return ERR_CHUNK
  svcrdma: Use correct XID in error replies
  svcrdma: Make RDMA_ERROR messages work
  rpcrdma: Add RPCRDMA_HDRLEN_ERR
  svcrdma: svc_rdma_post_recv() should close connection on error
  svcrdma: Close connection when a send error occurs
  nfsd: Lower NFSv4.1 callback message size limit
  svcrdma: Do not send Write chunk XDR pad with inline content
  svcrdma: Do not write xdr_buf::tail in a Write chunk
  svcrdma: Find client-provided write and reply chunks once per reply
  nfsd: Update NFS server comments related to RDMA support
  nfsd: Fix a memory leak when meeting unsupported state_protect_how4
  nfsd4: fix bad bounds checking
2016-03-24 10:41:00 -07:00
Martin Brandenburg fecd86aac5 ornagefs: ensure that truncate has an up to date inode size
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:16 -04:00
Martin Brandenburg e8da254c41 orangefs: move code which sets i_link to orangefs_inode_getattr
Everything else setting inode->i_ values is in there.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:16 -04:00
Martin Brandenburg 05d31c5cb3 orangefs: remove needless wrapper around GFP_KERNEL
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg 93d53a4885 orangefs: remove wrapper around mutex_lock(&inode->i_mutex)
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg 266626339b orangefs: refactor inode type or link_target change detection
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg 5859d77e56 orangefs: use new getattr for revalidate and remove old getattr
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg 8f24928d19 orangefs: use new getattr in inode getattr and permission
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg e2f7f0d798 orangefs: use new orangefs_inode_getattr to get size in write and llseek
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:14 -04:00
Martin Brandenburg 075cca50b6 orangefs: use new orangefs_inode_getattr to create new inodes
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:14 -04:00
Martin Brandenburg 3c9cf98d7b orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr
This is motivated by orangefs_inode_old_getattr's habit of writing over
live inodes.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:14 -04:00
Martin Brandenburg d57521a653 orangefs: remove inode->i_lock wrapper
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:13 -04:00
Benjamin Coddington ac503e4a30 nfsd: use short read as well as i_size to set eof
Use the result of a local read to determine when to set the eof flag.  This
allows us to return the location of the end of the file atomically at the
time of the read.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
[bfields: add some documentation]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-23 16:02:39 -04:00
Linus Torvalds c130423620 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

  API:
   - Fix kzalloc error path crash in ecryptfs added by skcipher
     conversion.  Note the subject of the commit is screwed up and the
     correct subject is actually in the body.

  Drivers:
   - A number of fixes to the marvell cesa hashing code.
   - Remove bogus nested irqsave that clobbers the saved flags in ccp"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: marvell/cesa - forward devm_ioremap_resource() error code
  crypto: marvell/cesa - initialize hash states
  crypto: marvell/cesa - fix memory leak
  crypto: ccp - fix lock acquisition code
  eCryptfs: Use skcipher and shash
2016-03-23 06:12:39 -07:00
Linus Torvalds a24e3d414e Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:

 - more ocfs2 changes

 - a few hotfixes

 - Andy's compat cleanups

 - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
   panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

 - many rapidio updates: fixes, new drivers.

 - kcov: kernel code coverage feature.  Like gcov, but not
   "prohibitively expensive".

 - extable code consolidation for various archs

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
  ia64/extable: use generic search and sort routines
  x86/extable: use generic search and sort routines
  s390/extable: use generic search and sort routines
  alpha/extable: use generic search and sort routines
  kernel/...: convert pr_warning to pr_warn
  drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
  drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
  memremap: add MEMREMAP_WC flag
  memremap: don't modify flags
  kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
  mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
  ipc/sem: make semctl setting sempid consistent
  ubsan: fix tree-wide -Wmaybe-uninitialized false positives
  kfifo: fix sparse complaints
  scripts/gdb: account for changes in module data structure
  scripts/gdb: add cmdline reader command
  scripts/gdb: add version command
  kernel: add kcov code coverage
  profile: hide unused functions when !CONFIG_PROC_FS
  hpwdt: use nmi_panic() when kernel panics in NMI handler
  ...
2016-03-22 17:09:14 -07:00
Paolo Bonzini a484c3dd94 eventfd: document lockless access in eventfd_poll
Since commit e22553e2a2 ("eventfd: don't take the spinlock in
eventfd_poll", 2015-02-17), eventfd is reading ctx->count outside
ctx->wqh.lock.

However, things aren't as simple as the read barrier in eventfd_poll
would suggest.  In fact, the read barrier, besides lacking a comment, is
not paired in any obvious manner with another read barrier, and it is
pointless because it is sitting between a write (deep in poll_wait) and
the read of ctx->count.  The read barrier is acting just as a compiler
barrier, for which we can use READ_ONCE instead.  This is what the code
change in this patch does.

The documentation change is just as important, however.  The question,
posed by Andrea Arcangeli, is then why the thing is safe on
architectures where spin_unlock does not imply a store-load memory
barrier.  The answer is that it's safe because writes of ctx->count use
the same lock as poll_wait, and hence an acquire barrier implicit in
poll_wait provides the necessary synchronization between eventfd_poll
and callers of wake_up_locked_poll.  This is sort of mentioned in the
commit message with respect to eventfd_ctx_read ("eventfd_read is
similar, it will do a single decrement with the lock held") but it
applies to all other callers too.  It's tricky enough that it should be
documented in the code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Mason <clm@fb.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Jann Horn 378c6520e7 fs/coredump: prevent fsuid=0 dumps into user-controlled directories
This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:

 - The fs.suid_dumpable sysctl is set to 2.
 - The kernel.core_pattern sysctl's value starts with "/". (Systems
   where kernel.core_pattern starts with "|/" are not affected.)
 - Unprivileged user namespace creation is permitted. (This is
   true on Linux >=3.8, but some distributions disallow it by
   default using a distro patch.)

Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.

To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Maciej S. Szmigiero 3873938068 fat: add config option to set UTF-8 mount option by default
FAT has long supported its own default file name encoding config
setting, separate from CONFIG_NLS_DEFAULT.

However, if UTF-8 encoded file names are desired FAT character set
should not be set to utf8 since this would make file names case
sensitive even if case insensitive matching is requested.  Instead,
"utf8" mount options should be provided to enable UTF-8 file names in
FAT file system.

Unfortunately, there was no possibility to set the default value of this
option so on UTF-8 system "utf8" mount option had to be added manually
to most FAT mounts.

This patch adds config option to set such default value.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Andy Lutomirski 121cef8f17 ext4: in ext4_dir_llseek, check syscall bitness directly
ext4 treats directory offsets differently for 32-bit and 64-bit callers.
Check the caller type using in_compat_syscall, not is_compat_task.  This
changes behavior on SPARC slightly.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Gang He d56a8f32e4 ocfs2: check/fix inode block for online file check
Implement online check or fix inode block during reading a inode block
to memory.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Gang He a849d46816 ocfs2: create/remove sysfile for online file check
Create online file check sysfile when ocfs2 mount, remove the related
sysfile when ocfs2 umount.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Gang He a860f6eb4c ocfs2: sysfile interfaces for online file check
Implement online file check sysfile interfaces, e.g. how to create the
related sysfile according to device name, how to display/handle file
check request from the sysfile.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Gang He 9dde5e4f33 ocfs2: export ocfs2_kset for online file check
When there are errors in the ocfs2 filesystem, they are usually
accompanied by the inode number which caused the error.  This inode
number would be the input to fixing the file.  One of these options
could be considered:

A file in the sys filesytem which would accept inode numbers.  This
could be used to communication back what has to be fixed or is fixed.
You could write:

  $# echo "<inode>" > /sys/fs/ocfs2/devname/filecheck/check

or

  $# echo "<inode>" > /sys/fs/ocfs2/devname/filecheck/fix

Compare with second version, I re-design filecheck sysfs interfaces,
there are three sysfs files (check, fix and set) under filecheck
directory (see above), sysfs will accept only one argument <inode>.
Second, I adjust some code in ocfs2_filecheck_repair_inode_block()
function according to upstream feedback, we cannot just add VALID_FL
flag back as a inode block fix, then we will not fix this field
corruption currently until having a complete solution.  Compare with
first version, I use strncasecmp instead of double strncmp functions.
Second, update the source file contribution vendor.

This patch (of 4):

Export ocfs2_kset object from ocfs2_stackglue kernel module, then online
file check code will create the related sysfiles under ocfs2_kset
object.  We're exporting this because it's built in ocfs2_stackglue.ko.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Linus Torvalds 01cde1538e NFS client updates for Linux 4.6
Highlights include:
 
 Features:
 - Add support for multiple NFSv4.1 callbacks in flight
 - Initial patchset for RPC multipath support
 - Adapt RPC/RDMA to use the new completion queue API
 
 Bugfixes and cleanups:
 - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
 - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
 - Fix RPC/RDMA credit accounting
 - Properly handle RDMA_ERROR replies
 - xprtrdma: Do not wait if ib_post_send() fails
 - xprtrdma: Segment head and tail XDR buffers on page boundaries
 - xprtrdma cleanups for dprintk, physical_op_map and unused macros
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8Y7MAAoJEGcL54qWCgDyMsMP+we8JSgfVqI5X1lKpU9aPWkI
 D912ybtV58Kv0elKwYvQMqm+mRvdMNz1hZNJa4sAEaPVBOGfFjyZLy3xlDlr0HTf
 M+Juh0FNLTcUh1obxJamjsbpNxfg4b6f/Z29KWRzahv/MlpMJVS3hLjpAEzCcTYr
 WfOOovV6mragtsBINegGl/6jk/x2D22JDnKcTU+8ltVZGJtZe+HoqTFhUOrLO5qm
 wR3YO22fbOuiZxCPoST06kMNiksYnYXnOju8RwlKwFYq3bWke0jWstQtIC4vKs6K
 4u5o74aTBL5zMkJPnJuIfN2if4LJPptSr1n7pItbv3MLmgY1mWjE6N2BATpijfhQ
 p+Gt/GHTAvswuWrmwySZKLj/Q8EkBuw4ohPFwLQ9eFHl2USoV3G9KQw7H0odR4d1
 IyQPCag+suN2lWBreFkPIV48dZyeCVk6JmJmy3SN+d0L1t3jd6gwSO2UBgG7S2Gd
 qVbdxYRiU/zYP6E5wFouLhIc1beSfe4vnJqvnuWZrIId+haTE2+OLi7772WGIkSe
 xoZVTg7AX4Wu79ZyWoH+e9FnDvEsRkRVv7HQfpsMq2gynBWj70/KemEoeZnjqWaB
 UOWcH8/vNLrnwlXTh0VHG6I8t3s0EXgqQB4//tYRLI42oIj35W2pIMnjYt52DeVB
 Mo5mbYghtR9bgeoRQ6V4
 =kC3t
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:
   - Add support for multiple NFSv4.1 callbacks in flight
   - Initial patchset for RPC multipath support
   - Adapt RPC/RDMA to use the new completion queue API

  Bugfixes and cleanups:
   - nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
   - Cleanups to remove nfs_inode_dio_wait and nfs4_file_fsync
   - Fix RPC/RDMA credit accounting
   - Properly handle RDMA_ERROR replies
   - xprtrdma: Do not wait if ib_post_send() fails
   - xprtrdma: Segment head and tail XDR buffers on page boundaries
   - xprtrdma cleanups for dprintk, physical_op_map and unused macros"

* tag 'nfs-for-4.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (35 commits)
  nfs/blocklayout: make sure making a aligned read request
  nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed
  nfs: remove nfs_inode_dio_wait
  nfs: remove nfs4_file_fsync
  xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs
  xprtrdma: Use an anonymous union in struct rpcrdma_mw
  xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs
  xprtrdma: Serialize credit accounting again
  xprtrdma: Properly handle RDMA_ERROR replies
  rpcrdma: Add RPCRDMA_HDRLEN_ERR
  xprtrdma: Do not wait if ib_post_send() fails
  xprtrdma: Segment head and tail XDR buffers on page boundaries
  xprtrdma: Clean up dprintk format string containing a newline
  xprtrdma: Clean up physical_op_map()
  xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro
  NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback
  pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3
  SUNRPC: Allow addition of new transports to a struct rpc_clnt
  NFSv4.1: nfs4_proc_bind_conn_to_session must iterate over all connections
  SUNRPC: Make NFS swap work with multipath
  ...
2016-03-22 13:16:21 -07:00
Linus Torvalds 243d506785 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "Various fixes and tweaks"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: cleanup unused var in rename2
  ovl: rename is_merge to is_lowest
  ovl: fixed coding style warning
  ovl: Ensure upper filesystem supports d_type
  ovl: Warn on copy up if a process has a R/O fd open to the lower file
  ovl: honor flag MS_SILENT at mount
  ovl: verify upper dentry before unlink and rename
2016-03-22 13:11:15 -07:00
Linus Torvalds 9f15dec813 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse update from Miklos Szeredi:
 "This contains direct I/O fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: return patrial success from fuse_direct_io()
  fuse: Add reference counting for fuse_io_priv
  fuse: do not use iocb after it may have been freed
2016-03-22 13:05:34 -07:00
J. Bruce Fields 4b15da44e7 nfsd: better layoutupdate bounds-checking
You could add any multiple of 2^32/PNFS_SCSI_RANGE_SIZE to nr_iomaps and
still pass this check.  You'd probably still fail the following kcalloc,
but best to be paranoid since this is from-the-wire data.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-22 14:39:35 -04:00
Jiri Kosina ce63f891e1 btrfs: transaction_kthread() is not freezable
transaction_kthread() is calling try_to_freeze(), but that's just an
expeinsive no-op given the fact that the thread is not marked freezable.

After removing this, disk-io.c is now independent on freezer API.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-22 10:08:47 +01:00
Jiri Kosina 838fe18877 btrfs: cleaner_kthread() doesn't need explicit freeze
cleaner_kthread() is not marked freezable, and therefore calling
try_to_freeze() in its context is a pointless no-op.

In addition to that, as has been clearly demonstrated by 80ad623edd
("Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()"), it's perfectly
valid / legal for cleaner_kthread() to stay scheduled out in an arbitrary
place during suspend (in that particular example that was waiting for
reading of extent pages), so there is no need to leave any traces of
freezer in this kthread.

Fixes: 80ad623edd ("Revert "btrfs: clear PF_NOFREEZE in cleaner_kthread()")
Fixes: 6962491321 ("btrfs: clear PF_NOFREEZE in cleaner_kthread()")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-22 10:08:47 +01:00
Alex Lyakas 0f805531da btrfs: do not write corrupted metadata blocks to disk
csum_dirty_buffer was issuing a warning in case the extent buffer
did not look alright, but was still returning success.
Let's return error in this case, and also add an additional sanity
check on the extent buffer header.
The caller up the chain may BUG_ON on this, for example flush_epd_write_bio will,
but it is better than to have a silent metadata corruption on disk.

Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-22 10:08:12 +01:00
Alex Lyakas 8bd98f0e6b btrfs: csum_tree_block: return proper errno value
Signed-off-by: Alex Lyakas <alex@zadarastorage.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-03-22 10:07:43 +01:00
Linus Torvalds 968f3e374f Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "We have a good sized cleanup of our internal read ahead code, and the
  first series of commits from Chandan to enable PAGE_SIZE > sectorsize

  Otherwise, it's a normal series of cleanups and fixes, with many
  thanks to Dave Sterba for doing most of the patch wrangling this time"

* 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (82 commits)
  btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums
  btrfs: Fix misspellings in comments.
  btrfs: Print Warning only if ENOSPC_DEBUG is enabled
  btrfs: scrub: silence an uninitialized variable warning
  btrfs: move btrfs_compression_type to compression.h
  btrfs: rename btrfs_print_info to btrfs_print_mod_info
  Btrfs: Show a warning message if one of objectid reaches its highest value
  Documentation: btrfs: remove usage specific information
  btrfs: use kbasename in btrfsic_mount
  Btrfs: do not collect ordered extents when logging that inode exists
  Btrfs: fix race when checking if we can skip fsync'ing an inode
  Btrfs: fix listxattrs not listing all xattrs packed in the same item
  Btrfs: fix deadlock between direct IO reads and buffered writes
  Btrfs: fix extent_same allowing destination offset beyond i_size
  Btrfs: fix file loss on log replay after renaming a file and fsync
  Btrfs: fix unreplayable log after snapshot delete + parent dir fsync
  Btrfs: fix lockdep deadlock warning due to dev_replace
  btrfs: drop unused argument in btrfs_ioctl_get_supported_features
  btrfs: add GET_SUPPORTED_FEATURES to the control device ioctls
  btrfs: change max_inline default to 2048
  ...
2016-03-21 18:12:42 -07:00
Linus Torvalds 77d913178c Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF and quota updates from Jan Kara:
 "This contains a rewrite of UDF handling of filename encoding to fix
  remaining overflow issues from Andrew Gabbasov and quota changes to
  support new Q_[X]GETNEXTQUOTA quotactl for VFS quota formats"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Fix possible GPF due to uninitialised pointers
  ext4: Make Q_GETNEXTQUOTA work for quota in hidden inodes
  quota: Forbid Q_GETQUOTA and Q_GETNEXTQUOTA for frozen filesystem
  quota: Fix possible races during quota loading
  ocfs2: Implement get_next_id()
  quota_v2: Implement get_next_id() for V2 quota format
  quota: Add support for ->get_nextdqblk() for VFS quota
  udf: Merge linux specific translation into CS0 conversion function
  udf: Remove struct ustr as non-needed intermediate storage
  udf: Use separate buffer for copying split names
  udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
  udf: Join functions for UTF8 and NLS conversions
  udf: Parameterize output length in udf_put_filename
  quota: Allow Q_GETQUOTA for frozen filesystem
  quota: Fixup comments about return value of Q_[X]GETNEXTQUOTA
2016-03-21 12:22:37 -07:00
Linus Torvalds 53d2e6976b xfs: Changes for 4.6-rc1
Change summary:
 o error propagation for direct IO failures fixes for both XFS and ext4
 o new quota interfaces and XFS implementation for iterating all the quota IDs
   in the filesystem
 o locking fixes for real-time device extent allocation
 o reduction of duplicate information in the xfs and vfs inode, saving roughly
   100 bytes of memory per cached inode.
 o buffer flag cleanup
 o rework of the writepage code to use the generic write clustering mechanisms
 o several fixes for inode flag based DAX enablement
 o rework of remount option parsing
 o compile time verification of on-disk format structure sizes
 o delayed allocation reservation overrun fixes
 o lots of little error handling fixes
 o small memory leak fixes
 o enable xfsaild freezing again
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW71DQAAoJEK3oKUf0dfodyiwP/0Tou9f1huzLC0kd7kmEoKKC
 BWQmtJGEdo0iSpJNZhg/EJmjvRtbBiOB9CRcEyG8d71kqZ+MKW7t/4JjNvNG34aE
 vHjhwMBVVqkw/q6azi2LiEDsVcOe5bXxUrXNZi18/09OAl4pHm+X8VERLnnC5y+i
 QIHAOdB5R+36cXcceJm1HR6jTZedbNdQkT/ndhm5S60FGhvVI29cs9NwYwoi5aif
 O55r6krSWBj6U/X6MsLvr+lNb6+1Sd1hyE8dGTE7lOUX/crFIysaDPEuQmWvDjsO
 M1ulVfzKoBJHcyvpbdHwdBEyiBjzvETcrgndMRoWOjZiOLqNtWYsgIEiC+Nlidwd
 +T4XhkJJJg5UUQ4r6Hs85SQn/THanzR5KoN5nbTsFtFkCKw1DRkUSNuh2mXP2xVG
 JcNDCjDvvHG76EfQ1otlYf7ru79Ck+hjVs+szaEVPpOzAwz8yOtD+L7I8f73gQ6a
 ayP8W2oZQpYvQRv+smgvt+HwQA4fNJk9ZseY3QD5+z5snJz7JEhZogqW+ngFYkNQ
 dtA5Y7gpTkKfo3mKO0XmE5+3fcSXhGHGYQzmUgJFlgWTK7+E8fuDhn6D66wFcZSq
 QhyRk9J7Xb7ZWuP5PlOkxb9DLd4hnuyie2bYw/0hVtOatjE/Em4gRJ3Oq3ZANwZx
 OeMGj4Uyb3/MKAJwy3Gq
 =ZoiX
 -----END PGP SIGNATURE-----

Merge tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs updates from Dave Chinner:
 "There's quite a lot in this request, and there's some cross-over with
  ext4, dax and quota code due to the nature of the changes being made.

  As for the rest of the XFS changes, there are lots of little things
  all over the place, which add up to a lot of changes in the end.

  The major changes are that we've reduced the size of the struct
  xfs_inode by ~100 bytes (gives an inode cache footprint reduction of
  >10%), the writepage code now only does a single set of mapping tree
  lockups so uses less CPU, delayed allocation reservations won't
  overrun under random write loads anymore, and we added compile time
  verification for on-disk structure sizes so we find out when a commit
  or platform/compiler change breaks the on disk structure as early as
  possible.

  Change summary:

   - error propagation for direct IO failures fixes for both XFS and
     ext4
   - new quota interfaces and XFS implementation for iterating all the
     quota IDs in the filesystem
   - locking fixes for real-time device extent allocation
   - reduction of duplicate information in the xfs and vfs inode, saving
     roughly 100 bytes of memory per cached inode.
   - buffer flag cleanup
   - rework of the writepage code to use the generic write clustering
     mechanisms
   - several fixes for inode flag based DAX enablement
   - rework of remount option parsing
   - compile time verification of on-disk format structure sizes
   - delayed allocation reservation overrun fixes
   - lots of little error handling fixes
   - small memory leak fixes
   - enable xfsaild freezing again"

* tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (66 commits)
  xfs: always set rvalp in xfs_dir2_node_trim_free
  xfs: ensure committed is initialized in xfs_trans_roll
  xfs: borrow indirect blocks from freed extent when available
  xfs: refactor delalloc indlen reservation split into helper
  xfs: update freeblocks counter after extent deletion
  xfs: debug mode forced buffered write failure
  xfs: remove impossible condition
  xfs: check sizes of XFS on-disk structures at compile time
  xfs: ioends require logically contiguous file offsets
  xfs: use named array initializers for log item dumping
  xfs: fix computation of inode btree maxlevels
  xfs: reinitialise per-AG structures if geometry changes during recovery
  xfs: remove xfs_trans_get_block_res
  xfs: fix up inode32/64 (re)mount handling
  xfs: fix format specifier , should be %llx and not %llu
  xfs: sanitize remount options
  xfs: convert mount option parsing to tokens
  xfs: fix two memory leaks in xfs_attr_list.c error paths
  xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE
  xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared
  ...
2016-03-21 11:53:05 -07:00
Linus Torvalds d407574e79 Merge tag 'for-f2fs-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "New Features:
   - uplift filesystem encryption into fs/crypto/
   - give sysfs entries to control memroy consumption

  Enhancements:
   - aio performance by preallocating blocks in ->write_iter
   - use writepages lock for only WB_SYNC_ALL
   - avoid redundant inline_data conversion
   - enhance forground GC
   - use wait_for_stable_page as possible
   - speed up SEEK_DATA and fiiemap

  Bug Fixes:
   - corner case in terms of -ENOSPC for inline_data
   - hung task caused by long latency in shrinker
   - corruption between atomic write and f2fs_trace_pid
   - avoid garbage lengths in dentries
   - revoke atomicly written pages if an error occurs

  In addition, there are various minor bug fixes and clean-ups"

* tag 'for-f2fs-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits)
  f2fs: submit node page write bios when really required
  f2fs: add missing argument to f2fs_setxattr stub
  f2fs: fix to avoid unneeded unlock_new_inode
  f2fs: clean up opened code with f2fs_update_dentry
  f2fs: declare static functions
  f2fs: use cryptoapi crc32 functions
  f2fs: modify the readahead method in ra_node_page()
  f2fs crypto: sync ext4_lookup and ext4_file_open
  fs crypto: move per-file encryption from f2fs tree to fs/crypto
  f2fs: mutex can't be used by down_write_nest_lock()
  f2fs: recovery missing dot dentries in root directory
  f2fs: fix to avoid deadlock when merging inline data
  f2fs: introduce f2fs_flush_merged_bios for cleanup
  f2fs: introduce f2fs_update_data_blkaddr for cleanup
  f2fs crypto: fix incorrect positioning for GCing encrypted data page
  f2fs: fix incorrect upper bound when iterating inode mapping tree
  f2fs: avoid hungtask problem caused by losing wake_up
  f2fs: trace old block address for CoWed page
  f2fs: try to flush inode after merging inline data
  f2fs: show more info about superblock recovery
  ...
2016-03-21 11:03:02 -07:00
Linus Torvalds 5518f66b5a Merge branch 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup namespace support from Tejun Heo:
 "These are changes to implement namespace support for cgroup which has
  been pending for quite some time now.  It is very straight-forward and
  only affects what part of cgroup hierarchies are visible.

  After unsharing, mounting a cgroup fs will be scoped to the cgroups
  the task belonged to at the time of unsharing and the cgroup paths
  exposed to userland would be adjusted accordingly"

* 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix and restructure error handling in copy_cgroup_ns()
  cgroup: fix alloc_cgroup_ns() error handling in copy_cgroup_ns()
  Add FS_USERNS_FLAG to cgroup fs
  cgroup: Add documentation for cgroup namespaces
  cgroup: mount cgroupns-root when inside non-init cgroupns
  kernfs: define kernfs_node_dentry
  cgroup: cgroup namespace setns support
  cgroup: introduce cgroup namespaces
  sched: new clone flag CLONE_NEWCGROUP for cgroup namespace
  kernfs: Add API to generate relative kernfs path
2016-03-21 10:05:13 -07:00
Kinglong Mee f35592a974 nfs/blocklayout: make sure making a aligned read request
Only treat write goes up to the inode size as aligned request,
because it always write PAGE_CACHE_SIZE, but read a dynamic size.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-03-21 12:39:46 -04:00
Miklos Szeredi 6986c012fa ovl: cleanup unused var in rename2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:46 +01:00
Miklos Szeredi 56656e960b ovl: rename is_merge to is_lowest
The 'is_merge' is an historical naming from when only a single lower layer
could exist.  With the introduction of multiple lower layers the meaning of
this flag was changed to mean only the "lowest layer" (while all lower
layers were being merged).

So now 'is_merge' is inaccurate and hence renaming to 'is_lowest'

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:46 +01:00
Sohom Bhattacharjee f134f24465 ovl: fixed coding style warning
This patch fixes a newline warning found by the checkpatch.pl tool

Signed-off-by: Sohom-Bhattacharjee <soham.bhattacharjee15@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:45 +01:00
Vivek Goyal 45aebeaf4f ovl: Ensure upper filesystem supports d_type
In some instances xfs has been created with ftype=0 and there if a file
on lower fs is removed, overlay leaves a whiteout in upper fs but that
whiteout does not get filtered out and is visible to overlayfs users.

And reason it does not get filtered out because upper filesystem does
not report file type of whiteout as DT_CHR during iterate_dir().

So it seems to be a requirement that upper filesystem support d_type for
overlayfs to work properly. Do this check during mount and fail if d_type
is not supported.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:45 +01:00
David Howells fb5bb2c3b7 ovl: Warn on copy up if a process has a R/O fd open to the lower file
Print a warning when overlayfs copies up a file if the process that
triggered the copy up has a R/O fd open to the lower file being copied up.

This can help catch applications that do things like the following:

	fd1 = open("foo", O_RDONLY);
	fd2 = open("foo", O_RDWR);

where they expect fd1 and fd2 to refer to the same file - which will no
longer be the case post-copy up.

With this patch, the following commands:

	bash 5</mnt/a/foo128
	6<>/mnt/a/foo128

assuming /mnt/a/foo128 to be an un-copied up file on an overlay will
produce the following warning in the kernel log:

	overlayfs: Copying up foo129, but open R/O on fd 5 which will cease
	to be coherent [pid=3818 bash]

This is enabled by setting:

	/sys/module/overlay/parameters/check_copy_up

to 1.

The warnings are ratelimited and are also limited to one warning per file -
assuming the copy up completes in each case.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:45 +01:00
Konstantin Khlebnikov 07f2af7bfd ovl: honor flag MS_SILENT at mount
This patch hides error about missing lowerdir if MS_SILENT is set.

We use mount(NULL, "/", "overlay", MS_SILENT, NULL) for testing support of
overlayfs: syscall returns -ENODEV if it's not supported. Otherwise kernel
automatically loads module and returns -EINVAL because lowerdir is missing.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:45 +01:00
Miklos Szeredi 11f3710417 ovl: verify upper dentry before unlink and rename
Unlink and rename in overlayfs checked the upper dentry for staleness by
verifying upper->d_parent against upperdir.  However the dentry can go
stale also by being unhashed, for example.

Expand the verification to actually look up the name again (under parent
lock) and check if it matches the upper dentry.  This matches what the VFS
does before passing the dentry to filesytem's unlink/rename methods, which
excludes any inconsistency caused by overlayfs.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-03-21 17:31:44 +01:00
Chris Mason 389f239c53 btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums
Commit c40a3d38af (Btrfs: Compute and look up csums based on
sectorsized blocks) changes around how we walk the bios while looking up
crcs.  There's an inner loop that is jumping to the next bvec based on
sectors and before it derefs the next bvec, it needs to make sure we're
still in the bio.

In this case, the outer loop would have decided to stop moving forward
too, and the bvec deref is never actually used for anything.  But
CONFIG_DEBUG_PAGEALLOC catches it because we're outside our bio.

Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
2016-03-21 07:25:44 -07:00
Linus Torvalds 643ad15d47 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
2016-03-20 19:08:56 -07:00
Andreas Gruenbacher c27cb97218 ubifs: Remove unused header
UBIFS does not support POSIX ACLs, so there is no need for including any
POSIX ACL hesders.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-03-20 21:37:46 +01:00
Joe Perches 3e7f2c5104 ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warn
The existing logging macros are fairly large and converting the
macros to functions make the object code smaller.

Use %pV and __builtin_return_address(0) as appropriate.

$ size fs/ubifs/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 575831	 309688	 161312	1046831	  ff92f	fs/ubifs/built-in.o.allyesconfig.new
 622457	 312872	 161120	1096449	 10bb01	fs/ubifs/built-in.o.allyesconfig.old
 223785	    640	    644	 225069	  36f2d	fs/ubifs/built-in.o.defconfig.new
 251873	    640	    644	 253157	  3dce5	fs/ubifs/built-in.o.defconfig.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-03-20 21:36:05 +01:00
Tejun Heo aaf2559332 writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
When cgroup writeback is in use, there can be multiple wb's
(bdi_writeback's) per bdi and an inode may switch among them
dynamically.  In a couple places, the wrong wb was used leading to
performing operations on the wrong list under the wrong lock
corrupting the io lists.

* writeback_single_inode() was taking @wb parameter and used it to
  remove the inode from io lists if it becomes clean after writeback.
  The callers of this function were always passing in the root wb
  regardless of the actual wb that the inode was associated with,
  which could also change while writeback is in progress.

  Fix it by dropping the @wb parameter and using
  inode_to_wb_and_lock_list() to determine and lock the associated wb.

* After writeback_sb_inodes() writes out an inode, it re-locks @wb and
  inode to remove it from or move it to the right io list.  It assumes
  that the inode is still associated with @wb; however, the inode may
  have switched to another wb while writeback was in progress.

  Fix it by using inode_to_wb_and_lock_list() to determine and lock
  the associated wb after writeback is complete.  As the function
  requires the original @wb->list_lock locked for the next iteration,
  in the unlikely case where the inode has changed association, switch
  the locks.

Kudos to Tahsin for pinpointing these subtle breakages.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: d10c809552 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Link: http://lkml.kernel.org/g/CAAeU0aMYeM_39Y2+PaRvyB1nqAPYZSNngJ1eBRmrxn7gKAt2Mg@mail.gmail.com
Reported-and-diagnosed-by: Tahsin Erdogan <tahsin@google.com>
Tested-by: Tahsin Erdogan <tahsin@google.com>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-20 09:44:20 -06:00
Tejun Heo 614a4e3773 writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
locked_inode_to_wb_and_lock_list() wb_get()'s the wb associated with
the target inode, unlocks inode, locks the wb's list_lock and verifies
that the inode is still associated with the wb.  To prevent the wb
going away between dropping inode lock and acquiring list_lock, the wb
is pinned while inode lock is held.  The wb reference is put right
after acquiring list_lock citing that the wb won't be dereferenced
anymore.

This isn't true.  If the inode is still associated with the wb, the
inode has reference and it's safe to return the wb; however, if inode
has been switched, the wb still needs to be unlocked which is a
dereference and can lead to use-after-free if it it races with wb
destruction.

Fix it by putting the reference after releasing list_lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 87e1d789bf ("writeback: implement [locked_]inode_to_wb_and_lock_list()")
Cc: stable@vger.kernel.org # v4.2+
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-20 09:44:18 -06:00
Linus Torvalds 3c2de27d79 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - Preparations of parallel lookups (the remaining main obstacle is the
   need to move security_d_instantiate(); once that becomes safe, the
   rest will be a matter of rather short series local to fs/*.c

 - preadv2/pwritev2 series from Christoph

 - assorted fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits)
  splice: handle zero nr_pages in splice_to_pipe()
  vfs: show_vfsstat: do not ignore errors from show_devname method
  dcache.c: new helper: __d_add()
  don't bother with __d_instantiate(dentry, NULL)
  untangle fsnotify_d_instantiate() a bit
  uninline d_add()
  replace d_add_unique() with saner primitive
  quota: use lookup_one_len_unlocked()
  cifs_get_root(): use lookup_one_len_unlocked()
  nfs_lookup: don't bother with d_instantiate(dentry, NULL)
  kill dentry_unhash()
  ceph_fill_trace(): don't bother with d_instantiate(dn, NULL)
  autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup()
  configfs: move d_rehash() into configfs_create() for regular files
  ceph: don't bother with d_rehash() in splice_dentry()
  namei: teach lookup_slow() to skip revalidate
  namei: massage lookup_slow() to be usable by lookup_one_len_unlocked()
  lookup_one_len_unlocked(): use lookup_dcache()
  namei: simplify invalidation logics in lookup_dcache()
  namei: change calling conventions for lookup_{fast,slow} and follow_managed()
  ...
2016-03-19 18:52:29 -07:00
Linus Torvalds 814a2bf957 Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - a couple of hotfixes

 - the rest of MM

 - a new timer slack control in procfs

 - a couple of procfs fixes

 - a few misc things

 - some printk tweaks

 - lib/ updates, notably to radix-tree.

 - add my and Nick Piggin's old userspace radix-tree test harness to
   tools/testing/radix-tree/.  Matthew said it was a godsend during the
   radix-tree work he did.

 - a few code-size improvements, switching to __always_inline where gcc
   screwed up.

 - partially implement character sets in sscanf

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  sscanf: implement basic character sets
  lib/bug.c: use common WARN helper
  param: convert some "on"/"off" users to strtobool
  lib: add "on"/"off" support to kstrtobool
  lib: update single-char callers of strtobool()
  lib: move strtobool() to kstrtobool()
  include/linux/unaligned: force inlining of byteswap operations
  include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
  include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
  usb: common: convert to use match_string() helper
  ide: hpt366: convert to use match_string() helper
  ata: hpt366: convert to use match_string() helper
  power: ab8500: convert to use match_string() helper
  power: charger_manager: convert to use match_string() helper
  drm/edid: convert to use match_string() helper
  pinctrl: convert to use match_string() helper
  device property: convert to use match_string() helper
  lib/string: introduce match_string() helper
  radix-tree tests: add test for radix_tree_iter_next
  radix-tree tests: add regression3 test
  ...
2016-03-18 19:26:54 -07:00
Linus Torvalds 35d88d97be Merge branch 'for-4.6/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
 "Here are the core block changes for this merge window.  Not a lot of
  exciting stuff going on in this round, most of the changes have been
  on the driver side of things.  That pull request is coming next.  This
  pull request contains:

   - A set of fixes for chained bio handling from Christoph.

   - A tag bounds check for blk-mq from Hannes, ensuring that we don't
     do something stupid if a device reports an invalid tag value.

   - A set of fixes/updates for the CFQ IO scheduler from Jan Kara.

   - A set of blk-mq fixes from Keith, adding support for dynamic
     hardware queues, and fixing init of max_dev_sectors for stacking
     devices.

   - A fix for the dynamic hw context from Ming.

   - Enabling of cgroup writeback support on a block device, from
     Shaohua"

* 'for-4.6/core' of git://git.kernel.dk/linux-block:
  blk-mq: add bounds check on tag-to-rq conversion
  block: bio_remaining_done() isn't unlikely
  block: cleanup bio_endio
  block: factor out chained bio completion
  block: don't unecessarily clobber bi_error for chained bios
  block-dev: enable writeback cgroup support
  blk-mq: Fix NULL pointer updating nr_requests
  blk-mq: mark request queue as mq asap
  block: Initialize max_dev_sectors to 0
  blk-mq: dynamic h/w context count
  cfq-iosched: Allow parent cgroup to preempt its child
  cfq-iosched: Allow sync noidle workloads to preempt each other
  cfq-iosched: Reorder checks in cfq_should_preempt()
  cfq-iosched: Don't group_idle if cfqq has big thinktime
2016-03-18 16:43:11 -07:00
Al Viro 8b23a8ce10 Merge branches 'work.lookups', 'work.misc' and 'work.preadv2' into for-next 2016-03-18 16:07:38 -04:00
Rabin Vincent d6785d9152 splice: handle zero nr_pages in splice_to_pipe()
Running the following command:

 busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null

with any tracing enabled pretty very quickly leads to various NULL
pointer dereferences and VM BUG_ON()s, such as these:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
 IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40
 Call Trace:
  [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
  [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
  [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
  [<ffffffff81196869>] do_sendfile+0x199/0x380
  [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
  [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d

 page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
 kernel BUG at include/linux/mm.h:367!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40
 Call Trace:
  [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
  [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
  [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
  [<ffffffff81196869>] do_sendfile+0x199/0x380
  [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
  [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89

(busybox's cat uses sendfile(2), unlike the coreutils version)

This is because tracing_splice_read_pipe() can call splice_to_pipe()
with spd->nr_pages == 0.  spd_pages underflows in splice_to_pipe() and
we fill the page pointers and the other fields of the pipe_buffers with
garbage.

All other callers of splice_to_pipe() avoid calling it when nr_pages ==
0, and we could make tracing_splice_read_pipe() do that too, but it
seems reasonable to have splice_to_page() handle this condition
gracefully.

Cc: stable@vger.kernel.org
Signed-off-by: Rabin Vincent <rabin@rab.in>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-18 16:06:44 -04:00
Christoph Hellwig 10c4de10b2 nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18 11:42:54 -04:00
Christoph Hellwig f99d4fbdae nfsd: add SCSI layout support
This is a simple extension to the block layout driver to use SCSI
persistent reservations for access control and fencing, as well as
SCSI VPD pages for device identification.

For this we need to pass the nfs4_client to the proc_getdeviceinfo method
to generate the reservation key, and add a new fence_client method
to allow for fence actions in the layout driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18 11:42:53 -04:00
Christoph Hellwig 368248eeb1 nfsd: move some blocklayout code
Trivial reorganization, no change in behavior.  Move some code around,
pull some code out of block layoutcommit that will be useful for the
scsi layout.

[bfields@redhat.com: split off from "nfsd: add SCSI layout support"]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18 11:41:17 -04:00
Christoph Hellwig 81c3932901 nfsd: add a new config option for the block layout driver
Split the config symbols into a generic pNFS one, which is invisible
and gets selected by the layout drivers, and one for the block layout
driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18 11:40:57 -04:00
Christoph Hellwig d9186c0397 nfs/blocklayout: add SCSI layout support
This is a trivial extension to the block layout driver to support the
new SCSI layouts draft.  There are three changes:

 - device identifcation through the SCSI VPD page.  This allows us to
   directly use the udev generated persistent device names instead of
   requiring an expensive lookup by crawling every block device node
   in /dev and reading a signature for it.
 - use of SCSI persistent reservations to protect device access and
   allow for robust fencing.  On the client sides this just means
   registering and unregistering a server supplied key.
 - an optimized LAYOUTCOMMIT payload that doesn't send unessecary
   fields to the server.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-18 11:38:17 -04:00
Jaegeuk Kim 12bb0a8fd4 f2fs: submit node page write bios when really required
If many threads calls fsync with data writes, we don't need to flush every
bios having node page writes.
The f2fs_wait_on_page_writeback will flush its bios when the page is really
needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:47 -07:00
Arnd Bergmann fff4c55d36 f2fs: add missing argument to f2fs_setxattr stub
The f2fs_setxattr() prototype for CONFIG_F2FS_FS_XATTR=n has
been wrong for a long time, since 8ae8f1627f ("f2fs: support
xattr security labels"), but there have never been any callers,
so it did not matter.

Now, the function gets called from f2fs_ioc_keyctl(), which
causes a build failure:

fs/f2fs/file.c: In function 'f2fs_ioc_keyctl':
include/linux/stddef.h:7:14: error: passing argument 6 of 'f2fs_setxattr' makes integer from pointer without a cast [-Werror=int-conversion]
 #define NULL ((void *)0)
              ^
fs/f2fs/file.c:1599:27: note: in expansion of macro 'NULL'
     value, F2FS_KEY_SIZE, NULL, type);
                           ^
In file included from ../fs/f2fs/file.c:29:0:
fs/f2fs/xattr.h:129:19: note: expected 'int' but argument is of type 'void *'
 static inline int f2fs_setxattr(struct inode *inode, int index,
                   ^
fs/f2fs/file.c:1597:9: error: too many arguments to function 'f2fs_setxattr'
  return f2fs_setxattr(inode, F2FS_XATTR_INDEX_KEY,
         ^
In file included from ../fs/f2fs/file.c:29:0:
fs/f2fs/xattr.h:129:19: note: declared here
 static inline int f2fs_setxattr(struct inode *inode, int index,

Thsi changes the prototype of the empty stub function to match
that of the actual implementation. This will not make the key
management work when F2FS_FS_XATTR is disabled, but it gets it
to build at least.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:47 -07:00
Chao Yu d726732c7c f2fs: fix to avoid unneeded unlock_new_inode
During ->lookup, I_NEW state of inode was been cleared in f2fs_iget,
so in error path, we don't need to clear it again.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:46 -07:00
Chao Yu 291bf80bec f2fs: clean up opened code with f2fs_update_dentry
Just clean up opened code with existing function, no logic change.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:45 -07:00
Jaegeuk Kim 17a0ee552c f2fs: declare static functions
Just to avoid sparse warnings.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:44 -07:00
Keith Mok 43b6573bac f2fs: use cryptoapi crc32 functions
The crc function is done bit by bit.
Optimize this by use cryptoapi
crc32 function which is backed by h/w acceleration.

Signed-off-by: Keith Mok <ek9852@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:43 -07:00
Fan Li 999270de31 f2fs: modify the readahead method in ra_node_page()
ra_node_page() is used to read ahead one node page. Comparing to regular
read, it's faster because it doesn't wait for IO completion.
But if it is called twice for reading the same block, and the IO request
from the first call hasn't been completed before the second call, the second
call will have to wait until the read is over.

Here use the code in __do_page_cache_readahead() to solve this problem.
It does nothing when someone else already puts the page in mapping. The
status of page should be assured by whoever puts it there.
This implement also prevents alteration of page reference count.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:43 -07:00
Jaegeuk Kim 8074bb5150 f2fs crypto: sync ext4_lookup and ext4_file_open
This patch tries to catch up with lookup and open policies in ext4.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:42 -07:00
Jaegeuk Kim 0b81d07790 fs crypto: move per-file encryption from f2fs tree to fs/crypto
This patch adds the renamed functions moved from the f2fs crypto files.

1. definitions for per-file encryption used by ext4 and f2fs.

2. crypto.c for encrypt/decrypt functions
 a. IO preparation:
  - fscrypt_get_ctx / fscrypt_release_ctx
 b. before IOs:
  - fscrypt_encrypt_page
  - fscrypt_decrypt_page
  - fscrypt_zeroout_range
 c. after IOs:
  - fscrypt_decrypt_bio_pages
  - fscrypt_pullback_bio_page
  - fscrypt_restore_control_page

3. policy.c supporting context management.
 a. For ioctls:
  - fscrypt_process_policy
  - fscrypt_get_policy
 b. For context permission
  - fscrypt_has_permitted_context
  - fscrypt_inherit_context

4. keyinfo.c to handle permissions
  - fscrypt_get_encryption_info
  - fscrypt_free_encryption_info

5. fname.c to support filename encryption
 a. general wrapper functions
  - fscrypt_fname_disk_to_usr
  - fscrypt_fname_usr_to_disk
  - fscrypt_setup_filename
  - fscrypt_free_filename

 b. specific filename handling functions
  - fscrypt_fname_alloc_buffer
  - fscrypt_fname_free_buffer

6. Makefile and Kconfig

Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Ildar Muslukhov <ildarm@google.com>
Signed-off-by: Uday Savagaonkar <savagaon@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-03-17 21:19:33 -07:00
Linus Torvalds 5cd0911a9e Allow ram backend to be configured with addresses above 4GB
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6yKvAAoJEKurIx+X31iBgkkQAJ0GM0zDD3jQx3G95k/KiRZW
 H70+sZZDnfYGXv22Vv5/u5eGlRGMkmZ6xBX6cwDOsNaUIh3YspjKmrHvcO8mNAcE
 MpEyMc6aKPmPCQPvNQxSegYEHzLZdN/OIbPYWxihnQ613iSNoYy/Gdgk3bqxWHDU
 2p5gvAq5lalTTBz5/nViC65op7qeziIKzzCvUrn1rkycZ7fkPhDaqeKqW6gAQO88
 Do4h9rItDL6NDEQR9S2ihbEWRKGZSTYuN0SqFvIYRuly8/aYVy2DCpDbh3aN63zp
 lIfLhvLPLFTWzFmTbkltD5ZG5qYYRlWZy2vsXKrc1ya+qKiQHn/BTe/9/65pCzP0
 IeV9h6JMP7cfAGD+vAKD9PeHaePqL9RYg5F9lbos68IvxkVFF8osaLbMCghomtMJ
 dtj3ORDh6KSFUOtCsudlc/S9xz3OLL9hmi4QTvF+qDEUEpHY9aY4SqqPcXm/8/Xy
 babWOBu23V3WCOlLEs9gRsKbZM3gSGT1jlgvSYBmDLJY43bpkcmN7E6WdX8jjPRK
 vXGfzYp6ZdCJNlBpytLDri34yGDcyKqc3J7c6Wj/D55v4PsQo3mWnPUNTeaeHf3q
 4SD6JFvXHSttrquA0G2fkJQ0ksvaWD3kJBRwzzlfqYZ0SbtUc0hjFH6/OOGQD9fg
 cJ2KH9bgpafvyG2yqF7w
 =vPCf
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull pstore update from Tony Luck:
 "Allow ram backend to be configured with addresses above 4GB"

* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: Add support for 64 Bit address space
2016-03-17 16:57:55 -07:00
Linus Torvalds 1ca80a0a3e GFS2: merge window
We only have six patches ready for this merge window.
 
 - Arnd Bergmann contributed a patch that fixes an uninitialized variable
   warning.
 - The second patch avoids a kernel panic due to referencing an iopen
   glock that may not be held, in an error path.
 - The third patch fixes a rounding error that caused xfs_tests direct IO
   write "fsx" tests to fail on GFS2.
 - The fourth patch tidies up the code path when glocks are being reused
   to recreate a dinode that was recently deleted.
 - The fifth reverts an ages-old patch that should no longer be needed, and
   which interfered with the transition of dinodes from unlinked to free.
 - And lastly, a patch to eliminate a function parameter that's not needed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW6qcKAAoJENeLYdPf93o70SwH/0czrFBIWaPbRgyuGLNvye4G
 qvfIk2Yky3UKfgUA+ZW+cAWkxeKubc9scMwce0elKxm0e03rNwIX0slIOx8hymj3
 19AgMnj3kcCJvRLdkBITNqnd6vTY2quadLN3j8I2cCNbHOV0GelEkP4jWcTHB+2F
 AG0ZJOsvvrcD1ClgdIvGdV52qipZApS/kgZjLpJBEyzxq8SpRe9vNqMDsDyoKWgi
 yjp0+aJ0IJAWA24fzdT5HE4fb5yGRWehg51l6Z2mbfXAvT+oeKcYrQiq1zhUutHw
 dT6SUHbjt+y0EulPClsf3r5zjRjVCCbpj0wkUY3kPB98lgnpAD2QnUP/JDA+CD0=
 =xRmX
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We only have six patches ready for this merge window:

   - Arnd Bergmann contributed a patch that fixes an uninitialized
     variable warning.

   - The second patch avoids a kernel panic due to referencing an iopen
     glock that may not be held, in an error path.

   - The third patch fixes a rounding error that caused xfs_tests direct
     IO write "fsx" tests to fail on GFS2.

   - The fourth patch tidies up the code path when glocks are being
     reused to recreate a dinode that was recently deleted.

   - The fifth reverts an ages-old patch that should no longer be
     needed, and which interfered with the transition of dinodes from
     unlinked to free.

   - And lastly, a patch to eliminate a function parameter that's not
     needed"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Eliminate parameter non_block on gfs2_inode_lookup
  GFS2: Don't filter out I_FREEING inodes anymore
  GFS2: Prevent delete work from occurring on glocks used for create
  GFS2: Fix direct IO write rounding error
  gfs2: avoid uninitialized variable warning
  GFS2: Check if iopen is held when deleting inode
2016-03-17 16:51:32 -07:00
Linus Torvalds d77bed0d4c dlm for 4.6
Previous changes introduced the use of socket error reporting
 for dlm sockets.  This set includes two fixes in how the
 socket error callbacks are used.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW4ISmAAoJEDgbc8f8gGmqnWgQANnS13rXt4NxnRtNqzrEUmjk
 NbvP4BxeWKinxhc9ObUnV0CzDllYa2A6Te2AageCQr/qhRKfnnaQJczJ39Xp+289
 oSgJJOo4HGCPjshrq9BwGo8ufijUIc0MsT64TzeI3ww58b1eK2CLoC9uiLDyYwjM
 Hw0PRXU/MAxzOWJIIWgkh78FQmx8fswOSNyK49/p3/INMVNFxn75bd+shtxUOuCp
 50gmI6DG4gGJDK3vtIjZStJhW4lcaM3tjGZ9+mcLQF2PZK5zIeHSr3nEfzJ4Qwps
 0p55JeiXdfff6RTrxqnJewc+xysmD9594wG0G0VqLsWaLWulDrHZMFsVg1J10frk
 bk0WwLjsYG/wLVZtKRe+3QwOyqgTxx1Cea8ZYB2yMcFBkDYxFmh8a00kJcVuucnS
 W+w4rhI9blk1cc4eHhnuBIi5m2jbelu4NPG5722ORtv+gNpBl2ptecqIjfuhr8xE
 IIF5tnkZb8lBuLyhCmg8in2mKnY5aKSk3kuQ98rDXZUMCLT0PKG2ZNsXJjpX6G38
 uQ+sB9rH6c5pIe0S3keS2f2Ly3V7gtBErA0otyxaq/XlxnJeFlLU5G1chHUeW8VP
 qxhtjDShPuIA97MxE2GA3ehr7r3jbOb+8qOc13E9ygXDyXNN0tb4JIeB0beNLdRx
 db8Lt+OR10IUawNyuFB9
 =qvzY
 -----END PGP SIGNATURE-----

Merge tag 'dlm-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:
 "Previous changes introduced the use of socket error reporting for dlm
  sockets.  This set includes two fixes in how the socket error
  callbacks are used"

* tag 'dlm-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  DLM: Save and restore socket callbacks properly
  DLM: Replace nodeid_to_addr with kernel_getpeername
2016-03-17 16:38:36 -07:00
Linus Torvalds faeb20ecfa Performance improvements in SEEK_DATA and xattr scalability
improvements, plus a lot of clean ups and bug fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJW6c9mAAoJEPL5WVaVDYGjWsEIAJkWUvKB3GgGgP82sKDBP2P8
 IbWegO1ICMrSY78BqLI7mLCqggH5JClBgYU3O4VFv8Brj1L9mS5X+vflaDE1j9jj
 Ik1KZKtZl1opOwO1L3D4l/ipZAiENUp7NehTtpsFousmz6nMZ5vo6x4t3QSwbUIm
 YXpxUIxHEhBcW5i3EDkfYG8305V5oj8HsVf6T98OlWGpBO5VGNMAHvA7CQdQe7Rd
 chv70rij5V684bJAEoosEFXVAuOUrxcBqbFA3Nlb432YOPj0ISLx76kw0GIjUYtf
 yjoSClbRgwxGzh0jm+yaoYjjm83xbsYbHSsBmh3+/QLMbKTLXeCqR/BiqJavmcM=
 =bWpz
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Performance improvements in SEEK_DATA and xattr scalability
  improvements, plus a lot of clean ups and bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (38 commits)
  ext4: clean up error handling in the MMP support
  jbd2: do not fail journal because of frozen_buffer allocation failure
  ext4: use __GFP_NOFAIL in ext4_free_blocks()
  ext4: fix compile error while opening the macro DOUBLE_CHECK
  ext4: print ext4 mount option data_err=abort correctly
  ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
  ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry()
  ext4: fix misspellings in comments.
  jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path
  ext4: more efficient SEEK_DATA implementation
  ext4: cleanup handling of bh->b_state in DAX mmap
  ext4: return hole from ext4_map_blocks()
  ext4: factor out determining of hole size
  ext4: fix setting of referenced bit in ext4_es_lookup_extent()
  ext4: remove i_ioend_count
  ext4: simplify io_end handling for AIO DIO
  ext4: move trans handling and completion deferal out of _ext4_get_block
  ext4: rename and split get blocks functions
  ext4: use i_mutex to serialize unaligned AIO DIO
  ext4: pack ioend structure better
  ...
2016-03-17 16:31:18 -07:00
Linus Torvalds 364e8dd9d6 Configfs changes for the 4.6 merge window:
- A large patch from me to simplify setting up the list of default
    groups by actually implementing it as a list instead of an array.
  - a small Y2083 prep patch from Deepa Dinamani.  Probably doesn't matter
    on it's own, but it seems like he is trying to get rid of all CURRENT_TIME
    uses in file systems, which is a worthwhile goal.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6Cz6AAoJEA+eU2VSBFGDmNYP/AzJuVdkXjOkzmAl0SjwS0UC
 b/gTF0Z0jAmXX8QTf0NtdNajHweYyY4PVvyuUYojO/Y9bgJigRC6gHIUviq8TLhO
 JR1EUJ3RNoWFZSHeEGTM4q+kSg3GkZ83WixeBiMkIZo7QgPXU2YB0mzErpdcID3N
 +KVnoVU+asVQi656UIDNZ1SawTAGog+tIMIgnM4vmL0Dd+9yN4pYhAmRLLS0C83P
 DPci/oVx1a3IjWAkmz24qtb9ht/SA+IBwyFPltg/gdn5OgJL9Vr1naW5mkqMhoPF
 PUBfX9YYizMwNMYuchng6JqyWlZBjXFr6iqi401vFJcILeq27As5Kc9adfDOEvVC
 V/dWCmTyMlHX507t+lC7kTa6OaHAZKA5scCHA6dgpQIvGfiaMNNu7MW8C6p0HqwY
 rf7na7S2fAu5zCyIRVPK//YMNbRHh2AoclzpK7Sw0NCV5jBlXZOdDJcSb4jQsVF7
 Yy84EqcebvF4ocaFRzwA/ZHNxz65l5Qu7brmOu6pTliQuQED1fop5z92RXkw2e9y
 rSIgzMCL5IoAUkYtoO1jzAQXzyySAb3QDpwCaBdZLzN4MbRF/dUxZDkOePKTaVft
 ckNXj5AVzvLYlpkmkhQ+bqsh91ayFH2/gw9Kt38i1yjzNLhsccZwq9ja5ifPlHLQ
 nOFiane31yp3Zhac8drb
 =9HqT
 -----END PGP SIGNATURE-----

Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs

Pull configfs updates from Christoph Hellwig:

 - A large patch from me to simplify setting up the list of default
   groups by actually implementing it as a list instead of an array.

 - a small Y2083 prep patch from Deepa Dinamani.  Probably doesn't
   matter on it's own, but it seems like he is trying to get rid of all
   CURRENT_TIME uses in file systems, which is a worthwhile goal.

* tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs:
  configfs: switch ->default groups to a linked list
  configfs: Replace CURRENT_TIME by current_fs_time()
2016-03-17 16:25:46 -07:00
Kees Cook 1404297ebf lib: update single-char callers of strtobool()
Some callers of strtobool() were passing a pointer to unterminated
strings.  In preparation of adding multi-character processing to
kstrtobool(), update the callers to not pass single-character pointers,
and switch to using the new kstrtobool_from_user() helper where
possible.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Amitkumar Karwar <akarwar@marvell.com>
Cc: Nishant Sarmukadam <nishants@marvell.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Steve French <sfrench@samba.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Matthew Wilcox c28f242063 btrfs: use radix_tree_iter_retry()
Even though this is a 'can't happen' situation, use the new
radix_tree_iter_retry() pattern to eliminate a goto.

[akpm@linux-foundation.org: fix btrfs build]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: David Sterba <dsterba@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Dave Young 0b50a2d86d proc-vmcore: wrong data type casting fix
On i686 PAE enabled machine the contiguous physical area could be large
and it can cause trimming down variables in below calculation in
read_vmcore() and mmap_vmcore():

	tsz = min_t(size_t, m->offset + m->size - *fpos, buflen);

That is, the types being used is like below on i686:
m->offset: unsigned long long int
m->size:   unsigned long long int
*fpos:     loff_t (long long int)
buflen:    size_t (unsigned int)

So casting (m->offset + m->size - *fpos) by size_t means truncating a
given value by 4GB.

Suppose (m->offset + m->size - *fpos) being truncated to 0, buflen >0
then we will get tsz = 0.  It is of course not an expected result.
Similarly we could also get other truncated values less than buflen.
Then the real size passed down is not correct any more.

If (m->offset + m->size - *fpos) is above 4GB, read_vmcore or
mmap_vmcore use the min_t result with truncated values being compared to
buflen.  Then, fpos proceeds with the wrong value so that we reach below
bugs:

1) read_vmcore will refuse to continue so makedumpfile fails.
2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range().

Use unsigned long long in min_t instead so that the variables in are not
truncated.

Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jianyu Zhan <nasa4836@gmail.com>
Cc: Minfei Huang <mhuang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Minfei Huang 7e2bc81da3 proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan"
It is not elegant that prompt shell does not start from new line after
executing "cat /proc/$pid/wchan".  Make prompt shell start from new
line.

Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Eric Engestrom b5946beaa9 procfs: add conditional compilation check
`proc_timers_operations` is only used when CONFIG_CHECKPOINT_RESTORE is
enabled.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
John Stultz 5de23d435e proc: add /proc/<pid>/timerslack_ns interface
This patch provides a proc/PID/timerslack_ns interface which exposes a
task's timerslack value in nanoseconds and allows it to be changed.

This allows power/performance management software to set timer slack for
other threads according to its policy for the thread (such as when the
thread is designated foreground vs.  background activity)

If the value written is non-zero, slack is set to that value.  Otherwise
sets it to the default for the thread.

This interface checks that the calling task has permissions to to use
PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure
arbitrary apps do not change the timer slack for other apps.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
John Stultz da8b44d5a9 timer: convert timer_slack_ns from unsigned long to u64
This patchset introduces a /proc/<pid>/timerslack_ns interface which
would allow controlling processes to be able to set the timerslack value
on other processes in order to save power by avoiding wakeups (Something
Android currently does via out-of-tree patches).

The first patch tries to fix the internal timer_slack_ns usage which was
defined as a long, which limits the slack range to ~4 seconds on 32bit
systems.  It converts it to a u64, which provides the same basically
unlimited slack (500 years) on both 32bit and 64bit machines.

The second patch introduces the /proc/<pid>/timerslack_ns interface
which allows the full 64bit slack range for a task to be read or set on
both 32bit and 64bit machines.

With these two patches, on a 32bit machine, after setting the slack on
bash to 10 seconds:

$ time sleep 1

real    0m10.747s
user    0m0.001s
sys     0m0.005s

The first patch is a little ugly, since I had to chase the slack delta
arguments through a number of functions converting them to u64s.  Let me
know if it makes sense to break that up more or not.

Other than that things are fairly straightforward.

This patch (of 2):

The timer_slack_ns value in the task struct is currently a unsigned
long.  This means that on 32bit applications, the maximum slack is just
over 4 seconds.  However, on 64bit machines, its much much larger (~500
years).

This disparity could make application development a little (as well as
the default_slack) to a u64.  This means both 32bit and 64bit systems
have the same effective internal slack range.

Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
the interface as a unsigned long, so we preserve that limitation on
32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
actually larger then what can be stored by an unsigned long.

This patch also modifies hrtimer functions which specified the slack
delta as a unsigned long.

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oren Laadan <orenl@cellrox.com>
Cc: Ruchi Kandoi <kandoiruchi@google.com>
Cc: Rom Lemarchand <romlem@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Android Kernel Team <kernel-team@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Joonsoo Kim fe896d1878 mm: introduce page reference manipulation functions
The success of CMA allocation largely depends on the success of
migration and key factor of it is page reference count.  Until now, page
reference is manipulated by direct calling atomic functions so we cannot
follow up who and where manipulate it.  Then, it is hard to find actual
reason of CMA allocation failure.  CMA allocation should be guaranteed
to succeed so finding offending place is really important.

In this patch, call sites where page reference is manipulated are
converted to introduced wrapper function.  This is preparation step to
add tracepoint to each page reference manipulation function.  With this
facility, we can easily find reason of CMA allocation failure.  There is
no functional change in this patch.

In addition, this patch also converts reference read sites.  It will
help a second step that renames page._count to something else and
prevents later attempt to direct access to it (Suggested by Andrew).

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Igor Redko d02bd27bd3 mm/page_alloc.c: calculate 'available' memory in a separate function
Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
statistics protocol, corresponding to 'Available' in /proc/meminfo.

It indicates to the hypervisor how big the balloon can be inflated
without pushing the guest system to swap.  This metric would be very
useful in VM orchestration software to improve memory management of
different VMs under overcommit.

This patch (of 2):

Factor out calculation of the available memory counter into a separate
exportable function, in order to be able to use it in other parts of the
kernel.

In particular, it appears a relevant metric to report to the hypervisor
via virtio-balloon statistics interface (in a followup patch).

Signed-off-by: Igor Redko <redkoi@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Naoya Horiguchi 0a71649cb7 /proc/kpageflags: return KPF_SLAB for slab tail pages
Currently /proc/kpageflags returns just KPF_COMPOUND_TAIL for slab tail
pages, which is inconvenient when grasping how slab pages are
distributed (userspace always needs to check which kind of tail pages by
itself).  This patch sets KPF_SLAB for such pages.

With this patch:

  $ grep Slab /proc/meminfo ; tools/vm/page-types -b slab
  Slab:              64880 kB
               flags      page-count       MB  symbolic-flags                     long-symbolic-flags
  0x0000000000000080           16220       63  _______S__________________________________ slab
               total           16220       63

16220 pages equals to 64880 kB, so returned result is consistent with the
global counter.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Naoya Horiguchi 832fc1de01 /proc/kpageflags: return KPF_BUDDY for "tail" buddy pages
Currently /proc/kpageflags returns nothing for "tail" buddy pages, which
is inconvenient when grasping how free pages are distributed.  This
patch sets KPF_BUDDY for such pages.

With this patch:

  $ grep MemFree /proc/meminfo ; tools/vm/page-types -b buddy
  MemFree:         3134992 kB
               flags      page-count       MB  symbolic-flags                     long-symbolic-flags
  0x0000000000000400          779272     3044  __________B_______________________________ buddy
  0x0000000000000c00            4385       17  __________BM______________________________ buddy,mmap
               total          783657     3061

783657 pages is 3134628 kB (roughly consistent with the global counter,)
so it's OK.

[akpm@linux-foundation.org: update comment, per Naoya]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Linus Torvalds 8eee93e257 Char/Misc patches for 4.6-rc1
Here is the big char/misc driver update for 4.6-rc1.
 
 The majority of the patches here is hwtracing and some new mic drivers,
 but there's a lot of other driver updates as well.  Full details in the
 shortlog.
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlbp9IcACgkQMUfUDdst+ykyJgCeLTC2QNGrh51kiJglkVJ0yD36
 q4MAn0NkvSX2+iv5Jq8MaX6UQoRa4Nun
 =MNjR
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc updates from Greg KH:
 "Here is the big char/misc driver update for 4.6-rc1.

  The majority of the patches here is hwtracing and some new mic
  drivers, but there's a lot of other driver updates as well.  Full
  details in the shortlog.

  All have been in linux-next for a while with no reported issues"

* tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (238 commits)
  goldfish: Fix build error of missing ioremap on UM
  nvmem: mediatek: Fix later provider initialization
  nvmem: imx-ocotp: Fix return value of imx_ocotp_read
  nvmem: Fix dependencies for !HAS_IOMEM archs
  char: genrtc: replace blacklist with whitelist
  drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular
  drivers: char: mem: fix IS_ERROR_VALUE usage
  char: xillybus: Fix internal data structure initialization
  pch_phub: return -ENODATA if ROM can't be mapped
  Drivers: hv: vmbus: Support kexec on ws2012 r2 and above
  Drivers: hv: vmbus: Support handling messages on multiple CPUs
  Drivers: hv: utils: Remove util transport handler from list if registration fails
  Drivers: hv: util: Pass the channel information during the init call
  Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload()
  Drivers: hv: vmbus: remove code duplication in message handling
  Drivers: hv: vmbus: avoid wait_for_completion() on crash
  Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages
  misc: at24: replace memory_accessor with nvmem_device_read
  eeprom: 93xx46: extend driver to plug into the NVMEM framework
  eeprom: at25: extend driver to plug into the NVMEM framework
  ...
2016-03-17 13:47:50 -07:00