Граф коммитов

52 Коммитов

Автор SHA1 Сообщение Дата
Linus Torvalds c4d6fe7311 XArray updates for 5.9
- Fix the test suite after introduction of the local_lock
  - Fix a bug in the IDA spotted by Coverity
  - Change the API that allows the workingset code to delete a node
  - Fix xas_reload() when dealing with entries that occupy multiple indices
  - Add a few more tests to the test suite
  - Fix an unsigned int being shifted into an unsigned long
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAl+OzzAACgkQDpNsjXcp
 gj5YFgf/cV99dyPaal7AfMwhVwFcuVjIRH4S/VeOHkjS2QT1lpu3ffqfKALVR8vU
 3IObM3oDCmLk0mYz9O+V/udVJoBYWiduI0LZhR6+V5ZrDjbw/d4VdCbwOplpeF5x
 rntyI9r8f5d4LxBJ/moLjsosc1KfCzyVnV389eZRvZ8Muxuyc73WdAwZZZfD79nY
 66gScEXQokU99zqJJ1nWfh05XTcTsKF25fVBGMLZTUBAytoFyPuC/kO2z8Uq9lEi
 Ug6gDClskSB7A2W5gvprMcoUAVYcHfTb0wqJD5/MhkHyoTdcWdW8Re0kssXvD86V
 KwlBdYQ/JuskgY/hbynZ/FP3p8+t1Q==
 =12E/
 -----END PGP SIGNATURE-----

Merge tag 'xarray-5.9' of git://git.infradead.org/users/willy/xarray

Pull XArray updates from Matthew Wilcox:

 - Fix the test suite after introduction of the local_lock

 - Fix a bug in the IDA spotted by Coverity

 - Change the API that allows the workingset code to delete a node

 - Fix xas_reload() when dealing with entries that occupy multiple
   indices

 - Add a few more tests to the test suite

 - Fix an unsigned int being shifted into an unsigned long

* tag 'xarray-5.9' of git://git.infradead.org/users/willy/xarray:
  XArray: Fix xas_create_range for ranges above 4 billion
  radix-tree: fix the comment of radix_tree_next_slot()
  XArray: Fix xas_reload for multi-index entries
  XArray: Add private interface for workingset node deletion
  XArray: Fix xas_for_each_conflict documentation
  XArray: Test marked multiorder iterations
  XArray: Test two more things about xa_cmpxchg
  ida: Free allocated bitmap in error path
  radix tree test suite: Fix compilation
2020-10-20 14:39:37 -07:00
Matthew Wilcox (Oracle) 8fc75643c5 XArray: add xas_split
In order to use multi-index entries for huge pages in the page cache, we
need to be able to split a multi-index entry (eg if a file is truncated in
the middle of a huge page entry).  This version does not support splitting
more than one level of the tree at a time.  This is an acceptable
limitation for the page cache as we do not expect to support order-12
pages in the near future.

[akpm@linux-foundation.org: export xas_split_alloc() to modules]
[willy@infradead.org: fix xarray split]
  Link: https://lkml.kernel.org/r/20200910175450.GV6583@casper.infradead.org
[willy@infradead.org: fix xarray]
  Link: https://lkml.kernel.org/r/20201001233943.GW20115@casper.infradead.org

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Qian Cai <cai@lca.pw>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lkml.kernel.org/r/20200903183029.14930-3-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:15 -07:00
Matthew Wilcox (Oracle) 57417cebc9 XArray: add xa_get_order
Patch series "Fix read-only THP for non-tmpfs filesystems".

As described more verbosely in the [3/3] changelog, we can inadvertently
put an order-0 page in the page cache which occupies 512 consecutive
entries.  Users are running into this if they enable the
READ_ONLY_THP_FOR_FS config option; see
https://bugzilla.kernel.org/show_bug.cgi?id=206569 and Qian Cai has also
reported it here:
https://lore.kernel.org/lkml/20200616013309.GB815@lca.pw/

This is a rather intrusive way of fixing the problem, but has the
advantage that I've actually been testing it with the THP patches, which
means that it sees far more use than it does upstream -- indeed, Song has
been entirely unable to reproduce it.  It also has the advantage that it
removes a few patches from my gargantuan backlog of THP patches.

This patch (of 3):

This function returns the order of the entry at the index.  We need this
because there isn't space in the shadow entry to encode its order.

[akpm@linux-foundation.org: export xa_get_order to modules]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Qian Cai <cai@lca.pw>
Cc: Song Liu <songliubraving@fb.com>
Link: https://lkml.kernel.org/r/20200903183029.14930-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200903183029.14930-2-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:15 -07:00
Matthew Wilcox (Oracle) ca7b639e86 XArray: Fix xas_reload for multi-index entries
xas_reload() was only checking that the head entry was still at the
head index.  If the entry has been split, that's not enough as there
may be a different entry at the specified index now.

Solve this by checking the slot for the requested index instead of the
head index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-10-13 08:41:26 -04:00
Matthew Wilcox (Oracle) f82cd2f0b5 XArray: Add private interface for workingset node deletion
Move the tricky bits of dealing with the XArray from the workingset
code to the XArray.  Make it clear in the documentation that this is a
private interface, and only export it for the benefit of the test suite.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-10-13 08:41:26 -04:00
Matthew Wilcox (Oracle) 8446466c9d XArray: Fix xas_for_each_conflict documentation
At one point, xas_for_each_conflict() was going to work this way,
and I forgot to update the documentation when I changed my mind.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-10-08 10:07:48 -04:00
Manfred Spraul 01f39c1c11 xarray.h: correct return code documentation for xa_store_{bh,irq}()
__xa_store() and xa_store() document that the functions can fail, and
that the return code can be an xa_err() encoded error code.

xa_store_bh() and xa_store_irq() do not document that the functions can
fail and that they can also return xa_err() encoded error codes.

Thus: Update the documentation.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Link: http://lkml.kernel.org/r/20200430111424.16634-1-manfred@colorfullife.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08 11:05:56 -07:00
Matthew Wilcox (Oracle) 7e934cf5ac xarray: Fix early termination of xas_for_each_marked
xas_for_each_marked() is using entry == NULL as a termination condition
of the iteration. When xas_for_each_marked() is used protected only by
RCU, this can however race with xas_store(xas, NULL) in the following
way:

TASK1                                   TASK2
page_cache_delete()         	        find_get_pages_range_tag()
                                          xas_for_each_marked()
                                            xas_find_marked()
                                              off = xas_find_chunk()

  xas_store(&xas, NULL)
    xas_init_marks(&xas);
    ...
    rcu_assign_pointer(*slot, NULL);
                                              entry = xa_entry(off);

And thus xas_for_each_marked() terminates prematurely possibly leading
to missed entries in the iteration (translating to missing writeback of
some pages or a similar problem).

If we find a NULL entry that has been marked, skip it (unless we're trying
to allocate an entry).

Reported-by: Jan Kara <jack@suse.cz>
CC: stable@vger.kernel.org
Fixes: ef8e5717db ("page cache: Convert delete_batch to XArray")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-03-12 17:42:08 -04:00
Chengguang Xu 24a448b165 XArray: Fix incorrect comment in header file
256 means Retry entry and 257 means Zero entry,
so fix it.

Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-01-31 15:09:49 -05:00
Matthew Wilcox (Oracle) 00ed452c21 XArray: Add xa_for_each_range
This function supports iterating over a range of an array.  Also add
documentation links for xa_for_each_start().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-01-17 22:33:37 -05:00
Matthew Wilcox (Oracle) 82a958497d XArray: Add wrappers for nested spinlocks
Some users need to take an xarray lock while holding another xarray lock.

Reported-by: Doug Gilbert <dgilbert@interlog.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2020-01-17 22:32:17 -05:00
Randy Dunlap 13bea898cd xarray.h: fix kernel-doc warning
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>:

  include/linux/xarray.h:232: WARNING: Unexpected indentation.

Link: http://lkml.kernel.org/r/89ba2134-ce23-7c10-5ee1-ef83b35aa984@infradead.org
Fixes: a3e4d3f97e ("XArray: Redesign xa_alloc API")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-10-14 15:04:01 -07:00
Johannes Weiner 7b785645e8 mm: fix page cache convergence regression
Since a283348629 ("page cache: Finish XArray conversion"), on most
major Linux distributions, the page cache doesn't correctly transition
when the hot data set is changing, and leaves the new pages thrashing
indefinitely instead of kicking out the cold ones.

On a freshly booted, freshly ssh'd into virtual machine with 1G RAM
running stock Arch Linux:

[root@ham ~]# ./reclaimtest.sh
+ dd of=workingset-a bs=1M count=0 seek=600
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ ./mincore workingset-a
153600/153600 workingset-a
+ dd of=workingset-b bs=1M count=0 seek=600
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
104029/153600 workingset-a
120086/153600 workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
104029/153600 workingset-a
120268/153600 workingset-b

workingset-b is a 600M file on a 1G host that is otherwise entirely
idle. No matter how often it's being accessed, it won't get cached.

While investigating, I noticed that the non-resident information gets
aggressively reclaimed - /proc/vmstat::workingset_nodereclaim. This is
a problem because a workingset transition like this relies on the
non-resident information tracked in the page cache tree of evicted
file ranges: when the cache faults are refaults of recently evicted
cache, we challenge the existing active set, and that allows a new
workingset to establish itself.

Tracing the shrinker that maintains this memory revealed that all page
cache tree nodes were allocated to the root cgroup. This is a problem,
because 1) the shrinker sizes the amount of non-resident information
it keeps to the size of the cgroup's other memory and 2) on most major
Linux distributions, only kernel threads live in the root cgroup and
everything else gets put into services or session groups:

[root@ham ~]# cat /proc/self/cgroup
0::/user.slice/user-0.slice/session-c1.scope

As a result, we basically maintain no non-resident information for the
workloads running on the system, thus breaking the caching algorithm.

Looking through the code, I found the culprit in the above-mentioned
patch: when switching from the radix tree to xarray, it dropped the
__GFP_ACCOUNT flag from the tree node allocations - the flag that
makes sure the allocated memory gets charged to and tracked by the
cgroup of the calling process - in this case, the one doing the fault.

To fix this, allow xarray users to specify per-tree flag that makes
xarray allocate nodes using __GFP_ACCOUNT. Then restore the page cache
tree annotation to request such cgroup tracking for the cache nodes.

With this patch applied, the page cache correctly converges on new
workingsets again after just a few iterations:

[root@ham ~]# ./reclaimtest.sh
+ dd of=workingset-a bs=1M count=0 seek=600
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ cat workingset-a
+ ./mincore workingset-a
153600/153600 workingset-a
+ dd of=workingset-b bs=1M count=0 seek=600
+ cat workingset-b
+ ./mincore workingset-a workingset-b
124607/153600 workingset-a
87876/153600 workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
81313/153600 workingset-a
133321/153600 workingset-b
+ cat workingset-b
+ ./mincore workingset-a workingset-b
63036/153600 workingset-a
153600/153600 workingset-b

Cc: stable@vger.kernel.org # 4.20+
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2019-05-31 13:52:41 -04:00
Matthew Wilcox 962033d55d XArray: Use xa_cmpxchg to implement xa_reserve
Jason feels this is clearer, and it saves a function and an exported
symbol.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-02-20 17:08:54 -05:00
Matthew Wilcox b38f6c5027 XArray: Fix xa_release in allocating arrays
xa_cmpxchg() was a little too magic in turning ZERO entries into NULL,
and would leave the entry set to the ZERO entry instead of releasing
it for future use.  After careful review of existing users of
xa_cmpxchg(), change the semantics so that it does not translate either
incoming argument from NULL into ZERO entries.

Add several tests to the test-suite to make sure this problem doesn't
come back.

Reported-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-02-20 17:08:54 -05:00
Matthew Wilcox f818b82b80 XArray: Mark xa_insert and xa_reserve as must_check
If the user doesn't care about the return value from xa_insert(), then
they should be using xa_store() instead.  The point of xa_reserve() is
to get the return value early before taking another lock, so this should
also be __must_check.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-02-09 00:00:49 -05:00
Matthew Wilcox 2fa044e51a XArray: Add cyclic allocation
This differs slightly from the IDR equivalent in five ways.

1. It can allocate up to UINT_MAX instead of being limited to INT_MAX,
   like xa_alloc().  Also like xa_alloc(), it will write to the 'id'
   pointer before placing the entry in the XArray.
2. The 'next' cursor is allocated separately from the XArray instead
   of being part of the IDR.  This saves memory for all the users which
   do not use the cyclic allocation API and suits some users better.
3. It returns -EBUSY instead of -ENOSPC.
4. It will attempt to wrap back to the minimum value on memory allocation
   failure as well as on an -EBUSY error, assuming that a user would
   rather allocate a small ID than suffer an ID allocation failure.
5. It reports whether it has wrapped, which is important to some users.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-02-06 13:32:25 -05:00
Matthew Wilcox a3e4d3f97e XArray: Redesign xa_alloc API
It was too easy to forget to initialise the start index.  Add an
xa_limit data structure which can be used to pass min & max, and
define a couple of special values for common cases.  Also add some
more tests cribbed from the IDR test suite.  Change the return value
from -ENOSPC to -EBUSY to match xa_insert().

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-02-06 13:32:23 -05:00
Matthew Wilcox 3ccaf57a6a XArray: Add support for 1s-based allocation
A lot of places want to allocate IDs starting at 1 instead of 0.
While the xa_alloc() API supports this, it's not very efficient if lots
of IDs are allocated, due to having to walk down to the bottom of the
tree to see if ID 1 is available, then all the way over to the next
non-allocated ID.  This method marks ID 0 as being occupied which wastes
one slot in the XArray, but preserves xa_empty() as working.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-02-06 13:13:24 -05:00
Matthew Wilcox fd9dc93e36 XArray: Change xa_insert to return -EBUSY
Userspace translates EEXIST to "File exists" which isn't a very good
error message for the problem.  "Device or resource busy" is a better
indication of what went wrong.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-02-06 13:12:15 -05:00
Matthew Wilcox 809ab9371c XArray: Update xa_erase family descriptions
xa_erase does not allocate memory and doesn't have a gfp parameter.
Update the descriptions of all four variants to be more useful.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-02-04 23:16:58 -05:00
Dan Carpenter edcddd4c87 XArray: Fix an arithmetic error in xa_is_err
There is a math problem here which leads to a lot of static checker
warnings for me:

net/sunrpc/clnt.c:451 rpc_new_client() error: (-4096) too low for ERR_PTR

Error values are from -1 to -4095 or from 0xffffffff to 0xfffff001 in
hexadecimal.  (I am assuming a 32 bit system for simplicity).  We are
using the lowest two bits to hold some internal XArray data so the
error is shifted two spaces to the left.  0xfffff001 << 2 is 0xffffc004.
And finally we want to check that BIT(1) is set so we add 2 which gives
us 0xffffc006.

In other words, we should be checking that "entry >= 0xffffc006", but
the check is actually testing if "entry >= 0xffffc002".

Fixes: 76b4e52995 ("XArray: Permit storing 2-byte-aligned pointers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[Use xa_mk_internal() instead of changing the bracketing]
Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-17 07:19:42 -05:00
Cyrill Gorcunov 19ba9ecf24 XArray: Fix typo in comment
Seems copy and paste typo, not a big deal but still
for consistency sake better to fix.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-14 12:52:16 -05:00
Matthew Wilcox b0606fed6e XArray: Honour reserved entries in xa_insert
xa_insert() should treat reserved entries as occupied, not as available.
Also, it should treat requests to insert a NULL pointer as a request
to reserve the slot.  Add xa_insert_bh() and xa_insert_irq() for
completeness.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-06 22:12:58 -05:00
Matthew Wilcox 76b4e52995 XArray: Permit storing 2-byte-aligned pointers
On m68k, statically allocated pointers may only be two-byte aligned.
This clashes with the XArray's method for tagging internal pointers.
Permit storing these pointers in single slots (ie not in multislots).

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-06 22:12:57 -05:00
Matthew Wilcox 4a31896c5b XArray: Change xa_for_each iterator
There were three problems with this API:
1. It took too many arguments; almost all users wanted to iterate over
every element in the array rather than a subset.
2. It required that 'index' be initialised before use, and there's no
realistic way to make GCC catch that.
3. 'index' and 'entry' were the opposite way round from every other
member of the XArray APIs.

So split it into three different APIs:

xa_for_each(xa, index, entry)
xa_for_each_start(xa, index, entry, start)
xa_for_each_marked(xa, index, entry, filter)

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-06 21:24:43 -05:00
Matthew Wilcox 02669b17a4 XArray: Turn xa_init_flags into a static inline
A regular xa_init_flags() put all dynamically-initialised XArrays into
the same locking class.  That leads to lockdep believing that taking
one XArray lock while holding another is a deadlock.  It's possible to
work around some of these situations with separate locking classes for
irq/bh/regular XArrays, and SINGLE_DEPTH_NESTING, but that's ugly, and
it doesn't work for all situations (where we have completely unrelated
XArrays).

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2019-01-06 21:24:43 -05:00
Matthew Wilcox 55f3f7eab7 XArray: Add xa_cmpxchg_irq and xa_cmpxchg_bh
These convenience wrappers match the other _irq and _bh wrappers we
already have.  It turns out I'd already open-coded xa_cmpxchg_irq()
in the shmem code, so convert that.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-12-06 08:26:17 -05:00
Matthew Wilcox 804dfaf01b XArray: Fix Documentation
Minor fixes.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-05 16:38:10 -05:00
Matthew Wilcox 84e5acb76d XArray: Add xa_store_bh() and xa_store_irq()
These convenience wrappers disable interrupts while taking the spinlock.
A number of drivers would otherwise have to open-code these functions.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-05 16:38:09 -05:00
Matthew Wilcox 9c16bb8890 XArray: Turn xa_erase into an exported function
Make xa_erase() take the spinlock and then call __xa_erase(), but make
it out of line since it's such a common function.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-05 16:38:09 -05:00
Matthew Wilcox c5beb07e7a XArray: Unify xa_cmpxchg and __xa_cmpxchg
xa_cmpxchg() was one of the largest functions in the xarray
implementation.  By turning it into a wrapper and having the callers
take the lock (like several other functions), we save 160 bytes on a
tinyconfig build and reduce the duplication in xarray.c.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-05 16:38:08 -05:00
Matthew Wilcox 4c0608f4a0 XArray: Regularise xa_reserve
The xa_reserve() function was a little unusual in that it attempted to
be callable for all kinds of locking scenarios.  Make it look like the
other APIs with __xa_reserve, xa_reserve_bh and xa_reserve_irq variants.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-11-05 16:38:08 -05:00
Matthew Wilcox 0e9446c35a xarray: Add range store functionality
This version of xa_store_range() really only supports load and store.
Our only user only needs basic load and store functionality, so there's
no need to do the extra work to support marking and overlapping stores
correctly yet.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:46:46 -04:00
Matthew Wilcox 371c752dc6 xarray: Track free entries in an XArray
Add the optional ability to track which entries in an XArray are free
and provide xa_alloc() to replace most of the functionality of the IDR.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:46:32 -04:00
Matthew Wilcox 9f14d4f1f1 xarray: Add xa_reserve and xa_release
This function reserves a slot in the XArray for users which need
to acquire multiple locks before storing their entry in the tree and
so cannot use a plain xa_store().

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:46:00 -04:00
Matthew Wilcox 2264f5132f xarray: Add xas_create_range
This hopefully temporary function is useful for users who have not yet
been converted to multi-index entries.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:59 -04:00
Matthew Wilcox 4e99d4e957 xarray: Add xas_for_each_conflict
This iterator iterates over each entry that is stored in the index or
indices specified by the xa_state.  This is intended for use for a
conditional store of a multiindex entry, or to allow entries which are
about to be removed from the xarray to be disposed of properly.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:59 -04:00
Matthew Wilcox 64d3e9a9e0 xarray: Step through an XArray
The xas_next and xas_prev functions move the xas index by one position,
and adjust the rest of the iterator state to match it.  This is more
efficient than calling xas_set() as it keeps the iterator at the leaves
of the tree instead of walking the iterator from the root each time.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:59 -04:00
Matthew Wilcox 687149fca1 xarray: Destroy an XArray
This function frees all the internal memory allocated to the xarray
and reinitialises it to be empty.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:59 -04:00
Matthew Wilcox 80a0a1a9a3 xarray: Extract entries from an XArray
The xa_extract function combines the functionality of
radix_tree_gang_lookup() and radix_tree_gang_lookup_tagged().
It extracts entries matching the specified filter into a normal array.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:58 -04:00
Matthew Wilcox b803b42823 xarray: Add XArray iterators
The xa_for_each iterator allows the user to efficiently walk a range
of the array, executing the loop body once for each entry in that
range that matches the filter.  This commit also includes xa_find()
and xa_find_after() which are helper functions for xa_for_each() but
may also be useful in their own right.

In the xas family of functions, we have xas_for_each(), xas_find(),
xas_next_entry(), xas_for_each_tagged(), xas_find_tagged(),
xas_next_tagged() and xas_pause().

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:58 -04:00
Matthew Wilcox 41aec91f55 xarray: Add XArray conditional store operations
Like cmpxchg(), xa_cmpxchg will only store to the index if the current
entry matches the old entry.  It returns the current entry, which is
usually more useful than the errno returned by radix_tree_insert().
For the users who really only want the errno, the xa_insert() wrapper
provides a more convenient calling convention.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:58 -04:00
Matthew Wilcox 58d6ea3085 xarray: Add XArray unconditional store operations
xa_store() differs from radix_tree_insert() in that it will overwrite an
existing element in the array rather than returning an error.  This is
the behaviour which most users want, and those that want more complex
behaviour generally want to use the xas family of routines anyway.

For memory allocation, xa_store() will first attempt to request memory
from the slab allocator; if memory is not immediately available, it will
drop the xa_lock and allocate memory, keeping a pointer in the xa_state.
It does not use the per-CPU cache, although those will continue to exist
until all radix tree users are converted to the xarray.

This patch also includes xa_erase() and __xa_erase() for a streamlined
way to store NULL.  Since there is no need to allocate memory in order
to store a NULL in the XArray, we do not need to trouble the user with
deciding what memory allocation flags to use.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:57 -04:00
Matthew Wilcox 9b89a03551 xarray: Add XArray marks
XArray marks are like the radix tree tags, only slightly more strongly
typed.  They are renamed in order to distinguish them from tagged
pointers.  This commit adds the basic get/set/clear operations.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:57 -04:00
Matthew Wilcox ad3d6c7263 xarray: Add XArray load operation
The xa_load function brings with it a lot of infrastructure; xa_empty(),
xa_is_err(), and large chunks of the XArray advanced API that are used
to implement xa_load.

As the test-suite demonstrates, it is possible to use the XArray functions
on a radix tree.  The radix tree functions depend on the GFP flags being
stored in the root of the tree, so it's not possible to use the radix
tree functions on an XArray.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-21 10:45:57 -04:00
Matthew Wilcox 01959dfe77 xarray: Define struct xa_node
This is a direct replacement for struct radix_tree_node.  A couple of
struct members have changed name, so convert those.  Use a #define so
that radix tree users continue to work without change.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-10-21 10:45:56 -04:00
Matthew Wilcox f8d5d0cc14 xarray: Add definition of struct xarray
This is a direct replacement for struct radix_tree_root.  Some of the
struct members have changed name; convert those, and use a #define so
that radix_tree users continue to work without change.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-10-21 10:45:53 -04:00
Matthew Wilcox 02c02bf12c xarray: Change definition of sibling entries
Instead of storing a pointer to the slot containing the canonical entry,
store the offset of the slot.  Produces slightly more efficient code
(~300 bytes) and simplifies the implementation.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-09-29 22:47:49 -04:00
Matthew Wilcox 3159f943aa xarray: Replace exceptional entries
Introduce xarray value entries and tagged pointers to replace radix
tree exceptional entries.  This is a slight change in encoding to allow
the use of an extra bit (we can now store BITS_PER_LONG - 1 bits in a
value entry).  It is also a change in emphasis; exceptional entries are
intimidating and different.  As the comment explains, you can choose
to store values or pointers in the xarray and they are both first-class
citizens.

Signed-off-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
2018-09-29 22:47:49 -04:00