Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add zstd compression and decompression support to SquashFS. zstd is a
great fit for SquashFS because it can compress at ratios approaching xz,
while decompressing twice as fast as zlib. For SquashFS in particular,
it can decompress as fast as lzo and lz4. It also has the flexibility
to turn down the compression ratio for faster compression times.
The compression benchmark is run on the file tree from the SquashFS archive
found in ubuntu-16.10-desktop-amd64.iso [1]. It uses `mksquashfs` with the
default block size (128 KB) and and various compression algorithms/levels.
xz and zstd are also benchmarked with 256 KB blocks. The decompression
benchmark times how long it takes to `tar` the file tree into `/dev/null`.
See the benchmark file in the upstream zstd source repository located under
`contrib/linux-kernel/squashfs-benchmark.sh` [2] for details.
I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM.
The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor,
16 GB of RAM, and a SSD.
| Method | Ratio | Compression MB/s | Decompression MB/s |
|----------------|-------|------------------|--------------------|
| gzip | 2.92 | 15 | 128 |
| lzo | 2.64 | 9.5 | 217 |
| lz4 | 2.12 | 94 | 218 |
| xz | 3.43 | 5.5 | 35 |
| xz 256 KB | 3.53 | 5.4 | 40 |
| zstd 1 | 2.71 | 96 | 210 |
| zstd 5 | 2.93 | 69 | 198 |
| zstd 10 | 3.01 | 41 | 225 |
| zstd 15 | 3.13 | 11.4 | 224 |
| zstd 16 256 KB | 3.24 | 8.1 | 210 |
This patch was written by Sean Purcell <me@seanp.xyz>, but I will be
taking over the submission process.
[1] http://releases.ubuntu.com/16.10/
[2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/squashfs-benchmark.sh
zstd source repository: https://github.com/facebook/zstd
Signed-off-by: Sean Purcell <me@seanp.xyz>
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Acked-by: Phillip Lougher <phillip@squashfs.org.uk>
Update fs/pstore and fs/squashfs to use the updated functions from the
new LZ4 module.
Link: http://lkml.kernel.org/r/1486321748-19085-5-git-send-email-4sschmid@informatik.uni-hamburg.de
Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
Cc: Bongkyu Kim <bongkyu.kim@lge.com>
Cc: Rui Salvaterra <rsalvaterra@gmail.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull partial readlink cleanups from Miklos Szeredi.
This is the uncontroversial part of the readlink cleanup patch-set that
simplifies the default readlink handling.
Miklos and Al are still discussing the rest of the series.
* git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
vfs: make generic_readlink() static
vfs: remove ".readlink = generic_readlink" assignments
vfs: default to generic_readlink()
vfs: replace calling i_op->readlink with vfs_readlink()
proc/self: use generic_readlink
ecryptfs: use vfs_get_link()
bad_inode: add missing i_op initializers
If .readlink == NULL implies generic_readlink().
Generated by:
to_del="\.readlink.*=.*generic_readlink"
for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
These inode operations are no longer used; remove them.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This has ll_rw_block users pass in the operation and flags separately,
so ll_rw_block can setup the bio op and bi_rw flags on the bio that
is submitted.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mostly direct substitution with occasional adjustment or removing
outdated comments.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:
- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.
The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull misc vfs updates from Al Viro:
"All kinds of stuff. That probably should've been 5 or 6 separate
branches, but by the time I'd realized how large and mixed that bag
had become it had been too close to -final to play with rebasing.
Some fs/namei.c cleanups there, memdup_user_nul() introduction and
switching open-coded instances, burying long-dead code, whack-a-mole
of various kinds, several new helpers for ->llseek(), assorted
cleanups and fixes from various people, etc.
One piece probably deserves special mention - Neil's
lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
called without ->i_mutex and tries to avoid ever taking it. That, of
course, means that it's not useful for any directory modifications,
but things like getting inode attributes in nfds readdirplus are fine
with that. I really should've asked for moratorium on lookup-related
changes this cycle, but since I hadn't done that early enough... I
*am* asking for that for the coming cycle, though - I'm going to try
and get conversion of i_mutex to rwsem with ->lookup() done under lock
taken shared.
There will be a patch closer to the end of the window, along the lines
of the one Linus had posted last May - mechanical conversion of
->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
inode_is_locked()/inode_lock_nested(). To quote Linus back then:
-----
| This is an automated patch using
|
| sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
| sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
| sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
| sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
| sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
|
| with a very few manual fixups
-----
I'm going to send that once the ->i_mutex-affecting stuff in -next
gets mostly merged (or when Linus says he's about to stop taking
merges)"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
nfsd: don't hold i_mutex over userspace upcalls
fs:affs:Replace time_t with time64_t
fs/9p: use fscache mutex rather than spinlock
proc: add a reschedule point in proc_readfd_common()
logfs: constify logfs_block_ops structures
fcntl: allow to set O_DIRECT flag on pipe
fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
fs: xattr: Use kvfree()
[s390] page_to_phys() always returns a multiple of PAGE_SIZE
nbd: use ->compat_ioctl()
fs: use block_device name vsprintf helper
lib/vsprintf: add %*pg format specifier
fs: use gendisk->disk_name where possible
poll: plug an unused argument to do_poll
amdkfd: don't open-code memdup_user()
cdrom: don't open-code memdup_user()
rsxx: don't open-code memdup_user()
mtip32xx: don't open-code memdup_user()
[um] mconsole: don't open-code memdup_user_nul()
[um] hostaudio: don't open-code memdup_user()
...
Pull vfs xattr updates from Al Viro:
"Andreas' xattr cleanup series.
It's a followup to his xattr work that went in last cycle; -0.5KLoC"
* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr handlers: Simplify list operation
ocfs2: Replace list xattr handler operations
nfs: Move call to security_inode_listsecurity into nfs_listxattr
xfs: Change how listxattr generates synthetic attributes
tmpfs: listxattr should include POSIX ACL xattrs
tmpfs: Use xattr handler infrastructure
btrfs: Use xattr handler infrastructure
vfs: Distinguish between full xattr names and proper prefixes
posix acls: Remove duplicate xattr name definitions
gfs2: Remove gfs2_xattr_acl_chmod
vfs: Remove vfs_xattr_cmp
Change the list operation to only return whether or not an attribute
should be listed. Copying the attribute names into the buffer is moved
to the callers.
Since the result only depends on the dentry and not on the attribute
name, we do not pass the attribute name to list operations.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.
It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.
new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases. page_follow_link_light()
instrumented to yell about anything missed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add an additional "name" field to struct xattr_handler. When the name
is set, the handler matches attributes with exactly that name. When the
prefix is set instead, the handler matches attributes with the given
prefix and with a non-empty suffix.
This patch should avoid bugs like the one fixed in commit c361016a in
the future.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that the xattr handler is passed to the xattr handler operations, we
have access to the attribute name prefix, so simplify the squashfs xattr
handlers a bit.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The xattr_handler operations are currently all passed a file system
specific flags value which the operations can use to disambiguate between
different handlers; some file systems use that to distinguish the xattr
namespace, for example. In some oprations, it would be useful to also have
access to the handler prefix. To allow that, pass a pointer to the handler
to operations instead of the flags value alone.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
list_entry is just a wrapper for container_of, but it is arguably
wrong (and slightly confusing) to use it when the pointed-to struct
member is not a struct list_head. Use container_of directly instead.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add support for reading file systems compressed with the
LZ4 compression algorithm.
This patch adds the LZ4 decompressor wrapper code.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Update the last pr_warning callsite in fs branch
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Direct decompression into the page cache. If we fall back
to using an intermediate buffer (because we cannot grab all the
page cache pages) and we get a decompress fail, we forgot to
release the pages.
Reported-by: Roman Peniaev <r.peniaev@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Fix static checker complaint that stream is not checked in
squashfs_decompressor_destroy().
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>
This introduces an implementation of squashfs_readpage_block()
that directly decompresses into the page cache.
This uses the previously added page handler abstraction to push
down the necessary kmap_atomic/kunmap_atomic operations on the
page cache buffers into the decompressors. This enables
direct copying into the page cache without using the slow
kmap/kunmap calls.
The code detects when multiple threads are racing in
squashfs_readpage() to decompress the same block, and avoids
this regression by falling back to using an intermediate
buffer.
This patch enhances the performance of Squashfs significantly
when multiple processes are accessing the filesystem simultaneously
because it not only reduces memcopying, but it more importantly
eliminates the lock contention on the intermediate buffer.
Using single-thread decompression.
dd if=file1 of=/dev/null bs=4096 &
dd if=file2 of=/dev/null bs=4096 &
dd if=file3 of=/dev/null bs=4096 &
dd if=file4 of=/dev/null bs=4096
Before:
629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s
After:
629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Restructure squashfs_readpage() splitting it into separate
functions for datablocks, fragments and sparse blocks.
Move the memcpying (from squashfs cache entry) implementation of
squashfs_readpage_block into file_cache.c
This allows different implementations to be supported.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Further generalise the decompressors by adding a page handler
abstraction. This adds helpers to allow the decompressors
to access and process the output buffers in an implementation
independant manner.
This allows different types of output buffer to be passed
to the decompressors, with the implementation specific
aspects handled at decompression time, but without the
knowledge being held in the decompressor wrapper code.
This will allow the decompressors to handle Squashfs
cache buffers, and page cache pages.
This patch adds the abstraction and an implementation for
the caches.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Add a multi-threaded decompression implementation which uses
percpu variables.
Using percpu variables has advantages and disadvantages over
implementations which do not use percpu variables.
Advantages:
* the nature of percpu variables ensures decompression is
load-balanced across the multiple cores.
* simplicity.
Disadvantages: it limits decompression to one thread per core.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Now squashfs have used for only one stream buffer for decompression
so it hurts parallel read performance so this patch supports
multiple decompressor to enhance performance parallel I/O.
Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.
dd if=test/test1.dat of=/dev/null &
dd if=test/test2.dat of=/dev/null &
dd if=test/test3.dat of=/dev/null &
dd if=test/test4.dat of=/dev/null &
old : 1m39s -> new : 9s
* From v1
* Change comp_strm with decomp_strm - Phillip
* Change/add comments - Phillip
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
The decompressor interface and code was written from
the point of view of single-threaded operation. In doing
so it mixed a lot of single-threaded implementation specific
aspects into the decompressor code and elsewhere which makes it
difficult to seamlessly support multiple different decompressor
implementations.
This patch does the following:
1. It removes compressor_options parsing from the decompressor
init() function. This allows the decompressor init() function
to be dynamically called to instantiate multiple decompressors,
without the compressor options needing to be read and parsed each
time.
2. It moves threading and all sleeping operations out of the
decompressors. In doing so, it makes the decompressors
non-blocking wrappers which only deal with interfacing with
the decompressor implementation.
3. It splits decompressor.[ch] into decompressor generic functions
in decompressor.[ch], and moves the single threaded
decompressor implementation into decompressor_single.c.
The result of this patch is Squashfs should now be able to
support multiple decompressors by adding new decompressor_xxx.c
files with specialised implementations of the functions in
decompressor_single.c
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reviewed-by: Minchan Kim <minchan@kernel.org>
We read the type field from disk. This value should be sanity
checked for correctness to avoid an out of bounds access when
reading the squashfs_filetype_table array.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
We read the size (of the name) field from disk. This value should
be sanity checked for correctness to avoid blindly reading
huge amounts of unnecessary data from disk on corruption.
Note, here we're not actually reading the name into a buffer, but
skipping it, and so corruption doesn't cause buffer overflow, merely
lots of unnecessary amounts of data to be read.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
The dir_count and size fields when read from disk are sanity
checked for correctness. However, the sanity checks only check the
values are not greater than expected. As dir_count and size were
incorrectly defined as signed ints, this can lead to corrupted values
appearing as negative which are not trapped.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
The dir_count and size fields when read from disk are sanity
checked for correctness. However, the sanity checks only check the
values are not greater than expected. As dir_count and size were
incorrectly defined as signed ints, this can lead to corrupted values
appearing as negative which are not trapped.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Patch "Squashfs: sanity check information from disk" from
Dan Carpenter adds a missing check for corruption in the
"size" field while reading the directory index from disk.
It, however, sets err to -EINVAL, this value is not used later, and
so setting it is completely redundant. So remove it.
Errors in reading the index are deliberately non-fatal. If we
get an error in reading the index we just return the part of the
index we have managed to read - the index isn't essential,
just quicker.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Merged the two for loops. We might get a little gain by overlapping
wait_on_bh and the memcpy operations.
Signed-off-by: Manish Sharma <manishrma@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
We read the size of the name from the disk, but a larger name than
expected would cause memory corruption.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Somehow I failed to add the MODULE_ALIAS_FS for cifs, hostfs, hpfs,
squashfs, and udf despite what I thought were my careful checks :(
Add them now.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.
Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPc65LAAoJEJAch/D1fbHUEswP/RzwQ7NFsGe+RgC7YygUn+nP
qQma8CUBXMPNKLzGl+BGqG+TtNdBDEVGvNrHYtfn54tI7N1qliHTcMqfHv4SF1nZ
QFKwqhBpEOFSbHQg4ts/N01Pa2Ilqw2A4L/8CBjqUkEZ8qOorI8sLp1Xb254YoVk
G4oP+dY/YIEXZxYhIerevIkpNElkqTB2dZAZ/uhNcdHkKIRyAvqyay6F04YdGqI+
r2JfzhPS0T70PbrBHur1ed7iAHYOtgrxgB89CS3jJ5X1iG+iK8i+Xsn18fBFOQPd
ULaSwfdJY5xALqxrEyuO1VxP1uEGAmn2+aOPQP/KIapLmIBGaZXHjC8H3uMqGhZ6
/Y6ZnoH4NJBANn+HXN3iwqQZ8+cw+HUgzJdyZwp6d8SEBM1KsESWeR2t+U6Zvr8L
sLS5inXjbS3O6B07GV58liyCFLEXtmEHj3GCtnnWvp44Vjax57hbegQzKxe8C+3D
YqBf/fx9WKIA5Ojbx5fGUaz7BQ2fczMuzrwNQB05bZAdHqSuKs3dWpGpjtKcelwp
k1BO+kstuwE/dRiAxWZ3lpMQ9GLNmAGg1DgqWEKRMXuThwgxhXwf0sAshhGYCmL6
IdkUqC95Be8/D5i9yxbY8TGIV4rmhV8xDR9j8cIVHtiqbuGZT1jPhXUXXJVWw/y2
Us/awa6sC0qlL18jHV35
=ywVL
-----END PGP SIGNATURE-----
Merge tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next
Pull squashfs updates from Phillip Lougher:
"Add an extra mount time sanity check, plus some code cleanups and bug
fixes."
* tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
Squashfs: add mount time sanity check for block_size and block_log match
Squashfs: fix f_pos check in get_dir_index_using_offset
Squashfs: get rid of obsolete definitions in header file
Squashfs: remove redundant length initialisation in squashfs_lookup
Squashfs: remove redundant length initialisation in squashfs_readdir
Squashfs: update comment removing reference to zlib only
Squashfs: use define instead of constant
Pull vfs pile 1 from Al Viro:
"This is _not_ all; in particular, Miklos' and Jan's stuff is not there
yet."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
ext4: initialization of ext4_li_mtx needs to be done earlier
debugfs-related mode_t whack-a-mole
hfsplus: add an ioctl to bless files
hfsplus: change finder_info to u32
hfsplus: initialise userflags
qnx4: new helper - try_extent()
qnx4: get rid of qnx4_bread/qnx4_getblk
take removal of PF_FORKNOEXEC to flush_old_exec()
trim includes in inode.c
um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
um: embed ->stub_pages[] into mmu_context
gadgetfs: list_for_each_safe() misuse
ocfs2: fix leaks on failure exits in module_init
ecryptfs: make register_filesystem() the last potential failure exit
ntfs: forgets to unregister sysctls on register_filesystem() failure
logfs: missing cleanup on register_filesystem() failure
jfs: mising cleanup on register_filesystem() failure
make configfs_pin_fs() return root dentry on success
configfs: configfs_create_dir() has parent dentry in dentry->d_parent
configfs: sanitize configfs_create()
...
Squashfs currently has a sanity check for block_size less than or
equal to the maximum block_size (1 Mbyte). This catches some
superblock corruption, but obviously with a block_size maximum
of 1 Mbyte there's 7 correct values (4K, 8K, 16K, 32K, ... etc) and
a lot of incorrect values which are not caught by this check.
The Squashfs superblock, however, has both a block_size and
a block_log (2^block_log == block_size). Checking that the block_size
matches the block_log is a much more robust check. Corruption of the
superblock is unlikely to produce values which match, and it also
ensures the block_size is an exact power of two.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
One off error in the f_pos check. If f_pos is 3 or less don't
bother reading the index because we're at the start of the
directory, and we obviously already know where that is on disk.
This eliminates an unnecessary read.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Most of these were never used by the kernel code, but belong to
the time when the header file was used by both the kernel code
and the user space tools.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Comment was written when Squashfs only supported zlib compression.
This comment is now misleading given Squashfs supports other
compression algorithms.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Its better to use defined name instead of constant
Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The le64_to_cpu() forces the calculation to be unsigned, with
the effect that it can underflow leading to an incorrect large
value.
This bug only triggers in rare(ish) circumstances, an empty file
encoded as an extended regular file or a completely sparse file.
Normally empty files are encoded as a regular file rather than as
an extended regular file (and the regular file i_blocks calculation
doesn't have this bug). To save space regular file inodes are
optimised to encode the most commonly occurring files. Less
common regular files are encoded using extended regular file inodes
which contain extra information.
Empty files with nlinks greater than 1, and or empty files
with extended attributes are encoded using extended regular file
inodes and they will hit this bug.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
A Squashfs filesystem containing nothing but an empty directory,
although unusual and ultimately pointless, is still valid.
The directory_table >= next_table sanity check rejects these
filesystems as invalid because the directory_table is empty and
equal to next_table.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
squashfs_cache_get() iterates over all entries to search for
block its looking for. Often get() / put() are called for
same block.
If we cache the current entry index, then we can optimise the
subsequent *_get() calls.
Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
This commit adds an option to set the device block size used to 4K.
By default Squashfs sets the device block size (sb_min_blocksize) to 1K
or the smallest block size supported by the block device (if larger).
This, because blocks are packed together and unaligned in Squashfs,
should reduce latency.
This, however, gives poor performance on MTD NAND devices where
the optimal I/O size is 4K (even though the devices can support
smaller block sizes).
Using a 4K device block size may also improve overall I/O
performance for some file access patterns (e.g. sequential
accesses of files in filesystem order) on all media.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.
Fix these broken references, sometimes by dropping the irrelevant text
they were part of.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
Squashfs: Make ZLIB compression support optional
Squashfs: Update documentation for XZ and add squashfs-tools devel tree
Squashfs now supports XZ and LZO compression in addition to ZLIB.
As such it no longer makes sense to always include ZLIB support.
In particular embedded systems may only use LZO or XZ compression, and
the ability to exclude ZLIB support will reduce kernel size.
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
d_splice_alias(NULL, dentry) is equivalent to d_add(dentry, NULL), NULL
so no need for that if (inode) ... in there (or ERR_PTR(0), for that
matter)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
Squashfs: update email address
Squashfs: add extra sanity checks at mount time
Squashfs: add sanity checks to fragment reading at mount time
Squashfs: add sanity checks to lookup table reading at mount time
Squashfs: add sanity checks to id reading at mount time
Squashfs: add sanity checks to xattr reading at mount time
Squashfs: reverse order of filesystem table reading
Squashfs: move table allocation into squashfs_read_table()
My existing email address may stop working in a month or two, so update
email to one that will continue working.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Fsfuzzer generates corrupted filesystems which throw a warn_on in
kmalloc. One of these is due to a corrupted superblock fragments field.
Fix this by checking that the number of bytes to be read (and allocated)
does not extend into the next filesystem structure.
Also add a couple of other sanity checks of the mount-time fragment table
structures.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Fsfuzzer generates corrupted filesystems which throw a warn_on in
kmalloc. One of these is due to a corrupted superblock inodes field.
Fix this by checking that the number of bytes to be read (and allocated)
does not extend into the next filesystem structure.
Also add a couple of other sanity checks of the mount-time lookup table
structures.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Fsfuzzer generates corrupted filesystems which throw a warn_on in
kmalloc. One of these is due to a corrupted superblock no_ids field.
Fix this by checking that the number of bytes to be read (and allocated)
does not extend into the next filesystem structure.
Also add a couple of other sanity checks of the mount-time id table
structures.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Reverse order of table reading from mostly first to last in placement
order, to last to first. This is to enable extra superblock sanity
checks to be added in later patches.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Bugzilla bug 31422 reports occasional "page allocation failure. order:4"
at Squashfs mount time. Fix this by making zlib workspace allocation
use vmalloc rather than kmalloc.
Reported-by: Mehmet Giritli <mehmet@giritli.eu>
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Handle the rare case where a directory metadata block is uncompressed and
corrupted, leading to a kernel oops in directory scanning (memcpy).
Normally corruption is detected at the decompression stage and dealt with
then, however, this will not happen if:
- metadata isn't compressed (users can optionally request no metadata
compression), or
- the compressed metadata block was larger than the original, in which
case the uncompressed version was used, or
- the data was corrupt after decompression
This patch fixes this by adding some sanity checks against known maximum
values.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Squashfs_get_sb() to squashfs_mount() conversion (commit 152a0836)
results in line over 80 characters.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Pass the dictionary size used to compress datablocks. Using a
dictionary size less than the block size saves memory overhead, in many
cases without adversely affecting compression ratio.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Extend decompressor framework to handle compression options stored in
the filesystem. These options can be used by the relevant decompressor
at initialisation time to over-ride defaults.
The presence of compression options in the filesystem is indicated by
the COMP_OPT filesystem flag. If present the data is read from the
filesystem and passed to the decompressor init function. The decompressor
init function signature has been extended to take this data.
Also update the init function signature in the glib, lzo and xz
decompressor wrappers.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Fix potential use of uninitialised variable caused by recent
decompressor code optimisations.
In zlib_uncompress (zlib_wrapper.c) we have
int zlib_err, zlib_init = 0;
...
do {
...
if (avail == 0) {
offset = 0;
put_bh(bh[k++]);
continue;
}
...
zlib_err = zlib_inflate(stream, Z_SYNC_FLUSH);
...
} while (zlib_err == Z_OK);
If continue is executed (avail == 0) then the while condition will be
evaluated testing zlib_err, which is uninitialised first time around the
loop.
Fix this by getting rid of the 'if (avail == 0)' condition test, this
edge condition should not be being handled in the decompressor code, and
instead handle it generically in the caller code.
Similarly for xz_wrapper.c.
Incidentally, on most architectures (bar Mips and Parisc), no
uninitialised variable warning is generated by gcc, this is because the
while condition test on continue is optimised out and not performed
(when executing continue zlib_err has not been changed since entering
the loop, and logically if the while condition was true previously, then
it's still true).
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Reported-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>