Create a new block/early-lookup.c to keep the early block device lookup
code instead of having this code sit with the early mount code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
name_to_dev_t has a very misleading name, that doesn't make clear
it should only be used by the early init code, and also has a bad
calling convention that doesn't allow returning different kinds of
errors. Rename it to early_lookup_bdev to make the use case clear,
and return an errno, where -EINVAL means the string could not be
parsed, and -ENODEV means it the string was valid, but there was
no device found for it.
Also stub out the whole call for !CONFIG_BLOCK as all the non-block
root cases are always covered in the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The nfs/cifs/ram special case only needs to be parsed once, and only in
the boot code. Move them out of name_to_dev_t and into
prepare_namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230531125535.676098-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make it possible to disable the MOPS extension at runtime using the
kernel command line. This can be useful for testing or working around
hardware issues. For example it could be used to test new memory copy
routines that do not use MOPS instructions (e.g. from Arm Optimized
Routines).
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20230509142235.3284028-11-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Initialize the ACPI core for RISC-V during boot.
ACPI tables and interpreter are initialized based on
the information passed from the firmware and the value of
the kernel parameter 'acpi'.
With ACPI support added for RISC-V, the kernel parameter 'acpi'
is also supported on RISC-V. Hence, update the documentation.
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-9-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add a new command line option "mtrr=debug" for getting debug output
after building the new cache mode map. The output will include MTRR
register values and the resulting map.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230502120931.20719-13-jgross@suse.com
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
The net.core.txrehash documentation mentions this knob is for listening
sockets only, while sk_rethink_txhash can be called on SYN and RTO
retransmits on all TCP sockets.
Remove the listening socket part.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The fs/cifs directory has moved to fs/smb/client, correct mentions
of this in Documentation and comments.
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The descriptions of certain KVM related kernel parameters can be
confusing. They state "Disable ...," which may make people think that
setting them to 1 will disable the associated feature when in fact the
opposite is true.
This commit addresses this issue by revising the descriptions of these
parameters by using "Control..." rather than "Enable/Disable...".
1==enabled or 0==disabled can be communicated by the description of
default value such as "1 (enabled)" or "0 (disabled)".
Also update the description of KVM's default value for kvm-intel.nested
as it is enabled by default.
Signed-off-by: Yan-Jie Wang <yanjiewtw@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20230503081530.19956-1-yanjiewtw@gmail.com
Information about intel_pstate active mode is added in the doc.
This operation mode could be used to set on the hardware when it's
not activated. Status of the mode could be checked from sysfs file
i.e., /sys/devices/system/cpu/intel_pstate/status.
The information is already available in cpu-freq/intel-pstate.txt
documentation.
Signed-off-by: Natesh Sharma <nsharma@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
[jc: reformatted for width ]
Link: https://lore.kernel.org/r/20230427083706.49882-1-nsharma@redhat.com
Workqueue now automatically marks per-cpu work items that hog CPU for too
long as CPU_INTENSIVE, which excludes them from concurrency management and
prevents stalling other concurrency-managed work items. If a work function
keeps running over the thershold, it likely needs to be switched to use an
unbound workqueue.
This patch adds a debug mechanism which tracks the work functions which
trigger the automatic CPU_INTENSIVE mechanism and report them using
pr_warn() with exponential backoff.
v3: Documentation update.
v2: Drop bouncing to kthread_worker for printing messages. It was to avoid
introducing circular locking dependency through printk but not effective
as it still had pool lock -> wci_lock -> printk -> pool lock loop. Let's
just print directly using printk_deferred().
Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
If a per-cpu work item hogs the CPU, it can prevent other work items from
starting through concurrency management. A per-cpu workqueue which intends
to host such CPU-hogging work items can choose to not participate in
concurrency management by setting %WQ_CPU_INTENSIVE; however, this can be
error-prone and difficult to debug when missed.
This patch adds an automatic CPU usage based detection. If a
concurrency-managed work item consumes more CPU time than the threshold
(10ms by default) continuously without intervening sleeps, wq_worker_tick()
which is called from scheduler_tick() will detect the condition and
automatically mark it CPU_INTENSIVE.
The mechanism isn't foolproof:
* Detection depends on tick hitting the work item. Getting preempted at the
right timings may allow a violating work item to evade detection at least
temporarily.
* nohz_full CPUs may not be running ticks and thus can fail detection.
* Even when detection is working, the 10ms detection delays can add up if
many CPU-hogging work items are queued at the same time.
However, in vast majority of cases, this should be able to detect violations
reliably and provide reasonable protection with a small increase in code
complexity.
If some work items trigger this condition repeatedly, the bigger problem
likely is the CPU being saturated with such per-cpu work items and the
solution would be making them UNBOUND. The next patch will add a debug
mechanism to help spot such cases.
v4: Documentation for workqueue.cpu_intensive_thresh_us added to
kernel-parameters.txt.
v3: Switch to use wq_worker_tick() instead of hooking into preemptions as
suggested by Peter.
v2: Lai pointed out that wq_worker_stopping() also needs to be called from
preemption and rtlock paths and an earlier patch was updated
accordingly. This patch adds a comment describing the risk of infinte
recursions and how they're avoided.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
* improve the short description of localmodconfig in the step-by-step
guide while fixing its broken first sentence
* briefly mention immutable Linux distributions
* use '--shallow-exclude=v6.0' throughout the document
* instead of "git reset --hard; git checkout ..." use "git checkout
--force ..." in the step-by-step guide: this matches the TLDR and is
one command less to execute. This led to a few small adjustments to
the text and the flow in the surrounding area.
* fix two thinkos in the section explaining full git clones
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Link: https://lore.kernel.org/r/6f4684b9a5d11d3adb04e0af3cfc60db8b28eeb2.1684140700.git.linux@leemhuis.info
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Sometimes the one-line ORC unwinder warnings aren't very helpful. Add a
new 'unwind_debug' cmdline option which will dump the full stack
contents of the current task when an error condition is encountered.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/6afb9e48a05fd2046bfad47e69b061b43dfd0e0e.1681331449.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
If you build a kernel with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y,
then run the rcutorture tests specifying stalls as follows:
runqemu kvm slirp nographic qemuparams="-m 1024 -smp 4" \
bootparams="console=ttyS0 rcutorture.stall_cpu=30 \
rcutorture.stall_no_softlockup=1 rcutorture.stall_cpu_block=1" -d
The tests will produce the following splat:
[ 10.841071] rcu-torture: rcu_torture_stall begin CPU stall
[ 10.841073] rcu_torture_stall start on CPU 3.
[ 10.841077] BUG: scheduling while atomic: rcu_torture_sta/66/0x0000000
....
[ 10.841108] Call Trace:
[ 10.841110] <TASK>
[ 10.841112] dump_stack_lvl+0x64/0xb0
[ 10.841118] dump_stack+0x10/0x20
[ 10.841121] __schedule_bug+0x8b/0xb0
[ 10.841126] __schedule+0x2172/0x2940
[ 10.841157] schedule+0x9b/0x150
[ 10.841160] schedule_timeout+0x2e8/0x4f0
[ 10.841192] schedule_timeout_uninterruptible+0x47/0x50
[ 10.841195] rcu_torture_stall+0x2e8/0x300
[ 10.841199] kthread+0x175/0x1a0
[ 10.841206] ret_from_fork+0x2c/0x50
This is because the rcutorture.stall_cpu_block=1 module parameter causes
rcu_torture_stall() to invoke schedule_timeout_uninterruptible() within
an RCU read-side critical section. This in turn results in a quiescent
state (which prevents the stall) and a sleep in an atomic context (which
produces the above splat).
Although this code is operating as designed, the design has proven to
be counterintuitive to many. This commit therefore updates the description
in kernel-parameters.txt accordingly.
[ paulmck: Apply Joel Fernandes feedback. ]
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
There is often significant latency in the early stages of CPU bringup, and
time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86) and
then waiting for it to respond before moving on to the next.
Allow a platform to enable parallel setup which brings all to be onlined
CPUs up to the CPUHP_BP_KICK_AP state. While this state advancement on the
control CPU (BP) is single-threaded the important part is the last state
CPUHP_BP_KICK_AP which wakes the to be onlined CPUs up.
This allows the CPUs to run up to the first sychronization point
cpuhp_ap_sync_alive() where they wait for the control CPU to release them
one by one for the full onlining procedure.
This parallelism depends on the CPU hotplug core sync mechanism which
ensures that the parallel brought up CPUs wait for release before touching
any state which would make the CPU visible to anything outside the hotplug
control mechanism.
To handle the SMT constraints of X86 correctly the bringup happens in two
iterations when CONFIG_HOTPLUG_SMT is enabled. The control CPU brings up
the primary SMT threads of each core first, which can load the microcode
without the need to rendevouz with the thread siblings. Once that's
completed it brings up the secondary SMT threads.
Co-developed-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205257.240231377@linutronix.de
This was introduced together with commit e1c467e690 ("x86, hotplug: Wake
up CPU0 via NMI instead of INIT, SIPI, SIPI") to eventually support
physical hotplug of CPU0:
"We'll change this code in the future to wake up hard offlined CPU0 if
real platform and request are available."
11 years later this has not happened and physical hotplug is not officially
supported. Remove the cruft.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205255.715707999@linutronix.de
This commit adds kernel-parameters.txt documentation for the
rcutree.rcu_resched_ns module parameter.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
An SoC can contain multiple power domains with individual or collection
of mesh partitions. This partition is called fabric cluster.
Certain type of meshes will need to run at the same frequency, they will
be placed in the same fabric cluster. Benefit of fabric cluster is that
it offers a scalable mechanism to deal with partitioned fabrics in a SoC.
The current sysfs interface supports control at package and die level.
This interface is not enough to support more granular control at
fabric cluster level.
SoCs with the support of TPMI (Topology Aware Register and PM Capsule
Interface), can have multiple power domains. Each power domain can
contain one or more fabric clusters.
To support such granular controls, enhance uncore common to optionally
create new directories to provide controls at fabric cluster level. It
is also important to have flexibility to change granularity for future
version of SoCs. If the directory name contains scope like:
"package_*_die_*_power_domain_*_cluster_*", then this is not expandable.
The cpufreq policies also have different scopes. There the scope of the
policy (affected_cpus) specified by attributes inside each policy.
So, follow the same model for uncore frequency scaling sysfs as:
"sys/devices/system/cpu/cpufreq/policy*"
Allow client drivers to optionally support granular control for each
fabric cluster. Here, the directory name will be "uncore" suffixed with
an unique instance number. For example: uncore00, uncore01 etc.
Attributes in the directory identify package id, power domain and
fabric cluster id. This interface is expandable even if some new level
of granularity is introduced. A new sysfs attribute can identify new
level.
For compatibility with the existing sysfs and provide easy way to set
limits for each fabric cluster in the package/die, the existing control
at package/die levels are still provided. For majority of users, this is
an easy approach.
For example: On a single package/die system, with three power domains
and one fabric cluster per power domain:
$tree -L 2 /sys/devices/system/cpu/intel_uncore_frequency/
/sys/devices/system/cpu/intel_uncore_frequency/
├── package_00_die_00
│ ├── current_freq_khz
│ ├── initial_max_freq_khz
│ ├── initial_min_freq_khz
│ ├── max_freq_khz
│ └── min_freq_khz
├── uncore00
│ ├── current_freq_khz
│ ├── domain_id
│ ├── fabric_cluster_id
│ ├── initial_max_freq_khz
│ ├── initial_min_freq_khz
│ ├── max_freq_khz
│ ├── min_freq_khz
│ └── package_id
├── uncore01
│ ├── current_freq_khz
│ ├── domain_id
│ ├── fabric_cluster_id
│ ├── initial_max_freq_khz
│ ├── initial_min_freq_khz
│ ├── max_freq_khz
│ ├── min_freq_khz
│ └── package_id
└── uncore02
├── current_freq_khz
├── domain_id
├── fabric_cluster_id
├── initial_max_freq_khz
├── initial_min_freq_khz
├── max_freq_khz
├── min_freq_khz
└── package_id
The attribute for cluster id is "fabric_cluster_id" instead of just
"cluster_id" is to avoid confusion with usage of term clusters in
other part of the Linux kernel.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wendy Wang <wendy.wang@intel.com>
Link: https://lore.kernel.org/r/20230418171340.681662-3-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmRWLQYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnwhD/4xRYfAY4O0oGZCITiKEEFxiSfDHCPMEBji
zTEtideDRjcrpFmjmZ411C9prW32MxYQ3wQTf4O7w4t906xTYVr9FQy8g3Et4izI
zglPcsa2jPzYCadQ0Ye4n9dRuYOH9FDJzDDLC+smu8zQKKmAqEAN/1ftpnADdVrY
qX1sHyfz4RQRAgTHg2WqgKOi2O9VwGSMwOfBocmzAnruv9oLUlypcGnFPRCSZH/a
OKpUZlvQhCBZTKScvVxQeJMg2Tl5yokQ0TH+gkQsdav9XcPJktqXiuD+c4h6q5ux
oTlysEqcrwcaAafOcV0w9u80SFNlFACUYsNmEnFJaPXFTqAdNHvo1DJNsmxiHJDU
bGo5ktlo5b/VZ51niOoWvGxursavq16G4yIYlGGHc7f4wGs12oc5ZP/yM3GRUY+C
PdezwEvvQufxP7sFokfpgAS4SuH+tBlrhFXMsYaI4NukZQW4TK1zzbMrzOkdxFhW
BOx17VFUKWtUnRmxinFGIA8Vj+FXN+E+ND+FoDsbrMyJD4maKDdJapPchG0J0Vbs
pDcsB4c0pBC6H2xrobKiA1CuSq2t2qvyvwe1Zl2Xd+RVW9vBB5SI6HXYrC+UtxwY
7LfX8F13cFD1E6iJ9Nta6x8fOunGnOVBdW5O0k4hDWEuZduvHItEDn2c3Ehqp4Jw
P8dFBbk8SQ==
=gAYf
-----END PGP SIGNATURE-----
Merge tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux
Pull more block updates from Jens Axboe:
- MD pull request via Song:
- Improve raid5 sequential IO performance on spinning disks, which
fixes a regression since v6.0 (Jan Kara)
- Fix bitmap offset types, which fixes an issue introduced in this
merge window (Jonathan Derrick)
- Cleanup of hweight type used for cgroup writeback (Maxim)
- Fix a regression with the "has_submit_bio" changes across partitions
(Ming)
- Cleanup of QUEUE_FLAG_ADD_RANDOM clearing.
We used to set this flag on queues non blk-mq queues, and hence some
drivers clear it unconditionally. Since all of these have since been
converted to true blk-mq drivers, drop the useless clear as the bit
is not set (Chaitanya)
- Fix the flags being set in a bio for a flush for drbd (Christoph)
- Cleanup and deduplication of the code handling setting block device
capacity (Damien)
- Fix for ublk handling IO timeouts (Ming)
- Fix for a regression in blk-cgroup teardown (Tao)
- NBD documentation and code fixes (Eric)
- Convert blk-integrity to using device_attributes rather than a second
kobject to manage lifetimes (Thomas)
* tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux:
ublk: add timeout handler
drbd: correctly submit flush bio on barrier
mailmap: add mailmap entries for Jens Axboe
block: Skip destroyed blkg when restart in blkg_destroy_all()
writeback: fix call of incorrect macro
md: Fix bitmap offset type in sb writer
md/raid5: Improve performance for sequential IO
docs nbd: userspace NBD now favors github over sourceforge
block nbd: use req.cookie instead of req.handle
uapi nbd: add cookie alias to handle
uapi nbd: improve doc links to userspace spec
blk-integrity: register sysfs attributes on struct device
blk-integrity: convert to struct device_attribute
blk-integrity: use sysfs_emit
block/drivers: remove dead clear of random flag
block: sync part's ->bd_has_submit_bio with disk's
block: Cleanup set_capacity()/bdev_set_nr_sectors()
translation that has been ready for some time but got applied late.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmRT2nkACgkQF0NaE2wM
flglRwf/fLxKnCmh/j6zVjXVjb00UWj9NV5ZmrrnFxU+ajyzejoiKeyYvXcdiKKp
R1PpVJPF9ZO2DQ43QEj01SAl3qyMKWbvbISg5L/btOQtV563h7zktyjULOai7UfF
6+svqWi2Cl0gqdix3AVICZryRYBNBY62PeIWcWka+VXmMGCijwLf/jRRXwNa/7kf
kye9C5UG01uGauAw2t4Ol/jbIsZa4ID9FiUHJI/aQfLCgqVVO9pVQXA1igDF743Y
vt4IGX4qsjKGsAAOMWfieFGozcCwQsc01hC2usa8yx36ov6EBT79hmoWMWHsLeHJ
NcLfF7LZCJNUD3g+c1hLhfjjYMufaw==
=dljY
-----END PGP SIGNATURE-----
Merge tag 'docs-6.4-2' of git://git.lwn.net/linux
Pull more documentation updates from Jonathan Corbet:
"A handful of late-arriving documentation fixes, plus one Spanish
translation that has been ready for some time but got applied late"
* tag 'docs-6.4-2' of git://git.lwn.net/linux:
docs/sp_SP: Add translation of process/adding-syscalls
CREDITS: Update email address for Mat Martineau
Documentation: update kernel stack for x86_64
docs: Remove unnecessary unicode character
docs: fix "Reviewd" typo
Documentation: timers: hrtimers: Make hybrid union historical
docs/admin-guide/mm/ksm.rst fix intraface -> interface typo
doc:it_IT: fix some typos
o Added detailed design documentation for the upcoming online repair feature
o major update to online scrub to complete the reverse mapping cross-referencing
infrastructure enabling us to fully validate allocated metadata against owner
records. This is the last piece of scrub infrastructure needed before we can
start merging online repair functionality.
o Fixes for the ascii-ci hashing issues
o deprecation of the ascii-ci functionality
o on-disk format verification bug fixes
o various random bug fixes for syzbot and other bug reports
Signed-off-by: Dave Chinner <david@fromorbit.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmRMTZ8UHGRhdmlkQGZy
b21vcmJpdC5jb20ACgkQregpR/R1+h3XtA//bZYjsYRU3hzyGLKee++5t/zbiqZB
KWw8zuPEdEsSAPphK4DQYO7XPWetgFh8iBU39M8TM0+g5YacjzBLGADjQiEv7naN
IxSoElQQzZbvMcUPOnuRaoopn0v7pbWIDRo3hKWaRDKCrnMGOnTvDFuC/VX0RAbn
GzPimbuvaYJPXTnWTwsKeAuVYP4HLdTh2R1gUMjyY80Ed08hxhCzrXSvjEtuxOOy
tDk50wJUhgx7UTgFBsXug1wXLCYwDFvAUjpsBKnmq+vSl0MpI3TdCetmSQbuvAeu
gvkRyBMOcqcY5rlozcKPpyXwy7I0ftXOY4xpUSW8H9tAx0oVImkC69DsAjotQV0r
r6vEtcw7LgmaS9kbA6G2Z4JfKEHuf2d/6OI4onZh25b5SWq7+qFBPo67AoFg8UQf
bKSf3QQNVLTyWqpRf8Z3XOEBygYGsDUuxrm2AA5Aar4t4T3y5oAKFKkf4ZAlAYxH
KViQsq0qVcoQ4k4txZgU7XQrftKyu2csqxqtKDozH7FutxscchZEwvjdQ6jnS2+L
2Qlf6On8edfEkPKzF7/1cgxUXCXuTqakFVetChXjZ1/ZFt9LUYphvESdaolJ8Aqz
lwEy5UrbC2oMrBDT7qESLWs3U66mPhRaaFfuLUJRyhHN3Y0tVVA2mgNzyD6oBQVy
ffIbZ3+1QEPOaOQ=
=lBJy
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.4-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Dave Chinner:
"This consists mainly of online scrub functionality and the design
documentation for the upcoming online repair functionality built on
top of the scrub code:
- Added detailed design documentation for the upcoming online repair
feature
- major update to online scrub to complete the reverse mapping
cross-referencing infrastructure enabling us to fully validate
allocated metadata against owner records. This is the last piece of
scrub infrastructure needed before we can start merging online
repair functionality.
- Fixes for the ascii-ci hashing issues
- deprecation of the ascii-ci functionality
- on-disk format verification bug fixes
- various random bug fixes for syzbot and other bug reports"
* tag 'xfs-6.4-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (107 commits)
xfs: fix livelock in delayed allocation at ENOSPC
xfs: Extend table marker on deprecated mount options table
xfs: fix duplicate includes
xfs: fix BUG_ON in xfs_getbmap()
xfs: verify buffer contents when we skip log replay
xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done
xfs: remove WARN when dquot cache insertion fails
xfs: don't consider future format versions valid
xfs: deprecate the ascii-ci feature
xfs: test the ascii case-insensitive hash
xfs: stabilize the dirent name transformation function used for ascii-ci dir hash computation
xfs: cross-reference rmap records with refcount btrees
xfs: cross-reference rmap records with inode btrees
xfs: cross-reference rmap records with free space btrees
xfs: cross-reference rmap records with ag btrees
xfs: introduce bitmap type for AG blocks
xfs: convert xbitmap to interval tree
xfs: drop the _safe behavior from the xbitmap foreach macro
xfs: don't load local xattr values during scrub
xfs: remove the for_each_xbitmap_ helpers
...
* cpuset changes including the fix for an incorrect interaction with CPU
hotplug and an optimization.
* Other doc and cosmetic changes.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErfng4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGVVtAQCDycK4VSgc4nsFPG1vh1Oy1A723ciEUwAbKmV/
F1n7xwEA68FiDvE29LpMJJuYP9HnX0A5zRMyNnb52kN9jmgcEQI=
=ALol
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset changes including the fix for an incorrect interaction with
CPU hotplug and an optimization
- Other doc and cosmetic changes
* tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1/cpusets: update libcgroup project link
cgroup/cpuset: Minor updates to test_cpuset_prs.sh
cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset
cpuset: Clean up cpuset_node_allowed
cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers
* Support for runtime detection of the Svnapot extension.
* Support for Zicboz when clearing pages.
* We've moved to GENERIC_ENTRY.
* Support for !MMU on rv32 systems.
* The linear region is now mapped via huge pages.
* Support for building relocatable kernels.
* Support for the hwprobe interface.
* Various fixes and cleanups throughout the tree.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmRL5rcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYibpcD/0RnmO+N2OJxsJXf0KtHv4LlChAFaMZ
mfcsU8lv8r3Rz1USJGyVoE57885R+iUw1664ic6Gj9Ll9/A+BDVyqlNeo1BZ7nnv
6hZawSh8XGMyCJoatjaCSMW6VKObsSpHXLoA0mxtj06w1XhtpUnzjv4SZQqBYxC2
7+/cfy6l3uGdSKQ0R402sF8PE+l3HthhO+Cw9NYHQZisAHEQrfFpXRnrovhs+vX0
aVxoWo8bmIhhNke2jh6dnGhfFfAs+UClbaKgZfe8af6feboo+Tal3+OibiEy1K1j
hDQ3w/G5jAdwSqnNPdXzpk4srskUOhP9is8AG79vCasMxybQIBfZcc7/kLmmQX+2
xt1EoDVD/lSO1p+CWRautLXEsInWbpBYaSJie7WcR4SHe8S7/nomTDlwkJHx5cma
mkSYHJKNwCbamDTI3gXg8nrScbxsRnJQsQUolFDwAeRz7AYVwtqVh8VxAWqAdU3q
xUNKrUpCAzNC3d5GL7pmRfZrqjpQhuFXkHFSy85vaCPuckBu926OzxpKBmX4Kea1
qLYWfxv78bcwuY47FWJKcd97Ib63iBYDgarJxvrHrwDaHV2xjBOmdapNPUc2PswT
a938enbYYnJHIbuSmbeNBPF4iF6nKUXshyfZu7tCZl6MzsXloUckGdm++j97Bpvr
g6G3ZP6STSQBmw==
=oxQd
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for runtime detection of the Svnapot extension
- Support for Zicboz when clearing pages
- We've moved to GENERIC_ENTRY
- Support for !MMU on rv32 systems
- The linear region is now mapped via huge pages
- Support for building relocatable kernels
- Support for the hwprobe interface
- Various fixes and cleanups throughout the tree
* tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits)
RISC-V: hwprobe: Explicity check for -1 in vdso init
RISC-V: hwprobe: There can only be one first
riscv: Allow to downgrade paging mode from the command line
dt-bindings: riscv: add sv57 mmu-type
RISC-V: hwprobe: Remove __init on probe_vendor_features()
riscv: Use --emit-relocs in order to move .rela.dyn in init
riscv: Check relocations at compile time
powerpc: Move script to check relocations at compile time in scripts/
riscv: Introduce CONFIG_RELOCATABLE
riscv: Move .rela.dyn outside of init to avoid empty relocations
riscv: Prepare EFI header for relocatable kernels
riscv: Unconditionnally select KASAN_VMALLOC if KASAN
riscv: Fix ptdump when KASAN is enabled
riscv: Fix EFI stub usage of KASAN instrumented strcmp function
riscv: Move DTB_EARLY_BASE_VA to the kernel address space
riscv: Rework kasan population functions
riscv: Split early and final KASAN population functions
riscv: Use PUD/P4D/PGD pages for the linear mapping
riscv: Move the linear mapping creation in its own function
riscv: Get rid of riscv_pfn_base variable
...
- Remove diagnostics and adjust config for CSD lock diagnostics
- Add a generic IPI-sending tracepoint, as currently there's no easy
way to instrument IPI origins: it's arch dependent and for some
major architectures it's not even consistently available.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq
gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5
M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9
lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl
KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT
AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3
JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi
Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF
25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB
11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh
yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK
PA5+V7HcQRk=
=Wp7f
-----END PGP SIGNATURE-----
Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP cross-CPU function-call updates from Ingo Molnar:
- Remove diagnostics and adjust config for CSD lock diagnostics
- Add a generic IPI-sending tracepoint, as currently there's no easy
way to instrument IPI origins: it's arch dependent and for some major
architectures it's not even consistently available.
* tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
trace,smp: Trace all smp_function_call*() invocations
trace: Add trace_ipi_send_cpu()
sched, smp: Trace smp callback causing an IPI
smp: reword smp call IPI comment
treewide: Trace IPIs sent via smp_send_reschedule()
irq_work: Trace self-IPIs sent via arch_irq_work_raise()
smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
sched, smp: Trace IPIs sent via send_call_function_single_ipi()
trace: Add trace_ipi_send_cpumask()
kernel/smp: Make csdlock_debug= resettable
locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
locking/csd_lock: Remove added data from CSD lock debugging
locking/csd_lock: Add Kconfig option for csd_debug default
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page().
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
than exclusive, and has fixed a bunch of errors which were caused by its
unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
=J+Dj
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page()
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
rather than exclusive, and has fixed a bunch of errors which were
caused by its unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics
flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim
accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
shmem: restrict noswap option to initial user namespace
mm/khugepaged: fix conflicting mods to collapse_file()
sparse: remove unnecessary 0 values from rc
mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
maple_tree: fix allocation in mas_sparse_area()
mm: do not increment pgfault stats when page fault handler retries
zsmalloc: allow only one active pool compaction context
selftests/mm: add new selftests for KSM
mm: add new KSM process and sysfs knobs
mm: add new api to enable ksm per process
mm: shrinkers: fix debugfs file permissions
mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
migrate_pages_batch: fix statistics for longterm pin retry
userfaultfd: use helper function range_in_vma()
lib/show_mem.c: use for_each_populated_zone() simplify code
mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
fs/buffer: convert create_page_buffers to folio_create_buffers
fs/buffer: add folio_create_empty_buffers helper
...
While the sourceforge site for userspace NBD still exists, the code
repository moved to github several years ago. Then with a recent
patch[1], the github landing page contains just as much information as
the sourceforge page, so we might as well point to a single location
that also provides the code.
[1] https://lists.debian.org/nbd/2023/03/msg00051.html
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20230410180611.1051618-5-eblake@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The summary of the changes for this pull requests is:
* Song Liu's new struct module_memory replacement
* Nick Alcock's MODULE_LICENSE() removal for non-modules
* My cleanups and enhancements to reduce the areas where we vmalloc
module memory for duplicates, and the respective debug code which
proves the remaining vmalloc pressure comes from userspace.
Most of the changes have been in linux-next for quite some time except
the minor fixes I made to check if a module was already loaded
prior to allocating the final module memory with vmalloc and the
respective debug code it introduces to help clarify the issue. Although
the functional change is small it is rather safe as it can only *help*
reduce vmalloc space for duplicates and is confirmed to fix a bootup
issue with over 400 CPUs with KASAN enabled. I don't expect stable
kernels to pick up that fix as the cleanups would have also had to have
been picked up. Folks on larger CPU systems with modules will want to
just upgrade if vmalloc space has been an issue on bootup.
Given the size of this request, here's some more elaborate details
on this pull request.
The functional change change in this pull request is the very first
patch from Song Liu which replaces the struct module_layout with a new
struct module memory. The old data structure tried to put together all
types of supported module memory types in one data structure, the new
one abstracts the differences in memory types in a module to allow each
one to provide their own set of details. This paves the way in the
future so we can deal with them in a cleaner way. If you look at changes
they also provide a nice cleanup of how we handle these different memory
areas in a module. This change has been in linux-next since before the
merge window opened for v6.3 so to provide more than a full kernel cycle
of testing. It's a good thing as quite a bit of fixes have been found
for it.
Jason Baron then made dynamic debug a first class citizen module user by
using module notifier callbacks to allocate / remove module specific
dynamic debug information.
Nick Alcock has done quite a bit of work cross-tree to remove module
license tags from things which cannot possibly be module at my request
so to:
a) help him with his longer term tooling goals which require a
deterministic evaluation if a piece a symbol code could ever be
part of a module or not. But quite recently it is has been made
clear that tooling is not the only one that would benefit.
Disambiguating symbols also helps efforts such as live patching,
kprobes and BPF, but for other reasons and R&D on this area
is active with no clear solution in sight.
b) help us inch closer to the now generally accepted long term goal
of automating all the MODULE_LICENSE() tags from SPDX license tags
In so far as a) is concerned, although module license tags are a no-op
for non-modules, tools which would want create a mapping of possible
modules can only rely on the module license tag after the commit
8b41fc4454 ("kbuild: create modules.builtin without Makefile.modbuiltin
or tristate.conf"). Nick has been working on this *for years* and
AFAICT I was the only one to suggest two alternatives to this approach
for tooling. The complexity in one of my suggested approaches lies in
that we'd need a possible-obj-m and a could-be-module which would check
if the object being built is part of any kconfig build which could ever
lead to it being part of a module, and if so define a new define
-DPOSSIBLE_MODULE [0]. A more obvious yet theoretical approach I've
suggested would be to have a tristate in kconfig imply the same new
-DPOSSIBLE_MODULE as well but that means getting kconfig symbol names
mapping to modules always, and I don't think that's the case today. I am
not aware of Nick or anyone exploring either of these options. Quite
recently Josh Poimboeuf has pointed out that live patching, kprobes and
BPF would benefit from resolving some part of the disambiguation as
well but for other reasons. The function granularity KASLR (fgkaslr)
patches were mentioned but Joe Lawrence has clarified this effort has
been dropped with no clear solution in sight [1].
In the meantime removing module license tags from code which could never
be modules is welcomed for both objectives mentioned above. Some
developers have also welcomed these changes as it has helped clarify
when a module was never possible and they forgot to clean this up,
and so you'll see quite a bit of Nick's patches in other pull
requests for this merge window. I just picked up the stragglers after
rc3. LWN has good coverage on the motivation behind this work [2] and
the typical cross-tree issues he ran into along the way. The only
concrete blocker issue he ran into was that we should not remove the
MODULE_LICENSE() tags from files which have no SPDX tags yet, even if
they can never be modules. Nick ended up giving up on his efforts due
to having to do this vetting and backlash he ran into from folks who
really did *not understand* the core of the issue nor were providing
any alternative / guidance. I've gone through his changes and dropped
the patches which dropped the module license tags where an SPDX
license tag was missing, it only consisted of 11 drivers. To see
if a pull request deals with a file which lacks SPDX tags you
can just use:
./scripts/spdxcheck.py -f \
$(git diff --name-only commid-id | xargs echo)
You'll see a core module file in this pull request for the above,
but that's not related to his changes. WE just need to add the SPDX
license tag for the kernel/module/kmod.c file in the future but
it demonstrates the effectiveness of the script.
Most of Nick's changes were spread out through different trees,
and I just picked up the slack after rc3 for the last kernel was out.
Those changes have been in linux-next for over two weeks.
The cleanups, debug code I added and final fix I added for modules
were motivated by David Hildenbrand's report of boot failing on
a systems with over 400 CPUs when KASAN was enabled due to running
out of virtual memory space. Although the functional change only
consists of 3 lines in the patch "module: avoid allocation if module is
already present and ready", proving that this was the best we can
do on the modules side took quite a bit of effort and new debug code.
The initial cleanups I did on the modules side of things has been
in linux-next since around rc3 of the last kernel, the actual final
fix for and debug code however have only been in linux-next for about a
week or so but I think it is worth getting that code in for this merge
window as it does help fix / prove / evaluate the issues reported
with larger number of CPUs. Userspace is not yet fixed as it is taking
a bit of time for folks to understand the crux of the issue and find a
proper resolution. Worst come to worst, I have a kludge-of-concept [3]
of how to make kernel_read*() calls for modules unique / converge them,
but I'm currently inclined to just see if userspace can fix this
instead.
[0] https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/
[1] https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com
[2] https://lwn.net/Articles/927569/
[3] https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRG4m0SHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinQ2oP/0xlvKwJg6Ey8fHZF0qv8VOskE80zoLF
hMazU3xfqLA+1TQvouW1YBxt3jwS3t1Ehs+NrV+nY9Yzcm0MzRX/n3fASJVe7nRr
oqWWQU+voYl5Pw1xsfdp6C8IXpBQorpYby3Vp0MAMoZyl2W2YrNo36NV488wM9KC
jD4HF5Z6xpnPSZTRR7AgW9mo7FdAtxPeKJ76Bch7lH8U6omT7n36WqTw+5B1eAYU
YTOvrjRs294oqmWE+LeebyiOOXhH/yEYx4JNQgCwPdxwnRiGJWKsk5va0hRApqF/
WW8dIqdEnjsa84lCuxnmWgbcPK8cgmlO0rT0DyneACCldNlldCW1LJ0HOwLk9pea
p3JFAsBL7TKue4Tos6I7/4rx1ufyBGGIigqw9/VX5g0Iif+3BhWnqKRfz+p9wiMa
Fl7cU6u7yC68CHu1HBSisK16cYMCPeOnTSd89upHj8JU/t74O6k/ARvjrQ9qmNUt
c5U+OY+WpNJ1nXQydhY/yIDhFdYg8SSpNuIO90r4L8/8jRQYXNG80FDd1UtvVDuy
eq0r2yZ8C0XHSlOT9QHaua/tWV/aaKtyC/c0hDRrigfUrq8UOlGujMXbUnrmrWJI
tLJLAc7ePWAAoZXGSHrt0U27l029GzLwRdKqJ6kkDANVnTeOdV+mmBg9zGh3/Mp6
agiwdHUMVN7X
=56WK
-----END PGP SIGNATURE-----
Merge tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull module updates from Luis Chamberlain:
"The summary of the changes for this pull requests is:
- Song Liu's new struct module_memory replacement
- Nick Alcock's MODULE_LICENSE() removal for non-modules
- My cleanups and enhancements to reduce the areas where we vmalloc
module memory for duplicates, and the respective debug code which
proves the remaining vmalloc pressure comes from userspace.
Most of the changes have been in linux-next for quite some time except
the minor fixes I made to check if a module was already loaded prior
to allocating the final module memory with vmalloc and the respective
debug code it introduces to help clarify the issue. Although the
functional change is small it is rather safe as it can only *help*
reduce vmalloc space for duplicates and is confirmed to fix a bootup
issue with over 400 CPUs with KASAN enabled. I don't expect stable
kernels to pick up that fix as the cleanups would have also had to
have been picked up. Folks on larger CPU systems with modules will
want to just upgrade if vmalloc space has been an issue on bootup.
Given the size of this request, here's some more elaborate details:
The functional change change in this pull request is the very first
patch from Song Liu which replaces the 'struct module_layout' with a
new 'struct module_memory'. The old data structure tried to put
together all types of supported module memory types in one data
structure, the new one abstracts the differences in memory types in a
module to allow each one to provide their own set of details. This
paves the way in the future so we can deal with them in a cleaner way.
If you look at changes they also provide a nice cleanup of how we
handle these different memory areas in a module. This change has been
in linux-next since before the merge window opened for v6.3 so to
provide more than a full kernel cycle of testing. It's a good thing as
quite a bit of fixes have been found for it.
Jason Baron then made dynamic debug a first class citizen module user
by using module notifier callbacks to allocate / remove module
specific dynamic debug information.
Nick Alcock has done quite a bit of work cross-tree to remove module
license tags from things which cannot possibly be module at my request
so to:
a) help him with his longer term tooling goals which require a
deterministic evaluation if a piece a symbol code could ever be
part of a module or not. But quite recently it is has been made
clear that tooling is not the only one that would benefit.
Disambiguating symbols also helps efforts such as live patching,
kprobes and BPF, but for other reasons and R&D on this area is
active with no clear solution in sight.
b) help us inch closer to the now generally accepted long term goal
of automating all the MODULE_LICENSE() tags from SPDX license tags
In so far as a) is concerned, although module license tags are a no-op
for non-modules, tools which would want create a mapping of possible
modules can only rely on the module license tag after the commit
8b41fc4454 ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf").
Nick has been working on this *for years* and AFAICT I was the only
one to suggest two alternatives to this approach for tooling. The
complexity in one of my suggested approaches lies in that we'd need a
possible-obj-m and a could-be-module which would check if the object
being built is part of any kconfig build which could ever lead to it
being part of a module, and if so define a new define
-DPOSSIBLE_MODULE [0].
A more obvious yet theoretical approach I've suggested would be to
have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as
well but that means getting kconfig symbol names mapping to modules
always, and I don't think that's the case today. I am not aware of
Nick or anyone exploring either of these options. Quite recently Josh
Poimboeuf has pointed out that live patching, kprobes and BPF would
benefit from resolving some part of the disambiguation as well but for
other reasons. The function granularity KASLR (fgkaslr) patches were
mentioned but Joe Lawrence has clarified this effort has been dropped
with no clear solution in sight [1].
In the meantime removing module license tags from code which could
never be modules is welcomed for both objectives mentioned above. Some
developers have also welcomed these changes as it has helped clarify
when a module was never possible and they forgot to clean this up, and
so you'll see quite a bit of Nick's patches in other pull requests for
this merge window. I just picked up the stragglers after rc3. LWN has
good coverage on the motivation behind this work [2] and the typical
cross-tree issues he ran into along the way. The only concrete blocker
issue he ran into was that we should not remove the MODULE_LICENSE()
tags from files which have no SPDX tags yet, even if they can never be
modules. Nick ended up giving up on his efforts due to having to do
this vetting and backlash he ran into from folks who really did *not
understand* the core of the issue nor were providing any alternative /
guidance. I've gone through his changes and dropped the patches which
dropped the module license tags where an SPDX license tag was missing,
it only consisted of 11 drivers. To see if a pull request deals with a
file which lacks SPDX tags you can just use:
./scripts/spdxcheck.py -f \
$(git diff --name-only commid-id | xargs echo)
You'll see a core module file in this pull request for the above, but
that's not related to his changes. WE just need to add the SPDX
license tag for the kernel/module/kmod.c file in the future but it
demonstrates the effectiveness of the script.
Most of Nick's changes were spread out through different trees, and I
just picked up the slack after rc3 for the last kernel was out. Those
changes have been in linux-next for over two weeks.
The cleanups, debug code I added and final fix I added for modules
were motivated by David Hildenbrand's report of boot failing on a
systems with over 400 CPUs when KASAN was enabled due to running out
of virtual memory space. Although the functional change only consists
of 3 lines in the patch "module: avoid allocation if module is already
present and ready", proving that this was the best we can do on the
modules side took quite a bit of effort and new debug code.
The initial cleanups I did on the modules side of things has been in
linux-next since around rc3 of the last kernel, the actual final fix
for and debug code however have only been in linux-next for about a
week or so but I think it is worth getting that code in for this merge
window as it does help fix / prove / evaluate the issues reported with
larger number of CPUs. Userspace is not yet fixed as it is taking a
bit of time for folks to understand the crux of the issue and find a
proper resolution. Worst come to worst, I have a kludge-of-concept [3]
of how to make kernel_read*() calls for modules unique / converge
them, but I'm currently inclined to just see if userspace can fix this
instead"
Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0]
Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1]
Link: https://lwn.net/Articles/927569/ [2]
Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3]
* tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits)
module: add debugging auto-load duplicate module support
module: stats: fix invalid_mod_bytes typo
module: remove use of uninitialized variable len
module: fix building stats for 32-bit targets
module: stats: include uapi/linux/module.h
module: avoid allocation if module is already present and ready
module: add debug stats to help identify memory pressure
module: extract patient module check into helper
modules/kmod: replace implementation with a semaphore
Change DEFINE_SEMAPHORE() to take a number argument
module: fix kmemleak annotations for non init ELF sections
module: Ignore L0 and rename is_arm_mapping_symbol()
module: Move is_arm_mapping_symbol() to module_symbol.h
module: Sync code of is_arm_mapping_symbol()
scripts/gdb: use mem instead of core_layout to get the module address
interconnect: remove module-related code
interconnect: remove MODULE_LICENSE in non-modules
zswap: remove MODULE_LICENSE in non-modules
zpool: remove MODULE_LICENSE in non-modules
x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules
...
Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening in
the driver core in the quest to be able to move "struct bus" and "struct
class" into read-only memory, a task now complete with these changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules for
all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most of
them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
LEGadNS38k5fs+73UaxV
=7K4B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
Here is the big set of tty/serial driver updates for 6.4-rc1.
Nothing major, just lots of tiny, constant, forward development. This
includes:
- obligatory n_gsm updates and feature additions
- 8250_em driver updates
- sh-sci driver updates
- dts cleanups and updates
- general cleanups and improvements by Ilpo and Jiri
- other small serial driver core fixes and driver updates
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEqB7w8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylQuwCgwU9bGoihDtFsoFYUra/FKPPoC88Anj6t1a1f
X5HZmADnwrFNNq/jP4vH
=FeNF
-----END PGP SIGNATURE-----
Merge tag 'tty-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial updates from Greg KH:
"Here is the big set of tty/serial driver updates for 6.4-rc1.
Nothing major, just lots of tiny, constant, forward development. This
includes:
- obligatory n_gsm updates and feature additions
- 8250_em driver updates
- sh-sci driver updates
- dts cleanups and updates
- general cleanups and improvements by Ilpo and Jiri
- other small serial driver core fixes and driver updates
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (87 commits)
n_gsm: Use array_index_nospec() with index that comes from userspace
tty: vt: drop checks for undefined VT_SINGLE_DRIVER
tty: vt: distribute EXPORT_SYMBOL()
tty: vt: simplify some cases in tioclinux()
tty: vt: reformat tioclinux()
tty: serial: sh-sci: Fix end of transmission on SCI
tty: serial: sh-sci: Add support for tx end interrupt handling
tty: serial: sh-sci: Fix TE setting on SCI IP
tty: serial: sh-sci: Add RZ/G2L SCIFA DMA rx support
tty: serial: sh-sci: Add RZ/G2L SCIFA DMA tx support
serial: max310x: fix IO data corruption in batched operations
serial: core: Disable uart_start() on uart_remove_one_port()
serial: 8250: Reinit port->pm on port specific driver unbind
serial: 8250: Add missing wakeup event reporting
tty: serial: fsl_lpuart: use UARTMODIR register bits for lpuart32 platform
tty: serial: fsl_lpuart: adjust buffer length to the intended size
serial: fix TIOCSRS485 locking
serial: make SiFive serial drivers depend on ARCH_ symbols
tty: synclink_gt: don't allocate and pass dummy flags
tty: serial: simplify qcom_geni_serial_send_chunk_fifo()
...
dm-bufio's locking to allow increased concurrent IO -- particularly
for read access for buffers already in dm-bufio's cache.
- Also split dm-bio-prison-v1's spinlock and rbtree with comparable
aim at improving concurrent IO (for the DM thinp target).
- Both the dm-bufio and dm-bio-prison-v1 scaling of the number of
locks and rbtrees used are managed by dm_num_hash_locks(). And the
hash function used by both is dm_hash_locks_index().
- Allow DM targets to require DISCARD, WRITE_ZEROES and SECURE_ERASE
to be split at the target specified boundary (in terms of
max_discard_sectors, max_write_zeroes_sectors and
max_secure_erase_sectors respectively).
- DM verity error handling fix for check_at_most_once on FEC.
- Update DM verity target to emit audit events on verification failure
and more.
- DM core ->io_hints improvements needed in support of new discard
support that is added to the DM "zero" and "error" targets.
- Fix missing kmem_cache_destroy() call in initialization error path
of both the DM integrity and DM clone targets.
- A couple fixes for DM flakey, also add "error_reads" feature.
- Fix DM core's resume to not lock FS when the DM map is NULL;
otherwise initial table load can race with FS mount that takes
superblock's ->s_umount rw_semaphore.
- Various small improvements to both DM core and DM targets.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmRGtWwACgkQxSPxCi2d
A1pBqgf/W7op3/PdXBI+tlb7j05MEvMaZx0vz3l+qF36SMglaP1yZLZPiU9MCX2V
Sm2t4p7VEn5gAxvmzqa0/pLINC7u/m1IW9O6y3qdOEFAgwJ2st+/yaDqgguN5kiA
uOzecyDfR7n0WU5rkaO2EUneO7MrYweLR3IROFNFNHndl4bVJOafDcOJvmsI4YYe
5PIMHb+AGND+O2lIVOvSiPD6e85trcRWkr2X6DUYlllV3XEaBLke5MP1OAp+o/Y5
MFPfznnuiEvcFAzsBoDebC5j7RBQjHw12Bp8ltZV1ZFbdvluw9q1GD2/uyR5UolV
jmerZXKThV7lRJYqilUmt74Rxl2JSg==
=zPkM
-----END PGP SIGNATURE-----
Merge tag 'for-6.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Split dm-bufio's rw_semaphore and rbtree. Offers improvements to
dm-bufio's locking to allow increased concurrent IO -- particularly
for read access for buffers already in dm-bufio's cache.
- Also split dm-bio-prison-v1's spinlock and rbtree with comparable aim
at improving concurrent IO (for the DM thinp target).
- Both the dm-bufio and dm-bio-prison-v1 scaling of the number of locks
and rbtrees used are managed by dm_num_hash_locks(). And the hash
function used by both is dm_hash_locks_index().
- Allow DM targets to require DISCARD, WRITE_ZEROES and SECURE_ERASE to
be split at the target specified boundary (in terms of
max_discard_sectors, max_write_zeroes_sectors and
max_secure_erase_sectors respectively).
- DM verity error handling fix for check_at_most_once on FEC.
- Update DM verity target to emit audit events on verification failure
and more.
- DM core ->io_hints improvements needed in support of new discard
support that is added to the DM "zero" and "error" targets.
- Fix missing kmem_cache_destroy() call in initialization error path of
both the DM integrity and DM clone targets.
- A couple fixes for DM flakey, also add "error_reads" feature.
- Fix DM core's resume to not lock FS when the DM map is NULL;
otherwise initial table load can race with FS mount that takes
superblock's ->s_umount rw_semaphore.
- Various small improvements to both DM core and DM targets.
* tag 'for-6.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (40 commits)
dm: don't lock fs when the map is NULL in process of resume
dm flakey: add an "error_reads" option
dm flakey: remove trailing space in the table line
dm flakey: fix a crash with invalid table line
dm ioctl: fix nested locking in table_clear() to remove deadlock concern
dm: unexport dm_get_queue_limits()
dm: allow targets to require splitting WRITE_ZEROES and SECURE_ERASE
dm: add helper macro for simple DM target module init and exit
dm raid: remove unused d variable
dm: remove unnecessary (void*) conversions
dm mirror: add DMERR message if alloc_workqueue fails
dm: push error reporting down to dm_register_target()
dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path
dm clone: call kmem_cache_destroy() in dm_clone_init() error path
dm error: add discard support
dm zero: add discard support
dm table: allow targets without devices to set ->io_hints
dm verity: emit audit events on verification failure and more
dm verity: fix error handling for check_at_most_once on FEC
dm: improve hash_locks sizing and hash function
...
* The data=journal writepath has been significantly cleaned up and
simplified, and reduces a large number of data=journal special cases
by Jan Kara.
* Ojaswin Muhoo has replaced linked list used to track extents that
have been used for inode preallocation with a red-black tree in the
multi-block allocator. This improves performance for workloads
which do a large number of random allocating writes.
* Thanks to Kemeng Shi for a lot of cleanup and bug fixes in the
multi-block allocator.
* Matthew wilcox has converted the code paths for reading and writing
ext4 pages to use folios.
* Jason Yan has continued to factor out ext4_fill_super() into smaller
functions for improve ease of maintenance and comprehension.
* Josh Triplett has created an uapi header for ext4 userspace API's.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmRHS3IACgkQ8vlZVpUN
gaNN7AgAnFiWfk4UqKpBsUL5iQKJgf2K4tjlNXgPd6ghNns0IdFEyeWSHhr6KLv/
SQeoMMyiWaUcTvZs9DokD8U/9M1ELPUiE9W5c9GxJjM86SXp8BlLYSZTiRoNHzGJ
noQpvikj4qTRviK0rA3q5ICTP2eh1ECHMFJy2wcsZQgwnBelUejQHsTGtOwSvFWF
8wMdfuVtAFDZJjzOxzVKfHP22R5HVRWlAU7P1d97qKjBj4Se3+QchI+zdcIrmU9A
tTmCXj57NpTDyLjS9dIDmLygtTv93lOzOmZS8glw0BFonPcd3ObI4RHVxR+V9xu1
lN13YYgBrK6yfApn9L5XL/31PuLfbg==
=VLBx
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"There are a number of major cleanups in ext4 this cycle:
- The data=journal writepath has been significantly cleaned up and
simplified, and reduces a large number of data=journal special
cases by Jan Kara.
- Ojaswin Muhoo has replaced linked list used to track extents that
have been used for inode preallocation with a red-black tree in the
multi-block allocator. This improves performance for workloads
which do a large number of random allocating writes.
- Thanks to Kemeng Shi for a lot of cleanup and bug fixes in the
multi-block allocator.
- Matthew wilcox has converted the code paths for reading and writing
ext4 pages to use folios.
- Jason Yan has continued to factor out ext4_fill_super() into
smaller functions for improve ease of maintenance and
comprehension.
- Josh Triplett has created an uapi header for ext4 userspace API's"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (105 commits)
ext4: Add a uapi header for ext4 userspace APIs
ext4: remove useless conditional branch code
ext4: remove unneeded check of nr_to_submit
ext4: move dax and encrypt checking into ext4_check_feature_compatibility()
ext4: factor out ext4_block_group_meta_init()
ext4: move s_reserved_gdt_blocks and addressable checking into ext4_check_geometry()
ext4: rename two functions with 'check'
ext4: factor out ext4_flex_groups_free()
ext4: use ext4_group_desc_free() in ext4_put_super() to save some duplicated code
ext4: factor out ext4_percpu_param_init() and ext4_percpu_param_destroy()
ext4: factor out ext4_hash_info_init()
Revert "ext4: Fix warnings when freezing filesystem with journaled data"
ext4: Update comment in mpage_prepare_extent_to_map()
ext4: Simplify handling of journalled data in ext4_bmap()
ext4: Drop special handling of journalled data from ext4_quota_on()
ext4: Drop special handling of journalled data from ext4_evict_inode()
ext4: Fix special handling of journalled data from extent zeroing
ext4: Drop special handling of journalled data from extent shifting operations
ext4: Drop special handling of journalled data from ext4_sync_file()
ext4: Commit transaction before writing back pages in data=journal mode
...
Add 2 early command line parameters that allow to downgrade satp mode
(using the same naming as x86):
- "no5lvl": use a 4-level page table (down from sv57 to sv48)
- "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39)
Note that going through the device tree to get the kernel command line
works with ACPI too since the efi stub creates a device tree anyway with
the command line.
In KASAN kernels, we can't use the libfdt that early in the boot process
since we are not ready to execute instrumented functions. So instead of
using the "generic" libfdt, we compile our own versions of those functions
that are not instrumented and that are prefixed so that they do not
conflict with the generic ones. We also need the non-instrumented versions
of the string functions and the prefixed versions of memcpy/memmove.
This is largely inspired by commit aacd149b62 ("arm64: head: avoid
relocating the kernel twice for KASLR") from which I removed compilation
flags that were not relevant to RISC-V at the moment (LTO, SCS). Also
note that we have to link with -z norelro to avoid ld.lld to throw a
warning with the new .got sections, like in commit 311bea3cb9 ("arm64:
link with -z norelro for LLD or aarch64-elf").
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230424092313.178699-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
- Fix the frequency unit in cpufreq_verify_current_freq checks()
(Sanjay Chandrashekara).
- Make mode_state_machine in amd-pstate static (Tom Rix).
- Make the cpufreq core require drivers with target_index() to set
freq_table (Viresh Kumar).
- Fix typo in the ARM_BRCMSTB_AVS_CPUFREQ Kconfig entry (Jingyu Wang).
- Use of_property_read_bool() for boolean properties in the pmac32
cpufreq driver (Rob Herring).
- Make the cpufreq sysfs interface return proper error codes on
obviously invalid input (qinyu).
- Add guided autonomous mode support to the AMD P-state driver (Wyes
Karny).
- Make the Intel P-state driver enable HWP IO boost on all server
platforms (Srinivas Pandruvada).
- Add opp and bandwidth support to tegra194 cpufreq driver (Sumit
Gupta).
- Use of_property_present() for testing DT property presence (Rob
Herring).
- Remove MODULE_LICENSE in non-modules (Nick Alcock).
- Add SM7225 to cpufreq-dt-platdev blocklist (Luca Weiss).
- Optimizations and fixes for qcom-cpufreq-hw driver (Krzysztof
Kozlowski, Konrad Dybcio, and Bjorn Andersson).
- DT binding updates for qcom-cpufreq-hw driver (Konrad Dybcio and
Bartosz Golaszewski).
- Updates and fixes for mediatek driver (Jia-Wei Chang and
AngeloGioacchino Del Regno).
- Use of_property_present() for testing DT property presence in the
cpuidle code (Rob Herring).
- Drop unnecessary (void *) conversions from the PM core (Li zeming).
- Add sysfs files to represent time spent in a platform sleep state
during suspend-to-idle and make AMD and Intel PMC drivers use them
(Mario Limonciello).
- Use of_property_present() for testing DT property presence (Rob
Herring).
- Add set_required_opps() callback to the 'struct opp_table', to make
the code paths cleaner (Viresh Kumar).
- Update the pm-graph siute of utilities to v5.11 with the following
changes:
* New script which allows users to install the latest pm-graph
from the upstream github repo.
* Update all the dmesg suspend/resume PM print formats to be able to
process recent timelines using dmesg only.
* Add ethtool output to the log for the system's ethernet device if
ethtool exists.
* Make the tool more robustly handle events where mangled dmesg or
ftrace outputs do not include all the requisite data.
- Make the sleepgraph utility recognize "CPU killed" messages (Xueqin
Luo).
- Remove unneeded SRCU selection in Kconfig because it's always set
from devfreq core (Paul E. McKenney).
- Drop of_match_ptr() macro from exynos-bus.c because this driver is
always using the DT table for driver probe (Krzysztof Kozlowski).
- Use the preferred of_property_present() instead of the low-level
of_get_property() on exynos-bus.c (Rob Herring).
- Use devm_platform_get_and_ioream_resource() in exyno-ppmu.c (Yang Li).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmRGvX4SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxcwsQAK5wK1HWLZDap8nTGGAyvpX+bNJ3YM+l
TS1zSzWV97K6kq2bg4GTgDi6EXJJNgfP9sThOEIee5GrWAjrk9yaxjEyIcrUBjfl
oyFN8SEuYbMN5t9Bir3GRqkL+tWErUiVafplML6vTT8W8GlL2rbxPXM6ifmK9IJq
7r3Rt+tlMrookTzV+ykSGVmC5cpnlNGsvMlGGw91Z8rlICy7MI/ecg8O6Zsy25dR
Vchrg0M+jVxtaFU9/ikQaNHx0B3AF7fpi472CYYWgk1ABfIfNyQATeHsCkKan/ZV
i4+gfgIhIQnO1Ov/05aGYbBhxVpFGQIcLkG0vEmdbHsnC/WDuMCrr5wg1HCgCdpQ
+0eQem5bWxrzKp0g9tL07QG8LuiJTfjuA4DrRZNhudKFU9oglZfZeywRk+s6ta4v
rQFzz7qdlKpcM87pz/Bm8tSTc8UYNCDd7hLe+ZI940CMs/vQ4CfQJ2tlYaIl0AiO
q33Nz1iqhEycQ9OZDzBDyQtK+Xm6lsXUehIBtbqBsFsP3Ry+nxe/fz6UMs5tVNeM
BYaaNhhkiZMhXgJncMi2oR8/LRLYtOHjn1rdOGSMu9Rck5i5TVPsxqzUOzkhvuM9
eXAwts6SwFVYxtaPJs+i6yl8cdLOFORsntIBWFKuwsgH8BFx7pNFuZA33eMOA+Iw
UFey2fKDn3W5
=p/5G
-----END PGP SIGNATURE-----
Merge tag 'pm-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update several cpufreq drivers and the cpufreq core, add sysfs
interface for exposing the time really spent in the platform low-power
state during suspend-to-idle, update devfreq (core and drivers) and
the pm-graph suite of tools and clean up code.
Specifics:
- Fix the frequency unit in cpufreq_verify_current_freq checks()
Sanjay Chandrashekara)
- Make mode_state_machine in amd-pstate static (Tom Rix)
- Make the cpufreq core require drivers with target_index() to set
freq_table (Viresh Kumar)
- Fix typo in the ARM_BRCMSTB_AVS_CPUFREQ Kconfig entry (Jingyu Wang)
- Use of_property_read_bool() for boolean properties in the pmac32
cpufreq driver (Rob Herring)
- Make the cpufreq sysfs interface return proper error codes on
obviously invalid input (qinyu)
- Add guided autonomous mode support to the AMD P-state driver (Wyes
Karny)
- Make the Intel P-state driver enable HWP IO boost on all server
platforms (Srinivas Pandruvada)
- Add opp and bandwidth support to tegra194 cpufreq driver (Sumit
Gupta)
- Use of_property_present() for testing DT property presence (Rob
Herring)
- Remove MODULE_LICENSE in non-modules (Nick Alcock)
- Add SM7225 to cpufreq-dt-platdev blocklist (Luca Weiss)
- Optimizations and fixes for qcom-cpufreq-hw driver (Krzysztof
Kozlowski, Konrad Dybcio, and Bjorn Andersson)
- DT binding updates for qcom-cpufreq-hw driver (Konrad Dybcio and
Bartosz Golaszewski)
- Updates and fixes for mediatek driver (Jia-Wei Chang and
AngeloGioacchino Del Regno)
- Use of_property_present() for testing DT property presence in the
cpuidle code (Rob Herring)
- Drop unnecessary (void *) conversions from the PM core (Li zeming)
- Add sysfs files to represent time spent in a platform sleep state
during suspend-to-idle and make AMD and Intel PMC drivers use them
Mario Limonciello)
- Use of_property_present() for testing DT property presence (Rob
Herring)
- Add set_required_opps() callback to the 'struct opp_table', to make
the code paths cleaner (Viresh Kumar)
- Update the pm-graph siute of utilities to v5.11 with the following
changes:
* New script which allows users to install the latest pm-graph
from the upstream github repo.
* Update all the dmesg suspend/resume PM print formats to be able
to process recent timelines using dmesg only.
* Add ethtool output to the log for the system's ethernet device
if ethtool exists.
* Make the tool more robustly handle events where mangled dmesg
or ftrace outputs do not include all the requisite data.
- Make the sleepgraph utility recognize "CPU killed" messages (Xueqin
Luo)
- Remove unneeded SRCU selection in Kconfig because it's always set
from devfreq core (Paul E. McKenney)
- Drop of_match_ptr() macro from exynos-bus.c because this driver is
always using the DT table for driver probe (Krzysztof Kozlowski)
- Use the preferred of_property_present() instead of the low-level
of_get_property() on exynos-bus.c (Rob Herring)
- Use devm_platform_get_and_ioream_resource() in exyno-ppmu.c (Yang
Li)"
* tag 'pm-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (44 commits)
platform/x86/intel/pmc: core: Report duration of time in HW sleep state
platform/x86/intel/pmc: core: Always capture counters on suspend
platform/x86/amd: pmc: Report duration of time in hw sleep state
PM: Add sysfs files to represent time spent in hardware sleep state
cpufreq: use correct unit when verify cur freq
cpufreq: tegra194: add OPP support and set bandwidth
cpufreq: amd-pstate: Make varaiable mode_state_machine static
PM: core: Remove unnecessary (void *) conversions
cpufreq: drivers with target_index() must set freq_table
PM / devfreq: exynos-ppmu: Use devm_platform_get_and_ioremap_resource()
OPP: Move required opps configuration to specialized callback
OPP: Handle all genpd cases together in _set_required_opps()
cpufreq: qcom-cpufreq-hw: Revert adding cpufreq qos
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCM2290
dt-bindings: cpufreq: cpufreq-qcom-hw: Sanitize data per compatible
dt-bindings: cpufreq: cpufreq-qcom-hw: Allow just 1 frequency domain
cpufreq: Add SM7225 to cpufreq-dt-platdev blocklist
cpufreq: qcom-cpufreq-hw: fix double IO unmap and resource release on exit
cpufreq: mediatek: Raise proc and sram max voltage for MT7622/7623
cpufreq: mediatek: raise proc/sram max voltage for MT8516
...
New drivers:
- add a driver for the Loongson GPIO controller
- add a driver for the fxl6408 I2C GPIO expander
- add a GPIO module containing code common for Intel Elkhart Lake and
Merrifield platforms
- add a driver for the Intel Elkhart Lake platform reusing the code from
the intel tangier library
GPIOLIB core:
- GPIO ACPI improvements
- simplify gpiochip_add_data_with_keys() fwnode handling
- cleanup header inclusions (remove unneeded ones, order the rest
alphabetically)
- remove duplicate code (reuse krealloc() instead of open-coding it, drop
a duplicated check in gpiod_find_and_request())
- reshuffle the code to remove unnecessary forward declarations
- coding style cleanups and improvements
- add a helper for accessing device fwnodes
- small updates in docs
Driver improvements:
- convert all remaining GPIO irqchip drivers to using immutable irqchips
- drop unnecessary of_match_ptr() macro expansions
- shrink the code in gpio-merrifield significantly by reusing the code from
gpio-tangier + minor tweaks to the driver code
- remove MODULE_LICENSE() from drivers that can only be built-in
- add device-tree support to gpio-loongson1
- use new regmap features in gpio-104-dio-48e and gpio-pcie-idio-24
- minor tweaks and fixes to gpio-xra1403, gpio-sim, gpio-tegra194, gpio-omap,
gpio-aspeed, gpio-raspberrypi-exp
- shrink code in gpio-ich and gpio-pxa
- Kconfig tweak for gpio-pmic-eic-sprd
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmRGjBIACgkQEacuoBRx
13IBMA/+PTEowr87BTJW+Z0Y3EoXPGZSKFzUpnzpbGo7CT5mEO3KBbyikZi3asZ4
5mVPbHOC7OU8t76KSGYWXwPh0bvskt+jR2wz19f6F65g1W2SnTym52wAPUJDrKvm
YQofEGcz9ykTIo5KQjAyqADYrrfIOKCOZbN59k8GscXBHkYmGFO3ZhEa5HhzcF+S
qJBxnJ13Tbg9bszyl062pLqsNYGDeqqSuELrhALQCzSCM3WlJQOaHUEG//mS1Syu
OHX2pwjw8u3HxBo6pKMK5fa4/aFM+EUAvSdDX59WmVrPnpLCHezyh4K3WQFUSnwG
OJsW+b/eUDjICQBRvsHIJLuiAr19UouWWY6IZE9dTOjoPO1KWHtbaYX8rHWRZWCM
+/QVfLavmXbOHW/pS2+NNxCARwu8o8ozcopY3PT6TjC5aN8/IkVT4eSaT3mJYXmh
8uS/5aY1Th0eyK5GHv7IcNME5Jb+sAHEnqG0Ebns7a9kOGQdEMJwZrnc5IjKWSMd
PAKNjWYZ49XALtl8vVSar2DYt6d6z+UvGDX67s686FVpCDk15cyUE6VjdtKdGdsd
mH+OnCaWDt+l89DEqZ4298ZA6kNk2CkHHjIO/TBDkU3jP7/wp/NtU0RTuCXydwjW
aNjnfHd2JMJ//wQ4l2fQgpzWfVEN34mKZ2pysDotY47bwjpPD7o=
=X+sP
-----END PGP SIGNATURE-----
Merge tag 'gpio-updates-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio updates from Bartosz Golaszewski:
"We have some new drivers, significant refactoring of existing intel
platforms, lots of improvements all around, mass conversion to using
immutable irqchips by drivers that had not been converted individually
yet and some changes in the core library code.
Summary:
New drivers:
- add a driver for the Loongson GPIO controller
- add a driver for the fxl6408 I2C GPIO expander
- add a GPIO module containing code common for Intel Elkhart Lake and
Merrifield platforms
- add a driver for the Intel Elkhart Lake platform reusing the code
from the intel tangier library
GPIOLIB core:
- GPIO ACPI improvements
- simplify gpiochip_add_data_with_keys() fwnode handling
- cleanup header inclusions (remove unneeded ones, order the rest
alphabetically)
- remove duplicate code (reuse krealloc() instead of open-coding it,
drop a duplicated check in gpiod_find_and_request())
- reshuffle the code to remove unnecessary forward declarations
- coding style cleanups and improvements
- add a helper for accessing device fwnodes
- small updates in docs
Driver improvements:
- convert all remaining GPIO irqchip drivers to using immutable
irqchips
- drop unnecessary of_match_ptr() macro expansions
- shrink the code in gpio-merrifield significantly by reusing the
code from gpio-tangier + minor tweaks to the driver code
- remove MODULE_LICENSE() from drivers that can only be built-in
- add device-tree support to gpio-loongson1
- use new regmap features in gpio-104-dio-48e and gpio-pcie-idio-24
- minor tweaks and fixes to gpio-xra1403, gpio-sim, gpio-tegra194,
gpio-omap, gpio-aspeed, gpio-raspberrypi-exp
- shrink code in gpio-ich and gpio-pxa
- Kconfig tweak for gpio-pmic-eic-sprd"
* tag 'gpio-updates-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (99 commits)
gpio: gpiolib: Simplify gpiochip_add_data_with_key() fwnode
gpiolib: Add gpiochip_set_data() helper
gpiolib: Move gpiochip_get_data() higher in the code
gpiolib: Check array_info for NULL only once in gpiod_get_array()
gpiolib: Replace open coded krealloc()
gpiolib: acpi: Add a ignore wakeup quirk for Clevo NL5xNU
gpiolib: acpi: Move ACPI device NULL check to acpi_get_driver_gpio_data()
gpiolib: acpi: use the fwnode in acpi_gpiochip_find()
gpio: mm-lantiq: Fix typo in the newly added header filename
sh: mach-x3proto: Add missing #include <linux/gpio/driver.h>
powerpc/40x: Add missing select OF_GPIO_MM_GPIOCHIP
gpio: xlp: Convert to immutable irq_chip
gpio: xilinx: Convert to immutable irq_chip
gpio: xgs-iproc: Convert to immutable irq_chip
gpio: visconti: Convert to immutable irq_chip
gpio: tqmx86: Convert to immutable irq_chip
gpio: thunderx: Convert to immutable irq_chip
gpio: stmpe: Convert to immutable irq_chip
gpio: siox: Convert to immutable irq_chip
gpio: rda: Convert to immutable irq_chip
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmRHgagACgkQCF8+vY7k
4RWmZQ//dcAgFS36a86xhEzyXZUeUmXly1hd8fzZsxEhYniIitIKGqVkEXPfqPaK
JkQQ4t/NJVXQNajZjJaUlPpb/ssxV8jIJiOPdWfUZi0sa4mlllLunv1wu9EMlP9S
ED46kFWA5aM39PUCIIgmnj+3qovWP8bhywhn/6iE+XCHtXcLWGRIQLWFAjRjwLSM
sFvejrh//uYXWoaZxtVZ6RgU9XR37ekI1h2TiU3I4HMt2v/ytqS8AByNCxhfdB5d
v+2Kuob6D6vgik2+HCayKjJqv0/2OT7mJELqUOh2zJdgua4fwJgu4hqK6CqUaarT
bP5ycCKs+8QGnf7a83rN2Ul9HbF28v23gCJDUp/x229ScPPmcglyXHXx2mjgcArf
blMMdswSF2ya4AbOGfRM2/roHghZDl3CK86l4ylU8Wgehvftip6FgktWDXtKy+iM
hogFDBn72V8p2mvkUMDXBHQmS3H4YYRWO2LRLXlgcMDfjh79Mrj7YpvC+ceyE8Z1
+Qc+uTxOeA9v9yE8Axs/VV0gsUsUEkmRdoFZ2x+lCYUMLxarz+qxtfTX3W6D3eFH
xIJVt+Mz7K/LarUTNKo+qqXRTyuw2HevIvWmpZ2g4stID0u889WPeVQMZJVz0hns
N/DnWrIeM9dtbHfZiNK7q7bpO98PI3GgQR2fj9HtuQcJoaRIp08=
=ocor
-----END PGP SIGNATURE-----
Merge tag 'media/v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Removal of some old unused sensor drivers: ad9389b, m5mols, mt9m032,
mt9t001, noon010pc30, s5k6aa, sr030pc30 and vs6624
- New i.MX8 image sensor interface driver
- Some new RC keymaps
- lots of cleanups at atomisp driver to make it support standard
features present on other webcam drivers
- the cx18 and saa7146 now uses VB2
- lots of cleanups and driver improvements
* tag 'media/v6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (460 commits)
media: ov5670: Fix probe on ACPI
media: nxp: imx8-isi: Remove 300ms sleep after enabling channel
media: nxp: imx8-isi: Replace udelay() with fsleep()
media: nxp: imx8-isi: Drop partial support for i.MX8QM and i.MX8QXP
media: nxp: Add i.MX8 ISI driver
media: dt-bindings: media: Add i.MX8 ISI DT bindings
media: atomisp: gmin_platform: Add Lenovo Ideapad Miix 310 gmin_vars
media: atomisp: gmin_platform: Make DMI quirks take precedence over the _DSM table
media: atomisp: Remove struct atomisp_sub_device index field
media: atomisp: Drop support for streaming from 2 sensors at once
media: atomisp: Remove atomisp_try_fmt() call from atomisp_set_fmt()
media: atomisp: Remove unused ATOM_ISP_MAX_WIDTH_TMP and ATOM_ISP_MAX_HEIGHT_TMP
media: atomisp: Remove snr_mbus_fmt local var from atomisp_try_fmt()
media: atomisp: Remove custom V4L2_CID_FMT_AUTO control
media: atomisp: Remove continuous mode related code from atomisp_set_fmt()
media: atomisp: Remove duplicate atomisp_[start|stop]_streaming() prototypes
media: atomisp: gc0310: Switch over to ACPI powermanagement
media: atomisp: gc0310: Use devm_kzalloc() for data struct
media: atomisp: gc0310: Add runtime-pm support
media: atomisp: gc0310: Delay power-on till streaming is started
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmRCSGEACgkQu+CwddJF
iJpA2wgAkwMP++Znd8JU3iQ4N53lv18euNuEMLTOY+jk7zXHvsRX8KyzLmsohUKO
SSGVi1Om785AidOsJhARJawW7AWYuJ5l7ri+FyskTwrTUcMC4UZ/IT2tB22lRsXi
0f3lgbdArZbj7aq7AVO9N7bh9rgVUHa/RHIwXzMp0sc9nekne9t+FFv7tyRnr7cc
SMp/FdMZqbt9pVf0Uwud1BpdgER7QqQaSfaxITL7D2oJTePRZVWiXerrr4hMcQl1
s6kgUgKdlaYmIx2N8eP1Nmp7undtwHo1C8dLLWKGCEuEAaXIxtXUtaUWFFmBDzH9
Fv6qswNFcfwiLNPsY+xi9iA+vlGKAg==
=T0EM
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
"The main change is naturally the SLOB removal. Since its deprecation
in 6.2 I've seen no complaints so hopefully SLUB_(TINY) works well for
everyone and we can proceed.
Besides the code cleanup, the main immediate benefit will be allowing
kfree() family of function to work on kmem_cache_alloc() objects,
which was incompatible with SLOB. This includes kfree_rcu() which had
no kmem_cache_free_rcu() counterpart yet and now it shouldn't be
necessary anymore.
Besides that, there are several small code and comment improvements
from Thomas, Thorsten and Vernon"
* tag 'slab-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: document kfree() as allowed for kmem_cache_alloc() objects
mm/slob: remove slob.c
mm/slab: remove CONFIG_SLOB code from slab common code
mm, pagemap: remove SLOB and SLQB from comments and documentation
mm, page_flags: remove PG_slob_free
mm/slob: remove CONFIG_SLOB
mm/slub: fix help comment of SLUB_DEBUG
mm: slub: make kobj_type structure constant
slab: Adjust comment after refactoring of gfp.h
Provide a ptrace set/get interface for syscall user dispatch. The main
purpose is to enable checkpoint/restore (CRIU) to handle processes which
utilize syscall user dispatch correctly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGgIETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYockhEACWVd/KOBlQIdUMpM3jfSWsm+VZrITg
sKN2WCKaz8MS5RA7xTAfZIEqMzkI0V+GPoj+8eK70W39XFU/PlSQo8LUFahSxVHF
RVyz4zFKeR2XZpDa8J3ytoOvngiAnpOUflssvfA0+f3gq/B48jgLmj8XsrkmkL2T
6txRpusYNlzVTBoza0+1uEmxBTNhRxvURXa6OR/l24Kbh2udyNd6dlAoRHBV0iOW
qn7ILgoYIr/74ChCbrr8yZe2rZ+BqqlS1fsjDWkuUqq9AgzeuOjGJnZtMKG6WbGg
/NBj0Ewe7gsgZwBo7t4MbKNF7bXRkLczp8BX/l9xOTe+mpZ+LyNIHvOM3/TD6O1A
NFJNwTAGAnhU5Uoba9HzaKYZZnanqgLxuszXznJDU3zKV5pCNMNzlKxjPT73Jzsl
T1WTCyhSydluSuhOHLU4awC38pqVEQwichx98c9agIBPo7kxkb5RcTVq223wOSeI
h8otkecJ6U+gmjNDHnRtNBzykEIjVFjgiSBYGTr+/6ek2Myf0O/RMr13oe9OZG5R
jaKyjcDIADbYRow1rXfEs7Bq42K8rIkbVZvEEK/auYRUFngAoQ3l090i9wj6ViXf
7CqAjCC1K1BBxbqQwf0YLuDXCzUaXxcWfvNGEGEGs/NYDuu291QntGSFSxsJgsym
HXvO4NzHOHi13A==
=AS+6
-----END PGP SIGNATURE-----
Merge tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry/ptrace update from Thomas Gleixner:
"Provide a ptrace set/get interface for syscall user dispatch. The main
purpose is to enable checkpoint/restore (CRIU) to handle processes
which utilize syscall user dispatch correctly"
* tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftest, ptrace: Add selftest for syscall user dispatch config api
ptrace: Provide set/get interface for syscall user dispatch
syscall_user_dispatch: Untag selector address before access_ok()
syscall_user_dispatch: Split up set_syscall_user_dispatch()
still a fair amount going on, including:
- Reorganizing the architecture-specific documentation under
Documentation/arch. This makes the structure match the source directory
and helps to clean up the mess that is the top-level Documentation
directory a bit. This work creates the new directory and moves x86 and
most of the less-active architectures there. The current plan is to move
the rest of the architectures in 6.5, with the patches going through the
appropriate subsystem trees.
- Some more Spanish translations and maintenance of the Italian
translation.
- A new "Kernel contribution maturity model" document from Ted.
- A new tutorial on quickly building a trimmed kernel from Thorsten.
Plus the usual set of updates and fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmRGze0PHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Y/VsH/RyWqinorRVFZmHqRJMRhR0j7hE2pAgK5prE
dGXYVtHHNQ+25thNaqhZTOLYFbSX6ii2NG7sLRXmyOTGIZrhUCFFXCHkuq4ZUypR
gJpMUiKQVT4dhln3gIZ0k09NSr60gz8UTcq895N9UFpUdY1SCDhbCcLc4uXTRajq
NrdgFaHWRkPb+gBRbXOExYm75DmCC6Ny5AyGo2rXfItV//ETjWIJVQpJhlxKrpMZ
3LgpdYSLhEFFnFGnXJ+EAPJ7gXDi2Tg5DuPbkvJyFOTouF3j4h8lSS9l+refMljN
xNRessv+boge/JAQidS6u8F2m2ESSqSxisv/0irgtKIMJwXaoX4=
=1//8
-----END PGP SIGNATURE-----
Merge tag 'docs-6.4' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Commit volume in documentation is relatively low this time, but there
is still a fair amount going on, including:
- Reorganize the architecture-specific documentation under
Documentation/arch
This makes the structure match the source directory and helps to
clean up the mess that is the top-level Documentation directory a
bit. This work creates the new directory and moves x86 and most of
the less-active architectures there.
The current plan is to move the rest of the architectures in 6.5,
with the patches going through the appropriate subsystem trees.
- Some more Spanish translations and maintenance of the Italian
translation
- A new "Kernel contribution maturity model" document from Ted
- A new tutorial on quickly building a trimmed kernel from Thorsten
Plus the usual set of updates and fixes"
* tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits)
media: Adjust column width for pdfdocs
media: Fix building pdfdocs
docs: clk: add documentation to log which clocks have been disabled
docs: trace: Fix typo in ftrace.rst
Documentation/process: always CC responsible lists
docs: kmemleak: adjust to config renaming
ELF: document some de-facto PT_* ABI quirks
Documentation: arm: remove stih415/stih416 related entries
docs: turn off "smart quotes" in the HTML build
Documentation: firmware: Clarify firmware path usage
docs/mm: Physical Memory: Fix grammar
Documentation: Add document for false sharing
dma-api-howto: typo fix
docs: move m68k architecture documentation under Documentation/arch/
docs: move parisc documentation under Documentation/arch/
docs: move ia64 architecture docs under Documentation/arch/
docs: Move arc architecture docs under Documentation/arch/
docs: move nios2 documentation under Documentation/arch/
docs: move openrisc documentation under Documentation/arch/
docs: move superh documentation under Documentation/arch/
...
This adds the general_profit KSM sysfs knob and the process profit metric
knobs to ksm_stat.
1) expose general_profit metric
The documentation mentions a general profit metric, however this
metric is not calculated. In addition the formula depends on the size
of internal structures, which makes it more difficult for an
administrator to make the calculation. Adding the metric for a better
user experience.
2) document general_profit sysfs knob
3) calculate ksm process profit metric
The ksm documentation mentions the process profit metric and how to
calculate it. This adds the calculation of the metric.
4) mm: expose ksm process profit metric in ksm_stat
This exposes the ksm process profit metric in /proc/<pid>/ksm_stat.
The documentation mentions the formula for the ksm process profit
metric, however it does not calculate it. In addition the formula
depends on the size of internal structures. So it makes sense to
expose it.
5) document new procfs ksm knobs
Link: https://lkml.kernel.org/r/20230418051342.1919757-3-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The finit_module() system call can in the worst case use up to more than
twice of a module's size in virtual memory. Duplicate finit_module()
system calls are non fatal, however they unnecessarily strain virtual
memory during bootup and in the worst case can cause a system to fail
to boot. This is only known to currently be an issue on systems with
larger number of CPUs.
To help debug this situation we need to consider the different sources for
finit_module(). Requests from the kernel that rely on module auto-loading,
ie, the kernel's *request_module() API, are one source of calls. Although
modprobe checks to see if a module is already loaded prior to calling
finit_module() there is a small race possible allowing userspace to
trigger multiple modprobe calls racing against modprobe and this not
seeing the module yet loaded.
This adds debugging support to the kernel module auto-loader (*request_module()
calls) to easily detect duplicate module requests. To aid with possible bootup
failure issues incurred by this, it will converge duplicates requests to a
single request. This avoids any possible strain on virtual memory during
bootup which could be incurred by duplicate module autoloading requests.
Folks debugging virtual memory abuse on bootup can and should enable
this to see what pr_warn()s come on, to see if module auto-loading is to
blame for their wores. If they see duplicates they can further debug this
by enabling the module.enable_dups_trace kernel parameter or by enabling
CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS_TRACE.
Current evidence seems to point to only a few duplicates for module
auto-loading. And so the source for other duplicates creating heavy
virtual memory pressure due to larger number of CPUs should becoming
from another place (likely udev).
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Sphinx reports htmldocs warning on deprecated mount options table:
/home/bagas/repo/linux-kernel/Documentation/admin-guide/xfs.rst:243: WARNING: Malformed table.
Text in column margin in table line 5.
=========================== ================
Name Removal Schedule
=========================== ================
Mounting with V4 filesystem September 2030
Mounting ascii-ci filesystem September 2030
ikeep/noikeep September 2025
attr2/noattr2 September 2025
=========================== ================
Extend the table markers to take account of the second name entry
("Mounting ascii-ci filesystem"), which is now the widest and
to fix the above warning.
Fixes: 7ba83850ca ("xfs: deprecate the ascii-ci feature")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
dm-flakey returns error on reads if no other argument is specified.
This commit simplifies associated logic while formalizing an
"error_reads" argument and an ERROR_READS flag.
If no argument is specified, set ERROR_READS flag so that it behaves
just like before this commit.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
LoongArch maintains cache coherency in hardware, but when paired with
LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar
to WriteCombine) is out of the scope of cache coherency machanism for
PCIe devices (this is a PCIe protocol violation, which may be fixed in
newer chipsets).
This means WUC can only used for write-only memory regions now, so this
option is disabled by default, making WUC silently fallback to SUC for
ioremap(). You can enable this option if the kernel is ensured to run on
hardware without this bug.
Kernel parameter writecombine=on/off can be used to override the Kconfig
option.
Cc: stable@vger.kernel.org
Suggested-by: WANG Xuerui <kernel@xen0n.name>
Reviewed-by: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
The syscall user dispatch configuration can only be set by the task itself,
but lacks a ptrace set/get interface which makes it impossible to implement
checkpoint/restore for it.
Add the required ptrace requests and the get/set functions in the syscall
user dispatch code to make that possible.
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/r/20230407171834.3558-4-gregory.price@memverge.com
The vs6624 camera sensor driver doesn't support DT and relies on
platform data. The last board files supplying platform data for that
device have been removed from the kernel in v4.17. The driver hasn't
been used since them. Drop it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The sr030pc30 camera sensor driver doesn't support DT and relies on
platform data. No board file has ever provided platform data for that
device. The driver has thus never been used in the mainline kernel since
its introduction in v2.6.37. Drop it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The s5k6aa camera sensor driver doesn't support DT and relies on
platform data. The last board files supplying platform data for that
device have been removed from the kernel in v3.11. The driver hasn't
been used since them. Drop it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The noon010pc30 camera sensor driver doesn't support DT and relies on
platform data. The last board files supplying platform data for that
device have been removed from the kernel in v3.16. A device tree file
referencing the device has been added in v3.17, but without
corresponding DT bindings, and with DT support in the driver. The driver
thus hasn't been used since v316. Drop it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The mt9t001 camera sensor driver doesn't support DT and relies on
platform data. No board file has ever provided platform data for that
device. The driver has thus never been used in the mainline kernel since
its introduction in v3.2. Drop it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The mt9m032 camera sensor driver doesn't support DT and relies on
platform data. No board file has ever provided platform data for that
device. The driver has thus never been used in the mainline kernel since
its introduction in v3.4. Drop it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The m5mols camera sensor driver doesn't support DT and relies on
platform data. The last board files supplying platform data for that
device have been removed from the kernel in v3.11. The driver hasn't
been used since them. Drop it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The ad9389b video encoder driver doesn't support DT and relies on
platform data. No board file has ever provided platform data for that
device. The driver has thus never been used in the mainline kernel since
its introduction in v3.7. Drop it.
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
The CEC pin framework is affected by NTP daemons speeding up or slowing
down the system clock. Document this and explain how to fix this for
chronyd.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
This feature is a mess -- the hash function has been broken for the
entire 15 years of its existence if you create names with extended ascii
bytes; metadump name obfuscation has silently failed for just as long;
and the feature clashes horribly with the UTF8 encodings that most
systems use today. There is exactly one fstest for this feature.
In other words, this feature is crap. Let's deprecate it now so we can
remove it from the codebase in 2030.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy. Move
Documentation/m68k into arch/ and fix all in-tree references.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Earlier, inode PAs were stored in a linked list. This caused a need to
periodically trim the list down inorder to avoid growing it to a very
large size, as this would severly affect performance during list
iteration.
Recent patches changed this list to an rbtree, and since the tree scales
up much better, we no longer need to have the trim functionality, hence
remove it.
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/c409addceaa3ade4b40328e28e3b54b2f259689e.1679731817.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
UFFDIO_COPY already has UFFDIO_COPY_MODE_WP, so when installing a new PTE
to resolve a missing fault, one can install a write-protected one. This
is useful when using UFFDIO_REGISTER_MODE_{MISSING,WP} in combination.
This was motivated by testing HugeTLB HGM [1], and in particular its
interaction with userfaultfd features. Existing userfaultfd code supports
using WP and MINOR modes together (i.e. you can register an area with
both enabled), but without this CONTINUE flag the combination is in
practice unusable.
So, add an analogous UFFDIO_CONTINUE_MODE_WP, which does the same thing as
UFFDIO_COPY_MODE_WP, but for *minor* faults.
Update the selftest to do some very basic exercising of the new flag.
Update Documentation/ to describe how these flags are used (neither the
COPY nor the new CONTINUE versions of this mode flag were described there
before).
[1]: https://patchwork.kernel.org/project/linux-mm/cover/20230218002819.1486479-1-jthoughton@google.com/
Link: https://lkml.kernel.org/r/20230314221250.682452-5-axelrasmussen@google.com
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/uffd: Add feature bit UFFD_FEATURE_WP_UNPOPULATED", v4.
The new feature bit makes anonymous memory acts the same as file memory on
userfaultfd-wp in that it'll also wr-protect none ptes.
It can be useful in two cases:
(1) Uffd-wp app that needs to wr-protect none ptes like QEMU snapshot,
so pre-fault can be replaced by enabling this flag and speed up
protections
(2) It helps to implement async uffd-wp mode that Muhammad is working on [1]
It's debatable whether this is the most ideal solution because with the
new feature bit set, wr-protect none pte needs to pre-populate the
pgtables to the last level (PAGE_SIZE). But it seems fine so far to
service either purpose above, so we can leave optimizations for later.
The series brings pte markers to anonymous memory too. There's some
change in the common mm code path in the 1st patch, great to have some eye
looking at it, but hopefully they're still relatively straightforward.
This patch (of 2):
This is a new feature that controls how uffd-wp handles none ptes. When
it's set, the kernel will handle anonymous memory the same way as file
memory, by allowing the user to wr-protect unpopulated ptes.
File memories handles none ptes consistently by allowing wr-protecting of
none ptes because of the unawareness of page cache being exist or not.
For anonymous it was not as persistent because we used to assume that we
don't need protections on none ptes or known zero pages.
One use case of such a feature bit was VM live snapshot, where if without
wr-protecting empty ptes the snapshot can contain random rubbish in the
holes of the anonymous memory, which can cause misbehave of the guest when
the guest OS assumes the pages should be all zeros.
QEMU worked it around by pre-populate the section with reads to fill in
zero page entries before starting the whole snapshot process [1].
Recently there's another need raised on using userfaultfd wr-protect for
detecting dirty pages (to replace soft-dirty in some cases) [2]. In that
case if without being able to wr-protect none ptes by default, the dirty
info can get lost, since we cannot treat every none pte to be dirty (the
current design is identify a page dirty based on uffd-wp bit being
cleared).
In general, we want to be able to wr-protect empty ptes too even for
anonymous.
This patch implements UFFD_FEATURE_WP_UNPOPULATED so that it'll make
uffd-wp handling on none ptes being consistent no matter what the memory
type is underneath. It doesn't have any impact on file memories so far
because we already have pte markers taking care of that. So it only
affects anonymous.
The feature bit is by default off, so the old behavior will be maintained.
Sometimes it may be wanted because the wr-protect of none ptes will
contain overheads not only during UFFDIO_WRITEPROTECT (by applying pte
markers to anonymous), but also on creating the pgtables to store the pte
markers. So there's potentially less chance of using thp on the first
fault for a none pmd or larger than a pmd.
The major implementation part is teaching the whole kernel to understand
pte markers even for anonymously mapped ranges, meanwhile allowing the
UFFDIO_WRITEPROTECT ioctl to apply pte markers for anonymous too when the
new feature bit is set.
Note that even if the patch subject starts with mm/uffd, there're a few
small refactors to major mm path of handling anonymous page faults. But
they should be straightforward.
With WP_UNPOPUATED, application like QEMU can avoid pre-read faults all
the memory before wr-protect during taking a live snapshot. Quotting from
Muhammad's test result here [3] based on a simple program [4]:
(1) With huge page disabled
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
./uffd_wp_perf
Test DEFAULT: 4
Test PRE-READ: 1111453 (pre-fault 1101011)
Test MADVISE: 278276 (pre-fault 266378)
Test WP-UNPOPULATE: 11712
(2) With Huge page enabled
echo always > /sys/kernel/mm/transparent_hugepage/enabled
./uffd_wp_perf
Test DEFAULT: 4
Test PRE-READ: 22521 (pre-fault 22348)
Test MADVISE: 4909 (pre-fault 4743)
Test WP-UNPOPULATE: 14448
There'll be a great perf boost for no-thp case, while for thp enabled with
extreme case of all-thp-zero WP_UNPOPULATED can be slower than MADVISE,
but that's low possibility in reality, also the overhead was not reduced
but postponed until a follow up write on any huge zero thp, so potentially
it is faster by making the follow up writes slower.
[1] https://lore.kernel.org/all/20210401092226.102804-4-andrey.gruzdev@virtuozzo.com/
[2] https://lore.kernel.org/all/Y+v2HJ8+3i%2FKzDBu@x1n/
[3] https://lore.kernel.org/all/d0eb0a13-16dc-1ac1-653a-78b7273781e3@collabora.com/
[4] https://github.com/xzpeter/clibs/blob/master/uffd-test/uffd-wp-perf.c
[peterx@redhat.com: comment changes, oneliner fix to khugepaged]
Link: https://lkml.kernel.org/r/ZB2/8jPhD3fpx5U8@x1n
Link: https://lkml.kernel.org/r/20230309223711.823547-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230309223711.823547-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Paul Gofman <pgofman@codeweavers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move the x86 documentation under Documentation/arch/ as a way of cleaning
up the top-level directory and making the structure of our docs more
closely match the structure of the source directories it describes.
All in-kernel references to the old paths have been updated.
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20230315211523.108836-1-corbet@lwn.net/
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
SLOB has been removed and SLQB never merged, so remove their mentions
from comments and documentation of pagemap.
In stable_page_flags() also correct an outdated comment mentioning that
PageBuddy() means a page->_refcount of -1, and remove compound_head()
from the PageSlab() call, as that's already implicitly there thanks to
PF_NO_TAIL.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Mike Rapoport (IBM) <rppt@kernel.org>
It is currently possible to set the csdlock_debug_enabled static
branch, but not to reset it. This is an issue when several different
entities supply kernel boot parameters and also for kernels built with
CONFIG_CSD_LOCK_WAIT_DEBUG_DEFAULT=y.
Therefore, make the csdlock_debug=0 kernel boot parameter turn off
debugging. Last one wins!
Reported-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-4-paulmck@kernel.org
The diagnostics added by this commit were extremely useful in one instance:
a5aabace5f ("locking/csd_lock: Add more data to CSD lock debugging")
However, they have not seen much action since, and there have been some
concerns expressed that the complexity is not worth the benefit.
Therefore, manually revert this commit, but leave a comment telling
people where to find these diagnostics.
[ paulmck: Apply Juergen Gross feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-2-paulmck@kernel.org
The csd_debug kernel parameter works well, but is inconvenient in cases
where it is more closely associated with boot loaders or automation than
with a particular kernel version or release. Thererfore, provide a new
CSD_LOCK_WAIT_DEBUG_DEFAULT Kconfig option that defaults csd_debug to
1 when selected and 0 otherwise, with this latter being the default.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230321005516.50558-1-paulmck@kernel.org
Add a text explaining how to quickly build a kernel, as that's something
users will often have to do when they want to report an issue or test
proposed fixes. This is a huge and frightening task for quite a few
users these days, as many rely on pre-compiled kernels and have never
built their own. They find help on quite a few websites explaining the
process in various ways, but those howtos often omit important details
or make things too hard for the 'quickly build just for testing' case
that 'localmodconfig' is really useful for. Hence give users something
at hand to guide them, as that makes it easier for them to help with
testing, debugging, and fixing the kernel.
To keep the complexity at bay, the document explicitly focuses on how to
compile the kernel on commodity distributions running on commodity
hardware. People that deal with less common distributions or hardware
will often know their way around already anyway.
The text describes a few oddities of Arch and Debian that were found by
the author and a few volunteers that tested the described procedure.
There are likely more such quirks that need to be covered as well as a
few things the author will have missed -- but one has to start
somewhere.
The document heavily uses anchors and links to them, which makes things
slightly harder to read in the source form. But the intended target
audience is way more likely to read rendered versions of this text on
pages like docs.kernel.org anyway -- and there those anchors and links
allow easy jumps to the reference section and back, which makes the
document a lot easier to work with for the intended target audience.
Aspects relevant for bisection were left out on purpose, as that is a
related, but in the end different use case. The rough plan is to have a
second document with a similar style to cover bisection. The idea is to
reuse a few bits from this document and link quite often to entries in
the reference section with the help of the anchors in this text.
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/1a788a8e7ba8a2063df08668f565efa832016032.1678021408.git.linux@leemhuis.info
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Sort all of the "no..." kernel parameters into the correct order
as specified in kernel-parameters.rst: "into English Dictionary order
(defined as ignoring all punctuation and sorting digits before letters
in a case insensitive manner)".
No other changes here, just movement of lines.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230317002635.16540-1-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The documentation on how to create your own Raspberry Pi CEC debugger was a
bit out of date. Update it to the Raspberry Pi 4B, drop the mention of the RTC
and a link to a picture that no longer works.
Also reorganize the text to make it easier to follow and change the pins to
match the pins I use.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
From ACPI spec below 3 modes for CPPC can be defined:
1. Non autonomous: OS scaling governor specifies operating frequency/
performance level through `Desired Performance` register and platform
follows that.
2. Guided autonomous: OS scaling governor specifies min and max
frequencies/ performance levels through `Minimum Performance` and
`Maximum Performance` register, and platform can autonomously select an
operating frequency in this range.
3. Fully autonomous: OS only hints (via EPP) to platform for the required
energy performance preference for the workload and platform autonomously
scales the frequency.
Currently (1) is supported by amd_pstate as passive mode, and (3) is
implemented by EPP support. This change is to support (2).
In guided autonomous mode the min_perf is based on the input from the
scaling governor. For example, in case of schedutil this value depends
on the current utilization. And max_perf is set to max capacity.
To activate guided auto mode ``amd_pstate=guided`` command line
parameter has to be passed in the kernel.
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sort the NFS kernel command line parameters. This is done in 4 groups
so as to not have them intermingled: 'nfs' module parameters, 'nfs4'
module parameters, 'nfsd' module parameters, and nfs "global" (__setup,
no module) parameters.
There were 5 parameters which were listed with a space between the
parameter name and the following '=' sign. The space has been
removed since module parameters expect 'parameter=' with no intervening
space.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Link: https://lore.kernel.org/r/20230227025816.1083-1-rdunlap@infradead.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Replace link to a non-existing page with a note that lanana.org does not
maintain Linux Zone Unicode Assignments anymore.
Remove the reference to H. Peter Anvin and the unicode lanana.org email as
the maintainer of this file, as at this point, this is all maintained by
the general kernel community.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20230307144000.29539-3-lukas.bulwahn@gmail.com
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Jiri Kosina, Jonathan Corbet, and Willy Tarreau all expressed a desire
to move this document under process/.
Create a new section for security issues in the index and group it with
embargoed-hardware-issues.
I'm doing this at the start of the series to make all the subsequent
changes show up in 'git blame'.
Existing references were updated using:
git grep -l security-bugs ':!Documentation/translations/' | xargs sed -i 's|admin-guide/security-bugs|process/security-bugs|g'
git grep -l security-bugs Documentation/translations/ | xargs sed -i 's|Documentation/admin-guide/security-bugs|Documentation/process/security-bugs|g'
git grep -l security-bugs Documentation/translations/ | xargs sed -i '/Original:/s|\.\./admin-guide/security-bugs|\.\./process/security-bugs|g'
Notably, the page is not moved in the translations (due to my lack of
knowledge of these languages), but the translations have been updated
to point to the new location of the original document where these
references exist.
Link: https://lore.kernel.org/all/nycvar.YFH.7.76.2206062326230.10851@cbobk.fhfr.pm/
Suggested-by: Jiri Kosina <jikos@kernel.org>
Cc: Alex Shi <alexs@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Hu Haowen <src.res@email.cn>
Cc: Federico Vaga <federico.vaga@vaga.pv.it>
Cc: Tsugikazu Shibata <tshibata@ab.jp.nec.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jeimi Lee <jamee.lee@samsung.com>
Cc: Carlos Bilbao <carlos.bilbao@amd.com>
Cc: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Carlos Bilbao <carlos.bilbao@amd.com>
Reviewed-by: Yanteng Si <siyanteng@loongson.cn>
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Acked-by: Federico Vaga <federico.vaga@vaga.pv.it>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Link: https://lore.kernel.org/r/20230305220010.20895-2-vegard.nossum@oracle.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When all devices that could probe have finished probing (based on
deferred_probe_timeout configuration or late_initcall() when
!CONFIG_MODULES), this parameter controls what to do with devices that
haven't yet received their sync_state() calls.
fw_devlink.sync_state=strict is the default and the driver core will
continue waiting on all consumers of a device to probe successfully
before sync_state() is called for the device. This is the default
behavior since calling sync_state() on a device when all its consumers
haven't probed could make some systems unusable/unstable. When this
option is selected, we also print the list of devices that haven't had
sync_state() called on them by the time all devices the could probe have
finished probing.
fw_devlink.sync_state=timeout will cause the driver core to give up
waiting on consumers and call sync_state() on any devices that haven't
yet received their sync_state() calls. This option is provided for
systems that won't become unusable/unstable as they might be able to
save power (depends on state of hardware before kernel starts) if all
devices get their sync_state().
Signed-off-by: Saravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20230304005355.746421-2-saravanak@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The console= kernel command-line parameter defines where the kernel
messages appear. It can be used multiple times to make the kernel log
visible on more devices.
The ordering of the console= parameters is important. In particular,
the last one defines which device can be accessed also via /dev/console.
The behavior is more complicated when the last console= parameter is
ignored by kernel. It might be surprising because it was not intentional.
The kernel just works this way historically.
There were few attempts to change the behavior. Unfortunately, it can't
be done because it would break existing users. Document the historical
behavior at least.
Link: https://lore.kernel.org/r/20170606143149.GB7604@pathway.suse.cz
Link: https://lore.kernel.org/r/20230213113912.1237943-1-rkanwal@rivosinc.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230308112433.24292-1-pmladek@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are only a handful of users of gpio_export() and
related functions.
As these are just wrappers around the modern gpiod_export()
helper, remove the wrappers and open-code the gpio_to_desc
in all callers to shrink the legacy API.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
CONFIG_SYSFS_DEPRECATED was added in commit 88a22c985e
("CONFIG_SYSFS_DEPRECATED") in 2006 to allow systems with older versions
of some tools (i.e. Fedora 3's version of udev) to boot properly. Four
years later, in 2010, the option was attempted to be removed as most of
userspace should have been fixed up properly by then, but some kernel
developers clung to those old systems and refused to update, so we added
CONFIG_SYSFS_DEPRECATED_V2 in commit e52eec13cd ("SYSFS: Allow boot
time switching between deprecated and modern sysfs layout") to allow
them to continue to boot properly, and we allowed a boot time parameter
to be used to switch back to the old format if needed.
Over time, the logic that was covered under these config options was
slowly removed from individual driver subsystems successfully, removed,
and the only thing that is now left in the kernel are some changes in
the block layer's representation in sysfs where real directories are
used instead of symlinks like normal.
Because the original changes were done to userspace tools in 2006, and
all distros that use those tools are long end-of-life, and older
non-udev-based systems do not care about the block layer's sysfs
representation, it is time to finally remove this old logic and the
config entries from the kernel.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-block@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20230223073326.2073220-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
- Return -EIO instead of success when the certificate buffer for SEV
guests is not large enough.
- Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared on
return to userspace for performance reasons, but the leaves user space
vulnerable to cross-thread attacks which STIBP prevents. Update the
documentation accordingly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmQEVnETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoegJEACbn+CQKFxB4kXJ1xBamYsqQfxY1mM1
yFziEVH3VCXSshfvKePH7fnoAUHTzhy+SjN6c1ERvl82WVXm/BoF2B81KpN9Yd18
R6wTpIS227Pn+Ll1yfVQJMHrb0mnSczo5vCGyOzMOxkqIbNCkHRMoeSBspfNLLGM
3D2+IQqBaqBgNzPQ3JHrwRqQAy/3ZJT4IrHSFe0LwgYQ/EeAGydY8UN0wB1y5YN0
SoFhPd7B7UWxUD7PrfriBc3B2HN44QkMpe/fQJ4y0GVF+1Uqp6Ti7ouCEVg60A3g
8kiS+98FBIzHySk+xfX/vlhiQD/J2c6/+p28gw+iGf6YmUsQbeu64tV5TAUGGBN+
kErLvJmJnC/dwWiEMXzv/e6sNKoZi0Yz/JVq6atuoT/521cjDEDapZRxBSmaW33M
Zn6YF8FIsUTHGdt9Equ+HPjZZTyk34W8f0d0N+lws0QNWtk5d0KU5XP2PDp+Mj6O
dGVaGv88qmMIr0o/s9CgvpefSM8L7fC0WQwRpRr905gu8k6YxuEWQofuh365ZcKT
sEDeRqZYi+ue4+gW1GRje6M5ODftTWoLPlX2f+iZui1gwwpuczvj0sRR10kKfKRD
qxpHcxyIzS2MW4aT1JnVgeWStt0x5wWeq1qzO1bwBJCAlS63vln/mUnBq7+uV0ca
KiEah5vP4dcenA==
=1RwH
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 updates from Thomas Gleixner:
"A small set of updates for x86:
- Return -EIO instead of success when the certificate buffer for SEV
guests is not large enough
- Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared
on return to userspace for performance reasons, but the leaves user
space vulnerable to cross-thread attacks which STIBP prevents.
Update the documentation accordingly"
* tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt/sev-guest: Return -EIO if certificate buffer is not large enough
Documentation/hw-vuln: Document the interaction between IBRS and STIBP
x86/speculation: Allow enabling STIBP with legacy IBRS
Explain why STIBP is needed with legacy IBRS as currently implemented
(KERNEL_IBRS) and why STIBP is not needed when enhanced IBRS is enabled.
Fixes: 7c693f54c8 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230227060541.1939092-2-kpsingh@kernel.org
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAmP7M9AACgkQCF8+vY7k
4RVpxRAAjarn420frUo/YiMWuYiYtDCFmXj+toHgqsa9fcUOjxml9V+S5L0uY6tF
D6d9KCgqKf1AO2MDzB3aR1qQmPfelMoSomQjsTm6cWaMPDobxpzL2IlcspMDBxz0
PyCz4R9cCK5kwuBiQlz3dE605/t/7JXOAFEopo5tvYWNfRt9YXFbPJ/Hdttc4cqw
d6js3TN7oxHoa+t5Ox9Fq+i6MSxsMEku5RvfHVI6yUs//eWcf9H2zFfZ83vZ7+vY
L8PlRzMXlvovsFwXivtiZdSkuwFloWrqIs8btHb1/psClOUxFQhpk2B4hkUixCAn
wk9EN7eHWNdbaZha5//uPRmxUjjhIn4XAIXnfslsB7iiRn7uJtYryUnt+b+kD3Lt
dtF2i1W7nNfUd5e7YRjipTjgjtazLpeyDGvH0TqfpwK8Wn10Acj+Az1v4bjf+cc0
GC1EVUtGeJhexYzLsHSQMQZB1IgFxUw5LNKdqsrboled3yvxfgK69Yp9FQLon9vZ
R7KEDHuzt+e4Kihxq8dTp6wMV47dNrq0wpJKMjfylKhq/MPqa9uiygl1s2KlMg6n
HDJQlYbQGlzrgHDDQRhYUAgxs3JeyxAmup2eLR6dWrAqBW+ULT9S2DiggOk9xxKg
UkaCkodr3ZOhZti+oLRRAY2cRsGYgan7rKhscd0t7opO+CrfHxo=
=0XIw
-----END PGP SIGNATURE-----
Merge tag 'media/v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- Removal of several VB1-only deprecated drivers: cpia2, fsl-viu, meye,
stkwebcam, tm6000, vpfe_capture and zr364xx
- saa7146 recovered from staging/deprecated. We opted to give ti a
chance, and, instead of deprecating it, the intention is to write
patches migrating it from VB1 to VB2.
- av7110 returned from staging/deprecated/ to staging/ as we're not
planning on dropping it any time soon
- media controller API has gained experimental support for G_ROUTING
and streams API. No drivers use it right now. We're planning to add
one after -rc1, giving some time to experience the API and eventually
have changes during the next development cycle
- New sensor drivers: imx296, imx415, ov8858
- Atomisp had lots of changes, specially on its sensor's interface,
making atomisp sensor drivers closer to normal sensor drivers
- media controller kAPI has gained some helpers to traverse pipelines
- uvcvideo now better support power line control
- lots of bug fixes, cleanups and driver improvements
* tag 'media/v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (296 commits)
media: imx-mipi-csis: Check csis_fmt validity before use
media: v4l2-subdev.c: clear stream field
media: v4l2-ctrls-api.c: move ctrl->is_new = 1 to the correct line
media: Revert "media: saa7146: deprecate hexium_gemini/orion, mxb and ttpci"
media: Revert "media: av7110: move to staging/media/deprecated/saa7146"
media: imx-pxp: convert to regmap
media: imx-pxp: Use non-threaded IRQ
media: imx-pxp: Introduce pxp_read() and pxp_write() wrappers
media: imx-pxp: Implement frame size enumeration
media: imx-pxp: Pass pixel format value to find_format()
media: imx-pxp: Add media controller support
media: imx-pxp: Don't set bus_info manually in .querycap()
media: imx-pxp: Sort headers alphabetically
media: imx-pxp: add support for i.MX7D
media: imx-pxp: make data_path_ctrl0 platform dependent
media: imx-pxp: disable LUT block
media: imx-pxp: explicitly disable unused blocks
media: imx-pxp: extract helper function to setup data path
media: imx-pxp: detect PXP version
media: dt-bindings: media: fsl-pxp: convert to yaml
...
- Provide a virtual cache topology to the guest to avoid
inconsistencies with migration on heterogenous systems. Non secure
software has no practical need to traverse the caches by set/way in
the first place.
- Add support for taking stage-2 access faults in parallel. This was an
accidental omission in the original parallel faults implementation,
but should provide a marginal improvement to machines w/o FEAT_HAFDBS
(such as hardware from the fruit company).
- A preamble to adding support for nested virtualization to KVM,
including vEL2 register state, rudimentary nested exception handling
and masking unsupported features for nested guests.
- Fixes to the PSCI relay that avoid an unexpected host SVE trap when
resuming a CPU when running pKVM.
- VGIC maintenance interrupt support for the AIC
- Improvements to the arch timer emulation, primarily aimed at reducing
the trap overhead of running nested.
- Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
interest of CI systems.
- Avoid VM-wide stop-the-world operations when a vCPU accesses its own
redistributor.
- Serialize when toggling CPACR_EL1.SMEN to avoid unexpected exceptions
in the host.
- Aesthetic and comment/kerneldoc fixes
- Drop the vestiges of the old Columbia mailing list and add [Oliver]
as co-maintainer
This also drags in arm64's 'for-next/sme2' branch, because both it and
the PSCI relay changes touch the EL2 initialization code.
RISC-V:
- Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
- Correctly place the guest in S-mode after redirecting a trap to the guest
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
s390:
- Two patches sorting out confusion between virtual and physical
addresses, which currently are the same on s390.
- A new ioctl that performs cmpxchg on guest memory
- A few fixes
x86:
- Change tdp_mmu to a read-only parameter
- Separate TDP and shadow MMU page fault paths
- Enable Hyper-V invariant TSC control
- Fix a variety of APICv and AVIC bugs, some of them real-world,
some of them affecting architecurally legal but unlikely to
happen in practice
- Mark APIC timer as expired if its in one-shot mode and the count
underflows while the vCPU task was being migrated
- Advertise support for Intel's new fast REP string features
- Fix a double-shootdown issue in the emergency reboot code
- Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give SVM
similar treatment to VMX
- Update Xen's TSC info CPUID sub-leaves as appropriate
- Add support for Hyper-V's extended hypercalls, where "support" at this
point is just forwarding the hypercalls to userspace
- Clean up the kvm->lock vs. kvm->srcu sequences when updating the PMU and
MSR filters
- One-off fixes and cleanups
- Fix and cleanup the range-based TLB flushing code, used when KVM is
running on Hyper-V
- Add support for filtering PMU events using a mask. If userspace
wants to restrict heavily what events the guest can use, it can now
do so without needing an absurd number of filter entries
- Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
support is disabled
- Add PEBS support for Intel Sapphire Rapids
- Fix a mostly benign overflow bug in SEV's send|receive_update_data()
- Move several SVM-specific flags into vcpu_svm
x86 Intel:
- Handle NMI VM-Exits before leaving the noinstr region
- A few trivial cleanups in the VM-Enter flows
- Stop enabling VMFUNC for L1 purely to document that KVM doesn't support
EPTP switching (or any other VM function) for L1
- Fix a crash when using eVMCS's enlighted MSR bitmaps
Generic:
- Clean up the hardware enable and initialization flow, which was
scattered around multiple arch-specific hooks. Instead, just
let the arch code call into generic code. Both x86 and ARM should
benefit from not having to fight common KVM code's notion of how
to do initialization.
- Account allocations in generic kvm_arch_alloc_vm()
- Fix a memory leak if coalesced MMIO unregistration fails
selftests:
- On x86, cache the CPU vendor (AMD vs. Intel) and use the info to emit
the correct hypercall instruction instead of relying on KVM to patch
in VMMCALL
- Use TAP interface for kvm_binary_stats_test and tsc_msrs_test
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmP2YA0UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPg/Qf+J6nT+TkIa+8Ei+fN1oMTDp4YuIOx
mXvJ9mRK9sQ+tAUVwvDz3qN/fK5mjsYbRHIDlVc5p2Q3bCrVGDDqXPFfCcLx1u+O
9U9xjkO4JxD2LS9pc70FYOyzVNeJ8VMGOBbC2b0lkdYZ4KnUc6e/WWFKJs96bK+H
duo+RIVyaMthnvbTwSv1K3qQb61n6lSJXplywS8KWFK6NZAmBiEFDAWGRYQE9lLs
VcVcG0iDJNL/BQJ5InKCcvXVGskcCm9erDszPo7w4Bypa4S9AMS42DHUaRZrBJwV
/WqdH7ckIz7+OSV0W1j+bKTHAFVTCjXYOM7wQykgjawjICzMSnnG9Gpskw==
=goe1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Provide a virtual cache topology to the guest to avoid
inconsistencies with migration on heterogenous systems. Non secure
software has no practical need to traverse the caches by set/way in
the first place
- Add support for taking stage-2 access faults in parallel. This was
an accidental omission in the original parallel faults
implementation, but should provide a marginal improvement to
machines w/o FEAT_HAFDBS (such as hardware from the fruit company)
- A preamble to adding support for nested virtualization to KVM,
including vEL2 register state, rudimentary nested exception
handling and masking unsupported features for nested guests
- Fixes to the PSCI relay that avoid an unexpected host SVE trap when
resuming a CPU when running pKVM
- VGIC maintenance interrupt support for the AIC
- Improvements to the arch timer emulation, primarily aimed at
reducing the trap overhead of running nested
- Add CONFIG_USERFAULTFD to the KVM selftests config fragment in the
interest of CI systems
- Avoid VM-wide stop-the-world operations when a vCPU accesses its
own redistributor
- Serialize when toggling CPACR_EL1.SMEN to avoid unexpected
exceptions in the host
- Aesthetic and comment/kerneldoc fixes
- Drop the vestiges of the old Columbia mailing list and add [Oliver]
as co-maintainer
RISC-V:
- Fix wrong usage of PGDIR_SIZE instead of PUD_SIZE
- Correctly place the guest in S-mode after redirecting a trap to the
guest
- Redirect illegal instruction traps to guest
- SBI PMU support for guest
s390:
- Sort out confusion between virtual and physical addresses, which
currently are the same on s390
- A new ioctl that performs cmpxchg on guest memory
- A few fixes
x86:
- Change tdp_mmu to a read-only parameter
- Separate TDP and shadow MMU page fault paths
- Enable Hyper-V invariant TSC control
- Fix a variety of APICv and AVIC bugs, some of them real-world, some
of them affecting architecurally legal but unlikely to happen in
practice
- Mark APIC timer as expired if its in one-shot mode and the count
underflows while the vCPU task was being migrated
- Advertise support for Intel's new fast REP string features
- Fix a double-shootdown issue in the emergency reboot code
- Ensure GIF=1 and disable SVM during an emergency reboot, i.e. give
SVM similar treatment to VMX
- Update Xen's TSC info CPUID sub-leaves as appropriate
- Add support for Hyper-V's extended hypercalls, where "support" at
this point is just forwarding the hypercalls to userspace
- Clean up the kvm->lock vs. kvm->srcu sequences when updating the
PMU and MSR filters
- One-off fixes and cleanups
- Fix and cleanup the range-based TLB flushing code, used when KVM is
running on Hyper-V
- Add support for filtering PMU events using a mask. If userspace
wants to restrict heavily what events the guest can use, it can now
do so without needing an absurd number of filter entries
- Clean up KVM's handling of "PMU MSRs to save", especially when vPMU
support is disabled
- Add PEBS support for Intel Sapphire Rapids
- Fix a mostly benign overflow bug in SEV's
send|receive_update_data()
- Move several SVM-specific flags into vcpu_svm
x86 Intel:
- Handle NMI VM-Exits before leaving the noinstr region
- A few trivial cleanups in the VM-Enter flows
- Stop enabling VMFUNC for L1 purely to document that KVM doesn't
support EPTP switching (or any other VM function) for L1
- Fix a crash when using eVMCS's enlighted MSR bitmaps
Generic:
- Clean up the hardware enable and initialization flow, which was
scattered around multiple arch-specific hooks. Instead, just let
the arch code call into generic code. Both x86 and ARM should
benefit from not having to fight common KVM code's notion of how to
do initialization
- Account allocations in generic kvm_arch_alloc_vm()
- Fix a memory leak if coalesced MMIO unregistration fails
selftests:
- On x86, cache the CPU vendor (AMD vs. Intel) and use the info to
emit the correct hypercall instruction instead of relying on KVM to
patch in VMMCALL
- Use TAP interface for kvm_binary_stats_test and tsc_msrs_test"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (325 commits)
KVM: SVM: hyper-v: placate modpost section mismatch error
KVM: x86/mmu: Make tdp_mmu_allowed static
KVM: arm64: nv: Use reg_to_encoding() to get sysreg ID
KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 changes
KVM: arm64: nv: Filter out unsupported features from ID regs
KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2
KVM: arm64: nv: Allow a sysreg to be hidden from userspace only
KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor
KVM: arm64: nv: Add accessors for SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
KVM: arm64: nv: Handle SMCs taken from virtual EL2
KVM: arm64: nv: Handle trapped ERET from virtual EL2
KVM: arm64: nv: Inject HVC exceptions to the virtual EL2
KVM: arm64: nv: Support virtual EL2 exceptions
KVM: arm64: nv: Handle HCR_EL2.NV system register traps
KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state
KVM: arm64: nv: Add EL2 system registers to vcpu context
KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x
KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
KVM: arm64: nv: Introduce nested virtualization VCPU feature
KVM: arm64: Use the S2 MMU context to iterate over S2 table
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmP2dbsUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vzBDg//aW2IeJYku5ENXwwnCQjBlyjBGOOZ
456KGpFt/ky0N9Jp0ZS3nQSa5YN7q+L8XY48gu6I7s1hXly8iLZKLrJN++S//k55
BadXu7mDUyVoY74LYvBe0nlXuwJul2qnq9IJLufRucrn1yoyqApAh39IRdCzi4U8
mP+wad7sQA0Si4bpf80uwn6Yq8SrDoO0mtmO/dZSXJooM2t2SnDXEL/fxMwTNDA4
XsVSP9FrbPmcTLo8mkDa8Dy7JKbL6KQJF9yDlmYzuA2spQpTf+YLLfsNnmE+850h
WTtfCjVaYtlik7i9qTB+VcN1CsGVepYKK3H5the16Aeql2Fu+Ji5KSt74C220Yi9
ZSDA93d/EfGc5egKyBdUUMFgqhe46srRUAoWcMrx2T4ARGuOm5EYCa9C8C7dFmO0
j6f9MYL3j2Sw3FROEKViRVOFfbIfVW1TXIo3x0fE0ud3xkg73eKp/++X8QeTMjox
2ArY2AWPNQpUI1oMlKxlSEd5XjFf7n/hHDtFqj9bIuJzt0/8wXQf0jCYTjhpGkRB
pmO+lColK6lp+bg8aWRRkiwN73xGdQhKaeXLo0Iq4T6xr0Lb3XoskHZvt6NIGe/A
ds5/uwtErq6kCf2G9YG1xfh+G1bimbjWwsHCNfSNXzTsWGDFTCb8tvqF90m+7+yl
bllxTXA6PO312Tw=
=/y4d
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Rework portdrv shutdown so it disables interrupts but doesn't
disable bus mastering, which leads to hangs on Loongson LS7A
- Add mechanism to prevent Max_Read_Request_Size (MRRS) increases,
again to avoid hardware issues on Loongson LS7A (and likely other
devices based on DesignWare IP)
- Ignore devices with a firmware (DT or ACPI) node that says the
device is disabled
Resource management:
- Distribute spare resources to unconfigured hotplug bridges at
boot-time (not just when hot-adding such a bridge), which makes
hot-adding devices to docks work better. Tried this in v6.1 but had
to revert for regressions, so try again
- Fix root bus issue that dropped resources that happened to end
at 0, e.g., [bus 00]
PCI device hotplug:
- Remove device locking when marking device as disconnected so this
doesn't have to wait for concurrent driver bind/unbind to complete
- Quirk more Qualcomm bridges that don't fully implement the PCIe
Slot Status 'Command Completed' bit
Power management:
- Account for _S0W of the target bridge in acpi_pci_bridge_d3() so we
don't miss hot-add notifications for USB4 docks, Thunderbolt, etc
Reset:
- Observe delay after reset, e.g., resuming from system sleep,
regardless of whether a bridge can suspend to D3cold at runtime
- Wait for secondary bus to become ready after a bridge reset
Virtualization:
- Avoid FLR on some AMD FCH AHCI adapters where it doesn't work
- Allow independent IOMMU groups for some Wangxun NICs that prevent
peer-to-peer transactions but don't advertise an ACS Capability
Error handling:
- Configure End-to-End-CRC (ECRC) only if Linux owns the AER
Capability
- Remove redundant Device Control Error Reporting Enable in the AER
service driver since this is already done for all devices during
enumeration
ASPM:
- Add pci_enable_link_state() interface to allow drivers to enable
ASPM link state
Endpoint framework:
- Move dra7xx and tegra194 linkup processing from hard IRQ to
threaded IRQ handler
- Add a separate lock for endpoint controller list of endpoint
function drivers to prevent deadlock in callbacks
- Pass events from endpoint controller to endpoint function drivers
via callbacks instead of notifiers
Synopsys DesignWare eDMA controller driver (acked by Vinod):
- Fix CPU vs PCI address issues
- Fix source vs destination address issues
- Fix issues with interleaved transfer semantics
- Fix channel count initialization issue (issue still exists in
several other drivers)
- Clean up and improve debugfs usage so it will work on platforms
with several eDMA devices
Baikal T-1 PCIe controller driver:
- Set a 64-bit DMA mask
Freescale i.MX6 PCIe controller driver:
- Add i.MX8MM, i.MX8MQ, i.MX8MP endpoint mode DT binding and driver
support
Intel VMD host bridge driver:
- Add quirk to configure PCIe ASPM and LTR. This is normally done by
BIOS, and will be for future products
Marvell MVEBU PCIe controller driver:
- Mark this driver as broken in Kconfig since bugs prevent its daily
usage
MediaTek MT7621 PCIe controller driver:
- Delay PHY port initialization to improve boot reliability for ZBT
WE1326, ZBT WF3526-P, and some Netgear models
Qualcomm PCIe controller driver:
- Add MSM8998 DT compatible string
- Unify MSM8996 and MSM8998 clock orderings
- Add SM8350 DT binding and driver support
- Add IPQ8074 Gen3 DT binding and driver support
- Correct qcom,perst-regs in DT binding
- Add qcom_pcie_host_deinit() so the PHY is powered off and
regulators and clocks are disabled on late host-init errors
Socionext UniPhier Pro5 controller driver:
- Clean up uniphier-ep reg, clocks, resets, and their names in DT
binding
Synopsys DesignWare PCIe controller driver:
- Restrict coherent DMA mask to 32 bits for MSI, but allow controller
drivers to set 64-bit streaming DMA mask
- Add eDMA engine support in both Root Port and Endpoint controllers
Miscellaneous:
- Remove MODULE_LICENSE from boolean drivers so they don't look like
modules so modprobe can complain about them"
* tag 'pci-v6.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (86 commits)
PCI: dwc: Add Root Port and Endpoint controller eDMA engine support
PCI: bt1: Set 64-bit DMA mask
PCI: dwc: Restrict only coherent DMA mask for MSI address allocation
dmaengine: dw-edma: Prepare dw_edma_probe() for builtin callers
dmaengine: dw-edma: Depend on DW_EDMA instead of selecting it
dmaengine: dw-edma: Add mem-mapped LL-entries support
PCI: Remove MODULE_LICENSE so boolean drivers don't look like modules
PCI: hv: Drop duplicate PCI_MSI dependency
PCI/P2PDMA: Annotate RCU dereference
PCI/sysfs: Constify struct kobj_type pci_slot_ktype
PCI: hotplug: Allow marking devices as disconnected during bind/unbind
PCI: pciehp: Add Qualcomm quirk for Command Completed erratum
PCI: qcom: Add IPQ8074 Gen3 port support
dt-bindings: PCI: qcom: Add IPQ8074 Gen3 port
dt-bindings: PCI: qcom: Sort compatibles alphabetically
PCI: qcom: Fix host-init error handling
PCI: qcom: Add SM8350 support
dt-bindings: PCI: qcom: Add SM8350
dt-bindings: PCI: qcom-ep: Correct qcom,perst-regs
dt-bindings: PCI: qcom: Unify MSM8996 and MSM8998 clock order
...
Here is the big set of serial and tty driver updates for 6.3-rc1.
Once again, Jiri and Ilpo have done a number of core vt and tty/serial
layer cleanups that were much needed and appreciated. Other than that,
it's just a bunch of little tty/serial driver updates:
- qcom-geni-serial driver updates
- liteuart driver updates
- hvcs driver cleanups
- n_gsm updates and additions for new features
- more 8250 device support added
- fpga/dfl update and additions
- imx serial driver updates
- fsl_lpuart updates
- other tiny fixes and updates for serial drivers
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/itAw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykJbQCfWv/J4ZElO108iHBU5mJCDagUDBgAnAtLLN6A
SEAnnokGCDtA/BNIXeES
=luRi
-----END PGP SIGNATURE-----
Merge tag 'tty-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty / serial driver updates from Greg KH:
"Here is the big set of serial and tty driver updates for 6.3-rc1.
Once again, Jiri and Ilpo have done a number of core vt and tty/serial
layer cleanups that were much needed and appreciated. Other than that,
it's just a bunch of little tty/serial driver updates:
- qcom-geni-serial driver updates
- liteuart driver updates
- hvcs driver cleanups
- n_gsm updates and additions for new features
- more 8250 device support added
- fpga/dfl update and additions
- imx serial driver updates
- fsl_lpuart updates
- other tiny fixes and updates for serial drivers
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (143 commits)
tty: n_gsm: add keep alive support
serial: imx: remove a redundant check
dt-bindings: serial: snps-dw-apb-uart: add dma & dma-names properties
soc: qcom: geni-se: Move qcom-geni-se.h to linux/soc/qcom/geni-se.h
tty: n_gsm: add TIOCMIWAIT support
tty: n_gsm: add RING/CD control support
tty: n_gsm: mark unusable ioctl structure fields accordingly
serial: imx: get rid of registers shadowing
serial: imx: refine local variables in rxint()
serial: imx: stop using USR2 in FIFO reading loop
serial: imx: remove redundant USR2 read from FIFO reading loop
serial: imx: do not break from FIFO reading loop prematurely
serial: imx: do not sysrq broken chars
serial: imx: work-around for hardware RX flood
serial: imx: factor-out common code to imx_uart_soft_reset()
serial: 8250_pci1xxxx: Add power management functions to quad-uart driver
serial: 8250_pci1xxxx: Add RS485 support to quad-uart driver
serial: 8250_pci1xxxx: Add driver for quad-uart support
serial: 8250_pci: Add serial8250_pci_setup_port definition in 8250_pcilib.c
tty: pcn_uart: fix memory leak with using debugfs_lookup()
...
Most notable is a set of zlib changes from Mikhail Zaslonko which enhances
and fixes zlib's use of S390 hardware support: "lib/zlib: Set of s390
DFLTCC related patches for kernel zlib".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/QC4QAKCRDdBJ7gKXxA
jtKdAQCbDCBdY8H45d1fONzQW2UDqCPnOi77MpVUxGL33r+1SAEA807C7rvDEmlf
yP1Ft+722fFU5jogVU8ZFh+vapv2/gI=
=Q9YK
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"There is no particular theme here - mainly quick hits all over the
tree.
Most notable is a set of zlib changes from Mikhail Zaslonko which
enhances and fixes zlib's use of S390 hardware support: 'lib/zlib: Set
of s390 DFLTCC related patches for kernel zlib'"
* tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (55 commits)
Update CREDITS file entry for Jesper Juhl
sparc: allow PM configs for sparc32 COMPILE_TEST
hung_task: print message when hung_task_warnings gets down to zero.
arch/Kconfig: fix indentation
scripts/tags.sh: fix the Kconfig tags generation when using latest ctags
nilfs2: prevent WARNING in nilfs_dat_commit_end()
lib/zlib: remove redundation assignement of avail_in dfltcc_gdht()
lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default
lib/zlib: DFLTCC always switch to software inflate for Z_PACKET_FLUSH option
lib/zlib: DFLTCC support inflate with small window
lib/zlib: Split deflate and inflate states for DFLTCC
lib/zlib: DFLTCC not writing header bits when avail_out == 0
lib/zlib: fix DFLTCC ignoring flush modes when avail_in == 0
lib/zlib: fix DFLTCC not flushing EOBS when creating raw streams
lib/zlib: implement switching between DFLTCC and software
lib/zlib: adjust offset calculation for dfltcc_state
nilfs2: replace WARN_ONs for invalid DAT metadata block requests
scripts/spelling.txt: add "exsits" pattern and fix typo instances
fs: gracefully handle ->get_block not mapping bh in __mpage_writepage
cramfs: Kconfig: fix spelling & punctuation
...
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter". These filters provide users
with finer-grained control over DAMOS's actions. SeongJae has also done
some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series "mm:
support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with his
series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings. The previous BPF-based approach had
shortcomings. See "mm: In-kernel support for memory-deny-write-execute
(MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a per-node
basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage during
compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in ths
series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's series
"mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
"fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest of
the kernel in the series "Simplify the external interface for GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the series
"mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
=MlGs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
- Fix ftrace2bconf.sh tool for checking event enable status correctly.
- Add CONFIG_BOOT_CONFIG_FORCE to apply bootconfig without 'bootconfig'
boot parameter.
- Enable CONFIG_BOOT_CONFIG_FORCE by default if a bootconfig is embedded
in the kernel.
- Increase max number of nodes of bootconfig to 8192.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmP1VUUACgkQ2/sHvwUr
PxuExQgAslUeGrdn8nAA2qsModVHrXwLl1Xa6797Xzh/xCoIOAQ5AaUkGOlBBpCi
0UGsiqo5pLfrJ7q1HCTiD4kNpDcK6Kw9UbjClMS2nSf58hK98upUAng+4VlTH3dZ
difzua1Y0PohBDsLZpV5Ex/K9ZHiPhm44pqkaA+q0gHBfa5AmFuRUD3icEdiHmFu
B3GX0qdIMeFmUhxt0jmfvsu1Xq8fjF3Lsz/xCeOHcNJYyxzmdttxHYY8pLTWOIoL
xGL2MmwIYzLRW3/r9E71JNCLgykUWZSBbYhcJ7lIAJadFNbNBFJ0+v5uiyxbZEib
Xv5UAyTKSIeZIyH0fUZ/4Ufa8sw5Nw==
=0Nnb
-----END PGP SIGNATURE-----
Merge tag 'bootconfig-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull bootconfig updates from Masami Hiramatsu:
- Fix ftrace2bconf.sh tool for checking event enable status correctly
- Add CONFIG_BOOT_CONFIG_FORCE to apply bootconfig without 'bootconfig'
boot parameter
- Enable CONFIG_BOOT_CONFIG_FORCE by default if a bootconfig is
embedded in the kernel
- Increase max number of nodes of bootconfig to 8192
* tag 'bootconfig-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
bootconfig: Increase max nodes of bootconfig from 1024 to 8192 for DCC support
bootconfig: Default BOOT_CONFIG_FORCE to y if BOOT_CONFIG_EMBED
Allow forcing unconditional bootconfig processing
tools/bootconfig: fix single & used for logical condition
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmP0vCsACgkQUqAMR0iA
lPKjHxAAldxJr7B5XZWRbYOw9VAQ/Qv0whfZwJ0Wl1crhx/PrB6JBkTbz/vHMj+c
vZSQXOID5g858+iR7KdvejWFTnr7RdBP28rkRj6zijXYBAlrNwQtV+aXS/AHyCjR
2/cMuP6+7vqaVp8BqDyToQrZil6Us5Io/a6l4C0RQp1nQmY5A8qLg7kdtpPOcIoW
yBWY+5pMJVfryr95yOjE7WxBVjpIh5eo/D1WoexoxcJNsHpe/2mAvXfZMNLMHBKL
Io0RSZfrD0Pw+oHyhpX7bLxRbmYO8kajBMtMvPa2NIMQyk/Cm+2SSvDS39n1B4vm
eXym5hUfSZIZUGV1cegia/exZgfvj3yCEc+8HJ0WEenlFruxXCtqwPbFwtkjsznK
EjgOEehygaHGiQ7LY9cEA5ON5e7MIGsYUp+EAS2vNja81yBmAK89TlOtezzKIyU2
U4q4UUsXi4TMMNzvNu0bs4gEu2HOCTovE1O0h7C56haoVHSWWqexepvRwNhAWELY
ztQz5hd8S7UIILGDTnx8nCc7W8RddfLZfl2XBuzr9gWp79SqeaGKOk10jkjBkatC
1STqkCN+Jwx5uIdF8LLg4EaNJ8QQCdDaCLvQpSJbQpFIYz7KCwk/hhczdCLsWrrp
feGEwSD5u2ek5cEk840WfvKlle6fsqtLb6m98UV2/WJpjW9qtSo=
=cVoE
-----END PGP SIGNATURE-----
Merge tag 'printk-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Refactor printk code for formatting messages that are shown on
consoles. This is a preparatory step for introducing atomic consoles
which could not share the global buffers
- Prevent memory leak when removing printk index in debugfs
- Dump also the newest printk message by the sample gdbmacro
- Fix a compiler warning
* tag 'printk-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printf: fix errname.c list
kernel/printk/index.c: fix memory leak with using debugfs_lookup()
printk: Use scnprintf() to print the message about the dropped messages on a console
printk: adjust string limit macros
printk: use printk_buffers for devkmsg
printk: introduce console_prepend_dropped() for dropped messages
printk: introduce printk_get_next_message() and printk_message
printk: introduce struct printk_buffers
console: Document struct console
console: Use BIT() macros for @flags values
printk: move size limit macros into internal.h
docs: gdbmacros: print newest record
- Add function names as a way to filter function addresses
- Add sample module to test ftrace ops and dynamic trampolines
- Allow stack traces to be passed from beginning event to end event for
synthetic events. This will allow seeing the stack trace of when a task is
scheduled out and recorded when it gets scheduled back in.
- Add trace event helper __get_buf() to use as a temporary buffer when printing
out trace event output.
- Add kernel command line to create trace instances on boot up.
- Add enabling of events to instances created at boot up.
- Add trace_array_puts() to write into instances.
- Allow boot instances to take a snapshot at the end of boot up.
- Allow live patch modules to include trace events
- Minor fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY/PaaBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qh5iAPoD0LKZzD33rhO5Ec4hoexE0DkqycP3
dvmOMbCBL8GkxwEA+d2gLz/EquSFm166hc4D79Sn3geCqvkwmy8vQWVjIQc=
=M82D
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Add function names as a way to filter function addresses
- Add sample module to test ftrace ops and dynamic trampolines
- Allow stack traces to be passed from beginning event to end event for
synthetic events. This will allow seeing the stack trace of when a
task is scheduled out and recorded when it gets scheduled back in.
- Add trace event helper __get_buf() to use as a temporary buffer when
printing out trace event output.
- Add kernel command line to create trace instances on boot up.
- Add enabling of events to instances created at boot up.
- Add trace_array_puts() to write into instances.
- Allow boot instances to take a snapshot at the end of boot up.
- Allow live patch modules to include trace events
- Minor fixes and clean ups
* tag 'trace-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
tracing: Remove unnecessary NULL assignment
tracepoint: Allow livepatch module add trace event
tracing: Always use canonical ftrace path
tracing/histogram: Fix stacktrace histogram Documententation
tracing/histogram: Fix stacktrace key
tracing/histogram: Fix a few problems with stacktrace variable printing
tracing: Add BUILD_BUG() to make sure stacktrace fits in strings
tracing/histogram: Don't use strlen to find length of stacktrace variables
tracing: Allow boot instances to have snapshot buffers
tracing: Add trace_array_puts() to write into instance
tracing: Add enabling of events to boot instances
tracing: Add creation of instances at boot command line
tracing: Fix trace_event_raw_event_synth() if else statement
samples: ftrace: Make some global variables static
ftrace: sample: avoid open-coded 64-bit division
samples: ftrace: Include the nospec-branch.h only for x86
tracing: Acquire buffer from temparary trace sequence
tracing/histogram: Wrap remaining shell snippets in code blocks
tracing/osnoise: No need for schedule_hrtimeout range
bpf/tracing: Use stage6 of tracing to not duplicate macros
...
* Eliminate repeated boxing and unboxing of log item parameters.
* Clean up some confusing variable names in the log item code.
* Fix a deadlock when doing unwritten extent conversion that causes a
bmbt split when there are sustained memory shortages and the worker
pool runs out of worker threads.
* Fix the panic_mask debug knob not being able to trigger on verifier
errors.
* Constify kobj_type objects.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCY+Z6BwAKCRBKO3ySh0YR
pkQJAQCjkzXqZuj8WH/g22S01smT51QhmX+1ubLdzMYSvRvrKQD+MlH74EcgurQD
GhgCWJh6dBTx1nICKpCXYgVD9Glvowc=
=J2Xw
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.3-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"There's a couple of bug fixes, some cleanups for inconsistent variable
names and reduction of struct boxing and unboxing in the logging code.
More work is pending, which will begin reworking allocation group
lifetimes and finally replace confusing indirect calls to the
allocator with actual ... function calls. But I want to let that
experience another week of testing.
Summary:
- Eliminate repeated boxing and unboxing of log item parameters
- Clean up some confusing variable names in the log item code
- Fix a deadlock when doing unwritten extent conversion that causes a
bmbt split when there are sustained memory shortages and the worker
pool runs out of worker threads
- Fix the panic_mask debug knob not being able to trigger on verifier
errors
- Constify kobj_type objects"
* tag 'xfs-6.3-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: revert commit 8954c44ff4
xfs: make kobj_type structures constant
xfs: allow setting full range of panic tags
xfs: don't use BMBT btree split workers for IO completion
xfs: fix confusing variable names in xfs_refcount_item.c
xfs: pass refcount intent directly through the log intent code
xfs: fix confusing variable names in xfs_rmap_item.c
xfs: pass rmap space mapping directly through the log intent code
xfs: fix confusing xfs_extent_item variable names
xfs: pass xfs_extent_free_item directly through the log intent code
xfs: fix confusing variable names in xfs_bmap_item.c
xfs: pass the xfs_bmbt_irec directly through the log intent code
xfs: use strscpy() to instead of strncpy()