Mgerge remote-tracking branch 'torvalds/master' into perf/core

To sync headers, for instance, in this case tools/perf was ahead of
upstream till Linus merged tip/perf/core to get the
PERF_RECORD_TEXT_POKE changes:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
  diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit is contained in:
Arnaldo Carvalho de Melo 2020-08-06 08:15:47 -03:00
Родитель c4735d9902 47ec5303d7
Коммит 94fb1afb14
7577 изменённых файлов: 521758 добавлений и 139049 удалений

1
.gitignore поставляемый
Просмотреть файл

@ -44,6 +44,7 @@
*.tab.[ch]
*.tar
*.xz
*.zst
Module.symvers
modules.builtin
modules.order

Просмотреть файл

@ -18,6 +18,9 @@ Aleksey Gorelov <aleksey_gorelov@phoenix.com>
Aleksandar Markovic <aleksandar.markovic@mips.com> <aleksandar.markovic@imgtec.com>
Alex Shi <alex.shi@linux.alibaba.com> <alex.shi@intel.com>
Alex Shi <alex.shi@linux.alibaba.com> <alex.shi@linaro.org>
Alexander Lobakin <alobakin@pm.me> <alobakin@dlink.ru>
Alexander Lobakin <alobakin@pm.me> <alobakin@marvell.com>
Alexander Lobakin <alobakin@pm.me> <bloodyreaper@yandex.ru>
Alexandre Belloni <alexandre.belloni@bootlin.com> <alexandre.belloni@free-electrons.com>
Alexei Starovoitov <ast@kernel.org> <ast@plumgrid.com>
Alexei Starovoitov <ast@kernel.org> <alexei.starovoitov@gmail.com>
@ -134,6 +137,11 @@ Jeff Layton <jlayton@kernel.org> <jlayton@poochiereds.net>
Jeff Layton <jlayton@kernel.org> <jlayton@primarydata.com>
Jens Axboe <axboe@suse.de>
Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Jiri Slaby <jirislaby@kernel.org> <jirislaby@gmail.com>
Jiri Slaby <jirislaby@kernel.org> <jslaby@novell.com>
Jiri Slaby <jirislaby@kernel.org> <jslaby@suse.com>
Jiri Slaby <jirislaby@kernel.org> <jslaby@suse.cz>
Jiri Slaby <jirislaby@kernel.org> <xslaby@fi.muni.cz>
Johan Hovold <johan@kernel.org> <jhovold@gmail.com>
Johan Hovold <johan@kernel.org> <johan@hovoldconsulting.com>
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
@ -151,6 +159,7 @@ Kamil Konieczny <k.konieczny@samsung.com> <k.konieczny@partner.samsung.com>
Kay Sievers <kay.sievers@vrfy.org>
Kenneth W Chen <kenneth.w.chen@intel.com>
Konstantin Khlebnikov <koct9i@gmail.com> <k.khlebnikov@samsung.com>
Konstantin Khlebnikov <koct9i@gmail.com> <khlebnikov@yandex-team.ru>
Koushik <raghavendra.koushik@neterion.com>
Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski@samsung.com>
Krzysztof Kozlowski <krzk@kernel.org> <k.kozlowski.k@gmail.com>

72
CREDITS
Просмотреть файл

@ -34,7 +34,7 @@ S: Romania
N: Mark Adler
E: madler@alumni.caltech.edu
W: http://alumnus.caltech.edu/~madler/
W: https://alumnus.caltech.edu/~madler/
D: zlib decompression
N: Monalisa Agrawal
@ -62,7 +62,7 @@ S: United Kingdom
N: Werner Almesberger
E: werner@almesberger.net
W: http://www.almesberger.net/
W: https://www.almesberger.net/
D: dosfs, LILO, some fd features, ATM, various other hacks here and there
S: Buenos Aires
S: Argentina
@ -96,7 +96,7 @@ S: USA
N: Erik Andersen
E: andersen@codepoet.org
W: http://www.codepoet.org/
W: https://www.codepoet.org/
P: 1024D/30D39057 1BC4 2742 E885 E4DE 9301 0C82 5F9B 643E 30D3 9057
D: Maintainer of ide-cd and Uniform CD-ROM driver,
D: ATAPI CD-Changer support, Major 2.1.x CD-ROM update.
@ -114,7 +114,7 @@ S: Canada K2P 0X3
N: H. Peter Anvin
E: hpa@zytor.com
W: http://www.zytor.com/~hpa/
W: https://www.zytor.com/~hpa/
P: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD 1E DF FE 69 EE 35 BD 74
D: Author of the SYSLINUX boot loader, maintainer of the linux.* news
D: hierarchy and the Linux Device List; various kernel hacks
@ -124,7 +124,7 @@ S: USA
N: Andrea Arcangeli
E: andrea@suse.de
W: http://www.kernel.org/pub/linux/kernel/people/andrea/
W: https://www.kernel.org/pub/linux/kernel/people/andrea/
P: 1024D/68B9CB43 13D9 8355 295F 4823 7C49 C012 DFA1 686E 68B9 CB43
P: 1024R/CB4660B9 CC A0 71 81 F4 A0 63 AC C0 4B 81 1D 8C 15 C8 E5
D: Parport hacker
@ -339,7 +339,7 @@ S: Haifa, Israel
N: Johannes Berg
E: johannes@sipsolutions.net
W: http://johannes.sipsolutions.net/
W: https://johannes.sipsolutions.net/
P: 4096R/7BF9099A C0EB C440 F6DA 091C 884D 8532 E0F3 73F3 7BF9 099A
D: powerpc & 802.11 hacker
@ -376,7 +376,7 @@ D: Original author of the Linux networking code
N: Anton Blanchard
E: anton@samba.org
W: http://samba.org/~anton/
W: https://samba.org/~anton/
P: 1024/8462A731 4C 55 86 34 44 59 A7 99 2B 97 88 4A 88 9A 0D 97
D: sun4 port, Sparc hacker
@ -509,7 +509,7 @@ S: Sweden
N: Paul Bristow
E: paul@paulbristow.net
W: http://paulbristow.net/linux/idefloppy.html
W: https://paulbristow.net/linux/idefloppy.html
D: Maintainer of IDE/ATAPI floppy driver
N: Stefano Brivio
@ -518,7 +518,7 @@ D: Broadcom B43 driver
N: Dominik Brodowski
E: linux@brodo.de
W: http://www.brodo.de/
W: https://www.brodo.de/
P: 1024D/725B37C6 190F 3E77 9C89 3B6D BECD 46EE 67C3 0308 725B 37C6
D: parts of CPUFreq code, ACPI bugfixes, PCMCIA rewrite, cpufrequtils
S: Tuebingen, Germany
@ -865,7 +865,7 @@ D: Promise DC4030VL caching HD controller drivers
N: Todd J. Derr
E: tjd@fore.com
W: http://www.wordsmith.org/~tjd
W: https://www.wordsmith.org/~tjd
D: Random console hacks and other miscellaneous stuff
S: 3000 FORE Drive
S: Warrendale, Pennsylvania 15086
@ -894,8 +894,8 @@ S: USA
N: Matt Domsch
E: Matt_Domsch@dell.com
W: http://www.dell.com/linux
W: http://domsch.com/linux
W: https://www.dell.com/linux
W: https://domsch.com/linux
D: Linux/IA-64
D: Dell PowerEdge server, SCSI layer, misc drivers, and other patches
S: Dell Inc.
@ -992,7 +992,7 @@ S: USA
N: Randy Dunlap
E: rdunlap@infradead.org
W: http://www.infradead.org/~rdunlap/
W: https://www.infradead.org/~rdunlap/
D: Linux-USB subsystem, USB core/UHCI/printer/storage drivers
D: x86 SMP, ACPI, bootflag hacking
D: documentation, builds
@ -1157,7 +1157,7 @@ S: Germany
N: Jeremy Fitzhardinge
E: jeremy@goop.org
W: http://www.goop.org/~jeremy
W: https://www.goop.org/~jeremy
D: author of userfs filesystem
D: Improved mmap and munmap handling
D: General mm minor tidyups
@ -1460,7 +1460,7 @@ S: The Netherlands
N: Oliver Hartkopp
E: oliver.hartkopp@volkswagen.de
W: http://www.volkswagen.de
W: https://www.volkswagen.de
D: Controller Area Network (network layer core)
S: Brieffach 1776
S: 38436 Wolfsburg
@ -1599,13 +1599,13 @@ S: Germany
N: Kenji Hollis
E: kenji@bitgate.com
W: http://www.bitgate.com/
W: https://www.bitgate.com/
D: Berkshire PC Watchdog Driver
D: Small/Industrial Driver Project
N: Nick Holloway
E: Nick.Holloway@pyrites.org.uk
W: http://www.pyrites.org.uk/
W: https://www.pyrites.org.uk/
P: 1024/36115A04 F4E1 3384 FCFD C055 15D6 BA4C AB03 FBF8 3611 5A04
D: Occasional Linux hacker...
S: (ask for current address)
@ -1655,7 +1655,7 @@ S: USA
N: Harald Hoyer
E: harald@redhat.com
W: http://www.harald-hoyer.de
W: https://www.harald-hoyer.de
D: ip_masq_quake
D: md boot support
S: Am Strand 5
@ -1856,7 +1856,7 @@ E: kas@fi.muni.cz
D: Author of the COSA/SRP sync serial board driver.
D: Port of the syncppp.c from the 2.0 to the 2.1 kernel.
P: 1024/D3498839 0D 99 A7 FB 20 66 05 D7 8B 35 FC DE 05 B1 8A 5E
W: http://www.fi.muni.cz/~kas/
W: https://www.fi.muni.cz/~kas/
S: c/o Faculty of Informatics, Masaryk University
S: Botanicka' 68a
S: 602 00 Brno
@ -2017,7 +2017,7 @@ S: Prague, Czech Republic
N: Gene Kozin
E: 74604.152@compuserve.com
W: http://www.sangoma.com
W: https://www.sangoma.com
D: WAN Router & Sangoma WAN drivers
S: Sangoma Technologies Inc.
S: 7170 Warden Avenue, Unit 2
@ -2112,7 +2112,7 @@ D: Original author of software suspend
N: Jaroslav Kysela
E: perex@perex.cz
W: http://www.perex.cz
W: https://www.perex.cz
D: Original Author and Maintainer for HP 10/100 Mbit Network Adapters
D: ISA PnP
S: Sindlovy Dvory 117
@ -2316,7 +2316,7 @@ S: Finland
N: Daniel J. Maas
E: dmaas@dcine.com
W: http://www.maasdigital.com
W: https://www.maasdigital.com
D: dv1394
N: Hamish Macdonald
@ -2647,7 +2647,7 @@ D: bug fixes, documentation, minor hackery
N: Paul Moore
E: paul@paul-moore.com
W: http://www.paul-moore.com
W: https://www.paul-moore.com
D: NetLabel, SELinux, audit
N: James Morris
@ -2786,7 +2786,7 @@ N: David C. Niemi
E: niemi@tux.org
W: http://www.tux.org/~niemi/
D: Assistant maintainer of Mtools, fdutils, and floppy driver
D: Administrator of Tux.Org Linux Server, http://www.tux.org
D: Administrator of Tux.Org Linux Server, https://www.tux.org
S: 2364 Old Trail Drive
S: Reston, Virginia 20191
S: USA
@ -2850,7 +2850,7 @@ S: USA
N: Mikulas Patocka
E: mikulas@artax.karlin.mff.cuni.cz
W: http://artax.karlin.mff.cuni.cz/~mikulas/
W: https://artax.karlin.mff.cuni.cz/~mikulas/
P: 1024/BB11D2D5 A0 F1 28 4A C4 14 1E CF 92 58 7A 8F 69 BC A4 D3
D: Read/write HPFS filesystem
S: Weissova 8
@ -2872,7 +2872,7 @@ D: RFC2385 Support for TCP
N: Barak A. Pearlmutter
E: bap@cs.unm.edu
W: http://www.cs.unm.edu/~bap/
W: https://www.cs.unm.edu/~bap/
P: 512/602D785D 9B A1 83 CD EE CB AD 93 20 C6 4C B7 F5 E9 60 D4
D: Author of mark-and-sweep GC integrated by Alan Cox
S: Computer Science Department
@ -3035,7 +3035,7 @@ S: United Kingdom
N: Daniel Quinlan
E: quinlan@pathname.com
W: http://www.pathname.com/~quinlan/
W: https://www.pathname.com/~quinlan/
D: FSSTND coordinator; FHS editor
D: random Linux documentation, patches, and hacks
S: 4390 Albany Drive #41A
@ -3130,7 +3130,7 @@ S: France
N: Rik van Riel
E: riel@redhat.com
W: http://www.surriel.com/
W: https://www.surriel.com/
D: Linux-MM site, Documentation/admin-guide/sysctl/*, swap/mm readaround
D: kswapd fixes, random kernel hacker, rmap VM,
D: nl.linux.org administrator, minor scheduler additions
@ -3246,7 +3246,7 @@ S: Germany
N: Paul `Rusty' Russell
E: rusty@rustcorp.com.au
W: http://ozlabs.org/~rusty
W: https://ozlabs.org/~rusty
D: Ruggedly handsome.
D: netfilter, ipchains with Michael Neuling.
S: 52 Moore St
@ -3369,7 +3369,7 @@ S: Germany
N: Robert Schwebel
E: robert@schwebel.de
W: http://www.schwebel.de
W: https://www.schwebel.de
D: Embedded hacker and book author,
D: AMD Elan support for Linux
S: Pengutronix
@ -3545,7 +3545,7 @@ S: Australia
N: Henrik Storner
E: storner@image.dk
W: http://www.image.dk/~storner/
W: http://www.sslug.dk/
W: https://www.sslug.dk/
D: Configure script: Invented tristate for module-configuration
D: vfat/msdos integration, kerneld docs, Linux promotion
D: Miscellaneous bug-fixes
@ -3579,7 +3579,7 @@ S: USA
N: Eugene Surovegin
E: ebs@ebshome.net
W: http://kernel.ebshome.net/
W: https://kernel.ebshome.net/
P: 1024D/AE5467F1 FF22 39F1 6728 89F6 6E6C 2365 7602 F33D AE54 67F1
D: Embedded PowerPC 4xx: EMAC, I2C, PIC and random hacks/fixes
S: Sunnyvale, California 94085
@ -3609,7 +3609,7 @@ S: France
N: Urs Thuermann
E: urs.thuermann@volkswagen.de
W: http://www.volkswagen.de
W: https://www.volkswagen.de
D: Controller Area Network (network layer core)
S: Brieffach 1776
S: 38436 Wolfsburg
@ -3656,7 +3656,7 @@ S: Canada K2L 1S2
N: Andrew Tridgell
E: tridge@samba.org
W: http://samba.org/tridge/
W: https://samba.org/tridge/
D: dosemu, networking, samba
S: 3 Ballow Crescent
S: MacGregor A.C.T 2615
@ -3894,7 +3894,7 @@ D: The Linux Support Team Erlangen
N: David Weinehall
E: tao@acc.umu.se
P: 1024D/DC47CA16 7ACE 0FB0 7A74 F994 9B36 E1D1 D14E 8526 DC47 CA16
W: http://www.acc.umu.se/~tao/
W: https://www.acc.umu.se/~tao/
D: v2.0 kernel maintainer
D: Fixes for the NE/2-driver
D: Miscellaneous MCA-support
@ -3919,7 +3919,7 @@ S: USA
N: Harald Welte
E: laforge@netfilter.org
P: 1024D/30F48BFF DBDE 6912 8831 9A53 879B 9190 5DA5 C655 30F4 8BFF
W: http://gnumonks.org/users/laforge
W: https://gnumonks.org/users/laforge
D: netfilter: new nat helper infrastructure
D: netfilter: ULOG, ECN, DSCP target
D: netfilter: TTL match

Просмотреть файл

@ -206,3 +206,20 @@ Description: This file exposes the firmware version of burnable voltage
regulator devices.
The file is read only.
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld1_pn
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld2_pn
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld3_pn
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld4_pn
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld1_version_min
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld2_version_min
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld3_version_min
What: /sys/devices/platform/mlxplat/mlxreg-io/hwmon/hwmon*/cpld4_version_min
Date: July 2020
KernelVersion: 5.9
Contact: Vadim Pasternak <vadimpmellanox.com>
Description: These files show with which CPLD part numbers and minor
versions have been burned CPLD devices equipped on a
system.
The files are read only.

Просмотреть файл

@ -0,0 +1,9 @@
What: /sys/kernel/debug/turris-mox-rwtm/do_sign
Date: Jun 2020
KernelVersion: 5.8
Contact: Marek Behún <marek.behun@nic.cz>
Description: (W) Message to sign with the ECDSA private key stored in
device's OTP. The message must be exactly 64 bytes (since
this is intended for SHA-512 hashes).
(R) The resulting signature, 136 bytes. This contains the R and
S values of the ECDSA signature, both in big-endian format.

Просмотреть файл

@ -56,6 +56,17 @@ Description: The /dev/kmsg character device node provides userspace access
seek after the last record available at the time
the last SYSLOG_ACTION_CLEAR was issued.
Other seek operations or offsets are not supported because of
the special behavior this device has. The device allows to read
or write only whole variable length messages (records) that are
stored in a ring buffer.
Because of the non-standard behavior also the error values are
non-standard. -ESPIPE is returned for non-zero offset. -EINVAL
is returned for other operations, e.g. SEEK_CUR. This behavior
and values are historical and could not be modified without the
risk of breaking userspace.
The output format consists of a prefix carrying the syslog
prefix including priority and facility, the 64 bit message
sequence number and the monotonic timestamp in microseconds,

Просмотреть файл

@ -273,6 +273,24 @@ Description:
device ("host-aware" or "host-managed" zone model). For regular
block devices, the value is always 0.
What: /sys/block/<disk>/queue/max_active_zones
Date: July 2020
Contact: Niklas Cassel <niklas.cassel@wdc.com>
Description:
For zoned block devices (zoned attribute indicating
"host-managed" or "host-aware"), the sum of zones belonging to
any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
is limited by this value. If this value is 0, there is no limit.
What: /sys/block/<disk>/queue/max_open_zones
Date: July 2020
Contact: Niklas Cassel <niklas.cassel@wdc.com>
Description:
For zoned block devices (zoned attribute indicating
"host-managed" or "host-aware"), the sum of zones belonging to
any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN,
is limited by this value. If this value is 0, there is no limit.
What: /sys/block/<disk>/queue/chunk_sectors
Date: September 2016
Contact: Hannes Reinecke <hare@suse.com>

Просмотреть файл

@ -0,0 +1,8 @@
What: /sys/bus/tee/devices/optee-ta-<uuid>/
Date: May 2020
KernelVersion 5.8
Contact: op-tee@lists.trustedfirmware.org
Description:
OP-TEE bus provides reference to registered drivers under this directory. The <uuid>
matches Trusted Application (TA) driver and corresponding TA in secure OS. Drivers
are free to create needed API under optee-ta-<uuid> directory.

Просмотреть файл

@ -18,3 +18,13 @@ Description:
devices to opt-out of driver binding using a driver_override
name such as "none". Only a single driver may be specified in
the override, there is no support for parsing delimiters.
What: /sys/bus/platform/devices/.../numa_node
Date: June 2020
Contact: Barry Song <song.bao.hua@hisilicon.com>
Description:
This file contains the NUMA node to which the platform device
is attached. It won't be visible if the node is unknown. The
value comes from an ACPI _PXM method or a similar firmware
source. Initial users for this file would be devices like
arm smmu which are populated by arm64 acpi_iort.

Просмотреть файл

@ -178,11 +178,18 @@ KernelVersion: 4.13
Contact: thunderbolt-software@lists.01.org
Description: When new NVM image is written to the non-active NVM
area (through non_activeX NVMem device), the
authentication procedure is started by writing 1 to
this file. If everything goes well, the device is
authentication procedure is started by writing to
this file.
If everything goes well, the device is
restarted with the new NVM firmware. If the image
verification fails an error code is returned instead.
This file will accept writing values "1" or "2"
- Writing "1" will flush the image to the storage
area and authenticate the image in one action.
- Writing "2" will run some basic validation on the image
and flush it to the storage area.
When read holds status of the last authentication
operation if an error occurred during the process. This
is directly the status value from the DMA configuration
@ -236,3 +243,49 @@ KernelVersion: 4.15
Contact: thunderbolt-software@lists.01.org
Description: This contains XDomain service specific settings as
bitmask. Format: %x
What: /sys/bus/thunderbolt/devices/<device>:<port>.<index>/device
Date: Oct 2020
KernelVersion: v5.9
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: Retimer device identifier read from the hardware.
What: /sys/bus/thunderbolt/devices/<device>:<port>.<index>/nvm_authenticate
Date: Oct 2020
KernelVersion: v5.9
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: When new NVM image is written to the non-active NVM
area (through non_activeX NVMem device), the
authentication procedure is started by writing 1 to
this file. If everything goes well, the device is
restarted with the new NVM firmware. If the image
verification fails an error code is returned instead.
When read holds status of the last authentication
operation if an error occurred during the process.
Format: %x.
What: /sys/bus/thunderbolt/devices/<device>:<port>.<index>/nvm_version
Date: Oct 2020
KernelVersion: v5.9
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: Holds retimer NVM version number. Format: %x.%x, major.minor.
What: /sys/bus/thunderbolt/devices/<device>:<port>.<index>/vendor
Date: Oct 2020
KernelVersion: v5.9
Contact: Mika Westerberg <mika.westerberg@linux.intel.com>
Description: Retimer vendor identifier read from the hardware.
What: /sys/bus/thunderbolt/devices/.../nvm_authenticate_on_disconnect
Date: Oct 2020
KernelVersion: v5.9
Contact: Mario Limonciello <mario.limonciello@dell.com>
Description: For supported devices, automatically authenticate the new Thunderbolt
image when the device is disconnected from the host system.
This file will accept writing values "1" or "2"
- Writing "1" will flush the image to the storage
area and prepare the device for authentication on disconnect.
- Writing "2" will run some basic validation on the image
and flush it to the storage area.

Просмотреть файл

@ -108,3 +108,15 @@ Description:
frequency requested by governors and min_freq.
The max_freq overrides min_freq because max_freq may be
used to throttle devices to avoid overheating.
What: /sys/class/devfreq/.../timer
Date: July 2020
Contact: Chanwoo Choi <cw00.choi@samsung.com>
Description:
This ABI shows and stores the kind of work timer by users.
This work timer is used by devfreq workqueue in order to
monitor the device status such as utilization. The user
can change the work timer on runtime according to their demand
as following:
echo deferrable > /sys/class/devfreq/.../timer
echo delayed > /sys/class/devfreq/.../timer

Просмотреть файл

@ -0,0 +1,126 @@
What: /sys/class/devlink/.../
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
Provide a place in sysfs for the device link objects in the
kernel at any given time. The name of a device link directory,
denoted as ... above, is of the form <supplier>--<consumer>
where <supplier> is the supplier device name and <consumer> is
the consumer device name.
What: /sys/class/devlink/.../auto_remove_on
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
This file indicates if the device link will ever be
automatically removed by the driver core when the consumer and
supplier devices themselves are still present.
This will be one of the following strings:
'consumer unbind'
'supplier unbind'
'never'
'consumer unbind' means the device link will be removed when
the consumer's driver is unbound from the consumer device.
'supplier unbind' means the device link will be removed when
the supplier's driver is unbound from the supplier device.
'never' means the device link will not be automatically removed
when as long as the supplier and consumer devices themselves
are still present.
What: /sys/class/devlink/.../consumer
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
This file is a symlink to the consumer device's sysfs directory.
What: /sys/class/devlink/.../runtime_pm
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
This file indicates if the device link has any impact on the
runtime power management behavior of the consumer and supplier
devices. For example: Making sure the supplier doesn't enter
runtime suspend while the consumer is active.
This will be one of the following strings:
'0' - Does not affect runtime power management
'1' - Affects runtime power management
What: /sys/class/devlink/.../status
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
This file indicates the status of the device link. The status
of a device link is affected by whether the supplier and
consumer devices have been bound to their corresponding
drivers. The status of a device link also affects the binding
and unbinding of the supplier and consumer devices with their
drivers and also affects whether the software state of the
supplier device is synced with the hardware state of the
supplier device after boot up.
See also: sysfs-devices-state_synced.
This will be one of the following strings:
'not tracked'
'dormant'
'available'
'consumer probing'
'active'
'supplier unbinding'
'unknown'
'not tracked' means this device link does not track the status
and has no impact on the binding, unbinding and syncing the
hardware and software device state.
'dormant' means the supplier and the consumer devices have not
bound to their driver.
'available' means the supplier has bound to its driver and is
available to supply resources to the consumer device.
'consumer probing' means the consumer device is currently
trying to bind to its driver.
'active' means the supplier and consumer devices have both
bound successfully to their drivers.
'supplier unbinding' means the supplier devices is currently in
the process of unbinding from its driver.
'unknown' means the state of the device link is not any of the
above. If this is ever the value, there's a bug in the kernel.
What: /sys/class/devlink/.../supplier
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
This file is a symlink to the supplier device's sysfs directory.
What: /sys/class/devlink/.../sync_state_only
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
This file indicates if the device link is limited to only
affecting the syncing of the hardware and software state of the
supplier device.
This will be one of the following strings:
'0'
'1' - Affects runtime power management
'0' means the device link can affect other device behaviors
like binding/unbinding, suspend/resume, runtime power
management, etc.
'1' means the device link will only affect the syncing of
hardware and software state of the supplier device after boot
up and doesn't not affect other behaviors of the devices.

Просмотреть файл

@ -0,0 +1,14 @@
What: /sys/class/leds/<led>/device/brightness
Date: July 2020
KernelVersion: 5.9
Contact: Marek Behún <marek.behun@nic.cz>
Description: (RW) On the front panel of the Turris Omnia router there is also
a button which can be used to control the intensity of all the
LEDs at once, so that if they are too bright, user can dim them.
The microcontroller cycles between 8 levels of this global
brightness (from 100% to 0%), but this setting can have any
integer value between 0 and 100. It is therefore convenient to be
able to change this setting from software.
Format: %i

Просмотреть файл

@ -0,0 +1,35 @@
What: /sys/class/leds/<led>/brightness
Date: March 2020
KernelVersion: 5.9
Contact: Dan Murphy <dmurphy@ti.com>
Description: read/write
Writing to this file will update all LEDs within the group to a
calculated percentage of what each color LED intensity is set
to. The percentage is calculated for each grouped LED via the
equation below:
led_brightness = brightness * multi_intensity/max_brightness
For additional details please refer to
Documentation/leds/leds-class-multicolor.rst.
The value of the LED is from 0 to
/sys/class/leds/<led>/max_brightness.
What: /sys/class/leds/<led>/multi_index
Date: March 2020
KernelVersion: 5.9
Contact: Dan Murphy <dmurphy@ti.com>
Description: read
The multi_index array, when read, will output the LED colors
as an array of strings as they are indexed in the
multi_intensity file.
What: /sys/class/leds/<led>/multi_intensity
Date: March 2020
KernelVersion: 5.9
Contact: Dan Murphy <dmurphy@ti.com>
Description: read/write
This file contains array of integers. Order of components is
described by the multi_index array. The maximum intensity should
not exceed /sys/class/leds/<led>/max_brightness.

Просмотреть файл

@ -90,3 +90,16 @@ Description: Display trc status register content
The ME FW writes Glitch Detection HW (TRC)
status information into trc status register
for BIOS and OS to monitor fw health.
What: /sys/class/mei/meiN/kind
Date: Jul 2020
KernelVersion: 5.8
Contact: Tomas Winkler <tomas.winkler@intel.com>
Description: Display kind of the device
Generic devices are marked as "mei"
while special purpose have their own
names.
Available options:
- mei: generic mei device.
- itouch: itouch (ipts) mei device.

Просмотреть файл

@ -0,0 +1,8 @@
What: /sys/devices/.../consumer:<consumer>
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
The /sys/devices/.../consumer:<consumer> are symlinks to device
links where this device is the supplier. <consumer> denotes the
name of the consumer in that device link. There can be zero or
more of these symlinks for a given device.

Просмотреть файл

@ -0,0 +1,33 @@
What: /sys/devices/uncore_iio_x/dieX
Date: February 2020
Contact: Roman Sudarikov <roman.sudarikov@linux.intel.com>
Description:
Each IIO stack (PCIe root port) has its own IIO PMON block, so
each dieX file (where X is die number) holds "Segment:Root Bus"
for PCIe root port, which can be monitored by that IIO PMON
block.
For example, on 4-die Xeon platform with up to 6 IIO stacks per
die and, therefore, 6 IIO PMON blocks per die, the mapping of
IIO PMON block 0 exposes as the following:
$ ls /sys/devices/uncore_iio_0/die*
-r--r--r-- /sys/devices/uncore_iio_0/die0
-r--r--r-- /sys/devices/uncore_iio_0/die1
-r--r--r-- /sys/devices/uncore_iio_0/die2
-r--r--r-- /sys/devices/uncore_iio_0/die3
$ tail /sys/devices/uncore_iio_0/die*
==> /sys/devices/uncore_iio_0/die0 <==
0000:00
==> /sys/devices/uncore_iio_0/die1 <==
0000:40
==> /sys/devices/uncore_iio_0/die2 <==
0000:80
==> /sys/devices/uncore_iio_0/die3 <==
0000:c0
Which means:
IIO PMU 0 on die 0 belongs to PCI RP on bus 0x00, domain 0x0000
IIO PMU 0 on die 1 belongs to PCI RP on bus 0x40, domain 0x0000
IIO PMU 0 on die 2 belongs to PCI RP on bus 0x80, domain 0x0000
IIO PMU 0 on die 3 belongs to PCI RP on bus 0xc0, domain 0x0000

Просмотреть файл

@ -126,3 +126,39 @@ Description:
1 no action
0 firmware record the notify code defined
in b[15:0].
What: /sys/devices/platform/stratix10-rsu.0/dcmf0
Date: June 2020
KernelVersion: 5.8
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) Decision firmware copy 0 version information.
What: /sys/devices/platform/stratix10-rsu.0/dcmf1
Date: June 2020
KernelVersion: 5.8
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) Decision firmware copy 1 version information.
What: /sys/devices/platform/stratix10-rsu.0/dcmf2
Date: June 2020
KernelVersion: 5.8
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) Decision firmware copy 2 version information.
What: /sys/devices/platform/stratix10-rsu.0/dcmf3
Date: June 2020
KernelVersion: 5.8
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) Decision firmware copy 3 version information.
What: /sys/devices/platform/stratix10-rsu.0/max_retry
Date: June 2020
KernelVersion: 5.8
Contact: Richard Gong <richard.gong@linux.intel.com>
Description:
(RO) max retry parameter is stored in the firmware
decision IO section, as a byte located at offset 0x18c.

Просмотреть файл

@ -26,6 +26,30 @@ Description:
Read-only attribute common to all SoCs. Contains SoC family name
(e.g. DB8500).
On many of ARM based silicon with SMCCC v1.2+ compliant firmware
this will contain the JEDEC JEP106 manufacturers identification
code. The format is "jep106:XXYY" where XX is identity code and
YY is continuation code.
This manufacturers identification code is defined by one
or more eight (8) bit fields, each consisting of seven (7)
data bits plus one (1) odd parity bit. It is a single field,
limiting the possible number of vendors to 126. To expand
the maximum number of identification codes, a continuation
scheme has been defined.
The specified mechanism is that an identity code of 0x7F
represents the "continuation code" and implies the presence
of an additional identity code field, and this mechanism
may be extended to multiple continuation codes followed
by the manufacturer's identity code.
For example, ARM has identity code 0x7F 0x7F 0x7F 0x7F 0x3B,
which is code 0x3B on the fifth 'page'. This is shortened
as JEP106 identity code of 0x3B and a continuation code of
0x4 to represent the four continuation codes preceding the
identity code.
What: /sys/devices/socX/serial_number
Date: January 2019
contact: Bjorn Andersson <bjorn.andersson@linaro.org>
@ -40,6 +64,12 @@ Description:
Read-only attribute supported by most SoCs. In the case of
ST-Ericsson's chips this contains the SoC serial number.
On many of ARM based silicon with SMCCC v1.2+ compliant firmware
this will contain the SOC ID appended to the family attribute
to ensure there is no conflict in this namespace across various
vendors. The format is "jep106:XXYY:ZZZZ" where XX is identity
code, YY is continuation code and ZZZZ is the SOC ID.
What: /sys/devices/socX/revision
Date: January 2012
contact: Lee Jones <lee.jones@linaro.org>

Просмотреть файл

@ -0,0 +1,24 @@
What: /sys/devices/.../state_synced
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
The /sys/devices/.../state_synced attribute is only present for
devices whose bus types or driver provides the .sync_state()
callback. The number read from it (0 or 1) reflects the value
of the device's 'state_synced' field. A value of 0 means the
.sync_state() callback hasn't been called yet. A value of 1
means the .sync_state() callback has been called.
Generally, if a device has sync_state() support and has some of
the resources it provides enabled at the time the kernel starts
(Eg: enabled by hardware reset or bootloader or anything that
run before the kernel starts), then it'll keep those resources
enabled and in a state that's compatible with the state they
were in at the start of the kernel. The device will stop doing
this only when the sync_state() callback has been called --
which happens only when all its consumer devices are registered
and have probed successfully. Resources that were left disabled
at the time the kernel starts are not affected or limited in
any way by sync_state() callbacks.

Просмотреть файл

@ -0,0 +1,8 @@
What: /sys/devices/.../supplier:<supplier>
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
The /sys/devices/.../supplier:<supplier> are symlinks to device
links where this device is the consumer. <supplier> denotes the
name of the supplier in that device link. There can be zero or
more of these symlinks for a given device.

Просмотреть файл

@ -0,0 +1,17 @@
What: /sys/devices/.../waiting_for_supplier
Date: May 2020
Contact: Saravana Kannan <saravanak@google.com>
Description:
The /sys/devices/.../waiting_for_supplier attribute is only
present when fw_devlink kernel command line option is enabled
and is set to something stricter than "permissive". It is
removed once a device probes successfully (because the
information is no longer relevant). The number read from it (0
or 1) reflects whether the device is waiting for one or more
suppliers to be added and then linked to using device links
before the device can probe.
A value of 0 means the device is not waiting for any suppliers
to be added before it can probe. A value of 1 means the device
is waiting for one or more suppliers to be added before it can
probe.

Просмотреть файл

@ -8,7 +8,7 @@ Description:
to device min/max capabilities. Values are integer as they are
stored in a 8bit register in the device. Lowest value is
automatically put to TL. Once set, alarms could be search at
master level, refer to Documentation/w1/w1_generic.rst for
master level, refer to Documentation/w1/w1-generic.rst for
detailed information
Users: any user space application which wants to communicate with
w1_term device

Просмотреть файл

@ -0,0 +1,26 @@
.. SPDX-License-Identifier: GPL-2.0
==========================
PCI Test Endpoint Function
==========================
name: Should be "pci_epf_test" to bind to the pci_epf_test driver.
Configurable Fields:
================ ===========================================================
vendorid should be 0x104c
deviceid should be 0xb500 for DRA74x and 0xb501 for DRA72x
revid don't care
progif_code don't care
subclass_code don't care
baseclass_code should be 0xff
cache_line_size don't care
subsys_vendor_id don't care
subsys_id don't care
interrupt_pin Should be 1 - INTA, 2 - INTB, 3 - INTC, 4 -INTD
msi_interrupts Should be 1 to 32 depending on the number of MSI interrupts
to test
msix_interrupts Should be 1 to 2048 depending on the number of MSI-X
interrupts to test
================ ===========================================================

Просмотреть файл

@ -1,19 +0,0 @@
PCI TEST ENDPOINT FUNCTION
name: Should be "pci_epf_test" to bind to the pci_epf_test driver.
Configurable Fields:
vendorid : should be 0x104c
deviceid : should be 0xb500 for DRA74x and 0xb501 for DRA72x
revid : don't care
progif_code : don't care
subclass_code : don't care
baseclass_code : should be 0xff
cache_line_size : don't care
subsys_vendor_id : don't care
subsys_id : don't care
interrupt_pin : Should be 1 - INTA, 2 - INTB, 3 - INTC, 4 -INTD
msi_interrupts : Should be 1 to 32 depending on the number of MSI interrupts
to test
msix_interrupts : Should be 1 to 2048 depending on the number of MSI-X
interrupts to test

Просмотреть файл

@ -11,3 +11,5 @@ PCI Endpoint Framework
pci-endpoint-cfs
pci-test-function
pci-test-howto
function/binding/pci-test

Просмотреть файл

@ -24,7 +24,7 @@ Directory Structure
The pci_ep configfs has two directories at its root: controllers and
functions. Every EPC device present in the system will have an entry in
the *controllers* directory and and every EPF driver present in the system
the *controllers* directory and every EPF driver present in the system
will have an entry in the *functions* directory.
::

Просмотреть файл

@ -214,7 +214,7 @@ pci-ep-cfs.c can be used as reference for using these APIs.
* pci_epf_create()
Create a new PCI EPF device by passing the name of the PCI EPF device.
This name will be used to bind the the EPF device to a EPF driver.
This name will be used to bind the EPF device to a EPF driver.
* pci_epf_destroy()

Просмотреть файл

@ -248,7 +248,7 @@ STEP 4: Slot Reset
------------------
In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
the platform will perform a slot reset on the requesting PCI device(s).
platform will perform a slot reset on the requesting PCI device(s).
The actual steps taken by a platform to perform a slot reset
will be platform-dependent. Upon completion of slot reset, the
platform will call the device slot_reset() callback.

Просмотреть файл

@ -209,7 +209,7 @@ the PCI device by calling pci_enable_device(). This will:
OS BUG: we don't check resource allocations before enabling those
resources. The sequence would make more sense if we called
pci_request_resources() before calling pci_enable_device().
Currently, the device drivers can't detect the bug when when two
Currently, the device drivers can't detect the bug when two
devices have been allocated the same range. This is not a common
problem and unlikely to get fixed soon.
@ -265,7 +265,7 @@ Set the DMA mask size
---------------------
.. note::
If anything below doesn't make sense, please refer to
Documentation/DMA-API.txt. This section is just a reminder that
:doc:`/core-api/dma-api`. This section is just a reminder that
drivers need to indicate DMA capabilities of the device and is not
an authoritative source for DMA interfaces.
@ -291,7 +291,7 @@ Many 64-bit "PCI" devices (before PCI-X) and some PCI-X devices are
Setup shared control data
-------------------------
Once the DMA masks are set, the driver can allocate "consistent" (a.k.a. shared)
memory. See Documentation/DMA-API.txt for a full description of
memory. See :doc:`/core-api/dma-api` for a full description of
the DMA APIs. This section is just a reminder that it needs to be done
before enabling DMA on the device.
@ -421,7 +421,7 @@ owners if there is one.
Then clean up "consistent" buffers which contain the control data.
See Documentation/DMA-API.txt for details on unmapping interfaces.
See :doc:`/core-api/dma-api` for details on unmapping interfaces.
Unregister from other subsystems

Просмотреть файл

@ -463,7 +463,7 @@ again without disrupting RCU readers.
This guarantee was only partially premeditated. DYNIX/ptx used an
explicit memory barrier for publication, but had nothing resembling
``rcu_dereference()`` for subscription, nor did it have anything
resembling the ``smp_read_barrier_depends()`` that was later subsumed
resembling the dependency-ordering barrier that was later subsumed
into ``rcu_dereference()`` and later still into ``READ_ONCE()``. The
need for these operations made itself known quite suddenly at a
late-1990s meeting with the DEC Alpha architects, back in the days when
@ -2583,7 +2583,12 @@ not work to have these markers in the trampoline itself, because there
would need to be instructions following ``rcu_read_unlock()``. Although
``synchronize_rcu()`` would guarantee that execution reached the
``rcu_read_unlock()``, it would not be able to guarantee that execution
had completely left the trampoline.
had completely left the trampoline. Worse yet, in some situations
the trampoline's protection must extend a few instructions *prior* to
execution reaching the trampoline. For example, these few instructions
might calculate the address of the trampoline, so that entering the
trampoline would be pre-ordained a surprisingly long time before execution
actually reached the trampoline itself.
The solution, in the form of `Tasks
RCU <https://lwn.net/Articles/607117/>`__, is to have implicit read-side

Просмотреть файл

@ -1,4 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
================================
Review Checklist for RCU Patches
================================
This document contains a checklist for producing and reviewing patches
@ -411,18 +415,21 @@ over a rather long period of time, but improvements are always welcome!
__rcu sparse checks to validate your RCU code. These can help
find problems as follows:
CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data
CONFIG_PROVE_LOCKING:
check that accesses to RCU-protected data
structures are carried out under the proper RCU
read-side critical section, while holding the right
combination of locks, or whatever other conditions
are appropriate.
CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
CONFIG_DEBUG_OBJECTS_RCU_HEAD:
check that you don't pass the
same object to call_rcu() (or friends) before an RCU
grace period has elapsed since the last time that you
passed that same object to call_rcu() (or friends).
__rcu sparse checks: tag the pointer to the RCU-protected data
__rcu sparse checks:
tag the pointer to the RCU-protected data
structure with __rcu, and sparse will warn you if you
access that pointer without the services of one of the
variants of rcu_dereference().
@ -442,8 +449,8 @@ over a rather long period of time, but improvements are always welcome!
You instead need to use one of the barrier functions:
o call_rcu() -> rcu_barrier()
o call_srcu() -> srcu_barrier()
- call_rcu() -> rcu_barrier()
- call_srcu() -> srcu_barrier()
However, these barrier functions are absolutely -not- guaranteed
to wait for a grace period. In fact, if there are no call_rcu()

Просмотреть файл

@ -1,3 +1,5 @@
.. SPDX-License-Identifier: GPL-2.0
.. _rcu_concepts:
============
@ -8,10 +10,17 @@ RCU concepts
:maxdepth: 3
arrayRCU
checklist
lockdep
lockdep-splat
rcubarrier
rcu_dereference
whatisRCU
rcu
rculist_nulls
rcuref
torture
stallwarn
listRCU
NMI-RCU
UP

Просмотреть файл

@ -1,3 +1,9 @@
.. SPDX-License-Identifier: GPL-2.0
=================
Lockdep-RCU Splat
=================
Lockdep-RCU was added to the Linux kernel in early 2010
(http://lwn.net/Articles/371986/). This facility checks for some common
misuses of the RCU API, most notably using one of the rcu_dereference()
@ -12,55 +18,54 @@ overwriting or worse. There can of course be false positives, this
being the real world and all that.
So let's look at an example RCU lockdep splat from 3.0-rc5, one that
has long since been fixed:
has long since been fixed::
=============================
WARNING: suspicious RCU usage
-----------------------------
block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!
=============================
WARNING: suspicious RCU usage
-----------------------------
block/cfq-iosched.c:2776 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
other info that might help us debug this::
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by scsi_scan_6/1552:
#0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]
scsi_scan_host_selected+0x5a/0x150
#1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]
elevator_exit+0x22/0x60
#2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]
cfq_exit_queue+0x43/0x190
rcu_scheduler_active = 1, debug_locks = 0
3 locks held by scsi_scan_6/1552:
#0: (&shost->scan_mutex){+.+.}, at: [<ffffffff8145efca>]
scsi_scan_host_selected+0x5a/0x150
#1: (&eq->sysfs_lock){+.+.}, at: [<ffffffff812a5032>]
elevator_exit+0x22/0x60
#2: (&(&q->__queue_lock)->rlock){-.-.}, at: [<ffffffff812b6233>]
cfq_exit_queue+0x43/0x190
stack backtrace:
Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
Call Trace:
[<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
[<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
[<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
[<ffffffff812a5046>] elevator_exit+0x36/0x60
[<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
[<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
[<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
[<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
[<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
[<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
[<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff812bcc60>] ? kobject_del+0x40/0x40
[<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
[<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
[<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
[<ffffffff8145f170>] do_scan_async+0x20/0x160
[<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
[<ffffffff810975b6>] kthread+0xa6/0xb0
[<ffffffff817db154>] kernel_thread_helper+0x4/0x10
[<ffffffff81066430>] ? finish_task_switch+0x80/0x110
[<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
[<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
[<ffffffff817db150>] ? gs_change+0xb/0xb
stack backtrace:
Pid: 1552, comm: scsi_scan_6 Not tainted 3.0.0-rc5 #17
Call Trace:
[<ffffffff810abb9b>] lockdep_rcu_dereference+0xbb/0xc0
[<ffffffff812b6139>] __cfq_exit_single_io_context+0xe9/0x120
[<ffffffff812b626c>] cfq_exit_queue+0x7c/0x190
[<ffffffff812a5046>] elevator_exit+0x36/0x60
[<ffffffff812a802a>] blk_cleanup_queue+0x4a/0x60
[<ffffffff8145cc09>] scsi_free_queue+0x9/0x10
[<ffffffff81460944>] __scsi_remove_device+0x84/0xd0
[<ffffffff8145dca3>] scsi_probe_and_add_lun+0x353/0xb10
[<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff817d98ed>] ? _raw_spin_unlock_irqrestore+0x3d/0x80
[<ffffffff8145e722>] __scsi_scan_target+0x112/0x680
[<ffffffff812c690d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff817da069>] ? error_exit+0x29/0xb0
[<ffffffff812bcc60>] ? kobject_del+0x40/0x40
[<ffffffff8145ed16>] scsi_scan_channel+0x86/0xb0
[<ffffffff8145f0b0>] scsi_scan_host_selected+0x140/0x150
[<ffffffff8145f149>] do_scsi_scan_host+0x89/0x90
[<ffffffff8145f170>] do_scan_async+0x20/0x160
[<ffffffff8145f150>] ? do_scsi_scan_host+0x90/0x90
[<ffffffff810975b6>] kthread+0xa6/0xb0
[<ffffffff817db154>] kernel_thread_helper+0x4/0x10
[<ffffffff81066430>] ? finish_task_switch+0x80/0x110
[<ffffffff817d9c04>] ? retint_restore_args+0xe/0xe
[<ffffffff81097510>] ? __kthread_init_worker+0x70/0x70
[<ffffffff817db150>] ? gs_change+0xb/0xb
Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows:
Line 2776 of block/cfq-iosched.c in v3.0-rc5 is as follows::
if (rcu_dereference(ioc->ioc_data) == cic) {
@ -70,7 +75,7 @@ case. Instead, we hold three locks, one of which might be RCU related.
And maybe that lock really does protect this reference. If so, the fix
is to inform RCU, perhaps by changing __cfq_exit_single_io_context() to
take the struct request_queue "q" from cfq_exit_queue() as an argument,
which would permit us to invoke rcu_dereference_protected as follows:
which would permit us to invoke rcu_dereference_protected as follows::
if (rcu_dereference_protected(ioc->ioc_data,
lockdep_is_held(&q->queue_lock)) == cic) {
@ -85,7 +90,7 @@ On the other hand, perhaps we really do need an RCU read-side critical
section. In this case, the critical section must span the use of the
return value from rcu_dereference(), or at least until there is some
reference count incremented or some such. One way to handle this is to
add rcu_read_lock() and rcu_read_unlock() as follows:
add rcu_read_lock() and rcu_read_unlock() as follows::
rcu_read_lock();
if (rcu_dereference(ioc->ioc_data) == cic) {
@ -102,7 +107,7 @@ above lockdep-RCU splat.
But in this particular case, we don't actually dereference the pointer
returned from rcu_dereference(). Instead, that pointer is just compared
to the cic pointer, which means that the rcu_dereference() can be replaced
by rcu_access_pointer() as follows:
by rcu_access_pointer() as follows::
if (rcu_access_pointer(ioc->ioc_data) == cic) {

Просмотреть файл

@ -1,4 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
========================
RCU and lockdep checking
========================
All flavors of RCU have lockdep checking available, so that lockdep is
aware of when each task enters and leaves any flavor of RCU read-side
@ -8,7 +12,7 @@ tracking to include RCU state, which can sometimes help when debugging
deadlocks and the like.
In addition, RCU provides the following primitives that check lockdep's
state:
state::
rcu_read_lock_held() for normal RCU.
rcu_read_lock_bh_held() for RCU-bh.
@ -63,7 +67,7 @@ checking of rcu_dereference() primitives:
The rcu_dereference_check() check expression can be any boolean
expression, but would normally include a lockdep expression. However,
any boolean expression can be used. For a moderately ornate example,
consider the following:
consider the following::
file = rcu_dereference_check(fdt->fd[fd],
lockdep_is_held(&files->file_lock) ||
@ -82,7 +86,7 @@ RCU read-side critical sections, in case (2) the ->file_lock prevents
any change from taking place, and finally, in case (3) the current task
is the only task accessing the file_struct, again preventing any change
from taking place. If the above statement was invoked only from updater
code, it could instead be written as follows:
code, it could instead be written as follows::
file = rcu_dereference_protected(fdt->fd[fd],
lockdep_is_held(&files->file_lock) ||
@ -105,7 +109,7 @@ false and they are called from outside any RCU read-side critical section.
For example, the workqueue for_each_pwq() macro is intended to be used
either within an RCU read-side critical section or with wq->mutex held.
It is thus implemented as follows:
It is thus implemented as follows::
#define for_each_pwq(pwq, wq)
list_for_each_entry_rcu((pwq), &(wq)->pwqs, pwqs_node,

Просмотреть файл

@ -0,0 +1,200 @@
.. SPDX-License-Identifier: GPL-2.0
=================================================
Using RCU hlist_nulls to protect list and objects
=================================================
This section describes how to use hlist_nulls to
protect read-mostly linked lists and
objects using SLAB_TYPESAFE_BY_RCU allocations.
Please read the basics in Documentation/RCU/listRCU.rst
Using 'nulls'
=============
Using special makers (called 'nulls') is a convenient way
to solve following problem :
A typical RCU linked list managing objects which are
allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
use following algos :
1) Lookup algo
--------------
::
rcu_read_lock()
begin:
obj = lockless_lookup(key);
if (obj) {
if (!try_get_ref(obj)) // might fail for free objects
goto begin;
/*
* Because a writer could delete object, and a writer could
* reuse these object before the RCU grace period, we
* must check key after getting the reference on object
*/
if (obj->key != key) { // not the object we expected
put_ref(obj);
goto begin;
}
}
rcu_read_unlock();
Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
but a version with an additional memory barrier (smp_rmb())
::
lockless_lookup(key)
{
struct hlist_node *node, *next;
for (pos = rcu_dereference((head)->first);
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(next))
if (obj->key == key)
return obj;
return NULL;
}
And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb()::
struct hlist_node *node;
for (pos = rcu_dereference((head)->first);
pos && ({ prefetch(pos->next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(pos->next))
if (obj->key == key)
return obj;
return NULL;
Quoting Corey Minyard::
"If the object is moved from one list to another list in-between the
time the hash is calculated and the next field is accessed, and the
object has moved to the end of a new list, the traversal will not
complete properly on the list it should have, since the object will
be on the end of the new list and there's not a way to tell it's on a
new list and restart the list traversal. I think that this can be
solved by pre-fetching the "next" field (with proper barriers) before
checking the key."
2) Insert algo
--------------
We need to make sure a reader cannot read the new 'obj->obj_next' value
and previous value of 'obj->key'. Or else, an item could be deleted
from a chain, and inserted into another chain. If new chain was empty
before the move, 'next' pointer is NULL, and lockless reader can
not detect it missed following items in original chain.
::
/*
* Please note that new inserts are done at the head of list,
* not in the middle or end.
*/
obj = kmem_cache_alloc(...);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* we need to make sure obj->key is updated before obj->next
* or obj->refcnt
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
hlist_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
3) Remove algo
--------------
Nothing special here, we can use a standard RCU hlist deletion.
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
very very fast (before the end of RCU grace period)
::
if (put_last_reference_on(obj) {
lock_chain(); // typically a spin_lock()
hlist_del_init_rcu(&obj->obj_node);
unlock_chain(); // typically a spin_unlock()
kmem_cache_free(cachep, obj);
}
--------------------------------------------------------------------------
Avoiding extra smp_rmb()
========================
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
and extra smp_wmb() in insert function.
For example, if we choose to store the slot number as the 'nulls'
end-of-list marker for each slot of the hash table, we can detect
a race (some writer did a delete and/or a move of an object
to another chain) checking the final 'nulls' value if
the lookup met the end of chain. If final 'nulls' value
is not the slot number, then we must restart the lookup at
the beginning. If the object was moved to the same chain,
then the reader doesn't care : It might eventually
scan the list again without harm.
1) lookup algo
--------------
::
head = &table[slot];
rcu_read_lock();
begin:
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
if (obj->key == key) {
if (!try_get_ref(obj)) // might fail for free objects
goto begin;
if (obj->key != key) { // not the object we expected
put_ref(obj);
goto begin;
}
goto out;
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
if (get_nulls_value(node) != slot)
goto begin;
obj = NULL;
out:
rcu_read_unlock();
2) Insert function
------------------
::
/*
* Please note that new inserts are done at the head of list,
* not in the middle or end.
*/
obj = kmem_cache_alloc(cachep);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* changes to obj->key must be visible before refcnt one
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
/*
* insert obj in RCU way (readers might be traversing chain)
*/
hlist_nulls_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()

Просмотреть файл

@ -1,172 +0,0 @@
Using hlist_nulls to protect read-mostly linked lists and
objects using SLAB_TYPESAFE_BY_RCU allocations.
Please read the basics in Documentation/RCU/listRCU.rst
Using special makers (called 'nulls') is a convenient way
to solve following problem :
A typical RCU linked list managing objects which are
allocated with SLAB_TYPESAFE_BY_RCU kmem_cache can
use following algos :
1) Lookup algo
--------------
rcu_read_lock()
begin:
obj = lockless_lookup(key);
if (obj) {
if (!try_get_ref(obj)) // might fail for free objects
goto begin;
/*
* Because a writer could delete object, and a writer could
* reuse these object before the RCU grace period, we
* must check key after getting the reference on object
*/
if (obj->key != key) { // not the object we expected
put_ref(obj);
goto begin;
}
}
rcu_read_unlock();
Beware that lockless_lookup(key) cannot use traditional hlist_for_each_entry_rcu()
but a version with an additional memory barrier (smp_rmb())
lockless_lookup(key)
{
struct hlist_node *node, *next;
for (pos = rcu_dereference((head)->first);
pos && ({ next = pos->next; smp_rmb(); prefetch(next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(next))
if (obj->key == key)
return obj;
return NULL;
And note the traditional hlist_for_each_entry_rcu() misses this smp_rmb() :
struct hlist_node *node;
for (pos = rcu_dereference((head)->first);
pos && ({ prefetch(pos->next); 1; }) &&
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });
pos = rcu_dereference(pos->next))
if (obj->key == key)
return obj;
return NULL;
}
Quoting Corey Minyard :
"If the object is moved from one list to another list in-between the
time the hash is calculated and the next field is accessed, and the
object has moved to the end of a new list, the traversal will not
complete properly on the list it should have, since the object will
be on the end of the new list and there's not a way to tell it's on a
new list and restart the list traversal. I think that this can be
solved by pre-fetching the "next" field (with proper barriers) before
checking the key."
2) Insert algo :
----------------
We need to make sure a reader cannot read the new 'obj->obj_next' value
and previous value of 'obj->key'. Or else, an item could be deleted
from a chain, and inserted into another chain. If new chain was empty
before the move, 'next' pointer is NULL, and lockless reader can
not detect it missed following items in original chain.
/*
* Please note that new inserts are done at the head of list,
* not in the middle or end.
*/
obj = kmem_cache_alloc(...);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* we need to make sure obj->key is updated before obj->next
* or obj->refcnt
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
hlist_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()
3) Remove algo
--------------
Nothing special here, we can use a standard RCU hlist deletion.
But thanks to SLAB_TYPESAFE_BY_RCU, beware a deleted object can be reused
very very fast (before the end of RCU grace period)
if (put_last_reference_on(obj) {
lock_chain(); // typically a spin_lock()
hlist_del_init_rcu(&obj->obj_node);
unlock_chain(); // typically a spin_unlock()
kmem_cache_free(cachep, obj);
}
--------------------------------------------------------------------------
With hlist_nulls we can avoid extra smp_rmb() in lockless_lookup()
and extra smp_wmb() in insert function.
For example, if we choose to store the slot number as the 'nulls'
end-of-list marker for each slot of the hash table, we can detect
a race (some writer did a delete and/or a move of an object
to another chain) checking the final 'nulls' value if
the lookup met the end of chain. If final 'nulls' value
is not the slot number, then we must restart the lookup at
the beginning. If the object was moved to the same chain,
then the reader doesn't care : It might eventually
scan the list again without harm.
1) lookup algo
head = &table[slot];
rcu_read_lock();
begin:
hlist_nulls_for_each_entry_rcu(obj, node, head, member) {
if (obj->key == key) {
if (!try_get_ref(obj)) // might fail for free objects
goto begin;
if (obj->key != key) { // not the object we expected
put_ref(obj);
goto begin;
}
goto out;
}
/*
* if the nulls value we got at the end of this lookup is
* not the expected one, we must restart lookup.
* We probably met an item that was moved to another chain.
*/
if (get_nulls_value(node) != slot)
goto begin;
obj = NULL;
out:
rcu_read_unlock();
2) Insert function :
--------------------
/*
* Please note that new inserts are done at the head of list,
* not in the middle or end.
*/
obj = kmem_cache_alloc(cachep);
lock_chain(); // typically a spin_lock()
obj->key = key;
/*
* changes to obj->key must be visible before refcnt one
*/
smp_wmb();
atomic_set(&obj->refcnt, 1);
/*
* insert obj in RCU way (readers might be traversing chain)
*/
hlist_nulls_add_head_rcu(&obj->obj_node, list);
unlock_chain(); // typically a spin_unlock()

Просмотреть файл

@ -1,4 +1,8 @@
Reference-count design for elements of lists/arrays protected by RCU.
.. SPDX-License-Identifier: GPL-2.0
====================================================================
Reference-count design for elements of lists/arrays protected by RCU
====================================================================
Please note that the percpu-ref feature is likely your first
@ -12,32 +16,33 @@ please read on.
Reference counting on elements of lists which are protected by traditional
reader/writer spinlocks or semaphores are straightforward:
CODE LISTING A:
1. 2.
add() search_and_reference()
{ {
alloc_object read_lock(&list_lock);
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
write_lock(&list_lock); ...
add_element read_unlock(&list_lock);
... ...
write_unlock(&list_lock); }
}
CODE LISTING A::
3. 4.
release_referenced() delete()
{ {
... write_lock(&list_lock);
if(atomic_dec_and_test(&el->rc)) ...
kfree(el);
... remove_element
} write_unlock(&list_lock);
...
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
1. 2.
add() search_and_reference()
{ {
alloc_object read_lock(&list_lock);
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
write_lock(&list_lock); ...
add_element read_unlock(&list_lock);
... ...
write_unlock(&list_lock); }
}
3. 4.
release_referenced() delete()
{ {
... write_lock(&list_lock);
if(atomic_dec_and_test(&el->rc)) ...
kfree(el);
... remove_element
} write_unlock(&list_lock);
...
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
If this list/array is made lock free using RCU as in changing the
write_lock() in add() and delete() to spin_lock() and changing read_lock()
@ -46,34 +51,35 @@ search_and_reference() could potentially hold reference to an element which
has already been deleted from the list/array. Use atomic_inc_not_zero()
in this scenario as follows:
CODE LISTING B:
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
spin_lock(&list_lock); rcu_read_unlock();
return FAIL;
add_element }
... ...
spin_unlock(&list_lock); rcu_read_unlock();
} }
3. 4.
release_referenced() delete()
{ {
... spin_lock(&list_lock);
if (atomic_dec_and_test(&el->rc)) ...
call_rcu(&el->head, el_free); remove_element
... spin_unlock(&list_lock);
} ...
if (atomic_dec_and_test(&el->rc))
call_rcu(&el->head, el_free);
...
}
CODE LISTING B::
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); if (!atomic_inc_not_zero(&el->rc)) {
spin_lock(&list_lock); rcu_read_unlock();
return FAIL;
add_element }
... ...
spin_unlock(&list_lock); rcu_read_unlock();
} }
3. 4.
release_referenced() delete()
{ {
... spin_lock(&list_lock);
if (atomic_dec_and_test(&el->rc)) ...
call_rcu(&el->head, el_free); remove_element
... spin_unlock(&list_lock);
} ...
if (atomic_dec_and_test(&el->rc))
call_rcu(&el->head, el_free);
...
}
Sometimes, a reference to the element needs to be obtained in the
update (write) stream. In such cases, atomic_inc_not_zero() might be
update (write) stream. In such cases, atomic_inc_not_zero() might be
overkill, since we hold the update-side spinlock. One might instead
use atomic_inc() in such cases.
@ -82,39 +88,40 @@ search_and_reference() code path. In such cases, the
atomic_dec_and_test() may be moved from delete() to el_free()
as follows:
CODE LISTING C:
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
spin_lock(&list_lock); ...
CODE LISTING C::
add_element rcu_read_unlock();
... }
spin_unlock(&list_lock); 4.
} delete()
3. {
release_referenced() spin_lock(&list_lock);
{ ...
... remove_element
if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
kfree(el); ...
... call_rcu(&el->head, el_free);
} ...
5. }
void el_free(struct rcu_head *rhp)
{
release_referenced();
}
1. 2.
add() search_and_reference()
{ {
alloc_object rcu_read_lock();
... search_for_element
atomic_set(&el->rc, 1); atomic_inc(&el->rc);
spin_lock(&list_lock); ...
add_element rcu_read_unlock();
... }
spin_unlock(&list_lock); 4.
} delete()
3. {
release_referenced() spin_lock(&list_lock);
{ ...
... remove_element
if (atomic_dec_and_test(&el->rc)) spin_unlock(&list_lock);
kfree(el); ...
... call_rcu(&el->head, el_free);
} ...
5. }
void el_free(struct rcu_head *rhp)
{
release_referenced();
}
The key point is that the initial reference added by add() is not removed
until after a grace period has elapsed following removal. This means that
search_and_reference() cannot find this element, which means that the value
of el->rc cannot increase. Thus, once it reaches zero, there are no
readers that can or ever will be able to reference the element. The
element can therefore safely be freed. This in turn guarantees that if
readers that can or ever will be able to reference the element. The
element can therefore safely be freed. This in turn guarantees that if
any reader finds the element, that reader may safely acquire a reference
without checking the value of the reference counter.
@ -130,21 +137,21 @@ the eventual invocation of kfree(), which is usually not a problem on
modern computer systems, even the small ones.
In cases where delete() can sleep, synchronize_rcu() can be called from
delete(), so that el_free() can be subsumed into delete as follows:
delete(), so that el_free() can be subsumed into delete as follows::
4.
delete()
{
spin_lock(&list_lock);
...
remove_element
spin_unlock(&list_lock);
...
synchronize_rcu();
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
4.
delete()
{
spin_lock(&list_lock);
...
remove_element
spin_unlock(&list_lock);
...
synchronize_rcu();
if (atomic_dec_and_test(&el->rc))
kfree(el);
...
}
As additional examples in the kernel, the pattern in listing C is used by
reference counting of struct pid, while the pattern in listing B is used by

Просмотреть файл

@ -1,4 +1,8 @@
.. SPDX-License-Identifier: GPL-2.0
==============================
Using RCU's CPU Stall Detector
==============================
This document first discusses what sorts of issues RCU's CPU stall
detector can locate, and then discusses kernel parameters and Kconfig
@ -7,39 +11,40 @@ this document explains the stall detector's "splat" format.
What Causes RCU CPU Stall Warnings?
===================================
So your kernel printed an RCU CPU stall warning. The next question is
"What caused it?" The following problems can result in RCU CPU stall
warnings:
o A CPU looping in an RCU read-side critical section.
- A CPU looping in an RCU read-side critical section.
o A CPU looping with interrupts disabled.
- A CPU looping with interrupts disabled.
o A CPU looping with preemption disabled.
- A CPU looping with preemption disabled.
o A CPU looping with bottom halves disabled.
- A CPU looping with bottom halves disabled.
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
- For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule(). If the looping in the kernel is
really expected and desirable behavior, you might need to add
some calls to cond_resched().
o Booting Linux using a console connection that is too slow to
- Booting Linux using a console connection that is too slow to
keep up with the boot-time console-message rate. For example,
a 115Kbaud serial console can be -way- too slow to keep up
with boot-time message rates, and will frequently result in
RCU CPU stall warning messages. Especially if you have added
debug printk()s.
o Anything that prevents RCU's grace-period kthreads from running.
- Anything that prevents RCU's grace-period kthreads from running.
This can result in the "All QSes seen" console-log message.
This message will include information on when the kthread last
ran and how often it should be expected to run. It can also
result in the "rcu_.*kthread starved for" console-log message,
result in the ``rcu_.*kthread starved for`` console-log message,
which will include additional debugging information.
o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
- A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
happen to preempt a low-priority task in the middle of an RCU
read-side critical section. This is especially damaging if
that low-priority task is not permitted to run on any other CPU,
@ -48,7 +53,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
While the system is in the process of running itself out of
memory, you might see stall-warning messages.
o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
- A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
is running at a higher priority than the RCU softirq threads.
This will prevent RCU callbacks from ever being invoked,
and in a CONFIG_PREEMPT_RCU kernel will further prevent
@ -63,7 +68,7 @@ o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
can increase your system's context-switch rate and thus degrade
performance.
o A periodic interrupt whose handler takes longer than the time
- A periodic interrupt whose handler takes longer than the time
interval between successive pairs of interrupts. This can
prevent RCU's kthreads and softirq handlers from running.
Note that certain high-overhead debugging options, for example
@ -71,20 +76,27 @@ o A periodic interrupt whose handler takes longer than the time
considerably longer than normal, which can in turn result in
RCU CPU stall warnings.
o Testing a workload on a fast system, tuning the stall-warning
- Testing a workload on a fast system, tuning the stall-warning
timeout down to just barely avoid RCU CPU stall warnings, and then
running the same workload with the same stall-warning timeout on a
slow system. Note that thermal throttling and on-demand governors
can cause a single system to be sometimes fast and sometimes slow!
o A hardware or software issue shuts off the scheduler-clock
- A hardware or software issue shuts off the scheduler-clock
interrupt on a CPU that is not in dyntick-idle mode. This
problem really has happened, and seems to be most likely to
result in RCU CPU stall warnings for CONFIG_NO_HZ_COMMON=n kernels.
o A bug in the RCU implementation.
- A hardware or software issue that prevents time-based wakeups
from occurring. These issues can range from misconfigured or
buggy timer hardware through bugs in the interrupt or exception
path (whether hardware, firmware, or software) through bugs
in Linux's timer subsystem through bugs in the scheduler, and,
yes, even including bugs in RCU itself.
o A hardware failure. This is quite unlikely, but has occurred
- A bug in the RCU implementation.
- A hardware failure. This is quite unlikely, but has occurred
at least once in real life. A CPU failed in a running system,
becoming unresponsive, but not causing an immediate crash.
This resulted in a series of RCU CPU stall warnings, eventually
@ -109,6 +121,7 @@ see include/trace/events/rcu.h.
Fine-Tuning the RCU CPU Stall Detector
======================================
The rcuupdate.rcu_cpu_stall_suppress module parameter disables RCU's
CPU stall detector, which detects conditions that unduly delay RCU grace
@ -118,6 +131,7 @@ The stall detector's idea of what constitutes "unduly delayed" is
controlled by a set of kernel configuration variables and cpp macros:
CONFIG_RCU_CPU_STALL_TIMEOUT
----------------------------
This kernel configuration parameter defines the period of time
that RCU will wait from the beginning of a grace period until it
@ -137,6 +151,7 @@ CONFIG_RCU_CPU_STALL_TIMEOUT
/sys/module/rcupdate/parameters/rcu_cpu_stall_suppress.
RCU_STALL_DELAY_DELTA
---------------------
Although the lockdep facility is extremely useful, it does add
some overhead. Therefore, under CONFIG_PROVE_RCU, the
@ -145,6 +160,7 @@ RCU_STALL_DELAY_DELTA
macro, not a kernel configuration parameter.)
RCU_STALL_RAT_DELAY
-------------------
The CPU stall detector tries to make the offending CPU print its
own warnings, as this often gives better-quality stack traces.
@ -155,6 +171,7 @@ RCU_STALL_RAT_DELAY
parameter.)
rcupdate.rcu_task_stall_timeout
-------------------------------
This boot/sysfs parameter controls the RCU-tasks stall warning
interval. A value of zero or less suppresses RCU-tasks stall
@ -168,9 +185,10 @@ rcupdate.rcu_task_stall_timeout
Interpreting RCU's CPU Stall-Detector "Splats"
==============================================
For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
it will print a message similar to the following:
it will print a message similar to the following::
INFO: rcu_sched detected stalls on CPUs/tasks:
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
@ -223,7 +241,7 @@ an estimate of the total number of RCU callbacks queued across all CPUs
(625 in this case).
In kernels with CONFIG_RCU_FAST_NO_HZ, more information is printed
for each CPU:
for each CPU::
0: (64628 ticks this GP) idle=dd5/3fffffffffffffff/0 softirq=82/543 last_accelerate: a345/d342 dyntick_enabled: 1
@ -235,7 +253,7 @@ processing is enabled.
If the grace period ends just as the stall warning starts printing,
there will be a spurious stall-warning message, which will include
the following:
the following::
INFO: Stall ended before state dump start
@ -248,7 +266,7 @@ which is overkill for this sort of problem.
If all CPUs and tasks have passed through quiescent states, but the
grace period has nevertheless failed to end, the stall-warning splat
will include something like the following:
will include something like the following::
All QSes seen, last rcu_preempt kthread activity 23807 (4297905177-4297881370), jiffies_till_next_fqs=3, root ->qsmask 0x0
@ -261,7 +279,7 @@ which is way less than 23807. Finally, the root rcu_node structure's
If the relevant grace-period kthread has been unable to run prior to
the stall warning, as was the case in the "All QSes seen" line above,
the following additional line is printed:
the following additional line is printed::
kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
@ -276,6 +294,7 @@ kthread last ran on CPU 5.
Multiple Warnings From One Stall
================================
If a stall lasts long enough, multiple stall-warning messages will be
printed for it. The second and subsequent messages are printed at
@ -285,9 +304,10 @@ of the stall and the first message.
Stall Warnings for Expedited Grace Periods
==========================================
If an expedited grace period detects a stall, it will place a message
like the following in dmesg:
like the following in dmesg::
INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 7-... } 21119 jiffies s: 73 root: 0x2/.

Просмотреть файл

@ -1,7 +1,12 @@
.. SPDX-License-Identifier: GPL-2.0
==========================
RCU Torture Test Operation
==========================
CONFIG_RCU_TORTURE_TEST
=======================
The CONFIG_RCU_TORTURE_TEST config option is available for all RCU
implementations. It creates an rcutorture kernel module that can
@ -13,9 +18,10 @@ when the module is loaded, and stops when the module is unloaded.
Module parameters are prefixed by "rcutorture." in
Documentation/admin-guide/kernel-parameters.txt.
OUTPUT
Output
======
The statistics output is as follows:
The statistics output is as follows::
rcu-torture:--- Start of test: nreaders=16 nfakewriters=4 stat_interval=30 verbose=0 test_no_idle_hz=1 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4
rcu-torture: rtc: (null) ver: 155441 tfle: 0 rta: 155441 rtaf: 8884 rtf: 155440 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 3055767
@ -36,53 +42,53 @@ automatic determination as to whether RCU operated correctly.
The entries are as follows:
o "rtc": The hexadecimal address of the structure currently visible
* "rtc": The hexadecimal address of the structure currently visible
to readers.
o "ver": The number of times since boot that the RCU writer task
* "ver": The number of times since boot that the RCU writer task
has changed the structure visible to readers.
o "tfle": If non-zero, indicates that the "torture freelist"
* "tfle": If non-zero, indicates that the "torture freelist"
containing structures to be placed into the "rtc" area is empty.
This condition is important, since it can fool you into thinking
that RCU is working when it is not. :-/
o "rta": Number of structures allocated from the torture freelist.
* "rta": Number of structures allocated from the torture freelist.
o "rtaf": Number of allocations from the torture freelist that have
* "rtaf": Number of allocations from the torture freelist that have
failed due to the list being empty. It is not unusual for this
to be non-zero, but it is bad for it to be a large fraction of
the value indicated by "rta".
o "rtf": Number of frees into the torture freelist.
* "rtf": Number of frees into the torture freelist.
o "rtmbe": A non-zero value indicates that rcutorture believes that
* "rtmbe": A non-zero value indicates that rcutorture believes that
rcu_assign_pointer() and rcu_dereference() are not working
correctly. This value should be zero.
o "rtbe": A non-zero value indicates that one of the rcu_barrier()
* "rtbe": A non-zero value indicates that one of the rcu_barrier()
family of functions is not working correctly.
o "rtbke": rcutorture was unable to create the real-time kthreads
* "rtbke": rcutorture was unable to create the real-time kthreads
used to force RCU priority inversion. This value should be zero.
o "rtbre": Although rcutorture successfully created the kthreads
* "rtbre": Although rcutorture successfully created the kthreads
used to force RCU priority inversion, it was unable to set them
to the real-time priority level of 1. This value should be zero.
o "rtbf": The number of times that RCU priority boosting failed
* "rtbf": The number of times that RCU priority boosting failed
to resolve RCU priority inversion.
o "rtb": The number of times that rcutorture attempted to force
* "rtb": The number of times that rcutorture attempted to force
an RCU priority inversion condition. If you are testing RCU
priority boosting via the "test_boost" module parameter, this
value should be non-zero.
o "nt": The number of times rcutorture ran RCU read-side code from
* "nt": The number of times rcutorture ran RCU read-side code from
within a timer handler. This value should be non-zero only
if you specified the "irqreader" module parameter.
o "Reader Pipe": Histogram of "ages" of structures seen by readers.
* "Reader Pipe": Histogram of "ages" of structures seen by readers.
If any entries past the first two are non-zero, RCU is broken.
And rcutorture prints the error flag string "!!!" to make sure
you notice. The age of a newly allocated structure is zero,
@ -94,14 +100,14 @@ o "Reader Pipe": Histogram of "ages" of structures seen by readers.
RCU. If you want to see what it looks like when broken, break
it yourself. ;-)
o "Reader Batch": Another histogram of "ages" of structures seen
* "Reader Batch": Another histogram of "ages" of structures seen
by readers, but in terms of counter flips (or batches) rather
than in terms of grace periods. The legal number of non-zero
entries is again two. The reason for this separate view is that
it is sometimes easier to get the third entry to show up in the
"Reader Batch" list than in the "Reader Pipe" list.
o "Free-Block Circulation": Shows the number of torture structures
* "Free-Block Circulation": Shows the number of torture structures
that have reached a given point in the pipeline. The first element
should closely correspond to the number of structures allocated,
the second to the number that have been removed from reader view,
@ -112,7 +118,7 @@ o "Free-Block Circulation": Shows the number of torture structures
Different implementations of RCU can provide implementation-specific
additional information. For example, Tree SRCU provides the following
additional line:
additional line::
srcud-torture: Tree SRCU per-CPU(idx=0): 0(35,-21) 1(-4,24) 2(1,1) 3(-26,20) 4(28,-47) 5(-9,4) 6(-10,14) 7(-14,11) T(1,6)
@ -123,15 +129,15 @@ using a dynamically allocated srcu_struct (hence "srcud-" rather than
"old" and "current" values to the underlying array, and is useful for
debugging. The final "T" entry contains the totals of the counters.
USAGE ON SPECIFIC KERNEL BUILDS
Usage on Specific Kernel Builds
===============================
It is sometimes desirable to torture RCU on a specific kernel build,
for example, when preparing to put that kernel build into production.
In that case, the kernel should be built with CONFIG_RCU_TORTURE_TEST=m
so that the test can be started using modprobe and terminated using rmmod.
For example, the following script may be used to torture RCU:
For example, the following script may be used to torture RCU::
#!/bin/sh
@ -148,7 +154,8 @@ two are self-explanatory, while the last indicates that while there
were no RCU failures, CPU-hotplug problems were detected.
USAGE ON MAINLINE KERNELS
Usage on Mainline Kernels
=========================
When using rcutorture to test changes to RCU itself, it is often
necessary to build a number of kernels in order to test that change
@ -180,16 +187,16 @@ to Tree SRCU might run only the SRCU-N and SRCU-P scenarios using the
--configs argument to kvm.sh as follows: "--configs 'SRCU-N SRCU-P'".
Large systems can run multiple copies of of the full set of scenarios,
for example, a system with 448 hardware threads can run five instances
of the full set concurrently. To make this happen:
of the full set concurrently. To make this happen::
kvm.sh --cpus 448 --configs '5*CFLIST'
Alternatively, such a system can run 56 concurrent instances of a single
eight-CPU scenario:
eight-CPU scenario::
kvm.sh --cpus 448 --configs '56*TREE04'
Or 28 concurrent instances of each of two eight-CPU scenarios:
Or 28 concurrent instances of each of two eight-CPU scenarios::
kvm.sh --cpus 448 --configs '28*TREE03 28*TREE04'
@ -199,14 +206,14 @@ values for memory may require disabling the callback-flooding tests
using the --bootargs parameter discussed below.
Sometimes additional debugging is useful, and in such cases the --kconfig
parameter to kvm.sh may be used, for example, "--kconfig 'CONFIG_KASAN=y'".
parameter to kvm.sh may be used, for example, ``--kconfig 'CONFIG_KASAN=y'``.
Kernel boot arguments can also be supplied, for example, to control
rcutorture's module parameters. For example, to test a change to RCU's
CPU stall-warning code, use "--bootargs 'rcutorture.stall_cpu=30'".
This will of course result in the scripting reporting a failure, namely
the resuling RCU CPU stall warning. As noted above, reducing memory may
require disabling rcutorture's callback-flooding tests:
require disabling rcutorture's callback-flooding tests::
kvm.sh --cpus 448 --configs '56*TREE04' --memory 128M \
--bootargs 'rcutorture.fwd_progress=0'
@ -225,7 +232,7 @@ is listed at the end of the kvm.sh output, which you really should redirect
to a file. The build products and console output of each run is kept in
tools/testing/selftests/rcutorture/res in timestamped directories. A
given directory can be supplied to kvm-find-errors.sh in order to have
it cycle you through summaries of errors and full error logs. For example:
it cycle you through summaries of errors and full error logs. For example::
tools/testing/selftests/rcutorture/bin/kvm-find-errors.sh \
tools/testing/selftests/rcutorture/res/2020.01.20-15.54.23
@ -245,38 +252,42 @@ that was tested and any uncommitted changes in diff format.
The most frequently used files in each per-scenario-run directory are:
.config: This file contains the Kconfig options.
.config:
This file contains the Kconfig options.
Make.out: This contains build output for a specific scenario.
Make.out:
This contains build output for a specific scenario.
console.log: This contains the console output for a specific scenario.
console.log:
This contains the console output for a specific scenario.
This file may be examined once the kernel has booted, but
it might not exist if the build failed.
vmlinux: This contains the kernel, which can be useful with tools like
vmlinux:
This contains the kernel, which can be useful with tools like
objdump and gdb.
A number of additional files are available, but are less frequently used.
Many are intended for debugging of rcutorture itself or of its scripting.
As of v5.4, a successful run with the default set of scenarios produces
the following summary at the end of the run on a 12-CPU system:
the following summary at the end of the run on a 12-CPU system::
SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ]
SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ]
SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ]
SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ]
TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ]
TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ]
TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ]
TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198
TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631
TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ]
TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844
TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497
CPU count limited from 16 to 12
TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961
TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997
TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
CPU count limited from 16 to 12
TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011
SRCU-N ------- 804233 GPs (148.932/s) [srcu: g10008272 f0x0 ]
SRCU-P ------- 202320 GPs (37.4667/s) [srcud: g1809476 f0x0 ]
SRCU-t ------- 1122086 GPs (207.794/s) [srcu: g0 f0x0 ]
SRCU-u ------- 1111285 GPs (205.794/s) [srcud: g1 f0x0 ]
TASKS01 ------- 19666 GPs (3.64185/s) [tasks: g0 f0x0 ]
TASKS02 ------- 20541 GPs (3.80389/s) [tasks: g0 f0x0 ]
TASKS03 ------- 19416 GPs (3.59556/s) [tasks: g0 f0x0 ]
TINY01 ------- 836134 GPs (154.84/s) [rcu: g0 f0x0 ] n_max_cbs: 34198
TINY02 ------- 850371 GPs (157.476/s) [rcu: g0 f0x0 ] n_max_cbs: 2631
TREE01 ------- 162625 GPs (30.1157/s) [rcu: g1124169 f0x0 ]
TREE02 ------- 333003 GPs (61.6672/s) [rcu: g2647753 f0x0 ] n_max_cbs: 35844
TREE03 ------- 306623 GPs (56.782/s) [rcu: g2975325 f0x0 ] n_max_cbs: 1496497
CPU count limited from 16 to 12
TREE04 ------- 246149 GPs (45.5831/s) [rcu: g1695737 f0x0 ] n_max_cbs: 434961
TREE05 ------- 314603 GPs (58.2598/s) [rcu: g2257741 f0x2 ] n_max_cbs: 193997
TREE07 ------- 167347 GPs (30.9902/s) [rcu: g1079021 f0x0 ] n_max_cbs: 478732
CPU count limited from 16 to 12
TREE09 ------- 752238 GPs (139.303/s) [rcu: g13075057 f0x0 ] n_max_cbs: 99011

Просмотреть файл

@ -19,9 +19,10 @@ attach to other running processes (e.g. Firefox, SSH sessions, GPG agent,
etc) to extract additional credentials and continue to expand the scope
of their attack without resorting to user-assisted phishing.
This is not a theoretical problem. SSH session hijacking
(http://www.storm.net.nz/projects/7) and arbitrary code injection
(http://c-skills.blogspot.com/2007/05/injectso.html) attacks already
This is not a theoretical problem. `SSH session hijacking
<https://www.blackhat.com/presentations/bh-usa-05/bh-us-05-boileau.pdf>`_
and `arbitrary code injection
<https://c-skills.blogspot.com/2007/05/injectso.html>`_ attacks already
exist and remain possible if ptrace is allowed to operate as before.
Since ptrace is not commonly used by non-developers and non-admins, system
builders should be allowed the option to disable this debugging system.

Просмотреть файл

@ -10,7 +10,7 @@ Description
clusters and in this context, is a "drop-in" replacement for shared
storage. Simplistically, you could see it as a network RAID 1.
Please visit http://www.drbd.org to find out more.
Please visit https://www.drbd.org to find out more.
.. toctree::
:maxdepth: 1

Просмотреть файл

@ -6,7 +6,7 @@ FAQ list:
=========
A FAQ list may be found in the fdutils package (see below), and also
at <http://fdutils.linux.lu/faq.html>.
at <https://fdutils.linux.lu/faq.html>.
LILO configuration options (Thinkpad users, read this)
@ -220,11 +220,11 @@ It also contains additional documentation about the floppy driver.
The latest version can be found at fdutils homepage:
http://fdutils.linux.lu
https://fdutils.linux.lu
The fdutils releases can be found at:
http://fdutils.linux.lu/download.html
https://fdutils.linux.lu/download.html
http://www.tux.org/pub/knaff/fdutils/

Просмотреть файл

@ -114,4 +114,4 @@ Following resources can be accounted by rdma controller.
(d) Delete resource limit::
echo echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max
echo mlx4_0 hca_handle=max hca_object=max > /sys/fs/cgroup/rdma/1/rdma.max

Просмотреть файл

@ -1483,8 +1483,7 @@ IO Interface Files
~~~~~~~~~~~~~~~~~~
io.stat
A read-only nested-keyed file which exists on non-root
cgroups.
A read-only nested-keyed file.
Lines are keyed by $MAJ:$MIN device numbers and not ordered.
The following nested keys are defined.
@ -1684,9 +1683,9 @@ per-cgroup dirty memory states are examined and the more restrictive
of the two is enforced.
cgroup writeback requires explicit support from the underlying
filesystem. Currently, cgroup writeback is implemented on ext2, ext4
and btrfs. On other filesystems, all writeback IOs are attributed to
the root cgroup.
filesystem. Currently, cgroup writeback is implemented on ext2, ext4,
btrfs, f2fs, and xfs. On other filesystems, all writeback IOs are
attributed to the root cgroup.
There are inherent differences in memory and writeback management
which affects how cgroup ownership is tracked. Memory is tracked per
@ -2043,7 +2042,7 @@ RDMA
----
The "rdma" controller regulates the distribution and accounting of
of RDMA resources.
RDMA resources.
RDMA Interface Files
~~~~~~~~~~~~~~~~~~~~

Просмотреть файл

@ -98,7 +98,7 @@ x) Finish support for SMB3.1.1 compression
Known Bugs
==========
See http://bugzilla.samba.org - search on product "CifsVFS" for
See https://bugzilla.samba.org - search on product "CifsVFS" for
current bug list. Also check http://bugzilla.kernel.org (Product = File System, Component = CIFS)
1) existing symbolic links (Windows reparse points) are recognized but

Просмотреть файл

@ -16,8 +16,7 @@ standard for interoperating between Macs and Windows and major NAS appliances.
Please see
MS-SMB2 (for detailed SMB2/SMB3/SMB3.1.1 protocol specification)
http://protocolfreedom.org/ and
http://samba.org/samba/PFIF/
or https://samba.org/samba/PFIF/
for more details.
@ -32,7 +31,7 @@ Build instructions
For Linux:
1) Download the kernel (e.g. from http://www.kernel.org)
1) Download the kernel (e.g. from https://www.kernel.org)
and change directory into the top of the kernel directory tree
(e.g. /usr/src/linux-2.5.73)
2) make menuconfig (or make xconfig)
@ -831,7 +830,7 @@ the active sessions and the shares that are mounted.
Enabling Kerberos (extended security) works but requires version 1.2 or later
of the helper program cifs.upcall to be present and to be configured in the
/etc/request-key.conf file. The cifs.upcall helper program is from the Samba
project(http://www.samba.org). NTLM and NTLMv2 and LANMAN support do not
project(https://www.samba.org). NTLM and NTLMv2 and LANMAN support do not
require this helper. Note that NTLMv2 security (which does not require the
cifs.upcall helper program), instead of using Kerberos, is sufficient for
some use cases.

Просмотреть файл

@ -16,7 +16,7 @@
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
# along with this program. If not, see <https://www.gnu.org/licenses/>.
#
while(<>) {

Просмотреть файл

@ -26,7 +26,7 @@ Please go to http://support.dell.com register and you can find info on
OpenManage and Dell Update packages (DUP).
Libsmbios can also be used to update BIOS on Dell systems go to
http://linux.dell.com/libsmbios/ for details.
https://linux.dell.com/libsmbios/ for details.
Dell_RBU driver supports BIOS update using the monolithic image and packetized
image methods. In case of monolithic the driver allocates a contiguous chunk

Просмотреть файл

@ -45,7 +45,7 @@ To use the target for the first time:
will format the device
3. unload the dm-integrity target
4. read the "provided_data_sectors" value from the superblock
5. load the dm-integrity target with the the target size
5. load the dm-integrity target with the target size
"provided_data_sectors"
6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target
with the size "provided_data_sectors"
@ -99,7 +99,7 @@ interleave_sectors:number
the superblock is used.
meta_device:device
Don't interleave the data and metadata on on device. Use a
Don't interleave the data and metadata on the device. Use a
separate device for metadata.
buffer_sectors:number

Просмотреть файл

@ -71,7 +71,7 @@ The target is named "raid" and it accepts the following parameters::
============= ===============================================================
Reference: Chapter 4 of
http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
https://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
<#raid_params>: The number of parameters that follow.

Просмотреть файл

@ -14,7 +14,7 @@ host-aware zoned block devices.
For a more detailed description of the zoned block device models and
their constraints see (for SCSI devices):
http://www.t10.org/drafts.htm#ZBC_Family
https://www.t10.org/drafts.htm#ZBC_Family
and (for ATA devices):

Просмотреть файл

@ -375,8 +375,9 @@
239 = /dev/uhid User-space I/O driver support for HID subsystem
240 = /dev/userio Serio driver testing device
241 = /dev/vhost-vsock Host kernel driver for virtio vsock
242 = /dev/rfkill Turning off radio transmissions (rfkill)
242-254 Reserved for local use
243-254 Reserved for local use
255 Reserved for MISC_DYNAMIC_MINOR
11 char Raw keyboard device (Linux/SPARC only)
@ -1442,7 +1443,7 @@
...
The driver and documentation may be obtained from
http://www.winradio.com/
https://www.winradio.com/
82 block I2O hard disk
0 = /dev/i2o/hdag 33rd I2O hard disk, whole disk
@ -1656,7 +1657,7 @@
dynamically, so there is no fixed mapping from subdevice
pathnames to minor numbers.
See http://www.comedi.org/ for information about the Comedi
See https://www.comedi.org/ for information about the Comedi
project.
98 block User-mode virtual block device
@ -1723,7 +1724,7 @@
implementations a kernel presence for caching and easy
mounting. For more information about the project,
write to <arla-drinkers@stacken.kth.se> or see
http://www.stacken.kth.se/project/arla/
https://www.stacken.kth.se/project/arla/
103 block Audit device
0 = /dev/audit Audit device

Просмотреть файл

@ -70,10 +70,10 @@ statements via::
nullarbor:~ # cat <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
net/sunrpc/svc_rdma.c:323 [svcxprt_rdma]svc_rdma_cleanup =_ "SVCRDMA Module Removed, deregister RPC RDMA transport\012"
net/sunrpc/svc_rdma.c:341 [svcxprt_rdma]svc_rdma_init =_ "\011max_inline : %d\012"
net/sunrpc/svc_rdma.c:340 [svcxprt_rdma]svc_rdma_init =_ "\011sq_depth : %d\012"
net/sunrpc/svc_rdma.c:338 [svcxprt_rdma]svc_rdma_init =_ "\011max_requests : %d\012"
...
@ -93,7 +93,7 @@ the debug statement callsites with any non-default flags::
nullarbor:~ # awk '$3 != "=_"' <debugfs>/dynamic_debug/control
# filename:lineno [module]function flags format
/usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
net/sunrpc/svcsock.c:1603 [sunrpc]svc_send p "svc_process: st_sendto returned %d\012"
Command Language Reference
==========================
@ -156,6 +156,7 @@ against. Possible keywords are:::
``line-range`` cannot contain space, e.g.
"1-30" is valid range but "1 - 30" is not.
``module=foo`` combined keyword=value form is interchangably accepted
The meanings of each keyword are:
@ -164,15 +165,18 @@ func
of each callsite. Example::
func svc_tcp_accept
func *recv* # in rfcomm, bluetooth, ping, tcp
file
The given string is compared against either the full pathname, the
src-root relative pathname, or the basename of the source file of
each callsite. Examples::
The given string is compared against either the src-root relative
pathname, or the basename of the source file of each callsite.
Examples::
file svcsock.c
file kernel/freezer.c
file /usr/src/packages/BUILD/sgi-enhancednfs-1.4/default/net/sunrpc/svcsock.c
file kernel/freezer.c # ie column 1 of control file
file drivers/usb/* # all callsites under it
file inode.c:start_* # parse :tail as a func (above)
file inode.c:1-100 # parse :tail as a line-range (above)
module
The given string is compared against the module name
@ -182,6 +186,7 @@ module
module sunrpc
module nfsd
module drm* # both drm, drm_kms_helper
format
The given string is searched for in the dynamic debug format
@ -251,8 +256,8 @@ the syntax described above, but must not exceed 1023 characters. Your
bootloader may impose lower limits.
These ``dyndbg`` params are processed just after the ddebug tables are
processed, as part of the arch_initcall. Thus you can enable debug
messages in all code run after this arch_initcall via this boot
processed, as part of the early_initcall. Thus you can enable debug
messages in all code run after this early_initcall via this boot
parameter.
On an x86 system for example ACPI enablement is a subsys_initcall and::

Просмотреть файл

@ -395,6 +395,13 @@ When mounting an ext4 filesystem, the following option are accepted:
Documentation/filesystems/dax.txt. Note that this option is
incompatible with data=journal.
inlinecrypt
When possible, encrypt/decrypt the contents of encrypted files using the
blk-crypto framework rather than filesystem-layer encryption. This
allows the use of inline encryption hardware. The on-disk format is
unaffected. For more details, see
Documentation/block/inline-encryption.rst.
Data Mode
=========
There are 3 different data modes:
@ -611,7 +618,7 @@ kernel source: <file:fs/ext4/>
programs: http://e2fsprogs.sourceforge.net/
useful links: http://fedoraproject.org/wiki/ext3-devel
useful links: https://fedoraproject.org/wiki/ext3-devel
http://www.bullopensource.org/ext4/
http://ext4.wiki.kernel.org/index.php/Main_Page
http://fedoraproject.org/wiki/Features/Ext4
https://fedoraproject.org/wiki/Features/Ext4

Просмотреть файл

@ -14,7 +14,7 @@ to the core through the special register mechanism that is susceptible
to MDS attacks.
Affected processors
--------------------
-------------------
Core models (desktop, mobile, Xeon-E3) that implement RDRAND and/or RDSEED may
be affected.
@ -59,7 +59,7 @@ executed on another core or sibling thread using MDS techniques.
Mitigation mechanism
-------------------
--------------------
Intel will release microcode updates that modify the RDRAND, RDSEED, and
EGETKEY instructions to overwrite secret special register data in the shared
staging buffer before the secret data can be accessed by another logical
@ -118,7 +118,7 @@ with the option "srbds=". The option for this is:
============= =============================================================
SRBDS System Information
-----------------------
------------------------
The Linux kernel provides vulnerability status information through sysfs. For
SRBDS this can be accessed by the following sysfs file:
/sys/devices/system/cpu/vulnerabilities/srbds

Просмотреть файл

@ -41,6 +41,7 @@ problems and bugs in particular.
init
kdump/index
perf/index
pstore-blk
This is the beginning of a section with information of interest to
application developers. Documents covering various aspects of the kernel

Просмотреть файл

@ -93,6 +93,11 @@ It exists in the sparse memory mapping model, and it is also somewhat
similar to the mem_map variable, both of them are used to translate an
address.
MAX_PHYSMEM_BITS
----------------
Defines the maximum supported physical address space memory.
page
----
@ -399,6 +404,17 @@ KERNELPACMASK
The mask to extract the Pointer Authentication Code from a kernel virtual
address.
TCR_EL1.T1SZ
------------
Indicates the size offset of the memory region addressed by TTBR1_EL1.
The region size is 2^(64-T1SZ) bytes.
TTBR1_EL1 is the table base address register specified by ARMv8-A
architecture which is used to lookup the page-tables for the Virtual
addresses in the higher VA range (refer to ARMv8 ARM document for
more details).
arm
===

Просмотреть файл

@ -703,6 +703,11 @@
cpufreq.off=1 [CPU_FREQ]
disable the cpufreq sub-system
cpufreq.default_governor=
[CPU_FREQ] Name of the default cpufreq governor or
policy to use. This governor must be registered in the
kernel before the cpufreq driver probes.
cpu_init_udelay=N
[X86] Delay for N microsec between assert and de-assert
of APIC INIT to start processors. This delay occurs
@ -827,6 +832,21 @@
useful to also enable the page_owner functionality.
on: enable the feature
debugfs= [KNL] This parameter enables what is exposed to userspace
and debugfs internal clients.
Format: { on, no-mount, off }
on: All functions are enabled.
no-mount:
Filesystem is not registered but kernel clients can
access APIs and a crashkernel can be used to read
its content. There is nothing to mount.
off: Filesystem is not registered and clients
get a -EPERM as result when trying to register files
or directories within debugfs.
This is equivalent of the runtime functionality if
debugfs was not enabled in the kernel at all.
Default value is set in build-time with a kernel configuration.
debugpat [X86] Enable PAT debugging
decnet.addr= [HW,NET]
@ -1207,26 +1227,28 @@
Format: {"off" | "on" | "skip[mbr]"}
efi= [EFI]
Format: { "old_map", "nochunk", "noruntime", "debug",
"nosoftreserve", "disable_early_pci_dma",
"no_disable_early_pci_dma" }
old_map [X86-64]: switch to the old ioremap-based EFI
runtime services mapping. [Needs CONFIG_X86_UV=y]
Format: { "debug", "disable_early_pci_dma",
"nochunk", "noruntime", "nosoftreserve",
"novamap", "no_disable_early_pci_dma",
"old_map" }
debug: enable misc debug output.
disable_early_pci_dma: disable the busmaster bit on all
PCI bridges while in the EFI boot stub.
nochunk: disable reading files in "chunks" in the EFI
boot stub, as chunking can cause problems with some
firmware implementations.
noruntime : disable EFI runtime services support
debug: enable misc debug output
nosoftreserve: The EFI_MEMORY_SP (Specific Purpose)
attribute may cause the kernel to reserve the
memory range for a memory mapping driver to
claim. Specify efi=nosoftreserve to disable this
reservation and treat the memory by its base type
(i.e. EFI_CONVENTIONAL_MEMORY / "System RAM").
disable_early_pci_dma: Disable the busmaster bit on all
PCI bridges while in the EFI boot stub
novamap: do not call SetVirtualAddressMap().
no_disable_early_pci_dma: Leave the busmaster bit set
on all PCI bridges while in the EFI boot stub
old_map [X86-64]: switch to the old ioremap-based EFI
runtime services mapping. [Needs CONFIG_X86_UV=y]
efi_no_storage_paranoia [EFI; X86]
Using this parameter you can use more than 50% of
@ -2786,7 +2808,7 @@
touchscreen support is not enabled in the mainstream
kernel as of 2.6.30, a preliminary port can be found
in the "bleeding edge" mini2440 support kernel at
http://repo.or.cz/w/linux-2.6/mini2440.git
https://repo.or.cz/w/linux-2.6/mini2440.git
mitigations=
[X86,PPC,S390,ARM64] Control optional mitigations for
@ -3079,6 +3101,8 @@
no5lvl [X86-64] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.
nofsgsbase [X86] Disables FSGSBASE instructions.
no_console_suspend
[HW] Never suspend the console
Disable suspending of consoles during suspend and
@ -4038,6 +4062,14 @@
latencies, which will choose a value aligned
with the appropriate hardware boundaries.
rcutree.rcu_min_cached_objs= [KNL]
Minimum number of objects which are cached and
maintained per one CPU. Object size is equal
to PAGE_SIZE. The cache allows to reduce the
pressure to page allocator, also it makes the
whole algorithm to behave better in low memory
condition.
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
@ -4258,6 +4290,20 @@
Set time (jiffies) between CPU-hotplug operations,
or zero to disable CPU-hotplug testing.
rcutorture.read_exit= [KNL]
Set the number of read-then-exit kthreads used
to test the interaction of RCU updaters and
task-exit processing.
rcutorture.read_exit_burst= [KNL]
The number of times in a given read-then-exit
episode that a set of read-then-exit kthreads
is spawned.
rcutorture.read_exit_delay= [KNL]
The delay, in seconds, between successive
read-then-exit testing episodes.
rcutorture.shuffle_interval= [KNL]
Set task-shuffle interval (s). Shuffling tasks
allows some CPUs to go into dyntick-idle mode
@ -4407,6 +4453,45 @@
reboot_cpu is s[mp]#### with #### being the processor
to be used for rebooting.
refscale.holdoff= [KNL]
Set test-start holdoff period. The purpose of
this parameter is to delay the start of the
test until boot completes in order to avoid
interference.
refscale.loops= [KNL]
Set the number of loops over the synchronization
primitive under test. Increasing this number
reduces noise due to loop start/end overhead,
but the default has already reduced the per-pass
noise to a handful of picoseconds on ca. 2020
x86 laptops.
refscale.nreaders= [KNL]
Set number of readers. The default value of -1
selects N, where N is roughly 75% of the number
of CPUs. A value of zero is an interesting choice.
refscale.nruns= [KNL]
Set number of runs, each of which is dumped onto
the console log.
refscale.readdelay= [KNL]
Set the read-side critical-section duration,
measured in microseconds.
refscale.scale_type= [KNL]
Specify the read-protection implementation to test.
refscale.shutdown= [KNL]
Shut down the system at the end of the performance
test. This defaults to 1 (shut it down) when
rcuperf is built into the kernel and to 0 (leave
it running) when rcuperf is built as a module.
refscale.verbose= [KNL]
Enable additional printk() statements.
relax_domain_level=
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/admin-guide/cgroup-v1/cpusets.rst.
@ -5082,6 +5167,13 @@
Prevent the CPU-hotplug component of torturing
until after init has spawned.
torture.ftrace_dump_at_shutdown= [KNL]
Dump the ftrace buffer at torture-test shutdown,
even if there were no errors. This can be a
very costly operation when many torture tests
are running concurrently, especially on systems
with rotating-rust storage.
tp720= [HW,PS2]
tpm_suspend_pcr=[HW,TPM]

Просмотреть файл

@ -135,7 +135,7 @@ single project which, although still considered experimental, is fit
for use. Please feel free to add projects that have been the victims
of my ignorance.
- http://www.thinkwiki.org/wiki/HDAPS
- https://www.thinkwiki.org/wiki/HDAPS
See this page for information about Linux support of the hard disk
active protection system as implemented in IBM/Lenovo Thinkpads.

Просмотреть файл

@ -151,7 +151,7 @@ Bugs:
different way to adjust the backlighting of the screen. There
is a userspace utility to adjust the brightness on those models,
which can be downloaded from
http://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
https://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
- since all development was done by reverse engineering, there is
*absolutely no guarantee* that this driver will not crash your

Просмотреть файл

@ -50,6 +50,7 @@ detailed description):
- WAN enable and disable
- UWB enable and disable
- LCD Shadow (PrivacyGuard) enable and disable
- Lap mode sensor
A compatibility table by model and feature is maintained on the web
site, http://ibm-acpi.sf.net/. I appreciate any success or failure
@ -904,7 +905,7 @@ temperatures:
The mapping of thermal sensors to physical locations varies depending on
system-board model (and thus, on ThinkPad model).
http://thinkwiki.org/wiki/Thermal_Sensors is a public wiki page that
https://thinkwiki.org/wiki/Thermal_Sensors is a public wiki page that
tries to track down these locations for various models.
Most (newer?) models seem to follow this pattern:
@ -925,7 +926,7 @@ For the R51 (source: Thomas Gruber):
- 3: Internal HDD
For the T43, T43/p (source: Shmidoax/Thinkwiki.org)
http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
https://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
- 2: System board, left side (near PCMCIA slot), reported as HDAPS temp
- 3: PCMCIA slot
@ -935,7 +936,7 @@ http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_T43.2C_T43p
- 11: Power regulator, underside of system board, below F2 key
The A31 has a very atypical layout for the thermal sensors
(source: Milos Popovic, http://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_A31)
(source: Milos Popovic, https://thinkwiki.org/wiki/Thermal_Sensors#ThinkPad_A31)
- 1: CPU
- 2: Main Battery: main sensor
@ -1432,6 +1433,20 @@ The first command ensures the best viewing angle and the latter one turns
on the feature, restricting the viewing angles.
DYTC Lapmode sensor
------------------
sysfs: dytc_lapmode
Newer thinkpads and mobile workstations have the ability to determine if
the device is in deskmode or lapmode. This feature is used by user space
to decide if WWAN transmission can be increased to maximum power and is
also useful for understanding the different thermal modes available as
they differ between desk and lap mode.
The property is read-only. If the platform doesn't have support the sysfs
class is not created.
EXPERIMENTAL: UWB
-----------------
@ -1470,6 +1485,23 @@ For more details about which buttons will appear depending on the mode, please
review the laptop's user guide:
http://www.lenovo.com/shop/americas/content/user_guides/x1carbon_2_ug_en.pdf
Battery charge control
----------------------
sysfs attributes:
/sys/class/power_supply/BAT*/charge_control_{start,end}_threshold
These two attributes are created for those batteries that are supported by the
driver. They enable the user to control the battery charge thresholds of the
given battery. Both values may be read and set. `charge_control_start_threshold`
accepts an integer between 0 and 99 (inclusive); this value represents a battery
percentage level, below which charging will begin. `charge_control_end_threshold`
accepts an integer between 1 and 100 (inclusive); this value represents a battery
percentage level, above which charging will stop.
The exact semantics of the attributes may be found in
Documentation/ABI/testing/sysfs-class-power.
Multiple Commands, Module Parameters
------------------------------------

Просмотреть файл

@ -426,6 +426,10 @@ All md devices contain:
The accepted values when writing to this file are ``ppl`` and ``resync``,
used to enable and disable PPL.
uuid
This indicates the UUID of the array in the following format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
As component devices are added to an md array, they appear in the ``md``
directory as new directories named::

Просмотреть файл

@ -90,7 +90,7 @@ built as modules.
Those GPU-specific drivers are selected via the ``Graphics support``
menu, under ``Device Drivers``.
When a GPU driver supports supports HDMI CEC, it will automatically
When a GPU driver supports HDMI CEC, it will automatically
enable the CEC core support at the media subsystem.
Media dependencies
@ -244,7 +244,7 @@ functionality.
If you have an hybrid card, you may need to enable both ``Analog TV``
and ``Digital TV`` at the menu.
When using this option, the defaults for the the media support core
When using this option, the defaults for the media support core
functionality are usually good enough to provide the basic functionality
for the driver. Yet, you could manually enable some desired extra (optional)
functionality using the settings under each of the following

Просмотреть файл

@ -35,7 +35,7 @@ physical memory (demand paging) and provides a mechanism for the
protection and controlled sharing of data between processes.
With virtual memory, each and every memory access uses a virtual
address. When the CPU decodes the an instruction that reads (or
address. When the CPU decodes an instruction that reads (or
writes) from (or to) the system memory, it translates the `virtual`
address encoded in that instruction to a `physical` address that the
memory controller can understand.

Просмотреть файл

@ -101,37 +101,48 @@ be specified in bytes with optional scale suffix [kKmMgG]. The default huge
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
Hugetlb boot command line parameter semantics
hugepagesz - Specify a huge page size. Used in conjunction with hugepages
hugepagesz
Specify a huge page size. Used in conjunction with hugepages
parameter to preallocate a number of huge pages of the specified
size. Hence, hugepagesz and hugepages are typically specified in
pairs such as:
pairs such as::
hugepagesz=2M hugepages=512
hugepagesz can only be specified once on the command line for a
specific huge page size. Valid huge page sizes are architecture
dependent.
hugepages - Specify the number of huge pages to preallocate. This typically
hugepages
Specify the number of huge pages to preallocate. This typically
follows a valid hugepagesz or default_hugepagesz parameter. However,
if hugepages is the first or only hugetlb command line parameter it
implicitly specifies the number of huge pages of default size to
allocate. If the number of huge pages of default size is implicitly
specified, it can not be overwritten by a hugepagesz,hugepages
parameter pair for the default size.
For example, on an architecture with 2M default huge page size:
For example, on an architecture with 2M default huge page size::
hugepages=256 hugepagesz=2M hugepages=512
will result in 256 2M huge pages being allocated and a warning message
indicating that the hugepages=512 parameter is ignored. If a hugepages
parameter is preceded by an invalid hugepagesz parameter, it will
be ignored.
default_hugepagesz - Specify the default huge page size. This parameter can
default_hugepagesz
pecify the default huge page size. This parameter can
only be specified once on the command line. default_hugepagesz can
optionally be followed by the hugepages parameter to preallocate a
specific number of huge pages of default size. The number of default
sized huge pages to preallocate can also be implicitly specified as
mentioned in the hugepages section above. Therefore, on an
architecture with 2M default huge page size:
architecture with 2M default huge page size::
hugepages=256
default_hugepagesz=2M hugepages=256
hugepages=256 default_hugepagesz=2M
will all result in 256 2M huge pages being allocated. Valid default
huge page size is architecture dependent.

Просмотреть файл

@ -31,6 +31,7 @@ the Linux memory management.
idle_page_tracking
ksm
memory-hotplug
nommu-mmap
numa_memory_policy
numaperf
pagemap

Просмотреть файл

@ -9,7 +9,7 @@ Overview
KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y,
added to the Linux kernel in 2.6.32. See ``mm/ksm.c`` for its implementation,
and http://lwn.net/Articles/306704/ and http://lwn.net/Articles/330589/
and http://lwn.net/Articles/306704/ and https://lwn.net/Articles/330589/
KSM was originally developed for use with KVM (where it was known as
Kernel Shared Memory), to fit more virtual machines into physical memory,
@ -52,7 +52,7 @@ with EAGAIN, but more probably arousing the Out-Of-Memory killer.
If KSM is not configured into the running kernel, madvise MADV_MERGEABLE
and MADV_UNMERGEABLE simply fail with EINVAL. If the running kernel was
built with CONFIG_KSM=y, those calls will normally succeed: even if the
the KSM daemon is not currently running, MADV_MERGEABLE still registers
KSM daemon is not currently running, MADV_MERGEABLE still registers
the range for whenever the KSM daemon is started; even if the range
cannot contain any pages which KSM could actually merge; even if
MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.

Просмотреть файл

@ -129,7 +129,7 @@ will create the following directory::
/sys/devices/system/node/nodeX/memory_side_cache/
If that directory is not present, the system either does not not provide
If that directory is not present, the system either does not provide
a memory-side cache, or that information is not accessible to the kernel.
The attributes for each level of cache is provided under its cache

Просмотреть файл

@ -65,8 +65,8 @@ migrated onto another server by means of the special "fs_locations"
attribute. See `RFC3530 Section 6: Filesystem Migration and Replication`_ and
`Implementation Guide for Referrals in NFSv4`_.
.. _RFC3530 Section 6\: Filesystem Migration and Replication: http://tools.ietf.org/html/rfc3530#section-6
.. _Implementation Guide for Referrals in NFSv4: http://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
.. _RFC3530 Section 6\: Filesystem Migration and Replication: https://tools.ietf.org/html/rfc3530#section-6
.. _Implementation Guide for Referrals in NFSv4: https://tools.ietf.org/html/draft-ietf-nfsv4-referrals-00
The fs_locations information can take the form of either an ip address and
a path, or a DNS hostname and a path. The latter requires the NFS client to

Просмотреть файл

@ -65,7 +65,7 @@ use with NFS/RDMA.
If the version is less than 1.1.2 or the command does not exist,
you should install the latest version of nfs-utils.
Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs
Download the latest package from: https://www.kernel.org/pub/linux/utils/nfs
Uncompress the package and follow the installation instructions.

Просмотреть файл

@ -264,7 +264,7 @@ They depend on various facilities being available:
access to the floppy drive device, /dev/fd0
For more information on syslinux, including how to create bootdisks
for prebuilt kernels, see http://syslinux.zytor.com/
for prebuilt kernels, see https://syslinux.zytor.com/
.. note::
Previously it was possible to write a kernel directly to
@ -292,7 +292,7 @@ They depend on various facilities being available:
cdrecord dev=ATAPI:1,0,0 arch/x86/boot/image.iso
For more information on isolinux, including how to create bootdisks
for prebuilt kernels, see http://syslinux.zytor.com/
for prebuilt kernels, see https://syslinux.zytor.com/
- Using LILO
@ -346,7 +346,7 @@ They depend on various facilities being available:
see Documentation/admin-guide/serial-console.rst for more information.
For more information on isolinux, including how to create bootdisks
for prebuilt kernels, see http://syslinux.zytor.com/
for prebuilt kernels, see https://syslinux.zytor.com/

Просмотреть файл

@ -8,7 +8,7 @@ to handling all the metadata access to the NFS export also hands out layouts
to the clients to directly access the underlying block devices that are
shared with the client.
To use pNFS block layouts with with the Linux NFS server the exported file
To use pNFS block layouts with the Linux NFS server the exported file
system needs to support the pNFS block layouts (currently just XFS), and the
file system must sit on shared storage (typically iSCSI) that is accessible
to the clients in addition to the MDS. As of now the file system needs to

Просмотреть файл

@ -9,7 +9,7 @@ which in addition to handling all the metadata access to the NFS export,
also hands out layouts to the clients so that they can directly access the
underlying SCSI LUNs that are shared with the client.
To use pNFS SCSI layouts with with the Linux NFS server, the exported file
To use pNFS SCSI layouts with the Linux NFS server, the exported file
system needs to support the pNFS SCSI layouts (currently just XFS), and the
file system must sit on a SCSI LUN that is accessible to the clients in
addition to the MDS. As of now the file system needs to sit directly on the

Просмотреть файл

@ -27,7 +27,7 @@ Crosspoint PMU events require "xp" (index), "bus" (bus number)
and "vc" (virtual channel ID).
Crosspoint watchpoint-based events (special "event" value 0xfe)
require "xp" and "vc" as as above plus "port" (device port index),
require "xp" and "vc" as above plus "port" (device port index),
"dir" (transmit/receive direction), comparator values ("cmp_l"
and "cmp_h") and "mask", being index of the comparator mask.

Просмотреть файл

@ -147,9 +147,9 @@ CPUs in it.
The next major initialization step for a new policy object is to attach a
scaling governor to it (to begin with, that is the default scaling governor
determined by the kernel configuration, but it may be changed later
via ``sysfs``). First, a pointer to the new policy object is passed to the
governor's ``->init()`` callback which is expected to initialize all of the
determined by the kernel command line or configuration, but it may be changed
later via ``sysfs``). First, a pointer to the new policy object is passed to
the governor's ``->init()`` callback which is expected to initialize all of the
data structures necessary to handle the given policy and, possibly, to add
a governor ``sysfs`` interface to it. Next, the governor is started by
invoking its ``->start()`` callback.

Просмотреть файл

@ -114,7 +114,7 @@ base performance profile (which is performance level 0).
Lock/Unlock status
~~~~~~~~~~~~~~~~~~
Even if there are multiple performance profiles, it is possible that that they
Even if there are multiple performance profiles, it is possible that they
are locked. If they are locked, users cannot issue a command to change the
performance state. It is possible that there is a BIOS setup to unlock or check
with your system vendor.
@ -883,7 +883,7 @@ To enable Intel(R) SST-TF, execute::
enable:success
In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF
feature and also sets the CPUs to high and and low priority using Intel Speed
feature and also sets the CPUs to high and low priority using Intel Speed
Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed
with "-c" arguments are marked as high priority, including its siblings.

Просмотреть файл

@ -431,6 +431,17 @@ argument is passed to the kernel in the command line.
supported in the current configuration, writes to this attribute will
fail with an appropriate error.
``energy_efficiency``
This attribute is only present on platforms, which have CPUs matching
Kaby Lake or Coffee Lake desktop CPU model. By default
energy efficiency optimizations are disabled on these CPU models in HWP
mode by this driver. Enabling energy efficiency may limit maximum
operating frequency in both HWP and non HWP mode. In non HWP mode,
optimizations are done only in the turbo frequency range. In HWP mode,
optimizations are done in the entire frequency range. Setting this
attribute to "1" enables energy efficiency optimizations and setting
to "0" disables energy efficiency optimizations.
Interpretation of Policy Attributes
-----------------------------------
@ -554,7 +565,11 @@ somewhere between the two extremes:
Strings written to the ``energy_performance_preference`` attribute are
internally translated to integer values written to the processor's
Energy-Performance Preference (EPP) knob (if supported) or its
Energy-Performance Bias (EPB) knob.
Energy-Performance Bias (EPB) knob. It is also possible to write a positive
integer value between 0 to 255, if the EPP feature is present. If the EPP
feature is not present, writing integer value to this attribute is not
supported. In this case, user can use
"/sys/devices/system/cpu/cpu*/power/energy_perf_bias" interface.
[Note that tasks may by migrated from one CPU to another by the scheduler's
load-balancing algorithm and if different energy vs performance hints are
@ -708,7 +723,7 @@ core (for the policies with other scaling governors).
The ``ftrace`` interface can be used for low-level diagnostics of
``intel_pstate``. For example, to check how often the function to set a
P-state is called, the ``ftrace`` filter can be set to to
P-state is called, the ``ftrace`` filter can be set to
:c:func:`intel_pstate_set_pstate`::
# cd /sys/kernel/debug/tracing/

Просмотреть файл

@ -21,11 +21,18 @@ understand and fix the security vulnerability.
As it is with any bug, the more information provided the easier it
will be to diagnose and fix. Please review the procedure outlined in
admin-guide/reporting-bugs.rst if you are unclear about what
:doc:`reporting-bugs` if you are unclear about what
information is helpful. Any exploit code is very helpful and will not
be released without consent from the reporter unless it has already been
made public.
Please send plain text emails without attachments where possible.
It is much harder to have a context-quoted discussion about a complex
issue if all the details are hidden away in attachments. Think of it like a
:doc:`regular patch submission <../process/submitting-patches>`
(even if you don't have a patch yet): describe the problem and impact, list
reproduction steps, and follow it with a proposed fix, all in plain text.
Disclosure and embargoed information
------------------------------------

Просмотреть файл

@ -261,7 +261,7 @@ directories like /tmp. The common method of exploitation of this flaw
is to cross privilege boundaries when following a given symlink (i.e. a
root process follows a symlink belonging to another user). For a likely
incomplete list of hundreds of examples across the years, please see:
http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
When set to "0", symlink following behavior is unrestricted.

Просмотреть файл

@ -235,7 +235,7 @@ This toggle indicates whether unprivileged users are prevented
from using ``dmesg(8)`` to view messages from the kernel's log
buffer.
When ``dmesg_restrict`` is set to 0 there are no restrictions.
When ``dmesg_restrict`` is set set to 1, users must have
When ``dmesg_restrict`` is set to 1, users must have
``CAP_SYSLOG`` to use ``dmesg(8)``.
The kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the
@ -335,8 +335,8 @@ Path for the hotplug policy agent.
Default value is "``/sbin/hotplug``".
hung_task_all_cpu_backtrace:
================
hung_task_all_cpu_backtrace
===========================
If this option is set, the kernel will send an NMI to all CPUs to dump
their backtraces when a hung task is detected. This file shows up if
@ -646,8 +646,8 @@ rate for each task.
scanned for a given scan.
oops_all_cpu_backtrace:
================
oops_all_cpu_backtrace
======================
If this option is set, the kernel will send an NMI to all CPUs to dump
their backtraces when an oops event occurs. It should be used as a last
@ -996,6 +996,38 @@ pty
See Documentation/filesystems/devpts.rst.
random
======
This is a directory, with the following entries:
* ``boot_id``: a UUID generated the first time this is retrieved, and
unvarying after that;
* ``entropy_avail``: the pool's entropy count, in bits;
* ``poolsize``: the entropy pool size, in bits;
* ``urandom_min_reseed_secs``: obsolete (used to determine the minimum
number of seconds between urandom pool reseeding).
* ``uuid``: a UUID generated every time this is retrieved (this can
thus be used to generate UUIDs at will);
* ``write_wakeup_threshold``: when the entropy count drops below this
(as a number of bits), processes waiting to write to ``/dev/random``
are woken up.
If ``drivers/char/random.c`` is built with ``ADD_INTERRUPT_BENCH``
defined, these additional entries are present:
* ``add_interrupt_avg_cycles``: the average number of cycles between
interrupts used to feed the pool;
* ``add_interrupt_avg_deviation``: the standard deviation seen on the
number of cycles between interrupts used to feed the pool.
randomize_va_space
==================
@ -1062,6 +1094,60 @@ Enables/disables scheduler statistics. Enabling this feature
incurs a small amount of overhead in the scheduler but is
useful for debugging and performance tuning.
sched_util_clamp_min:
=====================
Max allowed *minimum* utilization.
Default value is 1024, which is the maximum possible value.
It means that any requested uclamp.min value cannot be greater than
sched_util_clamp_min, i.e., it is restricted to the range
[0:sched_util_clamp_min].
sched_util_clamp_max:
=====================
Max allowed *maximum* utilization.
Default value is 1024, which is the maximum possible value.
It means that any requested uclamp.max value cannot be greater than
sched_util_clamp_max, i.e., it is restricted to the range
[0:sched_util_clamp_max].
sched_util_clamp_min_rt_default:
================================
By default Linux is tuned for performance. Which means that RT tasks always run
at the highest frequency and most capable (highest capacity) CPU (in
heterogeneous systems).
Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
1024 by default, which effectively boosts the tasks to run at the highest
frequency and biases them to run on the biggest CPU.
This knob allows admins to change the default behavior when uclamp is being
used. In battery powered devices particularly, running at the maximum
capacity and frequency will increase energy consumption and shorten the battery
life.
This knob is only effective for RT tasks which the user hasn't modified their
requested uclamp.min value via sched_setattr() syscall.
This knob will not escape the range constraint imposed by sched_util_clamp_min
defined above.
For example if
sched_util_clamp_min_rt_default = 800
sched_util_clamp_min = 600
Then the boost will be clamped to 600 because 800 is outside of the permissible
range of [0:600]. This could happen for instance if a powersave mode will
restrict all boosts temporarily by modifying sched_util_clamp_min. As soon as
this restriction is lifted, the requested sched_util_clamp_min_rt_default
will take effect.
seccomp
=======

Просмотреть файл

@ -583,7 +583,7 @@ trimming of allocations is initiated.
The default value is 1.
See Documentation/nommu-mmap.txt for more information.
See Documentation/admin-guide/mm/nommu-mmap.rst for more information.
numa_zonelist_order

Просмотреть файл

@ -38,7 +38,7 @@ either letters or blanks. In above example it looks like this::
Tainted: P W O
The meaning of those characters is explained in the table below. In tis case
The meaning of those characters is explained in the table below. In this case
the kernel got tainted earlier because a proprietary Module (``P``) was loaded,
a warning occurred (``W``), and an externally-built module was loaded (``O``).
To decode other letters use the table below.
@ -61,7 +61,7 @@ this on the machine that had the statements in the logs that were quoted earlier
* Proprietary module was loaded (#0)
* Kernel issued warning (#9)
* Externally-built ('out-of-tree') module was loaded (#12)
See Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel or
See Documentation/admin-guide/tainted-kernels.rst in the Linux kernel or
https://www.kernel.org/doc/html/latest/admin-guide/tainted-kernels.html for
a more details explanation of the various taint flags.
Raw taint value as int/string: 4609/'P W O '

Просмотреть файл

@ -173,8 +173,8 @@ following ``udev`` rule::
ACTION=="add", SUBSYSTEM=="thunderbolt", ATTRS{iommu_dma_protection}=="1", ATTR{authorized}=="0", ATTR{authorized}="1"
Upgrading NVM on Thunderbolt device or host
-------------------------------------------
Upgrading NVM on Thunderbolt device, host or retimer
----------------------------------------------------
Since most of the functionality is handled in firmware running on a
host controller or a device, it is important that the firmware can be
upgraded to the latest where possible bugs in it have been fixed.
@ -185,9 +185,10 @@ for some machines:
`Thunderbolt Updates <https://thunderbolttechnology.net/updates>`_
Before you upgrade firmware on a device or host, please make sure it is a
suitable upgrade. Failing to do that may render the device (or host) in a
state where it cannot be used properly anymore without special tools!
Before you upgrade firmware on a device, host or retimer, please make
sure it is a suitable upgrade. Failing to do that may render the device
in a state where it cannot be used properly anymore without special
tools!
Host NVM upgrade on Apple Macs is not supported.

Просмотреть файл

@ -133,7 +133,7 @@ When mounting an XFS filesystem, the following options are accepted.
logbsize must be an integer multiple of the log
stripe unit configured at **mkfs(8)** time.
The default value for for version 1 logs is 32768, while the
The default value for version 1 logs is 32768, while the
default value for version 2 logs is MAX(32768, log_sunit).
logdev=device and rtdev=device

Просмотреть файл

@ -128,7 +128,7 @@ it. The recommended placement is in the first 16KiB of RAM.
The boot loader must load a device tree image (dtb) into system ram
at a 64bit aligned address and initialize it with the boot data. The
dtb format is documented in Documentation/devicetree/booting-without-of.txt.
dtb format is documented in Documentation/devicetree/booting-without-of.rst.
The kernel will look for the dtb magic value of 0xd00dfeed at the dtb
physical address to determine if a dtb has been passed instead of a
tagged list.

Просмотреть файл

@ -220,7 +220,7 @@ LPIT Signature Reserved (signature == "LPIT")
x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor
descriptions and power states on ARM platforms should use the DSDT
and define processor container devices (_HID ACPI0010, Section 8.4,
and more specifically 8.4.3 and and 8.4.4).
and more specifically 8.4.3 and 8.4.4).
MADT Section 5.2.12 (signature == "APIC")

Просмотреть файл

@ -273,7 +273,7 @@ only use the _DSD Device Properties UUID [5]:
- UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301
- http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
- https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf
The UEFI Forum provides a mechanism for registering device properties [4]
so that they may be used across all operating systems supporting ACPI.
@ -470,7 +470,7 @@ likely be willing to assist in submitting ECRs.
Linux Code
----------
Individual items specific to Linux on ARM, contained in the the Linux
Individual items specific to Linux on ARM, contained in the Linux
source code, are in the list that follows:
ACPI_OS_NAME

Просмотреть файл

@ -14,6 +14,7 @@ ARM64 Architecture
hugetlbpage
legacy_instructions
memory
perf
pointer-authentication
silicon-errata
sve

Просмотреть файл

@ -1,8 +1,11 @@
.. SPDX-License-Identifier: GPL-2.0
=====================
Perf Event Attributes
=====================
Author: Andrew Murray <andrew.murray@arm.com>
Date: 2019-03-06
:Author: Andrew Murray <andrew.murray@arm.com>
:Date: 2019-03-06
exclude_user
------------

Просмотреть файл

@ -494,7 +494,7 @@ Appendix B. ARMv8-A FP/SIMD programmer's model
Note: This section is for information only and not intended to be complete or
to replace any architectural specification.
Refer to [4] for for more information.
Refer to [4] for more information.
ARMv8-A defines the following floating-point / SIMD register state:

Просмотреть файл

@ -85,21 +85,21 @@ smp_store_release() respectively. Therefore, if you find yourself only using
the Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
and are doing it wrong.
A subtle detail of atomic_set{}() is that it should be observable to the RMW
ops. That is:
A note for the implementation of atomic_set{}() is that it must not break the
atomicity of the RMW ops. That is:
C atomic-set
C Atomic-RMW-ops-are-atomic-WRT-atomic_set
{
atomic_set(v, 1);
atomic_t v = ATOMIC_INIT(1);
}
P0(atomic_t *v)
{
(void)atomic_add_unless(v, 1, 0);
}
P1(atomic_t *v)
{
atomic_add_unless(v, 1, 0);
}
P2(atomic_t *v)
{
atomic_set(v, 0);
}
@ -233,19 +233,19 @@ as well. Similarly, something like:
is an ACQUIRE pattern (though very much not typical), but again the barrier is
strictly stronger than ACQUIRE. As illustrated:
C strong-acquire
C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
{
}
P1(int *x, atomic_t *y)
P0(int *x, atomic_t *y)
{
r0 = READ_ONCE(*x);
smp_rmb();
r1 = atomic_read(y);
}
P2(int *x, atomic_t *y)
P1(int *x, atomic_t *y)
{
atomic_inc(y);
smp_mb__after_atomic();
@ -253,14 +253,14 @@ strictly stronger than ACQUIRE. As illustrated:
}
exists
(r0=1 /\ r1=0)
(0:r0=1 /\ 0:r1=0)
This should not happen; but a hypothetical atomic_inc_acquire() --
(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
because it would not order the W part of the RMW against the following
WRITE_ONCE. Thus:
P1 P2
P0 P1
t = LL.acq *y (0)
t++;

Просмотреть файл

@ -196,7 +196,7 @@ a virtual address mapping (unlike the earlier scheme of virtual address
do not have a corresponding kernel virtual address space mapping) and
low-memory pages.
Note: Please refer to Documentation/DMA-API-HOWTO.txt for a discussion
Note: Please refer to :doc:`/core-api/dma-api-howto` for a discussion
on PCI high mem DMA aspects and mapping of scatter gather lists, and support
for 64 bit PCI.
@ -1036,7 +1036,7 @@ Now the generic block layer performs partition-remapping early and thus
provides drivers with a sector number relative to whole device, rather than
having to take partition number into account in order to arrive at the true
sector number. The routine blk_partition_remap() is invoked by
generic_make_request even before invoking the queue specific make_request_fn,
submit_bio_noacct even before invoking the queue specific ->submit_bio,
so the i/o scheduler also gets to operate on whole disk sector numbers. This
should typically not require changes to block drivers, it just never gets
to invoke its own partition sector offset calculations since all bios

Просмотреть файл

@ -0,0 +1,153 @@
.. SPDX-License-Identifier: GPL-2.0
================================================
Multi-Queue Block IO Queueing Mechanism (blk-mq)
================================================
The Multi-Queue Block IO Queueing Mechanism is an API to enable fast storage
devices to achieve a huge number of input/output operations per second (IOPS)
through queueing and submitting IO requests to block devices simultaneously,
benefiting from the parallelism offered by modern storage devices.
Introduction
============
Background
----------
Magnetic hard disks have been the de facto standard from the beginning of the
development of the kernel. The Block IO subsystem aimed to achieve the best
performance possible for those devices with a high penalty when doing random
access, and the bottleneck was the mechanical moving parts, a lot slower than
any layer on the storage stack. One example of such optimization technique
involves ordering read/write requests according to the current position of the
hard disk head.
However, with the development of Solid State Drives and Non-Volatile Memories
without mechanical parts nor random access penalty and capable of performing
high parallel access, the bottleneck of the stack had moved from the storage
device to the operating system. In order to take advantage of the parallelism
in those devices' design, the multi-queue mechanism was introduced.
The former design had a single queue to store block IO requests with a single
lock. That did not scale well in SMP systems due to dirty data in cache and the
bottleneck of having a single lock for multiple processors. This setup also
suffered with congestion when different processes (or the same process, moving
to different CPUs) wanted to perform block IO. Instead of this, the blk-mq API
spawns multiple queues with individual entry points local to the CPU, removing
the need for a lock. A deeper explanation on how this works is covered in the
following section (`Operation`_).
Operation
---------
When the userspace performs IO to a block device (reading or writing a file,
for instance), blk-mq takes action: it will store and manage IO requests to
the block device, acting as middleware between the userspace (and a file
system, if present) and the block device driver.
blk-mq has two group of queues: software staging queues and hardware dispatch
queues. When the request arrives at the block layer, it will try the shortest
path possible: send it directly to the hardware queue. However, there are two
cases that it might not do that: if there's an IO scheduler attached at the
layer or if we want to try to merge requests. In both cases, requests will be
sent to the software queue.
Then, after the requests are processed by software queues, they will be placed
at the hardware queue, a second stage queue were the hardware has direct access
to process those requests. However, if the hardware does not have enough
resources to accept more requests, blk-mq will places requests on a temporary
queue, to be sent in the future, when the hardware is able.
Software staging queues
~~~~~~~~~~~~~~~~~~~~~~~
The block IO subsystem adds requests in the software staging queues
(represented by struct :c:type:`blk_mq_ctx`) in case that they weren't sent
directly to the driver. A request is one or more BIOs. They arrived at the
block layer through the data structure struct :c:type:`bio`. The block layer
will then build a new structure from it, the struct :c:type:`request` that will
be used to communicate with the device driver. Each queue has its own lock and
the number of queues is defined by a per-CPU or per-node basis.
The staging queue can be used to merge requests for adjacent sectors. For
instance, requests for sector 3-6, 6-7, 7-9 can become one request for 3-9.
Even if random access to SSDs and NVMs have the same time of response compared
to sequential access, grouped requests for sequential access decreases the
number of individual requests. This technique of merging requests is called
plugging.
Along with that, the requests can be reordered to ensure fairness of system
resources (e.g. to ensure that no application suffers from starvation) and/or to
improve IO performance, by an IO scheduler.
IO Schedulers
^^^^^^^^^^^^^
There are several schedulers implemented by the block layer, each one following
a heuristic to improve the IO performance. They are "pluggable" (as in plug
and play), in the sense of they can be selected at run time using sysfs. You
can read more about Linux's IO schedulers `here
<https://www.kernel.org/doc/html/latest/block/index.html>`_. The scheduling
happens only between requests in the same queue, so it is not possible to merge
requests from different queues, otherwise there would be cache trashing and a
need to have a lock for each queue. After the scheduling, the requests are
eligible to be sent to the hardware. One of the possible schedulers to be
selected is the NONE scheduler, the most straightforward one. It will just
place requests on whatever software queue the process is running on, without
any reordering. When the device starts processing requests in the hardware
queue (a.k.a. run the hardware queue), the software queues mapped to that
hardware queue will be drained in sequence according to their mapping.
Hardware dispatch queues
~~~~~~~~~~~~~~~~~~~~~~~~
The hardware queue (represented by struct :c:type:`blk_mq_hw_ctx`) is a struct
used by device drivers to map the device submission queues (or device DMA ring
buffer), and are the last step of the block layer submission code before the
low level device driver taking ownership of the request. To run this queue, the
block layer removes requests from the associated software queues and tries to
dispatch to the hardware.
If it's not possible to send the requests directly to hardware, they will be
added to a linked list (:c:type:`hctx->dispatch`) of requests. Then,
next time the block layer runs a queue, it will send the requests laying at the
:c:type:`dispatch` list first, to ensure a fairness dispatch with those
requests that were ready to be sent first. The number of hardware queues
depends on the number of hardware contexts supported by the hardware and its
device driver, but it will not be more than the number of cores of the system.
There is no reordering at this stage, and each software queue has a set of
hardware queues to send requests for.
.. note::
Neither the block layer nor the device protocols guarantee
the order of completion of requests. This must be handled by
higher layers, like the filesystem.
Tag-based completion
~~~~~~~~~~~~~~~~~~~~
In order to indicate which request has been completed, every request is
identified by an integer, ranging from 0 to the dispatch queue size. This tag
is generated by the block layer and later reused by the device driver, removing
the need to create a redundant identifier. When a request is completed in the
drive, the tag is sent back to the block layer to notify it of the finalization.
This removes the need to do a linear search to find out which IO has been
completed.
Further reading
---------------
- `Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems <http://kernel.dk/blk-mq.pdf>`_
- `NOOP scheduler <https://en.wikipedia.org/wiki/Noop_scheduler>`_
- `Null block device driver <https://www.kernel.org/doc/html/latest/block/null_blk.html>`_
Source code documentation
=========================
.. kernel-doc:: include/linux/blk-mq.h
.. kernel-doc:: block/blk-mq.c

Просмотреть файл

@ -10,6 +10,7 @@ Block
bfq-iosched
biodoc
biovecs
blk-mq
capability
cmdline-partition
data-integrity

Просмотреть файл

@ -9,7 +9,7 @@ access to block devices to specific initiators in a shared storage
setup.
This document gives a general overview of the support ioctl commands.
For a more detailed reference please refer the the SCSI Primary
For a more detailed reference please refer to the SCSI Primary
Commands standard, specifically the section on Reservations and the
"PERSISTENT RESERVE IN" and "PERSISTENT RESERVE OUT" commands.

Просмотреть файл

@ -117,6 +117,20 @@ Maximum number of elements in a DMA scatter/gather list with integrity
data that will be submitted by the block layer core to the associated
block driver.
max_active_zones (RO)
---------------------
For zoned block devices (zoned attribute indicating "host-managed" or
"host-aware"), the sum of zones belonging to any of the zone states:
EXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value.
If this value is 0, there is no limit.
max_open_zones (RO)
-------------------
For zoned block devices (zoned attribute indicating "host-managed" or
"host-aware"), the sum of zones belonging to any of the zone states:
EXPLICIT OPEN or IMPLICIT OPEN, is limited by this value.
If this value is 0, there is no limit.
max_sectors_kb (RW)
-------------------
This is the maximum number of kilobytes that the block layer will allow

Просмотреть файл

@ -47,7 +47,7 @@ the Forced Unit Access is implemented. The REQ_PREFLUSH and REQ_FUA flags
may both be set on a single bio.
Implementation details for make_request_fn based block drivers
Implementation details for bio based block drivers
--------------------------------------------------------------
These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit

Просмотреть файл

@ -643,5 +643,6 @@ when:
.. _selftests: ../../tools/testing/selftests/bpf/
.. _Documentation/dev-tools/kselftest.rst:
https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html
.. _Documentation/bpf/btf.rst: btf.rst
Happy BPF hacking!

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше