During interrupt setup we allocate interrupt vectors, walk the list of msi
descriptors, and fill in the message data. Requesting more interrupts than
supported on s390 can lead to an out of bounds access.
When we restrict the number of interrupts we should also stop walking the
msi list after all supported interrupts are handled.
Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
During review of KVM patches it was complained that the
ap_instructions_available() function returns 0 if AP
instructions are available and -ENODEV if not. The function
acts like a boolean function to check for AP instructions
available and thus should return 0 on failure and != 0 on
success. Changed to the suggested behaviour and adapted
the one and only caller of this function which is the ap
bus core code.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx,
hisi_sas, smartpqi, megaraid_sas, arcmsr. In addition, with the
continuing absence of Nic we have target updates for tcmu and target
core (all with reviews and acks). The biggest observable change is
going to be that we're (again) trying to switch to mulitqueue as the
default (a user can still override the setting on the kernel command
line). Other major core stuff is the removal of the remaining
Microchannel drivers, an update of the internal timers and some
reworks of completion and result handling.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCW3R3niYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishauRAP4yfBKK
dbxF81c/Bxi/Stk16FWkOOrjs4CizwmnMcpM5wD/UmM9o6ebDzaYpZgA8wIl7X/N
o/JckEZZpIp+5NySZNc=
=ggLB
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx,
hisi_sas, smartpqi, megaraid_sas, arcmsr.
In addition, with the continuing absence of Nic we have target updates
for tcmu and target core (all with reviews and acks).
The biggest observable change is going to be that we're (again) trying
to switch to mulitqueue as the default (a user can still override the
setting on the kernel command line).
Other major core stuff is the removal of the remaining Microchannel
drivers, an update of the internal timers and some reworks of
completion and result handling"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: core: use blk_mq_run_hw_queues in scsi_kick_queue
scsi: ufs: remove unnecessary query(DM) UPIU trace
scsi: qla2xxx: Fix issue reported by static checker for qla2x00_els_dcmd2_sp_done()
scsi: aacraid: Spelling fix in comment
scsi: mpt3sas: Fix calltrace observed while running IO & reset
scsi: aic94xx: fix an error code in aic94xx_init()
scsi: st: remove redundant pointer STbuffer
scsi: qla2xxx: Update driver version to 10.00.00.08-k
scsi: qla2xxx: Migrate NVME N2N handling into state machine
scsi: qla2xxx: Save frame payload size from ICB
scsi: qla2xxx: Fix stalled relogin
scsi: qla2xxx: Fix race between switch cmd completion and timeout
scsi: qla2xxx: Fix Management Server NPort handle reservation logic
scsi: qla2xxx: Flush mailbox commands on chip reset
scsi: qla2xxx: Fix unintended Logout
scsi: qla2xxx: Fix session state stuck in Get Port DB
scsi: qla2xxx: Fix redundant fc_rport registration
scsi: qla2xxx: Silent erroneous message
scsi: qla2xxx: Prevent sysfs access when chip is down
scsi: qla2xxx: Add longer window for chip reset
...
We've added duty cycle support to the clk API so that clk signal
duty cycle ratios can be adjusted while taking into account things
like clk dividers and clk tree hierarchy. So far only one SoC has
implemented support for this, but I expect there will be more to
come in the future.
Outside of the core, we have the usual pile of clk driver updates
and additions. The Amlogic meson driver got the most lines in the
diffstat this time around because it added support for a whole bunch
of hardware and duty cycle configuration. After that the Rockchip PX30,
Qualcomm SDM845, and Renesas SoC drivers fill in a majority of the diff.
We're left with the collection of non-critical fixes after that. Overall
it looks pretty quiet this time.
Core:
- Clk duty cycle support
- Proper CLK_SET_RATE_GATE support throughout the tree
New Drivers:
- Actions Semi Owl series S700 SoC clk driver
- Qualcomm SDM845 display clock controller
- i.MX6SX ocram_s clk support
- Uniphier NAND, USB3 PHY, and SPI clk support
- Qualcomm RPMh clk driver
- i.MX7D mailbox clk support
- Maxim 9485 Programmable Clock Generator
- Expose 32 kHz PLL on PXA SoCs
- imx6sll GPIO clk gate support
- Atmel at91 I2S audio clk support
- SI544/SI514 clk on/off support
- i.MX6UL GPIO clock gates in CCM CCGR
- Renesas Crypto Engine clocks on R-Car H3
- Renesas clk support for the new RZ/N1D SoC
- Allwinner A64 display engine clock support
- Support for Rockchip's PX30 SoC
- Amlogic Meson axg PCIe and audio clocks
- Amlogic Meson GEN CLK on gxbb, gxl and axg
Updates:
- Remove an unused variable from Exynos4412 ISP driver
- Fix a thinko bug in SCMI clk division logic
- Add missing of_node_put()s in some i.MX clk drivers
- Tegra SDMMC clk jitter improvements with high speed signaling modes
- SPDX tagging for qcom and cs2000-cp drivers
- Stop leaking con ids in __clk_put()
- Fix a corner case in fixed factor clk probing where node is in DT but
parent clk is registered much later
- Marvell Armada 3700 clk_pm_cpu_get_parent() had an invalid return value
- i.MX clk init arrays removed in place of CLK_IS_CRITICAL
- Convert to CLK_IS_CRITICAL for i.MX51/53 driver
- Fix Tegra BPMP driver oops when xlating a NULL clk
- Proper default configuration for vic03 and vde clks on Tegra124
- Mark Tegra memory controller clks as critical
- Fix array bounds clamp in Tegra's emc determine_rate() op
- Ingenic i2s bit update and allow UDC clk to gate
- Fix name of aspeed SDC clk define to have only one 'CLK'
- Fix i.MX6QDL video clk parent
- Critical clk markings for qcom SDM845
- Fix Stratix10 mpu_free_clk and sdmmc_free_clk parents
- Mark Rockchip's pclk_rkpwm_pmu as critical clock, due to it supplying
the pwm used to drive the logic supply of the rk3399 core.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAlt0WD0RHHNib3lkQGtl
cm5lbC5vcmcACgkQrQKIl8bklSX5jBAAlMLb0fqnuAGNJeXZDk5rsCa496LMyGWx
ku7uLA2H68SlbSQqq8FUPoCjDZkmsu2CbOX1U2/H4HFDS0pqpPiV3mZNtSeacedp
4Wf8yUB5G3xdq9QUCSX5LxMmEQoGeJ+gaTspBvM6sNvEMBR2kEMGBqUy768tnDTR
qCQ8Q1jOU6l8IdFV0SZGssmZ+oFqOyQoJVquPWPkw1+p/2f1KyYIyG5J5FXGxgcR
1XQITY/I/dShQ2wd+ZeDdt+GjZqIXQ06Pt3ruRG7HVP79Zt1XCRJd5dZ2lf+Wj8T
1ul3TWCAMYZ8gCPebLMbBGzKvQJQJcDU6DpIZsrUDN+C6z7KCS9vqeCxP9cF+3jJ
LOmA6cWE7z9Vkk9s0I0KJJ2Sw7wRoXzE5OJcwa/yousSz3s9cX+F8SAkdZs77oUF
0XnzPsvwdHI/egQ4UrsStPHM/gOFhsQqo8vvm5xaaTR2AxLKBHuPa9oUv9YpO/P5
J6FCst3qeY3Wp69fJ5/Z058OFOAt81dKXij2fZJBOO4KJy7Kse8Sz5ApybXVAbY5
lfvx+KGMITFqLYrcRIQZmlCuoHcMwI0FtHr9Ens5GXdbrJ+W+FlvP43eLCA0ZmRx
9DidemChj3k3PC3H6tbax/jzV4IIxZdyUoBJ1imL4uyhhaXp/qr45A/aGwNp8Q8a
WvkIGm3epK4=
=Mcn+
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The new and exciting feature this time around is in the clk core.
We've added duty cycle support to the clk API so that clk signal duty
cycle ratios can be adjusted while taking into account things like clk
dividers and clk tree hierarchy. So far only one SoC has implemented
support for this, but I expect there will be more to come in the
future.
Outside of the core, we have the usual pile of clk driver updates and
additions. The Amlogic meson driver got the most lines in the diffstat
this time around because it added support for a whole bunch of
hardware and duty cycle configuration. After that the Rockchip PX30,
Qualcomm SDM845, and Renesas SoC drivers fill in a majority of the
diff. We're left with the collection of non-critical fixes after that.
Overall it looks pretty quiet this time.
Core:
- Clk duty cycle support
- Proper CLK_SET_RATE_GATE support throughout the tree
New Drivers:
- Actions Semi Owl series S700 SoC clk driver
- Qualcomm SDM845 display clock controller
- i.MX6SX ocram_s clk support
- Uniphier NAND, USB3 PHY, and SPI clk support
- Qualcomm RPMh clk driver
- i.MX7D mailbox clk support
- Maxim 9485 Programmable Clock Generator
- expose 32 kHz PLL on PXA SoCs
- imx6sll GPIO clk gate support
- Atmel at91 I2S audio clk support
- SI544/SI514 clk on/off support
- i.MX6UL GPIO clock gates in CCM CCGR
- Renesas Crypto Engine clocks on R-Car H3
- Renesas clk support for the new RZ/N1D SoC
- Allwinner A64 display engine clock support
- support for Rockchip's PX30 SoC
- Amlogic Meson axg PCIe and audio clocks
- Amlogic Meson GEN CLK on gxbb, gxl and axg
Updates:
- remove an unused variable from Exynos4412 ISP driver
- fix a thinko bug in SCMI clk division logic
- add missing of_node_put()s in some i.MX clk drivers
- Tegra SDMMC clk jitter improvements with high speed signaling modes
- SPDX tagging for qcom and cs2000-cp drivers
- stop leaking con ids in __clk_put()
- fix a corner case in fixed factor clk probing where node is in DT
but parent clk is registered much later
- Marvell Armada 3700 clk_pm_cpu_get_parent() had an invalid return
value
- i.MX clk init arrays removed in place of CLK_IS_CRITICAL
- convert to CLK_IS_CRITICAL for i.MX51/53 driver
- fix Tegra BPMP driver oops when xlating a NULL clk
- proper default configuration for vic03 and vde clks on Tegra124
- mark Tegra memory controller clks as critical
- fix array bounds clamp in Tegra's emc determine_rate() op
- Ingenic i2s bit update and allow UDC clk to gate
- fix name of aspeed SDC clk define to have only one 'CLK'
- fix i.MX6QDL video clk parent
- critical clk markings for qcom SDM845
- fix Stratix10 mpu_free_clk and sdmmc_free_clk parents
- mark Rockchip's pclk_rkpwm_pmu as critical clock, due to it
supplying the pwm used to drive the logic supply of the rk3399
core"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (85 commits)
clk: rockchip: Add pclk_rkpwm_pmu to PMU critical clocks in rk3399
clk: cs2000-cp: convert to SPDX identifiers
clk: scmi: Fix the rounding of clock rate
clk: qcom: Add display clock controller driver for SDM845
clk: mvebu: armada-37xx-periph: Remove unused var num_parents
clk: samsung: Remove unused mout_user_aclk400_mcuisp_p4x12 variable
clk: actions: Add S700 SoC clock support
dt-bindings: clock: Add S700 support for Actions Semi Soc's
clk: actions: Add missing REGMAP_MMIO dependency
clk: uniphier: add clock frequency support for SPI
clk: uniphier: add more USB3 PHY clocks
clk: uniphier: add NAND 200MHz clock
clk: tegra: make sdmmc2 and sdmmc4 as sdmmc clocks
clk: tegra: Add sdmmc mux divider clock
clk: tegra: Refactor fractional divider calculation
clk: tegra: Fix includes required by fence_udelay()
clk: imx6sll: fix missing of_node_put()
clk: imx6ul: fix missing of_node_put()
clk: imx: add ocram_s clock for i.mx6sx
clk: mvebu: armada-37xx-periph: Fix wrong return value in get_parent
...
Utilize -mfentry and -mnop-mcount gcc options together with
-mrecord-mcount to get compiler generated calls to the profiling functions
as nops which are compatible with current -mhotpatch=0,3 approach. At the
same time -mrecord-mcount enables __mcount_loc section generation by
the compiler which allows to avoid using scripts/recordmcount.pl script.
Link: http://lkml.kernel.org/r/patch-4.thread-aa7b8d.git-aa7b8dbf236f.your-ad-here.call-01533557518-ext-9465@work.hours
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbdGisAAoJEAhfPr2O5OEVmXcQAIe+wWHfAFwFDCqRGvG7yHIV
q3Rf6bqYE/IK++sSX9qrCmPC1jXsw6oXjaYqjtdplLb6GBBkNdKo6neGtARV1phB
Moj0JVmFwvnq6yMBZl4TFIZIZHIlfXIe/Ac6b9h5aiL1bGUdnukLhSnGVnoWteVv
F6BRAhH/PDIf7xUpxchEic7HbwnWDWI51ALeDWZQCv5SkNgBUBrLDumWr7b2YCWF
BcNfC14fNN1Jk4Hi9NpS7VCC4Nxvxn+D3iFsVa10Ur/A8EPApIlj+8nu/CDA8p4c
cJa8ImvaKhK0PAfciKMi5RBs/5r5BFRlrmLADkyVz8pmz5AQqjHot/TGoC22LqaN
h0T9lTKEmFAE8CYFQhcOfb6qcR6bfnM6yyVjeCrnAQ3Fw9TgESiPbPE7BpgnxoTU
i3h7a5WHYs4xjkMByJUYda1GhahPD82eJJp8l4iIQ2IMlTZbrEmLFmg13WNO3950
a+dy3HqpXUd2EYwFn9CVFzjTzWKE+63obQWIxLFWJWRirqr5XKbTmB7nPLeik6Os
kPmn8t0eKig76SYiJ2rhKgTMUB1rXjtBYtqP/f02FYN0WSZCEIs7oKMb/SoVfTlP
2rF6k2ZPTzB2ZJp3RzNRX/+/39f04i26QzeEwwQ6/8daxSna3YlKuLMvHzJ5UuFp
q28aN8hD5eXU1IEWHtvH
=ClJZ
-----END PGP SIGNATURE-----
Merge tag 'media/v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- new Socionext MN88443x ISDB-S/T demodulator driver: mn88443x
- new sensor drivers: ak7375, ov2680 and rj54n1cb0c
- an old soc-camera sensor driver converted to the V4L2 framework:
mt9v111
- a new Voice-Coil Motor (VCM) driver: dw9807-vcm
- some cleanups at cx25821, removing legacy unused code
- some improvements at ddbridge driver
- new platform driver: vicodec
- some DVB API cleanups, removing ioctls and compat code for old
out-of-tree drivers that were never merged upstream
- improvements at DVB core to support frontents that support both
Satellite and non-satellite delivery systems
- got rid of the unused VIDIOC_RESERVED V4L2 ioctl
- some cleanups/improvements at gl861 ISDB driver
- several improvements on ov772x, ov7670 and ov5640, imx274, ov5645,
and smiapp sensor drivers
- fixes at em28xx to support dual TS devices
- some cleanups at V4L2/VB2 locking logic
- some API improvements at media controller
- some cec core and drivers improvements
- some uvcvideo improvements
- some improvements at platform drivers: stm32-dcmi, rcar-vin, coda,
reneseas-ceu, imx, vsp1, venus, camss
- lots of other cleanups and fixes
* tag 'media/v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (406 commits)
Revert "media: vivid: shut up warnings due to a non-trivial logic"
siano: get rid of an unused return code for debugfs register
media: isp: fix a warning about a wrong struct initializer
media: radio-wl1273: fix return code for the polling routine
media: s3c-camif: fix return code for the polling routine
media: saa7164: fix return codes for the polling routine
media: exynos-gsc: fix return code if mutex was interrupted
media: mt9v111: Fix build error with no VIDEO_V4L2_SUBDEV_API
media: xc4000: get rid of uneeded casts
media: drxj: get rid of uneeded casts
media: tuner-xc2028: don't use casts for printing sizes
media: cleanup fall-through comments
media: vivid: shut up warnings due to a non-trivial logic
media: rtl28xxu: be sure that it won't go past the array size
media: mt9v111: avoid going past the buffer
media: vsp1_dl: add a description for cmdpool field
media: sta2x11: add a missing parameter description
media: v4l2-mem2mem: add descriptions to MC fields
media: i2c: fix warning in Aptina MT9V111
media: imx: shut up a false positive warning
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbc41pAAoJEAx081l5xIa+ZrAP/AzKj4i4pBLVJcvNZ2BwD+UD
ZNSNj2iqCJ5+Jo/WtIwQ8tLct9UqfVssUwBke6tZksiLdTigGPTUyVIAdK+9kyWD
D00m3x/pToJrSF2D0FwxQlPUtPkohp9N+E6+TU7gd1oCasZfBzmcEpoVAmZf+NWE
kN1xXpmGxZWpu0wc7JA2lv9MuUTijCwIqJqa5E0bB3z06G5mw+PJ89kYzMx19OyA
ZYQK8y3A40ZGl8UbajZ4xg9pqFCRYFFHGqfYlpUWWTh0XMAXu8+Yqzh3dJxmak7r
4u2pdQBsxPMZO8qKBHpVvI7Zhoe0Ntnolc0XVD+2IbqqnTprVbQs0bWf3YyfUlQi
1/9bWFK67W0LEuzac6M7a7EQqFNiHF13Btao7aqENTIe/GaCZJoopaiRMAmh6EHD
4PezeYqrW8cSaPj6OKouL1BhW9Bjixsg0bvjS/uB6m4KekFCt1++BDFGzkqvm6Mo
SVW7nkJoCFpCASaR7DhUEOPexaHeJ65HCDDUvYdqz9jd2w1TgvvanEZWual1NwEm
ImA8A4wGZ/3KijpyyKm0gE96RX7+zMMZ3brW6p1vhUUKVYJCrvSr5jrXH5+2k6Aw
Y455doGL87IRkwyje/YbQF0I8pbUZD9QS5wII13tLGwOH9/uC/Xl6dHNM40gtqyh
W4gEdY+NAMJmYLvRNawa
=g9rD
-----END PGP SIGNATURE-----
Merge tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm
Pull drm updates from Dave Airlie:
"This is the main drm pull request for 4.19.
Rob has some new hardware support for new qualcomm hw that I'll send
along separately. This has the display part of it, the remaining pull
is for the acceleration engine.
This also contains a wound-wait/wait-die mutex rework, Peter has acked
it for merging via my tree.
Otherwise mostly the usual level of activity. Summary:
core:
- Wound-wait/wait-die mutex rework
- Add writeback connector type
- Add "content type" property for HDMI
- Move GEM bo to drm_framebuffer
- Initial gpu scheduler documentation
- GPU scheduler fixes for dying processes
- Console deferred fbcon takeover support
- Displayport support for CEC tunneling over AUX
panel:
- otm8009a panel driver fixes
- Innolux TV123WAM and G070Y2-L01 panel driver
- Ilitek ILI9881c panel driver
- Rocktech RK070ER9427 LCD
- EDT ETM0700G0EDH6 and EDT ETM0700G0BDH6
- DLC DLC0700YZG-1
- BOE HV070WSA-100
- newhaven, nhd-4.3-480272ef-atxl LCD
- DataImage SCF0700C48GGU18
- Sharp LQ035Q7DB03
- p079zca: Refactor to support multiple panels
tinydrm:
- ILI9341 display panel
New driver:
- vkms - virtual kms driver to testing.
i915:
- Icelake:
Display enablement
DSI support
IRQ support
Powerwell support
- GPU reset fixes and improvements
- Full ppgtt support refactoring
- PSR fixes and improvements
- Execlist improvments
- GuC related fixes
amdgpu:
- Initial amdgpu documentation
- JPEG engine support on VCN
- CIK uses powerplay by default
- Move to using core PCIE functionality for gens/lanes
- DC/Powerplay interface rework
- Stutter mode support for RV
- Vega12 Powerplay updates
- GFXOFF fixes
- GPUVM fault debugging
- Vega12 GFXOFF
- DC improvements
- DC i2c/aux changes
- UVD 7.2 fixes
- Powerplay fixes for Polaris12, CZ/ST
- command submission bo_list fixes
amdkfd:
- Raven support
- Power management fixes
udl:
- Cleanups and fixes
nouveau:
- misc fixes and cleanups.
msm:
- DPU1 support display controller in sdm845
- GPU coredump support.
vmwgfx:
- Atomic modesetting validation fixes
- Support for multisample surfaces
armada:
- Atomic modesetting support completed.
exynos:
- IPPv2 fixes
- Move g2d to component framework
- Suspend/resume support cleanups
- Driver cleanups
imx:
- CSI configuration improvements
- Driver cleanups
- Use atomic suspend/resume helpers
- ipu-v3 V4L2 XRGB32/XBGR32 support
pl111:
- Add Nomadik LCDC variant
v3d:
- GPU scheduler jobs management
sun4i:
- R40 display engine support
- TCON TOP driver
mediatek:
- MT2712 SoC support
rockchip:
- vop fixes
omapdrm:
- Workaround for DRA7 errata i932
- Fix mm_list locking
mali-dp:
- Writeback implementation
PM improvements
- Internal error reporting debugfs
tilcdc:
- Single fix for deferred probing
hdlcd:
- Teardown fixes
tda998x:
- Converted to a bridge driver.
etnaviv:
- Misc fixes"
* tag 'drm-next-2018-08-15' of git://anongit.freedesktop.org/drm/drm: (1506 commits)
drm/amdgpu/sriov: give 8s for recover vram under RUNTIME
drm/scheduler: fix param documentation
drm/i2c: tda998x: correct PLL divider calculation
drm/i2c: tda998x: get rid of private fill_modes function
drm/i2c: tda998x: move mode_valid() to bridge
drm/i2c: tda998x: register bridge outside of component helper
drm/i2c: tda998x: cleanup from previous changes
drm/i2c: tda998x: allocate tda998x_priv inside tda998x_create()
drm/i2c: tda998x: convert to bridge driver
drm/scheduler: fix timeout worker setup for out of order job completions
drm/amd/display: display connected to dp-1 does not light up
drm/amd/display: update clk for various HDMI color depths
drm/amd/display: program display clock on cache match
drm/amd/display: Add NULL check for enabling dp ss
drm/amd/display: add vbios table check for enabling dp ss
drm/amd/display: Don't share clk source between DP and HDMI
drm/amd/display: Fix DP HBR2 Eye Diagram Pattern on Carrizo
drm/amd/display: Use calculated disp_clk_khz value for dce110
drm/amd/display: Implement custom degamma lut on dcn
drm/amd/display: Destroy aux_engines only once
...
Pull networking updates from David Miller:
"Highlights:
- Gustavo A. R. Silva keeps working on the implicit switch fallthru
changes.
- Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
Luca Coelho.
- Re-enable ASPM in r8169, from Kai-Heng Feng.
- Add virtual XFRM interfaces, which avoids all of the limitations of
existing IPSEC tunnels. From Steffen Klassert.
- Convert GRO over to use a hash table, so that when we have many
flows active we don't traverse a long list during accumluation.
- Many new self tests for routing, TC, tunnels, etc. Too many
contributors to mention them all, but I'm really happy to keep
seeing this stuff.
- Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.
- Lots of cleanups and fixes in L2TP code from Guillaume Nault.
- Add IPSEC offload support to netdevsim, from Shannon Nelson.
- Add support for slotting with non-uniform distribution to netem
packet scheduler, from Yousuk Seung.
- Add UDP GSO support to mlx5e, from Boris Pismenny.
- Support offloading of Team LAG in NFP, from John Hurley.
- Allow to configure TX queue selection based upon RX queue, from
Amritha Nambiar.
- Support ethtool ring size configuration in aquantia, from Anton
Mikaev.
- Support DSCP and flowlabel per-transport in SCTP, from Xin Long.
- Support list based batching and stack traversal of SKBs, this is
very exciting work. From Edward Cree.
- Busyloop optimizations in vhost_net, from Toshiaki Makita.
- Introduce the ETF qdisc, which allows time based transmissions. IGB
can offload this in hardware. From Vinicius Costa Gomes.
- Add parameter support to devlink, from Moshe Shemesh.
- Several multiplication and division optimizations for BPF JIT in
nfp driver, from Jiong Wang.
- Lots of prepatory work to make more of the packet scheduler layer
lockless, when possible, from Vlad Buslov.
- Add ACK filter and NAT awareness to sch_cake packet scheduler, from
Toke Høiland-Jørgensen.
- Support regions and region snapshots in devlink, from Alex Vesker.
- Allow to attach XDP programs to both HW and SW at the same time on
a given device, with initial support in nfp. From Jakub Kicinski.
- Add TLS RX offload and support in mlx5, from Ilya Lesokhin.
- Use PHYLIB in r8169 driver, from Heiner Kallweit.
- All sorts of changes to support Spectrum 2 in mlxsw driver, from
Ido Schimmel.
- PTP support in mv88e6xxx DSA driver, from Andrew Lunn.
- Make TCP_USER_TIMEOUT socket option more accurate, from Jon
Maxwell.
- Support for templates in packet scheduler classifier, from Jiri
Pirko.
- IPV6 support in RDS, from Ka-Cheong Poon.
- Native tproxy support in nf_tables, from Máté Eckl.
- Maintain IP fragment queue in an rbtree, but optimize properly for
in-order frags. From Peter Oskolkov.
- Improvde handling of ACKs on hole repairs, from Yuchung Cheng"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
hv/netvsc: Fix NULL dereference at single queue mode fallback
net: filter: mark expected switch fall-through
xen-netfront: fix warn message as irq device name has '/'
cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
net: dsa: mv88e6xxx: missing unlock on error path
rds: fix building with IPV6=m
inet/connection_sock: prefer _THIS_IP_ to current_text_addr
net: dsa: mv88e6xxx: bitwise vs logical bug
net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
ieee802154: hwsim: using right kind of iteration
net: hns3: Add vlan filter setting by ethtool command -K
net: hns3: Set tx ring' tc info when netdev is up
net: hns3: Remove tx ring BD len register in hns3_enet
net: hns3: Fix desc num set to default when setting channel
net: hns3: Fix for phy link issue when using marvell phy driver
net: hns3: Fix for information of phydev lost problem when down/up
net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
net: hns3: Add support for serdes loopback selftest
bnxt_en: take coredump_record structure off stack
...
i8259.h uses inb/outb and thus needs to include asm/io.h to avoid the
following build error, as seen with x86_64:defconfig and CONFIG_SMP=n.
In file included from drivers/rtc/rtc-cmos.c:45:0:
arch/x86/include/asm/i8259.h: In function 'inb_pic':
arch/x86/include/asm/i8259.h:32:24: error:
implicit declaration of function 'inb'
arch/x86/include/asm/i8259.h: In function 'outb_pic':
arch/x86/include/asm/i8259.h:45:2: error:
implicit declaration of function 'outb'
Reported-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Suggested-by: Sebastian Gottschall <s.gottschall@dd-wrt.com>
Fixes: 447ae31667 ("x86: Don't include linux/irq.h from asm/hardirq.h")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the source statements of arch-independent Kconfig files instead of
duplicating the includes in every arch/$(SRCARCH)/Kconfig.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbdFsfAAoJED2LAQed4NsGxHsP/1tmA57OOOj8oGxO2OXhXVbr
Q0MZqCoV4bqMvK/hgCQdl9f+tp0m+j12x4xDLdVf4OqnTXMbqvPDu3uQVKvaj/k1
gHhsFA1tFgSbuJ8InltUsrPEQqbceeJsj50xHVAKijqI6LYeRPPSU7aE9obn+OzH
n2nd5sLKvMI/dqdJvW6i5KPydqTH3r3iA7D+ne/XQj0s0EMXvXUPmDT1+ijTnM4a
yfm6W5p7L/c3Ugf1Pz5PfnPl4BxBwZMfW5ie/UO8j5C6Rl0iPaOGuuHurocaaJb3
MefR/7NEAR3G8MhJyL2+70jbbwhjpqR2b5ooz1vpuulPHxjeU45BY60XIBWq1afR
ewsc12MMCYB695ieYWoHdaWgxD/jhffyRuajfpkXKIZEMgDxS03sMhdULXENVMx1
M0ZQ01g/NLWt9ti9DY3eTKB3ymOhnBa1sa77nGGUHkITq4DQKwPX1J9FP/HT6RNt
uOvzeH5kGzc7tqOlZAO0kHbwhQG1uqGcd78IYd4lgf/XfkSgDERTWjnJmnQbwr9m
3PFuST2u8eyO+8Lh1MK76TXOEkXsHMdFugPmb6SlgtMEPKGVLDPlsj52o/LFtgzl
eygfMiBFr2+ttkZ6IpNcpmQ4IztmDpz6XoMk3PqDAfUTUSYpCnq1gAEuff/eisCM
Odva1ZZaeQ7WpxhsP8rr
=gsQJ
-----END PGP SIGNATURE-----
Merge tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kconfig consolidation from Masahiro Yamada:
"Consolidation of Kconfig files by Christoph Hellwig.
Move the source statements of arch-independent Kconfig files instead
of duplicating the includes in every arch/$(SRCARCH)/Kconfig"
* tag 'kconfig-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: add a Memory Management options" menu
kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt
kconfig: use a menu in arch/Kconfig to reduce clutter
kconfig: include kernel/Kconfig.preempt from init/Kconfig
Kconfig: consolidate the "Kernel hacking" menu
kconfig: include common Kconfig files from top-level Kconfig
kconfig: remove duplicate SWAP symbol defintions
um: create a proper drivers Kconfig
um: cleanup Kconfig files
um: stop abusing KBUILD_KCONFIG
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
* pci/resource:
PCI: Make pci_get_rom_size() static
PCI: Add check code for last image indicator not set
PCI: Avoid accessing memory outside the ROM BAR
PCI: Make early dump functionality generic
PCI: Cleanup PCI_REBAR_CTRL_BAR_SHIFT handling
PCI: Restore resized BAR state on resume
PCI: Clean up resource allocation in devm_of_pci_get_host_bridge_resources()
# Conflicts:
# Documentation/admin-guide/kernel-parameters.txt
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied
(Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum (Jakub
Kicinski)
* pci/misc:
PCI: Limit config space size for Netronome NFP5000
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Unify PCI and normal DMA direction definitions
PCI: Use IRQF_ONESHOT if pci_request_irq() called with no handler
PCI: Call dma_debug_add_bus() for pci_bus_type from PCI core
PCI: Mark fall-through switch cases before enabling -Wimplicit-fallthrough
# Conflicts:
# drivers/pci/hotplug/pciehp_ctrl.c
- verify depmod is installed before modules_install
- support build salt in case build ids must be unique between builds
- allow users to specify additional host compiler flags via HOST*FLAGS,
and rename internal variables to KBUILD_HOST*FLAGS
- update buildtar script to drop vax support, add arm64 support
- update builddeb script for better debarch support
- document the pit-fall of if_changed usage
- fix parallel build of UML with O= option
- make 'samples' target depend on headers_install to fix build errors
- remove deprecated host-progs variable
- add a new coccinelle script for refcount_t vs atomic_t check
- improve double-test coccinelle script
- misc cleanups and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbdFZ0AAoJED2LAQed4NsGcHYP/23txxk3GRP7O4UkfPw9Rtky
MHiXTgcoy2vbG+l12BgzWX+qFii8XTUe3dQtK4HnGQFUIBtEBV/hpZPJtxfgGSev
Zou5cv1kr5rNzTkCn//TG3O6/WIkTBCe2hahDCtmGDI3kd/cPK4dHbU/q6KpaqIJ
qzZYBXIvCeu2GM8idQoCRrwdMpgu1pBz1gz2sDje1yHH2toI7T6cXHRLQDgx+HPq
LIP7W9GUsoDdXjecvPD51LiW89E6BUxETBh5Ft9r9uzwB5ylQQMcw6Qyu2DiYDUX
PPsHCMiolYV+Ttcy+vj/67KOvKmEaFotssck+RD/xDCF17zKhRkup+YM8kPLHTVZ
TcAUZadbnT6U/s2W6GFwvVbN/P7cc3aif+aNCC/Pl23yagp3pydlSCocYxQgiVR7
/rx48haYDEgu/MJ1X0dOpSO0ErY7zu2OoAlNerW+D9QizwbP+WtZO/CJH8SxQRuN
dQ1xmyNrie+ODgi9tbc4eBrsb+1rioX927TP5MbJcfXt5CTsxDmIqop5XwyYIoQN
ZWWlzC8Ii3P2trAVpBgM2IEbngSxwr6T9Wbf1ScJnPKr/o1rq+pBk49cYstTz3kQ
OwJ8gPwUrkW4R+hlD7L6mL/WcrKzZBQS0Ij1QW2kVSEhRrsKo99psE1/rGehnHu9
KGB0LYYCqGSOHR4zOjg0
=VjfG
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- verify depmod is installed before modules_install
- support build salt in case build ids must be unique between builds
- allow users to specify additional host compiler flags via HOST*FLAGS,
and rename internal variables to KBUILD_HOST*FLAGS
- update buildtar script to drop vax support, add arm64 support
- update builddeb script for better debarch support
- document the pit-fall of if_changed usage
- fix parallel build of UML with O= option
- make 'samples' target depend on headers_install to fix build errors
- remove deprecated host-progs variable
- add a new coccinelle script for refcount_t vs atomic_t check
- improve double-test coccinelle script
- misc cleanups and fixes
* tag 'kbuild-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (41 commits)
coccicheck: return proper error code on fail
Coccinelle: doubletest: reduce side effect false positives
kbuild: remove deprecated host-progs variable
kbuild: make samples really depend on headers_install
um: clean up archheaders recipe
kbuild: add %asm-generic to no-dot-config-targets
um: fix parallel building with O= option
scripts: Add Python 3 support to tracing/draw_functrace.py
builddeb: Add automatic support for sh{3,4}{,eb} architectures
builddeb: Add automatic support for riscv* architectures
builddeb: Add automatic support for m68k architecture
builddeb: Add automatic support for or1k architecture
builddeb: Add automatic support for sparc64 architecture
builddeb: Add automatic support for mips{,64}r6{,el} architectures
builddeb: Add automatic support for mips64el architecture
builddeb: Add automatic support for ppc64 and powerpcspe architectures
builddeb: Introduce functions to simplify kconfig tests in set_debarch
builddeb: Drop check for 32-bit s390
builddeb: Change architecture detection fallback to use dpkg-architecture
builddeb: Skip architecture detection when KBUILD_DEBARCH is set
...
allmodconfig+CONFIG_INTEL_KVM=n results in the following build error.
ERROR: "l1tf_vmx_mitigation" [arch/x86/kvm/kvm.ko] undefined!
Fixes: 5b76a3cff0 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit e641a31783 ("KVM: PPC: Book3S HV: Unify dirty page map
between HPT and radix", 2017-10-26), kvm_unmap_radix() computes the
number of PAGE_SIZEd pages being unmapped and passes it to
kvmppc_update_dirty_map(), which expects to be passed the page size
instead. Consequently it will only mark one system page dirty even
when a large page (for example a THP page) is being unmapped. The
consequence of this is that part of the THP page might not get copied
during live migration, resulting in memory corruption for the guest.
This fixes it by computing and passing the page size in kvm_unmap_radix().
Cc: stable@vger.kernel.org # v4.15+
Fixes: e641a31783 (KVM: PPC: Book3S HV: Unify dirty page map between HPT and radix)
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW3LkCgAKCRCAXGG7T9hj
vtyfAQDTMUqfBlpz9XqFyTBTFRkP3aVtnEeE7BijYec+RXPOxwEAsiXwZPsmW/AN
up+NEHqPvMOcZC8zJZ9THCiBgOxligY=
=F51X
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- add dma-buf functionality to Xen grant table handling
- fix for booting the kernel as Xen PVH dom0
- fix for booting the kernel as a Xen PV guest with
CONFIG_DEBUG_VIRTUAL enabled
- other minor performance and style fixes
* tag 'for-linus-4.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: fix balloon initialization for PVH Dom0
xen: don't use privcmd_call() from xen_mc_flush()
xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits
xen/biomerge: Use true and false for boolean values
xen/gntdev: don't dereference a null gntdev_dmabuf on allocation failure
xen/spinlock: Don't use pvqspinlock if only 1 vCPU
xen/gntdev: Implement dma-buf import functionality
xen/gntdev: Implement dma-buf export functionality
xen/gntdev: Add initial support for dma-buf UAPI
xen/gntdev: Make private routines/structures accessible
xen/gntdev: Allow mappings for DMA buffers
xen/grant-table: Allow allocating buffers suitable for DMA
xen/balloon: Share common memory reservation routines
xen/grant-table: Make set/clear page private code shared
A bunch of good stuff in here:
- Wire up support for qspinlock, replacing our trusty ticket lock code
- Add an IPI to flush_icache_range() to ensure that stale instructions
fetched into the pipeline are discarded along with the I-cache lines
- Support for the GCC "stackleak" plugin
- Support for restartable sequences, plus an arm64 port for the selftest
- Kexec/kdump support on systems booting with ACPI
- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace
- Support for chained PMU counters, allowing 64-bit event counters to be
constructed on current CPUs
- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events
- Re-enable support for huge vmalloc/IO mappings now that the core code
has the correct hooks to use break-before-make sequences
- Miscellaneous, non-critical fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJbbV41AAoJELescNyEwWM0WoEIALhrKtsIn6vqFlSs/w6aDuJL
cMWmFxjTaKLmIq2+cJIdFLOJ3CH80Pu9gB+nEv/k+cZdCTfUVKfRf28HTpmYWsht
bb4AhdHMC7yFW752BHk+mzJspeC8h/2Rm8wMuNVplZ3MkPrwo3vsiuJTofLhVL/y
BihlU3+5sfBvCYIsWnuEZIev+/I/s/qm1ASiqIcKSrFRZP6VTt5f9TC75vFI8seW
7yc3odKb0CArexB8yBjiPNziehctQF42doxQyL45hezLfWw4qdgHOSiwyiOMxEz9
Fwwpp8Tx33SKLNJgqoqYznGW9PhYJ7n2Kslv19uchJrEV+mds82vdDNaWRULld4=
=kQn6
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"A bunch of good stuff in here. Worth noting is that we've pulled in
the x86/mm branch from -tip so that we can make use of the core
ioremap changes which allow us to put down huge mappings in the
vmalloc area without screwing up the TLB. Much of the positive
diffstat is because of the rseq selftest for arm64.
Summary:
- Wire up support for qspinlock, replacing our trusty ticket lock
code
- Add an IPI to flush_icache_range() to ensure that stale
instructions fetched into the pipeline are discarded along with the
I-cache lines
- Support for the GCC "stackleak" plugin
- Support for restartable sequences, plus an arm64 port for the
selftest
- Kexec/kdump support on systems booting with ACPI
- Rewrite of our syscall entry code in C, which allows us to zero the
GPRs on entry from userspace
- Support for chained PMU counters, allowing 64-bit event counters to
be constructed on current CPUs
- Ensure scheduler topology information is kept up-to-date with CPU
hotplug events
- Re-enable support for huge vmalloc/IO mappings now that the core
code has the correct hooks to use break-before-make sequences
- Miscellaneous, non-critical fixes and cleanups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (90 commits)
arm64: alternative: Use true and false for boolean values
arm64: kexec: Add comment to explain use of __flush_icache_range()
arm64: sdei: Mark sdei stack helper functions as static
arm64, kaslr: export offset in VMCOREINFO ELF notes
arm64: perf: Add cap_user_time aarch64
efi/libstub: Only disable stackleak plugin for arm64
arm64: drop unused kernel_neon_begin_partial() macro
arm64: kexec: machine_kexec should call __flush_icache_range
arm64: svc: Ensure hardirq tracing is updated before return
arm64: mm: Export __sync_icache_dcache() for xen-privcmd
drivers/perf: arm-ccn: Use devm_ioremap_resource() to map memory
arm64: Add support for STACKLEAK gcc plugin
arm64: Add stack information to on_accessible_stack
drivers/perf: hisi: update the sccl_id/ccl_id when MT is supported
arm64: fix ACPI dependencies
rseq/selftests: Add support for arm64
arm64: acpi: fix alignment fault in accessing ACPI
efi/arm: map UEFI memory map even w/o runtime services enabled
efi/arm: preserve early mapping of UEFI memory map longer for BGRT
drivers: acpi: add dependency of EFI for arm64
...
Commit 5209654a46 (x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on
AMD systems) forgot to update the ACPI C1 idle state description and
tools like turbostat display "ACPI FFH INTEL MWAIT 0x0" which is
quite confusing on an AMD system.
Drop the "INTEL" part from the ACPI C1 FFH MWAIT C-state description
to avoid confusion.
Fixes: 5209654a46 (x86/ACPI/cstate: Allow ACPI C1 FFH MWAIT use on AMD systems)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Without program headers for PTI entry trampoline pages, the trampoline
virtual addresses do not map to anything.
Example before:
sudo gdb --quiet vmlinux /proc/kcore
Reading symbols from vmlinux...done.
[New process 1]
Core was generated by `BOOT_IMAGE=/boot/vmlinuz-4.16.0 root=UUID=a6096b83-b763-4101-807e-f33daff63233'.
#0 0x0000000000000000 in irq_stack_union ()
(gdb) x /21ib 0xfffffe0000006000
0xfffffe0000006000: Cannot access memory at address 0xfffffe0000006000
(gdb) quit
After:
sudo gdb --quiet vmlinux /proc/kcore
[sudo] password for ahunter:
Reading symbols from vmlinux...done.
[New process 1]
Core was generated by `BOOT_IMAGE=/boot/vmlinuz-4.16.0-fix-4-00005-gd6e65a8b4072 root=UUID=a6096b83-b7'.
#0 0x0000000000000000 in irq_stack_union ()
(gdb) x /21ib 0xfffffe0000006000
0xfffffe0000006000: swapgs
0xfffffe0000006003: mov %rsp,-0x3e12(%rip) # 0xfffffe00000021f8
0xfffffe000000600a: xchg %ax,%ax
0xfffffe000000600c: mov %cr3,%rsp
0xfffffe000000600f: bts $0x3f,%rsp
0xfffffe0000006014: and $0xffffffffffffe7ff,%rsp
0xfffffe000000601b: mov %rsp,%cr3
0xfffffe000000601e: mov -0x3019(%rip),%rsp # 0xfffffe000000300c
0xfffffe0000006025: pushq $0x2b
0xfffffe0000006027: pushq -0x3e35(%rip) # 0xfffffe00000021f8
0xfffffe000000602d: push %r11
0xfffffe000000602f: pushq $0x33
0xfffffe0000006031: push %rcx
0xfffffe0000006032: push %rdi
0xfffffe0000006033: mov $0xffffffff91a00010,%rdi
0xfffffe000000603a: callq 0xfffffe0000006046
0xfffffe000000603f: pause
0xfffffe0000006041: lfence
0xfffffe0000006044: jmp 0xfffffe000000603f
0xfffffe0000006046: mov %rdi,(%rsp)
0xfffffe000000604a: retq
(gdb) quit
In addition, entry trampolines all map to the same page. Represent that
by giving the corresponding program headers in kcore the same offset.
This has the benefit that, when perf tools uses /proc/kcore as a source
for kernel object code, samples from different CPU trampolines are
aggregated together. Note, such aggregation is normal for profiling
i.e. people want to profile the object code, not every different virtual
address the object code might be mapped to (across different processes
for example).
Notes by PeterZ:
This also adds the KCORE_REMAP functionality.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1528289651-4113-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, the addresses of PTI entry trampolines are not exported to
user space. Kernel profiling tools need these addresses to identify the
kernel code, so add a symbol and address for each CPU's PTI entry
trampoline.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1528289651-4113-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The function has an inline "return false;" definition with CONFIG_SMP=n
but the "real" definition is also visible leading to "redefinition of
‘apic_id_is_primary_thread’" compiler error.
Guard it with #ifdef CONFIG_SMP
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 6a4d2657e0 ("x86/smp: Provide topology_is_primary_thread()")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's been busy summer weeks and hence lots of changes, partly for a
few new drivers and partly for a wide range of fixes.
Here are highlights:
ALSA Core:
- Fix rawmidi buffer management, code cleanup / refactoring
- Fix the SG-buffer page handling with incorrect fallback size
- Fix the stall at virmidi trigger callback with a large buffer;
also offloading and code-refactoring along with it
- Various ALSA sequencer code cleanups
ASoC:
- Deploy the standard snd_pcm_stop_xrun() helper in several drivers
- Support for providing name prefixes to generic component nodes
- Quite a few fixes for DPCM as it gains a bit wider use and more
robust testing
- Generalization of the DIO2125 support to a simple amplifier driver
- Accessory detection support for the audio graph card
- DT support for PXA AC'97 devices
- Quirks for a number of new x86 systems
- Support for AM Logic Meson, Everest ES7154, Intel systems with
RT5682, Qualcomm QDSP6 and WCD9335, Realtek RT5682 and TI TAS5707
HD-audio:
- Code refactoring in HD-audio ext codec codes to drop own classes;
preliminary works for the upcoming legacy codec support
- Generalized DRM audio component for the upcoming radeon / amdgpu
support
- Unification of mic mute-LED and GPIO support for various codecs
- Further improvement of CA0132 codec support including Recon3D
- Proper vga_switcheroo handling for AMD i-GPU
- Update of model list in documentation
- Fixups for another HP Spectre x360, Conexant codecs, power-save
blacklist update
USB-audio:
- Fix the invalid sample rate setup with external clock
- Support of UAC3 selector units and processing units
- Basic UAC3 power-domain support
- Support for Encore mDSD and Thesycon-based DSD devices
- Preparation for future complete callback changes
Firewire:
- Add support for MOTU Traveler
Misc:
- The endianess notation fixes in various drivers
- Add fall-through comment in lots of drivers
- Various sparse warning fixes, e.g. about PCM format types
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAltxhXUOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE9BPxAAnJyKM2IcfVKanjFZQmM4w1gEM3nt+EBvbOF5
155Mq5ELNLxwwCav0eyoeTVD+mB4UO1y9+64SK73dUzBvj7llILs8s5VNMg7KYn6
MnYMo0UIDlvQ/ZwJzzpU04QZVOIHa5HK9XG+u+Fycr8YOCdhGD6m/zGiBStd/xGd
Bvgw2eHF10l+zqN+Vf+1P2/ENRyNxLYpJYYC02zl0nfXP29ZY+CjicDoRvD8/97X
Pe5Jcfj/4oJZlN0MMXfLLP0vaWbUyogG3a4mzVRC+wHG2FZ5hGfFb92mfT8Yce3H
dq9eaih6zMMiDuP4ipClMv/0t64cA8HD+Du/Enq9Jc/g41QAU1JFzj4qi1Ga7S2w
F80uWCedwZ5FLXYAAK2kIunIaaD5phD+8DegzchiVzL6eg7BLmi/Rsfg9VkuC1Ir
ZZERtT07XCzwXke0IAQQ1yhTE/uMxqntCJlZ0FoohvIABgH0jvrpp/cdDYFdC3it
TaNk9REAuCVjr2ket1Jrw6pKNgtry7cDkKFK5d8S5HTNFEL+sWwz2NcSdPNRIIFT
aqeUWt/H1P5G25if/636UJtf+EtlRPqC2Eng8OpY8hitb2vB3trjY25T4a5x5FW5
S4oTyVWmxa4Xxj4QMMhpo7jxxwsux8J1fghDCpEqekiwt72CyVP2UN/cU8HJL7kN
IWqnZgg=
=eTIC
-----END PGP SIGNATURE-----
Merge tag 'sound-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"It's been busy summer weeks and hence lots of changes, partly for a
few new drivers and partly for a wide range of fixes.
Here are highlights:
ALSA Core:
- Fix rawmidi buffer management, code cleanup / refactoring
- Fix the SG-buffer page handling with incorrect fallback size
- Fix the stall at virmidi trigger callback with a large buffer; also
offloading and code-refactoring along with it
- Various ALSA sequencer code cleanups
ASoC:
- Deploy the standard snd_pcm_stop_xrun() helper in several drivers
- Support for providing name prefixes to generic component nodes
- Quite a few fixes for DPCM as it gains a bit wider use and more
robust testing
- Generalization of the DIO2125 support to a simple amplifier driver
- Accessory detection support for the audio graph card
- DT support for PXA AC'97 devices
- Quirks for a number of new x86 systems
- Support for AM Logic Meson, Everest ES7154, Intel systems with
RT5682, Qualcomm QDSP6 and WCD9335, Realtek RT5682 and TI TAS5707
HD-audio:
- Code refactoring in HD-audio ext codec codes to drop own classes;
preliminary works for the upcoming legacy codec support
- Generalized DRM audio component for the upcoming radeon / amdgpu
support
- Unification of mic mute-LED and GPIO support for various codecs
- Further improvement of CA0132 codec support including Recon3D
- Proper vga_switcheroo handling for AMD i-GPU
- Update of model list in documentation
- Fixups for another HP Spectre x360, Conexant codecs, power-save
blacklist update
USB-audio:
- Fix the invalid sample rate setup with external clock
- Support of UAC3 selector units and processing units
- Basic UAC3 power-domain support
- Support for Encore mDSD and Thesycon-based DSD devices
- Preparation for future complete callback changes
Firewire:
- Add support for MOTU Traveler
Misc:
- The endianess notation fixes in various drivers
- Add fall-through comment in lots of drivers
- Various sparse warning fixes, e.g. about PCM format types"
* tag 'sound-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (529 commits)
ASoC: adav80x: mark expected switch fall-through
ASoC: da7219: Add delays to capture path to remove DC offset noise
ALSA: usb-audio: Mark expected switch fall-through
ALSA: mixart: Mark expected switch fall-through
ALSA: opl3: Mark expected switch fall-through
ALSA: hda/ca0132 - Add exit commands for Recon3D
ALSA: hda/ca0132 - Change mixer controls for Recon3D
ALSA: hda/ca0132 - Add Recon3D input and output select commands
ALSA: hda/ca0132 - Add DSP setup defaults for Recon3D
ALSA: hda/ca0132 - Add Recon3D startup functions and setup
ALSA: hda/ca0132 - Add bool variable to enable/disable pci region2 mmio
ALSA: hda/ca0132 - Add Recon3D pincfg
ALSA: hda/ca0132 - Add quirk ID and enum for Recon3D
ALSA: hda/ca0132 - Add alt_functions unsolicited response
ALSA: hda/ca0132 - Clean up ca0132_init function.
ALSA: hda/ca0132 - Create mmio gpio function to make code clearer
ASoC: wm_adsp: Make DSP name configurable by codec driver
ASoC: wm_adsp: Declare firmware controls from codec driver
ASoC: max98373: Added software reset register to readable registers
ASoC: wm_adsp: Correct DSP pointer for preloader control
...
- Revert two ACPICA commits that are not needed any more (Erik
Schmauss).
- Rework property graph support in the ACPI device properties
framework to make it behave more like the analogous DT code
and update the documentation of it (Sakari Ailus).
- Change the default ACPI device status after initialization
to ACPI_STA_DEFAULT instead of 0 (Hans de Goede).
- Add a special platform driver for enumerating multiple I2C devices
hooked up to the same object in the ACPI tables (Hans de Goede).
- Fix the ACPI battery driver to avoid reporting full capacity on
systems without support for that and clean it up (Hans de Goede,
Dmitry Rozhkov, Lucas Rangit Magasweran).
- Add two system wakeup quirks to the ACPI EC driver (Aaron Ma,
Mika Westerberg).
- Add the touchscreen on Dell Venue Pro 7139 to the list of "always
present" devices to make it work (Tristian Celestin).
- Revert a special tables handling quirk for Dell XPS 9570 and
Precision M5530 which is not needed any more (Kai Heng Feng).
- Add support for a new OEM _OSI string to allow system vendors to
work around issues with NVidia HDMI audio (Alex Hung).
- Prevent the ACPI button driver from reporting excessive system
wakeup events and clean it up (Ravi Chandra Sadineni, Randy Dunlap).
- Clean up two minor code style issues in the ACPI core and GHES
handling on ARM64 (Dongjiu Geng, John Garry, Tom Todd).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbcqQyAAoJEILEb/54YlRxpeIP/i+eB9sq+Pwwb6rpvrcx8JkQ
cWYbjBQPZLsqqHBQDlwVC4I5o0gn6OhRsQlgSQdxheXA+G6kRtbh942oPptIH7NY
vF0g0OYFZGhahjgkLyBtXlacMi+A+aSFkZ6lc4Ie4x/nFAELumjpYtef4v09Ecme
p1G5wz5KYv+47t0DHl+Lb1NglIHsLKtiNMiy9QHyXl3oxrcbW7VIG5o8INw6TgIg
/yaYOAzU8UMjnhcn6gDgrD+OKdux8jt3yAaHW/90/zBp4qbAJ9fp8zhb5LP7M0sE
r63pC2qUq7NDKAl99sMuX5I2jSub9d2yJcjcOSyt+FnB/ErsBCbJGm5rEkK4sRHv
wkV8ybHdq+PkOblR1ob5LMvDXcTlsVTMPonS/oPbEH4J7+jbJjU4vxZxblJGyan6
6GeiOoI6tMyZB7DBiTJH4EgwcNu6GESfPURnyUqRGCUswS7ocQBmRdSbsf6hRgn9
loOZnYXb+PPIxYHH9deFsej71d6En9bMUyUZmOGeLWZ+NSW69XISgnke0tmtgvlZ
pj3zJ7Egnw4SBms2gL/VtPEulxJ2MxScju1RbKMpjP86oIDrO4u3pD8tYSTouSSG
P3AjFXVqpKMTmfVUnbqu/Avg7GsuDHcv8lgUbY8BQGLXaZVni2oJBtcwc8Ca0lJg
8q0xTvX+YEnuOwgvNoNz
=5mW8
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These revert two ACPICA commits that are not needed any more, rework
the property graphs support in ACPI to be more aligned with the
analogous DT code, add some new quirks and remove one that isn't
needed any more, add a special platform driver to enumerate multiple
I2C devices hooked up to the same device object in the ACPI tables and
update the battery and button drivers.
Specifics:
- Revert two ACPICA commits that are not needed any more (Erik
Schmauss).
- Rework property graph support in the ACPI device properties
framework to make it behave more like the analogous DT code and
update the documentation of it (Sakari Ailus).
- Change the default ACPI device status after initialization to
ACPI_STA_DEFAULT instead of 0 (Hans de Goede).
- Add a special platform driver for enumerating multiple I2C devices
hooked up to the same object in the ACPI tables (Hans de Goede).
- Fix the ACPI battery driver to avoid reporting full capacity on
systems without support for that and clean it up (Hans de Goede,
Dmitry Rozhkov, Lucas Rangit Magasweran).
- Add two system wakeup quirks to the ACPI EC driver (Aaron Ma, Mika
Westerberg).
- Add the touchscreen on Dell Venue Pro 7139 to the list of "always
present" devices to make it work (Tristian Celestin).
- Revert a special tables handling quirk for Dell XPS 9570 and
Precision M5530 which is not needed any more (Kai Heng Feng).
- Add support for a new OEM _OSI string to allow system vendors to
work around issues with NVidia HDMI audio (Alex Hung).
- Prevent the ACPI button driver from reporting excessive system
wakeup events and clean it up (Ravi Chandra Sadineni, Randy
Dunlap).
- Clean up two minor code style issues in the ACPI core and GHES
handling on ARM64 (Dongjiu Geng, John Garry, Tom Todd)"
* tag 'acpi-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (34 commits)
platform/x86: Add ACPI i2c-multi-instantiate pseudo driver
ACPI / x86: utils: Remove status workaround from acpi_device_always_present()
ACPI / scan: Create platform device for fwnodes with multiple i2c devices
ACPI / scan: Initialize status to ACPI_STA_DEFAULT
ACPI / EC: Add another entry for Thinkpad X1 Carbon 6th
ACPI: bus: Fix a pointer coding style issue
arm64 / ACPI: clean the additional checks before calling ghes_notify_sea()
ACPI / scan: Add static attribute to indirect_io_hosts[]
ACPI / battery: Do not export energy_full[_design] on devices without full_charge_capacity
ACPI / EC: Use ec_no_wakeup on ThinkPad X1 Yoga 3rd
ACPI / battery: get rid of negations in conditions
ACPI / battery: use specialized print macros
ACPI / battery: reorder headers alphabetically
ACPI / battery: drop inclusion of init.h
ACPI: battery: remove redundant old_present check on insertion
ACPI: property: graph: Update graph documentation to use generic references
ACPI: property: graph: Improve graph documentation for port/ep numbering
ACPI: property: graph: Fix graph documentation
ACPI: property: Update documentation for hierarchical data extension 1.1
ACPI: property: Document key numbering for hierarchical data extension refs
...
- Add a new framework for CPU idle time injection (Daniel Lezcano).
- Add AVS support to the armada-37xx cpufreq driver (Gregory CLEMENT).
- Add support for current CPU frequency reporting to the ACPI CPPC
cpufreq driver (George Cherian).
- Rework the cooling device registration in the imx6q/thermal
driver (Bastian Stender).
- Make the pcc-cpufreq driver refuse to work with dynamic
scaling governors on systems with many CPUs to avoid
scalability issues with it (Rafael Wysocki).
- Fix the intel_pstate driver to report different maximum CPU
frequencies on systems where they really are different and to
ignore the turbo active ratio if hardware-managend P-states (HWP)
are in use; make it use the match_string() helper (Xie Yisheng,
Srinivas Pandruvada).
- Fix a minor deferred probe issue in the qcom-kryo cpufreq
driver (Niklas Cassel).
- Add a tracepoint for the tracking of frequency limits changes
(from Andriod) to the cpufreq core (Ruchi Kandoi).
- Fix a circular lock dependency between CPU hotplug and sysfs
locking in the cpufreq core reported by lockdep (Waiman Long).
- Avoid excessive error reports on driver registration failures
in the ARM cpuidle driver (Sudeep Holla).
- Add a new device links flag to the driver core to make links go
away automatically on supplier driver removal (Vivek Gautam).
- Eliminate potential race condition between system-wide power
management transitions and system shutdown (Pingfan Liu).
- Add a quirk to save NVS memory on system suspend for the ASUS
1025C laptop (Willy Tarreau).
- Make more systems use suspend-to-idle (instead of ACPI S3) by
default (Tristian Celestin).
- Get rid of stack VLA usage in the low-level hibernation code on
64-bit x86 (Kees Cook).
- Fix error handling in the hibernation core and mark an expected
fall-through switch in it (Chengguang Xu, Gustavo Silva).
- Extend the generic power domains (genpd) framework to support
attaching a device to a power domain by name (Ulf Hansson).
- Fix device reference counting and user limits initialization in
the devfreq core (Arvind Yadav, Matthias Kaehlcke).
- Fix a few issues in the rk3399_dmc devfreq driver and improve its
documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner).
- Drop a redundant error message from the exynos-ppmu devfreq driver
(Markus Elfring).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbcqOqAAoJEILEb/54YlRxOxMP/2ZFvnXU0pey/VX/+TelLMS7
/ROVGQ+s75QP1c9P/3BjvnXc0dsMRLRFPog+7wyoG/2DbEIV25COyAYsmSE0TRni
XUaZO6YAx4/e3pm2AfamYbLCPvjw85eucHg5QJQ4b1mSVRNJOsNv+fUo6lmxwvnm
j9kHvfttFeIhoa/3wa7hbhPKLln46atnpVSxCIceY7L5EFNhkKBvQt6B5yx9geb9
QMY6ohgkyN+bnK9QySXX+trcWpzx1uGX0apI07NkX7n9QGFdU4lCW8lsAf8jMC3g
PPValTsUQsdRONUJJsrgqBioq4tvtgQWibyS2tfRrOGXYvHpJNpGmHVplfsrf/SE
cvlsciR47YbmrXZuqg/r8hql+qefNN16/rnZIZ9VnbcG806VBy2z8IzI5wcdWR7p
vzxhbCqVqOHcEdEwRwvuM2io67MWvkGtKsbCP+33DBh8SubpsECpKN4nIDboa3SE
CJ15RUqXnF6enmmfCKOoHZeu7iXWDz6Pi71XmRzaj9DqbITVV281IerqLgV3rbal
BVa53+202iD0IP+2b7KedGe/5ALlI97ffN0gB+L/eB832853DKSZQKzcvvpRhEN7
Iv2crnUwuQED9ns8P7hzp1Bk9CFCAOLW8UM43YwZRPWnmdeSsPJusJ5lzkAf7bss
wfsFoUE3RaY4msnuHyCh
=kv2M
-----END PGP SIGNATURE-----
Merge tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add a new framework for CPU idle time injection, to be used by
all of the idle injection code in the kernel in the future, fix some
issues and add a number of relatively small extensions in multiple
places.
Specifics:
- Add a new framework for CPU idle time injection (Daniel Lezcano).
- Add AVS support to the armada-37xx cpufreq driver (Gregory
CLEMENT).
- Add support for current CPU frequency reporting to the ACPI CPPC
cpufreq driver (George Cherian).
- Rework the cooling device registration in the imx6q/thermal driver
(Bastian Stender).
- Make the pcc-cpufreq driver refuse to work with dynamic scaling
governors on systems with many CPUs to avoid scalability issues
with it (Rafael Wysocki).
- Fix the intel_pstate driver to report different maximum CPU
frequencies on systems where they really are different and to
ignore the turbo active ratio if hardware-managend P-states (HWP)
are in use; make it use the match_string() helper (Xie Yisheng,
Srinivas Pandruvada).
- Fix a minor deferred probe issue in the qcom-kryo cpufreq driver
(Niklas Cassel).
- Add a tracepoint for the tracking of frequency limits changes (from
Andriod) to the cpufreq core (Ruchi Kandoi).
- Fix a circular lock dependency between CPU hotplug and sysfs
locking in the cpufreq core reported by lockdep (Waiman Long).
- Avoid excessive error reports on driver registration failures in
the ARM cpuidle driver (Sudeep Holla).
- Add a new device links flag to the driver core to make links go
away automatically on supplier driver removal (Vivek Gautam).
- Eliminate potential race condition between system-wide power
management transitions and system shutdown (Pingfan Liu).
- Add a quirk to save NVS memory on system suspend for the ASUS 1025C
laptop (Willy Tarreau).
- Make more systems use suspend-to-idle (instead of ACPI S3) by
default (Tristian Celestin).
- Get rid of stack VLA usage in the low-level hibernation code on
64-bit x86 (Kees Cook).
- Fix error handling in the hibernation core and mark an expected
fall-through switch in it (Chengguang Xu, Gustavo Silva).
- Extend the generic power domains (genpd) framework to support
attaching a device to a power domain by name (Ulf Hansson).
- Fix device reference counting and user limits initialization in the
devfreq core (Arvind Yadav, Matthias Kaehlcke).
- Fix a few issues in the rk3399_dmc devfreq driver and improve its
documentation (Enric Balletbo i Serra, Lin Huang, Nick Milner).
- Drop a redundant error message from the exynos-ppmu devfreq driver
(Markus Elfring)"
* tag 'pm-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
PM / reboot: Eliminate race between reboot and suspend
PM / hibernate: Mark expected switch fall-through
cpufreq: intel_pstate: Ignore turbo active ratio in HWP
cpufreq: Fix a circular lock dependency problem
cpu/hotplug: Add a cpus_read_trylock() function
x86/power/hibernate_64: Remove VLA usage
cpufreq: trace frequency limits change
cpufreq: intel_pstate: Show different max frequency with turbo 3 and HWP
cpufreq: pcc-cpufreq: Disable dynamic scaling on many-CPU systems
cpufreq: qcom-kryo: Silently error out on EPROBE_DEFER
cpufreq / CPPC: Add cpuinfo_cur_freq support for CPPC
cpufreq: armada-37xx: Add AVS support
dt-bindings: marvell: Add documentation for the Armada 3700 AVS binding
PM / devfreq: rk3399_dmc: Fix duplicated opp table on reload.
PM / devfreq: Init user limits from OPP limits, not viceversa
PM / devfreq: rk3399_dmc: fix spelling mistakes.
PM / devfreq: rk3399_dmc: do not print error when get supply and clk defer.
dt-bindings: devfreq: rk3399_dmc: move interrupts to be optional.
PM / devfreq: rk3399_dmc: remove wait for dcf irq event.
dt-bindings: clock: add rk3399 DDR3 standard speed bins.
...
Core changes:
- Augment pinctrl_generic_add_group() and pinmux_generic_add_function()
to return the selector for the added group/function to the caller
and augment (hopefully) all drivers to handle this.
New subdrivers:
- Qualcomm PM8998 and PM8005 are supported in the SPMI pin
control and GPIO driver.
- Intel Ice Lake PCH (platform controller hub) support.
- NXP (ex Freescale) i.MX8MQ support.
- Berlin AS370 support.
Improvements to drivers:
- Support interrupts on the Ocelot pin controller.
- Add SPI pins to the Uniphier driver.
- Define a GPIO compatible per SoC in the Tegra driver.
- Push Tegra initialization down in the initlevels.
- Support external wakeup interrupts on the Exynos.
- Add generic clocks pins to the meson driver.
- Add USB and HSCIF pins for some Renesas PFC chips.
- Suspend/resume support in the armada-37xx.
- Interrupt support for the Actions Semiconductor S900 also
known as "owl".
- Correct the pin ordering in Cedarfork.
- Debugfs output for INTF in the mcp23s08 driver
- Avoid divisions in context save/restore in pinctrl-single.
The rest is minor bug fixes or cleanups.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJbcYUCAAoJEEEQszewGV1zHyQP/2sbSF5fDiOs+CdAqHI+LzIU
KHfXeamJlufZzIY5Cit6L9BRowVLnWewK3lkmQ3NJmUtF4KTbDkbMMEyzNh15WEu
47xOVeHpa1Mrp3kTRiatVW7BibnC97wXFg48omG6KAABLt/eRNZ69NTdq6VZUdWD
7PhCLhLtZSry4nZ/dDp2esc+yGeeQkMNMeNZEAiG+MF5+OYUtNdr7NUYCxMpQuTC
7KxyCia8S0NNND3RtUANUP+M8XeyWRWYEQnqPXuWo1+Fwpk2CoYdraw7m44X7YIw
voBqap5ThOFfhmR7LiqAaMcQEgm5n5ABy+qE0+fcJs4TYcdV8MYSQjCU/lIpO81b
EfdcbU4lDkbDtTLO7aFSjXI01qB/J+bRmxcyTkYbUxENdNW7ZD1izHambhJNxDEt
LO75fOlGJBx348zsypGL13WLc5j/IL4raa8Bj5+BOLuUbQOEpCnFovcktx42QJOK
NHnNK6RknlpXjeNO3w33YO/oxNAkdhLlNU7IHXTN6T9rcBJJjtS7MFn7Sro+QGlC
6EwyGfb0Y08wcUIkMpKHL+9L6So5r08GESzb7PLpgOZIvIi291wA454r1ntK1zoW
JBDX+2vrFjmLSSKWqhKN5nCq85V6M2A2PJSfYG6A0CJZLEPDRC7Lx3roV2W2vss4
EoxKjpMskqRuSbmy/WJZ
=EUMW
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"This is the bulk of pin control changes for v4.19:
Core changes:
- Augment pinctrl_generic_add_group() and pinmux_generic_add_function()
to return the selector for the added group/function to the caller
and augment (hopefully) all drivers to handle this
New subdrivers:
- Qualcomm PM8998 and PM8005 are supported in the SPMI pin control
and GPIO driver
- Intel Ice Lake PCH (platform controller hub) support
- NXP (ex Freescale) i.MX8MQ support
- Berlin AS370 support
Improvements to drivers:
- Support interrupts on the Ocelot pin controller
- Add SPI pins to the Uniphier driver
- Define a GPIO compatible per SoC in the Tegra driver
- Push Tegra initialization down in the initlevels
- Support external wakeup interrupts on the Exynos
- Add generic clocks pins to the meson driver
- Add USB and HSCIF pins for some Renesas PFC chips
- Suspend/resume support in the armada-37xx
- Interrupt support for the Actions Semiconductor S900 also known as
"owl"
- Correct the pin ordering in Cedarfork
- Debugfs output for INTF in the mcp23s08 driver
- Avoid divisions in context save/restore in pinctrl-single
The rest is minor bug fixes or cleanups"
* tag 'pinctrl-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (69 commits)
pinctrl: nomadik: silence uninitialized variable warning
pinctrl: axp209: Fix NULL pointer dereference after allocation
pinctrl: samsung: Remove duplicated "wakeup" in printk
pinctrl: ocelot: add support for interrupt controller
pinctrl: intel: Don't shadow error code of gpiochip_lock_as_irq()
pinctrl: berlin: fix 'pctrl->functions' allocation in berlin_pinctrl_build_state
gpio: tegra: Move driver registration to subsys_init level
pinctrl: tegra: Move drivers registration to arch_init level
pinctrl: baytrail: actually print the apparently misconfigured pin
MAINTAINERS: Replace Heikki as maintainer of Intel pinctrl
pinctrl: freescale: off by one in imx1_pinconf_group_dbg_show()
pinctrl: uniphier: add spi pin-mux settings
pinctrl: cannonlake: Fix community ordering for H variant
pinctrl: tegra: define GPIO compatible node per SoC
pinctrl: intel: Do pin translation when lock IRQ
pinctrl: imx: off by one in imx_pinconf_group_dbg_show()
pinctrl: mediatek: include chained_irq.h header
pinctrl/amd: only handle irq if it is pending and unmasked
pinctrl/amd: fix gpio irq level in debugfs
pinctrl: stm32: add syscfg mask parameter
...
The introduction of generic_max_swapfile_size and arch-specific versions has
broken linking on x86 with CONFIG_SWAP=n due to undefined reference to
'generic_max_swapfile_size'. Fix it by compiling the x86-specific
max_swapfile_size() only with CONFIG_SWAP=y.
Reported-by: Tomas Pruzina <pruzinat@gmail.com>
Fixes: 377eeaa8e1 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- a series from Robin to fix bus imposed dma limits by adding a separate
mask for them to struct device instead of trying to squeeze a second
meaning out of the existing dma mask as we did before. This has ACKs
from the various other subsystems touched
- a small swiotlb cleanup from Kees (acked by Konrad)
- conversion of nios2 and sh to the new generic dma-noncoherent code.
Various other architecture conversions will come through the
architectures maintainers trees.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAltxLuILHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPKMA//WY7UGhTrQ9kSjrfCRYJnSgLPjkqwY1wuYdXg1FSw
rUYtQXCem59W48aX9QqHzGiCXf3uVJDBCO7/ikP8R/hHJTC/ih6tTJAtKMBTaDm5
NEVU4lZCzHfzliF3vhj/3+ifWThI0gSo5QQolf4hD5255a95HMLZgJ9zt0IRVgdj
aKfFfKG824LX+G4iuN27jwmvfOJM4KwzaAFID6Rc4Pt1BXy3CmLkED63p6YvTAk/
pCR9LZpYWPccEuA1YfHoDJrB4onTw7zkVHFBtMZ5xW+ypJwRACSpLp/8IjeG841x
/vA3BsDNu8zrRj2cHpD+fViPriMm/qdMew45keII/0jCeCs640wg9Cy7f6yl2QQh
nsCJA8DEKvY8ebaX12XyWC3GNbLNru1XORrMl9+Lqu0kUHbdNxRDg47fWILLdxGc
QvgassFOxb7wg9GwIsoGpOItagl2hHP3j0vMrHlfWaBNefi5ZtWiZs8aJBgm2rJm
kzjmXGzx9IBy5SWSpou5UDCvf4xF7CkTA11YKPyBt2H3hquFKBH6R+6SzAm0n4rn
WcAlqx4BQX1no/p0hPM4b01yPwgd3OmNmITAaYKWTvDXKpImRkPYWIfvLQBx4/PW
9TyVd1QWO1wMbCziVCaHVwpsNAHFTK/kw41OLjK4H1P+HMorGH0crhimfl5BiaJY
XJk=
=WNwP
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- a series from Robin to fix bus imposed dma limits by adding a
separate mask for them to struct device instead of trying to squeeze
a second meaning out of the existing dma mask as we did before.
This has ACKs from the various other subsystems touched
- a small swiotlb cleanup from Kees (acked by Konrad)
- conversion of nios2 and sh to the new generic dma-noncoherent code.
Various other architecture conversions will come through the
architectures maintainers trees.
* tag 'dma-mapping-4.19' of git://git.infradead.org/users/hch/dma-mapping:
sh: use generic dma_noncoherent_ops
sh: split arch/sh/mm/consistent.c
sh: use dma_direct_ops for the CONFIG_DMA_COHERENT case
sh: introduce a sh_cacheop_vaddr helper
sh: simplify get_arch_dma_ops
OF: Don't set default coherent DMA mask
ACPI/IORT: Don't set default coherent DMA mask
iommu/dma: Respect bus DMA limit for IOVAs
of/device: Set bus DMA mask as appropriate
ACPI/IORT: Set bus DMA mask as appropriate
dma-mapping: Generalise dma_32bit_limit flag
ACPI/IORT: Support address size limit for root complexes
of/platform: Initialise default DMA masks
nios2: use generic dma_noncoherent_ops
swiotlb: clean up reporting
dma-mapping: relax warning for per-device areas
- Support 64-bit timestamps
MTD changes:
Core changes:
- Support sub-partitions
- Clarify mtd_oob_ops documentation
- Make Kconfig formatting consistent
- Fix potential overflows in mtdchar_{write,read}()
- Fallback to ->_{read,write}() when ->_{read,write}_oob() is missing
and no OOB data were requested
- Remove VLA usage in the bch lib
Driver changes:
- Use mtd_device_register() instead of mtd_device_parse_register()
where applicable
- Use proper printk format to print physical addresses in the
solutionengine driver
- Add missing mtd_set_of_node() call in the powernv driver
- Remove unneeded variables in a few drivers
- Plug the TRX part parser to the DT partition parsers logic
- Check ioremap_cache() return code in the gpio-addr-flash driver
- Stop using VMLINUX_SYMBOL_STR() in gen_probe.c
SPI NOR changes:
Core changes:
- Apply reset hacks only when reset is explicitly marked as broken in
the DT
Driver changes:
- Minor cleanup/fixes in the m25p80 driver
- Release flash_np in the nxp-spifi driver
- Add suspend/resume hooks to the atmel-quadspi driver
- Include gpio/consumer.h instead of gpio.h in the atmel-quadspi
driver
- Use %pK instead of %p in the stm32-quadspi driver
- Improve timeout handling in the cadence-quadspi driver
- Use mtd_device_register() instead of mtd_device_parse_register()
in the intel-spi driver
NAND changes:
Core changes:
- Add the SPI-NAND framework.
- Create a helper to find the best ECC configuration.
- Create NAND controller operations.
- Allocate dynamically ONFI parameters structure.
- Add defines for ONFI version bits.
- Add manufacturer fixup for ONFI parameter page.
- Add an option to specify NAND chip as a boot device.
- Add Reed-Solomon error correction algorithm.
- Better name for the controller structure.
- Remove unused caller_is_module() definition.
- Make subop helpers return unsigned values.
- Expose _notsupp() helpers for raw page accessors.
- Add default values for dynamic timings.
- Kill the chip->scan_bbt() hook.
- Rename nand_default_bbt() into nand_create_bbt().
- Start to clean the nand_chip structure.
- Remove stale prototype from rawnand.h.
Raw NAND controllers drivers changes:
- Qcom: structuring cleanup.
- Denali: use core helper to find the best ECC configuration.
- Possible build of almost all drivers by adding a dependency on
COMPILE_TEST for almost all of them in Kconfig, implies various
fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even
changes in sparc64 and ia64 architectures.
- Clean the ->probe() functions error path of a lot of drivers.
- Migrate all drivers to use nand_scan() instead of
nand_scan_ident()/nand_scan_tail() pair.
- Use mtd_device_register() where applicable to simplify the code.
- Marvell:
* Handle on-die ECC.
* Better clocks handling.
* Remove bogus comment.
* Add suspend and resume support.
- Tegra: add NAND controller driver.
- Atmel:
* Add module param to avoid using dma.
* Drop Wenyou Yang from MAINTAINERS.
- Denali: optimize timings handling.
- FSMC: Stop using chip->read_buf().
- FSL:
* Switch to SPDX license tag identifiers.
* Fix qualifiers in MXC init functions.
Raw NAND chip drivers changes:
- Micron:
* Add fixup for ONFI revision.
* Update ecc_stats.corrected.
* Make ECC activation stateful.
* Avoid enabling/disabling ECC when it can't be disabled.
* Get the actual number of bitflips.
* Allow forced on-die ECC.
* Support 8/512 on-die ECC.
* Fix on-die ECC detection logic.
- Hynix:
* Fix decoding the OOB size on H27UCG8T2BTR.
* Use ->exec_op() in hynix_nand_reg_write_op().
-----BEGIN PGP SIGNATURE-----
iQI5BAABCAAjBQJbcSYdHBxib3Jpcy5icmV6aWxsb25AYm9vdGxpbi5jb20ACgkQ
Ze02AX4ItwA85Q//ViUNiWoluaj7q3Kpv44NZZClPjVfirDtIHEfyzob7HbkrqjT
ivjNUu5cjKU2aqptR24UzIV2lwmaSGicIzvyANN5bNedHy8tb4BWXo1iTEflXc1P
lYihrRz7AeHQqdrOYAiHxq0fcqgIm2NjCDxfdtkP1HJsMQH+LgvoosOkd8RyLF8E
jV5IyreTfg10AiVS44yvnqdsYo7GDVZRgnx1dBRbdJ+WwZS05m6LSeNu0ivgfLMM
NmISrn7pzlMYnuDkzckXwVka7STFhV9befoSCmAHedqNEHtzaGDaNMLEilaC4WzV
1yvsdRGI3hDkdO4s44nP/Vs2XlVmCevuTpOOQ1dEK94Bbek+us7NEPjxOIAf22jE
5MVZR6vpNN7JXJ+nvEZjjL8X9zysqZgk2r6os35Wi2zJK7iHIrFpjlMYwTJfVPzv
JKabOQZuOUqRACbL0isTTdG0+Tzr1BxaQqFiFUteQ9QpG4nSifoHUmI+9XUCJmil
iGTOy6rEkaMd6qu3xapPtZbxGbUbvcLiJvNrZ02ZfsELDAELowz66O0PX6rSwTV9
NIWeVNXK9Bsp0oxbb8P8oJIB5K4F6vqncPBls34J7E/xrLO9nLk1VvFQrGqzuwwP
HmEdgcRwKeXrvJITZz2u0u6lrKpgdDA3FTqfGCNec51PCgb1FwxcpKF/Xz0=
=yIkF
-----END PGP SIGNATURE-----
Merge tag 'mtd/for-4.19' of git://git.infradead.org/linux-mtd
Pull mtd updates from Boris Brezillon:
"JFFS2 changes:
- Support 64-bit timestamps
MTD core changes:
- Support sub-partitions
- Clarify mtd_oob_ops documentation
- Make Kconfig formatting consistent
- Fix potential overflows in mtdchar_{write,read}()
- Fallback to ->_{read,write}() when ->_{read,write}_oob() is missing
and no OOB data were requested
- Remove VLA usage in the bch lib
MTD driver changes:
- Use mtd_device_register() instead of mtd_device_parse_register()
where applicable
- Use proper printk format to print physical addresses in the
solutionengine driver
- Add missing mtd_set_of_node() call in the powernv driver
- Remove unneeded variables in a few drivers
- Plug the TRX part parser to the DT partition parsers logic
- Check ioremap_cache() return code in the gpio-addr-flash driver
- Stop using VMLINUX_SYMBOL_STR() in gen_probe.c
SPI NOR core changes:
- Apply reset hacks only when reset is explicitly marked as broken in
the DT
SPI NOR driver changes:
- Minor cleanup/fixes in the m25p80 driver
- Release flash_np in the nxp-spifi driver
- Add suspend/resume hooks to the atmel-quadspi driver
- Include gpio/consumer.h instead of gpio.h in the atmel-quadspi
driver
- Use %pK instead of %p in the stm32-quadspi driver
- Improve timeout handling in the cadence-quadspi driver
- Use mtd_device_register() instead of mtd_device_parse_register() in
the intel-spi driver
NAND core changes:
- Add the SPI-NAND framework.
- Create a helper to find the best ECC configuration.
- Create NAND controller operations.
- Allocate dynamically ONFI parameters structure.
- Add defines for ONFI version bits.
- Add manufacturer fixup for ONFI parameter page.
- Add an option to specify NAND chip as a boot device.
- Add Reed-Solomon error correction algorithm.
- Better name for the controller structure.
- Remove unused caller_is_module() definition.
- Make subop helpers return unsigned values.
- Expose _notsupp() helpers for raw page accessors.
- Add default values for dynamic timings.
- Kill the chip->scan_bbt() hook.
- Rename nand_default_bbt() into nand_create_bbt().
- Start to clean the nand_chip structure.
- Remove stale prototype from rawnand.h.
Raw NAND controllers drivers changes:
- Qcom: structuring cleanup.
- Denali: use core helper to find the best ECC configuration.
- Possible build of almost all drivers by adding a dependency on
COMPILE_TEST for almost all of them in Kconfig, implies various
fixes, Kconfig cleanup, GPIO headers inclusion cleanup, and even
changes in sparc64 and ia64 architectures.
- Clean the ->probe() functions error path of a lot of drivers.
- Migrate all drivers to use nand_scan() instead of
nand_scan_ident()/nand_scan_tail() pair.
- Use mtd_device_register() where applicable to simplify the code.
- Marvell:
* Handle on-die ECC.
* Better clocks handling.
* Remove bogus comment.
* Add suspend and resume support.
- Tegra: add NAND controller driver.
- Atmel:
* Add module param to avoid using dma.
* Drop Wenyou Yang from MAINTAINERS.
- Denali: optimize timings handling.
- FSMC: Stop using chip->read_buf().
- FSL:
* Switch to SPDX license tag identifiers.
* Fix qualifiers in MXC init functions.
Raw NAND chip drivers changes:
- Micron:
* Add fixup for ONFI revision.
* Update ecc_stats.corrected.
* Make ECC activation stateful.
* Avoid enabling/disabling ECC when it can't be disabled.
* Get the actual number of bitflips.
* Allow forced on-die ECC.
* Support 8/512 on-die ECC.
* Fix on-die ECC detection logic.
- Hynix:
* Fix decoding the OOB size on H27UCG8T2BTR.
* Use ->exec_op() in hynix_nand_reg_write_op()"
* tag 'mtd/for-4.19' of git://git.infradead.org/linux-mtd: (188 commits)
mtd: rawnand: atmel: Select GENERIC_ALLOCATOR
MAINTAINERS: drop Wenyou Yang from Atmel NAND driver support
mtd: rawnand: allocate dynamically ONFI parameters during detection
mtd: spi-nor: only apply reset hacks to broken hardware
mtd: spi-nor: cadence-quadspi: fix timeout handling
mtd: spi-nor: atmel-quadspi: Include gpio/consumer.h instead of gpio.h
mtd: spi-nor: intel-spi: use mtd_device_register()
mtd: spi-nor: stm32-quadspi: replace "%p" with "%pK"
mtd: spi-nor: atmel-quadspi: add suspend/resume hooks
mtd: rawnand: allocate model parameter dynamically
mtd: rawnand: do not export nand_scan_[ident|tail]() anymore
mtd: rawnand: txx9ndfmc: convert driver to nand_scan()
mtd: rawnand: txx9ndfmc: clarify ECC parameters assignation
mtd: rawnand: tegra: convert driver to nand_scan()
mtd: rawnand: jz4740: convert driver to nand_scan()
mtd: rawnand: jz4740: group nand_scan_{ident, tail} calls
mtd: rawnand: jz4740: fix probe function error path
mtd: rawnand: docg4: convert driver to nand_scan()
mtd: rawnand: do not execute nand_scan_ident() if maxchips is zero
mtd: rawnand: atmel: convert driver to nand_scan()
...
Always set the 5 upper-most supported physical address bits to 1 for SPTEs
that are marked as non-present or reserved, to make them unusable for
L1TF attacks from the guest. Currently, this just applies to MMIO SPTEs.
(We do not need to mark PTEs that are completely 0 as physical page 0
is already reserved.)
This allows mitigation of L1TF without disabling hyper-threading by using
shadow paging mode instead of EPT.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge L1 Terminal Fault fixes from Thomas Gleixner:
"L1TF, aka L1 Terminal Fault, is yet another speculative hardware
engineering trainwreck. It's a hardware vulnerability which allows
unprivileged speculative access to data which is available in the
Level 1 Data Cache when the page table entry controlling the virtual
address, which is used for the access, has the Present bit cleared or
other reserved bits set.
If an instruction accesses a virtual address for which the relevant
page table entry (PTE) has the Present bit cleared or other reserved
bits set, then speculative execution ignores the invalid PTE and loads
the referenced data if it is present in the Level 1 Data Cache, as if
the page referenced by the address bits in the PTE was still present
and accessible.
While this is a purely speculative mechanism and the instruction will
raise a page fault when it is retired eventually, the pure act of
loading the data and making it available to other speculative
instructions opens up the opportunity for side channel attacks to
unprivileged malicious code, similar to the Meltdown attack.
While Meltdown breaks the user space to kernel space protection, L1TF
allows to attack any physical memory address in the system and the
attack works across all protection domains. It allows an attack of SGX
and also works from inside virtual machines because the speculation
bypasses the extended page table (EPT) protection mechanism.
The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646
The mitigations provided by this pull request include:
- Host side protection by inverting the upper address bits of a non
present page table entry so the entry points to uncacheable memory.
- Hypervisor protection by flushing L1 Data Cache on VMENTER.
- SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
by offlining the sibling CPU threads. The knobs are available on
the kernel command line and at runtime via sysfs
- Control knobs for the hypervisor mitigation, related to L1D flush
and SMT control. The knobs are available on the kernel command line
and at runtime via sysfs
- Extensive documentation about L1TF including various degrees of
mitigations.
Thanks to all people who have contributed to this in various ways -
patches, review, testing, backporting - and the fruitful, sometimes
heated, but at the end constructive discussions.
There is work in progress to provide other forms of mitigations, which
might be less horrible performance wise for a particular kind of
workloads, but this is not yet ready for consumption due to their
complexity and limitations"
* 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
x86/microcode: Allow late microcode loading with SMT disabled
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
x86/mm/kmmio: Make the tracer robust against L1TF
x86/mm/pat: Make set_memory_np() L1TF safe
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/speculation/l1tf: Invert all not present mappings
cpu/hotplug: Fix SMT supported evaluation
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Documentation/l1tf: Remove Yonah processors from not vulnerable list
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
x86: Don't include linux/irq.h from asm/hardirq.h
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
cpu/hotplug: detect SMT disabled by BIOS
...
Merge changes in the PM core, system-wide PM infrastructure, generic
power domains (genpd) framework, ACPI PM infrastructure and cpuidle
for 4.19.
* pm-core:
driver core: Add flag to autoremove device link on supplier unbind
driver core: Rename flag AUTOREMOVE to AUTOREMOVE_CONSUMER
* pm-domains:
PM / Domains: Introduce dev_pm_domain_attach_by_name()
PM / Domains: Introduce option to attach a device by name to genpd
PM / Domains: dt: Add a power-domain-names property
* pm-sleep:
PM / reboot: Eliminate race between reboot and suspend
PM / hibernate: Mark expected switch fall-through
x86/power/hibernate_64: Remove VLA usage
PM / hibernate: cast PAGE_SIZE to int when comparing with error code
* acpi-pm:
ACPI / PM: save NVS memory for ASUS 1025C laptop
ACPI / PM: Default to s2idle in all machines supporting LP S0
* pm-cpuidle:
ARM: cpuidle: silence error on driver registration failure
When idle_power4() hard disables interrupts then finds a soft pending
interrupt, it returns with interrupts hard disabled but without
PACA_IRQ_HARD_DIS set. Commit 9b81c0211c ("powerpc/64s: make
PACA_IRQ_HARD_DIS track MSR[EE] closely") added a warning for that
condition (since disabled).
Fix this by adding the PACA_IRQ_HARD_DIS for that case.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Marking default memory region as cached is not always sufficient and is
not flexible. Allow specifying cache attributes for the whole memory
address space with new config entry MEMMAP_CACHEATTR. Apply it after
cache initialization.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Cache invalidation macros use cache line size to iterate over
invalidated cache lines, assuming that all cache ways are invalidated by
single instruction, but xtensa ISA recommends to not assume that for
future compatibility:
In some implementations all ways at index Addry-1..z are invalidated
regardless of the specified way, but for future compatibility this
behavior should not be assumed.
Iterate over all cache ways in ___invalidate_icache_all and
___invalidate_dcache_all.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
When building kernel for xtensa cores with big cache lines (e.g. 128
bytes or more) __loop_cache_all and __loop_cache_page may generate
assembly instructions with immediate fields that are too big. This
results in the following build errors:
arch/xtensa/mm/misc.S: Assembler messages:
arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '256'
arch/xtensa/mm/misc.S:464: Error: operand 2 of 'diwbi' has invalid value '384'
arch/xtensa/kernel/head.S: Assembler messages:
arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '256'
arch/xtensa/kernel/head.S:172: Error: operand 2 of 'diu' has invalid value '384'
arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '256'
arch/xtensa/kernel/head.S:176: Error: operand 2 of 'iiu' has invalid value '384'
arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '256'
arch/xtensa/kernel/head.S:255: Error: operand 2 of 'diwb' has invalid value '384'
Add parameter max_immed to these macros and use it to limit values of
immediate operands. Extract common code of these macros into the new
macro __loop_cache_unroll.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
An overview of the general architecture changes:
- Massive DMA ops refactoring from Christoph Hellwig (huzzah for
deleting crufty code!).
- We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes & corresponding
regsets to expose DSP ASE & floating point mode state respectively,
both for live debugging & core dumps.
- We better optimize our code by hard-coding cpu_has_* macros at
compile time where their values are known due to the ISA revision
that the kernel build is targeting.
- The EJTAG exception handler now better handles SMP systems, where it
was previously possible for CPUs to clobber a register value saved
by another CPU.
- Our implementation of memset() gained a couple of fixes for MIPSr6
systems to return correct values in some cases where stores fault.
- We now implement ioremap_wc() using the uncached-accelerated cache
coherency attribute where supported, which is detected during boot,
and fall back to plain uncached access where necessary. The
MIPS-specific (and unused in tree) ioremap_uncached_accelerated() &
ioremap_cacheable_cow() are removed.
- The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP
systems by reworking the way we ensure remote CPUs that may be
running threads within the affected process switch mode.
- Systems using the MIPS Coherence Manager will now set the
MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache
maintenance overhead when flushing the icache.
- A few fixes were made for building with clang/LLVM, which
now sucessfully builds kernels for many of our platforms.
- Miscellaneous cleanups all over.
And some platform-specific changes:
- ar7 gained stubs for a few clock API functions to fix build failures
for some drivers.
- ath79 gained support for a few new SoCs, a few fixes & better
gpio-keys support.
- Ci20 now exposes its SPI bus using the spi-gpio driver.
- The generic platform can now auto-detect a suitable value for
PHYS_OFFSET based upon the memory map described by the device tree,
allowing us to avoid wasting memory on page book-keeping for systems
where RAM starts at a non-zero physical address.
- Ingenic systems using the jz4740 platform code now link their
vmlinuz higher to allow for kernels of a realistic size.
- Loongson32 now builds the kernel targeting MIPSr1 rather than MIPSr2
to avoid CPU errata.
- Loongson64 gains a couple of fixes, a workaround for a write
buffering issue & support for the Loongson 3A R3.1 CPU.
- Malta now uses the piix4-poweroff driver to handle powering down.
- Microsemi Ocelot gained support for its SPI bus & NOR flash, its
second MDIO bus and can now be supported by a FIT/.itb image.
- Octeon saw a bunch of header cleanups which remove a lot of
duplicate or unused code.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW3G6JxUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN0n/gD/Rpdgay31G/4eTTKBmBrcaju6Shjt
/2Iu6WC5Sj4hDHUBAJSbuI+B9YjcNsjekBYxB/LLD7ImcLBl6nLMIvKmXLAL
=cUiF
-----END PGP SIGNATURE-----
Merge tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Paul Burton:
"Here are the main MIPS changes for 4.19.
An overview of the general architecture changes:
- Massive DMA ops refactoring from Christoph Hellwig (huzzah for
deleting crufty code!).
- We introduce NT_MIPS_DSP & NT_MIPS_FP_MODE ELF notes &
corresponding regsets to expose DSP ASE & floating point mode state
respectively, both for live debugging & core dumps.
- We better optimize our code by hard-coding cpu_has_* macros at
compile time where their values are known due to the ISA revision
that the kernel build is targeting.
- The EJTAG exception handler now better handles SMP systems, where
it was previously possible for CPUs to clobber a register value
saved by another CPU.
- Our implementation of memset() gained a couple of fixes for MIPSr6
systems to return correct values in some cases where stores fault.
- We now implement ioremap_wc() using the uncached-accelerated cache
coherency attribute where supported, which is detected during boot,
and fall back to plain uncached access where necessary. The
MIPS-specific (and unused in tree) ioremap_uncached_accelerated() &
ioremap_cacheable_cow() are removed.
- The prctl(PR_SET_FP_MODE, ...) syscall is better supported for SMP
systems by reworking the way we ensure remote CPUs that may be
running threads within the affected process switch mode.
- Systems using the MIPS Coherence Manager will now set the
MIPS_IC_SNOOPS_REMOTE flag to avoid some unnecessary cache
maintenance overhead when flushing the icache.
- A few fixes were made for building with clang/LLVM, which now
sucessfully builds kernels for many of our platforms.
- Miscellaneous cleanups all over.
And some platform-specific changes:
- ar7 gained stubs for a few clock API functions to fix build
failures for some drivers.
- ath79 gained support for a few new SoCs, a few fixes & better
gpio-keys support.
- Ci20 now exposes its SPI bus using the spi-gpio driver.
- The generic platform can now auto-detect a suitable value for
PHYS_OFFSET based upon the memory map described by the device tree,
allowing us to avoid wasting memory on page book-keeping for
systems where RAM starts at a non-zero physical address.
- Ingenic systems using the jz4740 platform code now link their
vmlinuz higher to allow for kernels of a realistic size.
- Loongson32 now builds the kernel targeting MIPSr1 rather than
MIPSr2 to avoid CPU errata.
- Loongson64 gains a couple of fixes, a workaround for a write
buffering issue & support for the Loongson 3A R3.1 CPU.
- Malta now uses the piix4-poweroff driver to handle powering down.
- Microsemi Ocelot gained support for its SPI bus & NOR flash, its
second MDIO bus and can now be supported by a FIT/.itb image.
- Octeon saw a bunch of header cleanups which remove a lot of
duplicate or unused code"
* tag 'mips_4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (123 commits)
MIPS: Remove remnants of UASM_ISA
MIPS: netlogic: xlr: Remove erroneous check in nlm_fmn_send()
MIPS: VDSO: Force link endianness
MIPS: Always specify -EB or -EL when using clang
MIPS: Use dins to simplify __write_64bit_c0_split()
MIPS: Use read-write output operand in __write_64bit_c0_split()
MIPS: Avoid using array as parameter to write_c0_kpgd()
MIPS: vdso: Allow clang's --target flag in VDSO cflags
MIPS: genvdso: Remove GOT checks
MIPS: Remove obsolete MIPS checks for DST node "chosen@0"
MIPS: generic: Remove input symbols from defconfig
MIPS: Delete unused code in linux32.c
MIPS: Remove unused sys_32_mmap2
MIPS: Remove nabi_no_regargs
mips: dts: mscc: enable spi and NOR flash support on ocelot PCB123
mips: dts: mscc: Add spi on Ocelot
MIPS: Loongson: Merge load addresses
MIPS: Loongson: Set Loongson32 to MIPS32R1
MIPS: mscc: ocelot: add interrupt controller properties to GPIO controller
MIPS: generic: Select MIPS_AUTO_PFN_OFFSET
...
Pull parisc updates from Helge Deller:
- parisc now uses the generic dma_noncoherent_ops implementation
(Christoph Hellwig)
- further memory barrier and spinlock improvements (John David Anglin)
- prepare removal of current_text_addr() functions (Nick Desaulniers)
- improve kernel stack unwinding on parisc (me)
- drop ENOTSUP which was defined on parisc only (me)
* 'parisc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix and improve kernel stack unwinding
parisc: Remove unnecessary barriers from spinlock.h
parisc: Remove ordered stores from syscall.S
parisc: prefer _THIS_IP_ and _RET_IP_ statement expressions
parisc: Add HAVE_REGS_AND_STACK_ACCESS_API feature
parisc: Drop architecture-specific ENOTSUP define
parisc: use generic dma_noncoherent_ops
parisc: always use flush_kernel_dcache_range for DMA cache maintainance
parisc: merge pcx_dma_ops and pcxl_dma_ops
Pull ARM updates from Russell King:
- further Spectre variant 1 fixes for user accessors.
- kbuild cleanups (Masahiro Yamada)
- hook up sync core functionality (Will Deacon)
- nommu updates for hypervisor mode booting (Vladimir Murzin)
- use compiler built-ins for fls and ffs (Nicolas Pitre)
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: spectre-v1: mitigate user accesses
ARM: spectre-v1: use get_user() for __get_user()
ARM: use __inttype() in get_user()
ARM: oabi-compat: copy semops using __copy_from_user()
ARM: vfp: use __copy_from_user() when restoring VFP state
ARM: 8785/1: use compiler built-ins for ffs and fls
ARM: 8784/1: NOMMU: Allow enter in Hyp mode
ARM: 8783/1: NOMMU: Extend check for VBAR support
ARM: 8782/1: vfp: clean up arch/arm/vfp/Makefile
ARM: signal: copy registers using __copy_from_user()
ARM: tcm: ensure inline stub functions are marked static
ARM: 8779/1: add endianness option to LDFLAGS instead of LD
ARM: 8777/1: Hook up SYNC_CORE functionality for sys_membarrier()
Pull s390 updates from Heiko Carstens:
"Since Martin is on vacation you get the s390 pull request from me:
- Host large page support for KVM guests. As the patches have large
impact on arch/s390/mm/ this series goes out via both the KVM and
the s390 tree.
- Add an option for no compression to the "Kernel compression mode"
menu, this will come in handy with the rework of the early boot
code.
- A large rework of the early boot code that will make life easier
for KASAN and KASLR. With the rework the bootable uncompressed
image is not generated anymore, only the bzImage is available. For
debuggung purposes the new "no compression" option is used.
- Re-enable the gcc plugins as the issue with the latent entropy
plugin is solved with the early boot code rework.
- More spectre relates changes:
+ Detect the etoken facility and remove expolines automatically.
+ Add expolines to a few more indirect branches.
- A rewrite of the common I/O layer trace points to make them
consumable by 'perf stat'.
- Add support for format-3 PCI function measurement blocks.
- Changes for the zcrypt driver:
+ Add attributes to indicate the load of cards and queues.
+ Restructure some code for the upcoming AP device support in KVM.
- Build flags improvements in various Makefiles.
- A few fixes for the kdump support.
- A couple of patches for gcc 8 compile warning cleanup.
- Cleanup s390 specific proc handlers.
- Add s390 support to the restartable sequence self tests.
- Some PTR_RET vs PTR_ERR_OR_ZERO cleanup.
- Lots of bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (107 commits)
s390/dasd: fix hanging offline processing due to canceled worker
s390/dasd: fix panic for failed online processing
s390/mm: fix addressing exception after suspend/resume
rseq/selftests: add s390 support
s390: fix br_r1_trampoline for machines without exrl
s390/lib: use expoline for all bcr instructions
s390/numa: move initial setup of node_to_cpumask_map
s390/kdump: Fix elfcorehdr size calculation
s390/cpum_sf: save TOD clock base in SDBs for time conversion
KVM: s390: Add huge page enablement control
s390/mm: Add huge page gmap linking support
s390/mm: hugetlb pages within a gmap can not be freed
KVM: s390: Add skey emulation fault handling
s390/mm: Add huge pmd storage key handling
s390/mm: Clear skeys for newly mapped huge guest pmds
s390/mm: Clear huge page storage keys on enable_skey
s390/mm: Add huge page dirty sync support
s390/mm: Add gmap pmd invalidation and clearing
s390/mm: Add gmap pmd notification bit setting
s390/mm: Add gmap pmd linking
...
Pull x86 timer updates from Thomas Gleixner:
"Early TSC based time stamping to allow better boot time analysis.
This comes with a general cleanup of the TSC calibration code which
grew warts and duct taping over the years and removes 250 lines of
code. Initiated and mostly implemented by Pavel with help from various
folks"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/kvmclock: Mark kvm_get_preset_lpj() as __init
x86/tsc: Consolidate init code
sched/clock: Disable interrupts when calling generic_sched_clock_init()
timekeeping: Prevent false warning when persistent clock is not available
sched/clock: Close a hole in sched_clock_init()
x86/tsc: Make use of tsc_calibrate_cpu_early()
x86/tsc: Split native_calibrate_cpu() into early and late parts
sched/clock: Use static key for sched_clock_running
sched/clock: Enable sched clock early
sched/clock: Move sched clock initialization and merge with generic clock
x86/tsc: Use TSC as sched clock early
x86/tsc: Initialize cyc2ns when tsc frequency is determined
x86/tsc: Calibrate tsc only once
ARM/time: Remove read_boot_clock64()
s390/time: Remove read_boot_clock64()
timekeeping: Default boot time offset to local_clock()
timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
s390/time: Add read_persistent_wall_and_boot_offset()
x86/xen/time: Output xen sched_clock time from 0
x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
...
Pull x86 PTI updates from Thomas Gleixner:
"The Speck brigade sadly provides yet another large set of patches
destroying the perfomance which we carefully built and preserved
- PTI support for 32bit PAE. The missing counter part to the 64bit
PTI code implemented by Joerg.
- A set of fixes for the Global Bit mechanics for non PCID CPUs which
were setting the Global Bit too widely and therefore possibly
exposing interesting memory needlessly.
- Protection against userspace-userspace SpectreRSB
- Support for the upcoming Enhanced IBRS mode, which is preferred
over IBRS. Unfortunately we dont know the performance impact of
this, but it's expected to be less horrible than the IBRS
hammering.
- Cleanups and simplifications"
* 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
x86/mm/pti: Move user W+X check into pti_finalize()
x86/relocs: Add __end_rodata_aligned to S_REL
x86/mm/pti: Clone kernel-image on PTE level for 32 bit
x86/mm/pti: Don't clear permissions in pti_clone_pmd()
x86/mm/pti: Fix 32 bit PCID check
x86/mm/init: Remove freed kernel image areas from alias mapping
x86/mm/init: Add helper for freeing kernel image pages
x86/mm/init: Pass unconverted symbol addresses to free_init_pages()
mm: Allow non-direct-map arguments to free_reserved_area()
x86/mm/pti: Clear Global bit more aggressively
x86/speculation: Support Enhanced IBRS on future CPUs
x86/speculation: Protect against userspace-userspace spectreRSB
x86/kexec: Allocate 8k PGDs for PTI
Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables"
x86/mm: Remove in_nmi() warning from vmalloc_fault()
x86/entry/32: Check for VM86 mode in slow-path check
perf/core: Make sure the ring-buffer is mapped in all page-tables
x86/pti: Check the return value of pti_user_pagetable_walk_pmd()
x86/pti: Check the return value of pti_user_pagetable_walk_p4d()
x86/entry/32: Add debug code to check entry/exit CR3
...
Pull x86 vdso update from Thomas Gleixner:
"Use LD to link the VDSO libs instead of indirecting trough CC which
causes build failures with Clang"
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: vdso: Use $LD instead of $CC to link
Add addition argument 'arch_uprobe' to uprobe_write_opcode().
We need this in later set of patches.
Link: http://lkml.kernel.org/r/20180809041856.1547-3-ravi.bangoria@linux.ibm.com
Reviewed-by: Song Liu <songliubraving@fb.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull misc x86 fixes from Thomas Gleixner:
"Two fixes for x86:
- Provide a declaration for native_save_fl() which unbreaks the
wreckage caused by making it 'extern inline'.
- Fix the failing paravirt patching which is supposed to replace
indirect with direct calls. The wreckage is caused by an incorrect
clobber test"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
x86/irqflags: Provide a declaration for native_save_fl
Pull x86 mm updates from Thomas Gleixner:
- Make lazy TLB mode even lazier to avoid pointless switch_mm()
operations, which reduces CPU load by 1-2% for memcache workloads
- Small cleanups and improvements all over the place
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Remove redundant check for kmem_cache_create()
arm/asm/tlb.h: Fix build error implicit func declaration
x86/mm/tlb: Make clear_asid_other() static
x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off()
x86/mm/tlb: Always use lazy TLB mode
x86/mm/tlb: Only send page table free TLB flush to lazy TLB CPUs
x86/mm/tlb: Make lazy TLB mode lazier
x86/mm/tlb: Restructure switch_mm_irqs_off()
x86/mm/tlb: Leave lazy TLB mode at page table free time
mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) dynamically based on nr_cpu_ids
x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap: Update pgtable free interfaces with addr
x86/mm: Disable ioremap free page handling on x86-PAE
Pull x86 platform updates from Thomas Gleixner:
"Trivial cleanups and improvements"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/UV: Remove redundant check of p == q
x86/platform/olpc: Use PTR_ERR_OR_ZERO()
x86/platform/UV: Mark memblock related init code and data correctly
Pull x86 cache QoS (RDT/CAR) updates from Thomas Gleixner:
"Add support for pseudo-locked cache regions.
Cache Allocation Technology (CAT) allows on certain CPUs to isolate a
region of cache and 'lock' it. Cache pseudo-locking builds on the fact
that a CPU can still read and write data pre-allocated outside its
current allocated area on cache hit. With cache pseudo-locking data
can be preloaded into a reserved portion of cache that no application
can fill, and from that point on will only serve cache hits. The cache
pseudo-locked memory is made accessible to user space where an
application can map it into its virtual address space and thus have a
region of memory with reduced average read latency.
The locking is not perfect and gets totally screwed by WBINDV and
similar mechanisms, but it provides a reasonable enhancement for
certain types of latency sensitive applications.
The implementation extends the current CAT mechanism and provides a
generally useful exclusive CAT mode on which it builds the extra
pseude-locked regions"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
x86/intel_rdt: Disable PMU access
x86/intel_rdt: Fix possible circular lock dependency
x86/intel_rdt: Make CPU information accessible for pseudo-locked regions
x86/intel_rdt: Support restoration of subset of permissions
x86/intel_rdt: Fix cleanup of plr structure on error
x86/intel_rdt: Move pseudo_lock_region_clear()
x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
x86/intel_rdt: Support L3 cache performance event of Broadwell
x86/intel_rdt: More precise L2 hit/miss measurements
x86/intel_rdt: Create character device exposing pseudo-locked region
x86/intel_rdt: Create debugfs files for pseudo-locking testing
x86/intel_rdt: Create resctrl debug area
x86/intel_rdt: Ensure RDT cleanup on exit
x86/intel_rdt: Resctrl files reflect pseudo-locked information
x86/intel_rdt: Support creation/removal of pseudo-locked region
x86/intel_rdt: Pseudo-lock region creation/removal core
x86/intel_rdt: Discover supported platforms via prefetch disable bits
x86/intel_rdt: Add utilities to test pseudo-locked region possibility
x86/intel_rdt: Split resource group removal in two
x86/intel_rdt: Enable entering of pseudo-locksetup mode
...
Pull x86/hyper-v update from Thomas Gleixner:
"Add fast hypercall support for guest running on the Microsoft HyperV(isor)"
* 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hyper-v: Fix wrong merge conflict resolution
x86/hyper-v: Check for VP_INVAL in hyperv_flush_tlb_others()
x86/hyper-v: Check cpumask_to_vpset() return value in hyperv_flush_tlb_others_ex()
x86/hyper-v: Trace PV IPI send
x86/hyper-v: Use cheaper HVCALL_SEND_IPI hypercall when possible
x86/hyper-v: Use 'fast' hypercall for HVCALL_SEND_IPI
x86/hyper-v: Implement hv_do_fast_hypercall16
x86/hyper-v: Use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible
Pull x86 dump printing cleanup from Thomas Gleixner:
"Clean up the show_opcodes() printout so nested dumps can be properly
differentiated"
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Avoid pr_cont() in show_opcodes()
Pull x86 cpu updates from Thomas Gleixner:
"Two small updates for the CPU code:
- Improve NUMA emulation
- Add the EPT_AD CPU feature bit"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpufeatures: Add EPT_AD feature bit
x86/numa_emulation: Introduce uniform split capability
x86/numa_emulation: Fix emulated-to-physical node mapping
Pull x86 cleanups from Thomas Gleixner:
"Trival cleanups"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/iommu: Use NULL instead of 0
x86/platform/pcspeaker: Use PTR_ERR_OR_ZERO() to fix ptr_ret.cocci warning
Pull x86 build cleanup from Thomas Gleixner:
"Remove a stale quirk for a no longer supported GCC version"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Remove old -funit-at-a-time GCC quirk
Pull x86 asm updates from Thomas Gleixner:
"The lowlevel and ASM code updates for x86:
- Make stack trace unwinding more reliable
- ASM instruction updates for better code generation
- Various cleanups"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Add two more instruction suffixes
x86/asm/64: Use 32-bit XOR to zero registers
x86/build/vdso: Simplify 'cmd_vdso2c'
x86/build/vdso: Remove unused vdso-syms.lds
x86/stacktrace: Enable HAVE_RELIABLE_STACKTRACE for the ORC unwinder
x86/unwind/orc: Detect the end of the stack
x86/stacktrace: Do not fail for ORC with regs on stack
x86/stacktrace: Clarify the reliable success paths
x86/stacktrace: Remove STACKTRACE_DUMP_ONCE
x86/stacktrace: Do not unwind after user regs
x86/asm: Use CC_SET/CC_OUT in percpu_cmpxchg8b_double() to micro-optimize code generation
Pull x86 boot updates from Thomas Gleixner:
"Boot code updates for x86:
- Allow to skip a given amount of huge pages for address layout
randomization on the kernel command line to prevent regressions in
the huge page allocation with small memory sizes
- Various cleanups"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Use CC_SET()/CC_OUT() instead of open coding it
x86/boot/KASLR: Make local variable mem_limit static
x86/boot/KASLR: Skip specified number of 1GB huge pages when doing physical randomization (KASLR)
x86/boot/KASLR: Add two new functions for 1GB huge pages handling
Pull x86 apic update from Thomas Gleixner:
"Trivial cleanups of the APIC related code"
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Trivial coding style fixes
x86/vector: Merge allocate_vector() into assign_vector_locked()
Pull perf update from Thomas Gleixner:
"The perf crowd presents:
Kernel updates:
- Removal of jprobes
- Cleanup and consolidatation the handling of kprobes
- Cleanup and consolidation of hardware breakpoints
- The usual pile of fixes and updates to PMUs and event descriptors
Tooling updates:
- Updates and improvements all over the place. Nothing outstanding,
just the (good) boring incremental grump work"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
perf trace: Do not require --no-syscalls to suppress strace like output
perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h
perf tools: Allow overriding MAX_NR_CPUS at compile time
perf bpf: Show better message when failing to load an object
perf list: Unify metric group description format with PMU event description
perf vendor events arm64: Update ThunderX2 implementation defined pmu core events
perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet
perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet
perf cs-etm: Fix start tracing packet handling
perf build: Fix installation directory for eBPF
perf c2c report: Fix crash for empty browser
perf tests: Fix indexing when invoking subtests
perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args
perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg
perf trace beauty: Do not print NULL strarray entries
perf beauty: Add a generator for IPPROTO_ socket's protocol constants
tools include uapi: Grab a copy of linux/in.h
perf tests: Fix complex event name parsing
perf evlist: Fix error out while applying initial delay and LBR
...
Pull locking/atomics update from Thomas Gleixner:
"The locking, atomics and memory model brains delivered:
- A larger update to the atomics code which reworks the ordering
barriers, consolidates the atomic primitives, provides the new
atomic64_fetch_add_unless() primitive and cleans up the include
hell.
- Simplify cmpxchg() instrumentation and add instrumentation for
xchg() and cmpxchg_double().
- Updates to the memory model and documentation"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
locking/atomics: Rework ordering barriers
locking/atomics: Instrument cmpxchg_double*()
locking/atomics: Instrument xchg()
locking/atomics: Simplify cmpxchg() instrumentation
locking/atomics/x86: Reduce arch_cmpxchg64*() instrumentation
tools/memory-model: Rename litmus tests to comply to norm7
tools/memory-model/Documentation: Fix typo, smb->smp
sched/Documentation: Update wake_up() & co. memory-barrier guarantees
locking/spinlock, sched/core: Clarify requirements for smp_mb__after_spinlock()
sched/core: Use smp_mb() in wake_woken_function()
tools/memory-model: Add informal LKMM documentation to MAINTAINERS
locking/atomics/Documentation: Describe atomic_set() as a write operation
tools/memory-model: Make scripts executable
tools/memory-model: Remove ACCESS_ONCE() from model
tools/memory-model: Remove ACCESS_ONCE() from recipes
locking/memory-barriers.txt/kokr: Update Korean translation to fix broken DMA vs. MMIO ordering example
MAINTAINERS: Add Daniel Lustig as an LKMM reviewer
tools/memory-model: Fix ISA2+pooncelock+pooncelock+pombonce name
tools/memory-model: Add litmus test for full multicopy atomicity
locking/refcount: Always allow checked forms
...
Pull scheduler updates from Thomas Gleixner:
- Cleanup and improvement of NUMA balancing
- Refactoring and improvements to the PELT (Per Entity Load Tracking)
code
- Watchdog simplification and related cleanups
- The usual pile of small incremental fixes and improvements
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
watchdog: Reduce message verbosity
stop_machine: Reflow cpu_stop_queue_two_works()
sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
sched/numa: Use group_weights to identify if migration degrades locality
sched/numa: Update the scan period without holding the numa_group lock
sched/numa: Remove numa_has_capacity()
sched/numa: Modify migrate_swap() to accept additional parameters
sched/numa: Remove unused task_capacity from 'struct numa_stats'
sched/numa: Skip nodes that are at 'hoplimit'
sched/debug: Reverse the order of printing faults
sched/numa: Use task faults only if numa_group is not yet set up
sched/numa: Set preferred_node based on best_cpu
sched/numa: Simplify load_too_imbalanced()
sched/numa: Evaluate move once per node
sched/numa: Remove redundant field
sched/debug: Show the sum wait time of a task group
sched/fair: Remove #ifdefs from scale_rt_capacity()
sched/core: Remove get_cpu() from sched_fork()
sched/cpufreq: Clarify sugov_get_util()
sched/sysctl: Remove unused sched_time_avg_ms sysctl
...
Pull x86 RAS updates from Thomas Gleixner:
"A small set of changes to the RAS core:
- Rework of the MCE bank scanning code
- Y2038 converion"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Cleanup __mc_scan_banks()
x86/mce: Carve out bank scanning code
x86/mce: Remove !banks check
x86/mce: Carve out the crashing_cpu check
x86/mce: Always use 64-bit timestamps
Pull genirq updates from Thomas Gleixner:
"The irq departement provides:
- A synchronization fix for free_irq() to synchronize just the
removed interrupt thread on shared interrupt lines.
- Consolidate the multi low level interrupt entry handling and mvoe
it to the generic code instead of adding yet another copy for
RISC-V
- Refactoring of the ARM LPI allocator and LPI exposure to the
hypervisor
- Yet another interrupt chip driver for the JZ4725B SoC
- Speed up for /proc/interrupts as people seem to love reading this
file with high frequency
- Miscellaneous fixes and updates"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
irqchip/gic-v3-its: Make its_lock a raw_spin_lock_t
genirq/irqchip: Remove MULTI_IRQ_HANDLER as it's now obselete
openrisc: Use the new GENERIC_IRQ_MULTI_HANDLER
arm64: Use the new GENERIC_IRQ_MULTI_HANDLER
ARM: Convert to GENERIC_IRQ_MULTI_HANDLER
irqchip: Port the ARM IRQ drivers to GENERIC_IRQ_MULTI_HANDLER
irqchip/gic-v3-its: Reduce minimum LPI allocation to 1 for PCI devices
dt-bindings: irqchip: renesas-irqc: Document r8a77980 support
dt-bindings: irqchip: renesas-irqc: Document r8a77470 support
irqchip/ingenic: Add support for the JZ4725B SoC
irqchip/stm32: Add exti0 translation for stm32mp1
genirq: Remove redundant NULL pointer check in __free_irq()
irqchip/gic-v3-its: Honor hypervisor enforced LPI range
irqchip/gic-v3: Expose GICD_TYPER in the rdist structure
irqchip/gic-v3-its: Drop chunk allocation compatibility
irqchip/gic-v3-its: Move minimum LPI requirements to individual busses
irqchip/gic-v3-its: Use full range of LPIs
irqchip/gic-v3-its: Refactor LPI allocator
genirq: Synchronize only with single thread on free_irq()
genirq: Update code comments wrt recycled thread_mask
...
Pull EFI updates from Thomas Gleixner:
"The EFI pile:
- Make mixed mode UEFI runtime service invocations mutually
exclusive, as mandated by the UEFI spec
- Perform UEFI runtime services calls from a work queue so the calls
into the firmware occur from a kernel thread
- Honor the UEFI memory map attributes for live memory regions
configured by UEFI as a framebuffer. This works around a coherency
problem with KVM guests running on ARM.
- Cleanups, improvements and fixes all over the place"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efivars: Call guid_parse() against guid_t type of variable
efi/cper: Use consistent types for UUIDs
efi/x86: Replace references to efi_early->is64 with efi_is_64bit()
efi: Deduplicate efi_open_volume()
efi/x86: Add missing NULL initialization in UGA draw protocol discovery
efi/x86: Merge 32-bit and 64-bit UGA draw protocol setup routines
efi/x86: Align efi_uga_draw_protocol typedef names to convention
efi/x86: Merge the setup_efi_pci32() and setup_efi_pci64() routines
efi/x86: Prevent reentrant firmware calls in mixed mode
efi/esrt: Only call efi_mem_reserve() for boot services memory
fbdev/efifb: Honour UEFI memory map attributes when mapping the FB
efi: Drop type and attribute checks in efi_mem_desc_lookup()
efi/libstub/arm: Add opt-in Kconfig option for the DTB loader
efi: Remove the declaration of efi_late_init() as the function is unused
efi/cper: Avoid using get_seconds()
efi: Use a work queue to invoke EFI Runtime Services
efi/x86: Use non-blocking SetVariable() for efi_delete_dummy_variable()
efi/x86: Clean up the eboot code
Last set of new features for 4.19. Most notable is simplifying SSB
debugging code with two Kconfig option removals and fixing mt76 USB
build problems.
Major changes:
ath10k
* add debugfs file warm_hw_reset
wil6210
* add debugfs files tx_latency, link_stats and link_stats_global
* add 3-MSI support
* allow scan on AP interface
* support max aggregation window size 64
ssb
* remove CONFIG_SSB_SILENT and CONFIG_SSB_DEBUG Kconfig options
mt76
* fix build problems with recently added USB support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbcHsyAAoJEG4XJFUm622biTsH/17f4jCOjg49PsLxX7GhJdyS
3LiauBKgT+7s5HU9Ak7ogkM16uzb3yjw22VsO/6SEoUIAG77P2Ns1/8u8fgfvQxm
RyiPglctfLMFlzAyOUihkVmKICH0pjVCU5AB94+71g8ZjSkvxRF0SbeP5an4ojYt
0rTcSbrweuay5YDw4bTpJ7QkSDPYoyNrcTPMTwdjIqry498NO2SMGv573TTn4jeT
cWxQLKIDtmCPdrvRcyOsNLsAve73x760glWqmtFJuyVtgwjnWUyVlNSlTcVTW4OR
NsrTVzfA5AbZb56WQmhmBxVcKn9UwKxPIwjZ9I3zydBagsODIWkRoViBTXNQjsk=
=aha/
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2018-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
pull-request: wireless-drivers-next 2018-08-12
wireless-drivers-next patches for 4.19
Last set of new features for 4.19. Most notable is simplifying SSB
debugging code with two Kconfig option removals and fixing mt76 USB
build problems.
Major changes:
ath10k
* add debugfs file warm_hw_reset
wil6210
* add debugfs files tx_latency, link_stats and link_stats_global
* add 3-MSI support
* allow scan on AP interface
* support max aggregation window size 64
ssb
* remove CONFIG_SSB_SILENT and CONFIG_SSB_DEBUG Kconfig options
mt76
* fix build problems with recently added USB support
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Enable mac_scsi PDMA on PowerBook 500,
- Generic dma_noncoherent_ops conversion,
- Time handling improvements,
- I/O accessor improvements,
- Conversion to MEMBLOCK and NO_BOOTMEM, to bring m68k in line with
other mainstream architectures,
- Miscellaneous fixes and cleanups,
- Defconfig updates.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQQ9qaHoIs/1I4cXmEiKwlD9ZEnxcAUCW2rv8BUcZ2VlcnRAbGlu
dXgtbTY4ay5vcmcACgkQisJQ/WRJ8XBRAQD+O3x187pxeICV5LUC4SGtPj3kw07e
wr1NA0hnRqri77MBAPNx96fy0wnQAgMuvIghpEYqX6ha2O4NTxKUmLdxl/QL
=gJGl
-----END PGP SIGNATURE-----
Merge tag 'm68k-for-v4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- Enable mac_scsi PDMA on PowerBook 500
- Generic dma_noncoherent_ops conversion
- Time handling improvements
- I/O accessor improvements
- Conversion to MEMBLOCK and NO_BOOTMEM, to bring m68k in line with
other mainstream architectures
- Miscellaneous fixes and cleanups
- Defconfig updates
* tag 'm68k-for-v4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k/defconfig: Update defconfigs for v4.18-rc6
m68k: switch to MEMBLOCK + NO_BOOTMEM
m68k/page_no.h: force __va argument to be unsigned long
m68k/bitops: convert __ffs to match generic declaration
m68k/io: Switch mmu variant to <asm-generic/io.h>
m68k/io: Move mem*io define guards to <asm/kmap.h>
Input: hilkbd - Add casts to HP9000/300 I/O accessors
net: mac8390: Use standard memcpy_{from,to}io()
m68k/io: Add missing ioremap define guards, fix typo
m68k: Remove unused set_clock_mmss() helpers
m68k: mac: Use time64_t in RTC handling
m68k: Use generic dma_noncoherent_ops
nubus: Set default dma mask for nubus_board devices
m68k/mac: Enable PDMA for PowerBook 500 series
Enabling both CONFIG_PERF_EVENTS without !CONFIG_SMP
generates following compilation error.
arch/riscv/include/asm/perf_event.h:80:2: error: expected
specifier-qualifier-list before 'irqreturn_t'
irqreturn_t (*handle_irq)(int irq_num, void *dev);
^~~~~~~~~~~
Include interrupt.h in proper place to avoid compilation
error.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Add a driver for the SiFive implementation of the RISC-V Platform Level
Interrupt Controller (PLIC). The PLIC connects global interrupt sources
to the local interrupt controller on each hart.
This driver is based on the driver in the RISC-V tree from Palmer Dabbelt,
but has been almost entirely rewritten since, and includes many fixes
from Atish Patra.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
[Binding update by Palmer]
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
The stvec's value must be 4 byte alignment by specification definition.
These directives avoid to stvec be set the non-alignment value.
Signed-off-by: Zong Li <zong@andestech.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
The RISC-V ISA defines a per-hart real-time clock and timer, which is
present on all systems. The clock is accessed via the 'rdtime'
pseudo-instruction (which reads a CSR), and the timer is set via an SBI
call.
Contains various improvements from Atish Patra <atish.patra@wdc.com>.
Signed-off-by: Dmitriy Cherkasov <dmitriy@oss-tech.org>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
[hch: remove dead code, add SPDX tags, used riscv_of_processor_hart(),
minor cleanups, merged hotplug cpu support and other improvements
from Atish]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Add support for a routine that dispatches exceptions with the interrupt
flags set to either the IPI or irqdomain code (and the clock source in the
future).
Loosely based on the irq-riscv-int.c irqchip driver from the RISC-V tree.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This mirrors the SIE_SSIE and SETE bits that are used in a similar
fashion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
These are only of use to the local irq controller driver, so add them in
that driver implementation instead, which will be submitted soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Rename handle_ipi to riscv_software_interrupt, drop the unused return
value and move the prototype to irq.h together with riscv_timer_interupt.
This allows simplifying the upcoming interrupt handling support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This code is currently unused and will be added back later in a different
place with the real interrupt and clocksource support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This code lives entirely within the RISC-V arch code. I've left it
within an "#ifdef CONFIG_EARLY_PRINTK" despite always having
EARLY_PRINTK support on RISC-V just in case someone wants to remove
it.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Adding 4 to sepc is pointless, and is wrong if we executed a 2-byte
compressed breakpoint. This plus a corresponding gdb patch allows
compressed breakpoints to work in gdb. Gdb maintainers have already
agreed that this is the right approach.
Signed-off-by: Jim Wilson <jimw@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
If you use a 64-bit compiler to build a 32-bit kernel then you'll get an
error when building the vDSO due to a library mismatch. The happens
because the relevant "-march" argument isn't supplied to the GCC run
that generates one of the vDSO intermediate files.
I'm not actually sure what the right thing to do here is as I'm not
particularly familiar with the kernel build system. I poked the
documentation and it appears that KCFLAGS is the correct thing to do
(it's suggested that should be used when building modules), but we set
KBUILD_CFLAGS in arch/riscv/Makefile.
This does at least fix the build error.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
This patchset fixes and improves stack unwinding a lot:
1. Show backward stack traces with up to 30 callsites
2. Add callinfo to ENTRY_CFI() such that every assembler function will get an
entry in the unwind table
3. Use constants instead of numbers in call_on_stack()
4. Do not depend on CONFIG_KALLSYMS to generate backtraces.
5. Speed up backtrace generation
Make sure you have this patch to GNU as installed:
https://sourceware.org/ml/binutils/2018-07/msg00474.html
Without this patch, unwind info in the kernel is often wrong for various
functions.
Signed-off-by: Helge Deller <deller@gmx.de>
Now that mb() is an instruction barrier, it will slow performance if we issue
unnecessary barriers.
The spinlock defines have a number of unnecessary barriers. The __ldcw()
define is both a hardware and compiler barrier. The mb() barriers in the
routines using __ldcw() serve no purpose.
The only barrier needed is the one in arch_spin_unlock(). We need to ensure
all accesses are complete prior to releasing the lock.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Now that we use a sync prior to releasing the locks in syscall.S, we don't need
the PA 2.0 ordered stores used to release some locks. Using an ordered store,
potentially slows the release and subsequent code.
There are a number of other ordered stores and loads that serve no purpose. I
have converted these to normal stores.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
As part of the effort to reduce the code duplication between _THIS_IP_
and current_text_addr(), let's consolidate callers of
current_text_addr() to use _THIS_IP_.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Some parts of the HAVE_REGS_AND_STACK_ACCESS_API feature is needed for
the rseq syscall. This patch adds the most important parts, and as long
as we don't support kprobes, we should be fine.
Signed-off-by: Helge Deller <deller@gmx.de>
parisc is the only Linux architecture which has defined a value for ENOTSUP.
All other architectures #define ENOTSUP as EOPNOTSUPP in their libc headers.
Having an own value for ENOTSUP which is different than EOPNOTSUPP often gives
problems with userspace programs which expect both to be the same. One such
example is a build error in the libuv package, as can be seen in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900237.
Since we dropped HP-UX support, there is no real benefit in keeping an own
value for ENOTSUP. This patch drops the parisc value for ENOTSUP from the
kernel sources. glibc needs no patch, it reuses the exported headers.
Signed-off-by: Helge Deller <deller@gmx.de>
Switch to the generic noncoherent direct mapping implementation.
Fix sync_single_for_cpu to do skip the cache flush unless the transfer
is to the device to match the more tested unmap_single path which should
have the same cache coherency implications.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Current the S/G list based DMA ops use flush_kernel_vmap_range which
contains a few UP optimizations, while the rest of the DMA operations
uses flush_kernel_dcache_range. The single vs sg operations are supposed
to have the same effect, so they should use the same routines. Use
the more conservation version for now, but if people more familiar with
parisc think the vmap version is generally fine for DMA we should switch
all interfaces over to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Helge Deller <deller@gmx.de>
The only difference is that pcxl supports dma coherent allocations, while
pcx only supports non-consistent allocations and otherwise fails.
But dma_alloc* is not in the fast path, and merging these two allows an
easy migration path to the generic dma-noncoherent implementation, so
do it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Helge Deller <deller@gmx.de>
Add statistics that show how memory is mapped within the kernel linear mapping.
This is similar to commit 37cd944c8d ("s390/pgtable: add mapping statistics")
We don't do this with Hash translation mode. Hash uses one size (mmu_linear_psize)
to map the kernel linear mapping and we print the linear psize during boot as
below.
"Page orders: linear mapping = 24, virtual = 16, io = 16, vmemmap = 24"
A sample output looks like:
DirectMap4k: 0 kB
DirectMap64k: 18432 kB
DirectMap2M: 1030144 kB
DirectMap1G: 11534336 kB
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Return statements in functions returning bool should use true or false
instead of an integer value.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to generate Group0 SGIs, let's add some decoding logic to
access_gic_sgi(), and pass the generating group accordingly.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to generate Group0 SGIs, let's add some decoding logic to
access_gic_sgi(), and pass the generating group accordingly.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Although vgic-v3 now supports Group0 interrupts, it still doesn't
deal with Group0 SGIs. As usually with the GIC, nothing is simple:
- ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1
with KVM (as per 8.1.10, Non-secure EL1 access)
- ICC_SGI0R can only generate Group0 SGIs
- ICC_ASGI1R sees its scope refocussed to generate only Group0
SGIs (as per the note at the bottom of Table 8-14)
We only support Group1 SGIs so far, so no material change.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
ICC_SGI1R is a 64bit system register, even on AArch32. It is thus
pointless to have such an encoding in the 32bit cp15 array. Let's
drop it.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Holding uts_sem as a writer while accessing userspace memory allows a
namespace admin to stall all processes that attempt to take uts_sem.
Instead, move data through stack buffers and don't access userspace memory
while uts_sem is held.
Cc: stable@vger.kernel.org
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
With gcc-8 fsanitize=null become very noisy. GCC started to complain
about things like &a->b, where 'a' is NULL pointer. There is no NULL
dereference, we just calculate address to struct member. It's
technically undefined behavior so UBSAN is correct to report it. But as
long as there is no real NULL-dereference, I think, we should be fine.
-fno-delete-null-pointer-checks compiler flag should protect us from any
consequences. So let's just no use -fsanitize=null as it's not useful
for us. If there is a real NULL-deref we will see crash. Even if
userspace mapped something at NULL (root can do this), with things like
SMAP should catch the issue.
Link: http://lkml.kernel.org/r/20180802153209.813-1-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since at least the beginning of the git era we've declared our TLB
exception handling functions inconsistently. They're actually functions,
but we declare them as arrays of u32 where each u32 is an encoded
instruction. This has always been the case for arch/mips/mm/tlbex.c, and
has also been true for arch/mips/kernel/traps.c since commit
86a1708a9d ("MIPS: Make tlb exception handler definitions and
declarations match.") which aimed for consistency but did so by
consistently making the our C code inconsistent with our assembly.
This is all usually harmless, but when using GCC 7 or newer to build a
kernel targeting microMIPS (ie. CONFIG_CPU_MICROMIPS=y) it becomes
problematic. With microMIPS bit 0 of the program counter indicates the
ISA mode. When bit 0 is zero instructions are decoded using the standard
MIPS32 or MIPS64 ISA. When bit 0 is one instructions are decoded using
microMIPS. This means that function pointers become odd - their least
significant bit is one for microMIPS code. We work around this in cases
where we need to access code using loads & stores with our
msk_isa16_mode() macro which simply clears bit 0 of the value it is
given:
#define msk_isa16_mode(x) ((x) & ~0x1)
For example we do this for our TLB load handler in
build_r4000_tlb_load_handler():
u32 *p = (u32 *)msk_isa16_mode((ulong)handle_tlbl);
We then write code to p, expecting it to be suitably aligned (our LEAF
macro aligns functions on 4 byte boundaries, so (ulong)handle_tlbl will
give a value one greater than a multiple of 4 - ie. the start of a
function on a 4 byte boundary, with the ISA mode bit 0 set).
This worked fine up to GCC 6, but GCC 7 & onwards is smart enough to
presume that handle_tlbl which we declared as an array of u32s must be
aligned sufficiently that bit 0 of its address will never be set, and as
a result optimize out msk_isa16_mode(). This leads to p having an
address with bit 0 set, and when we go on to attempt to store code at
that address we take an address error exception due to the unaligned
memory access.
This leads to an exception prior to the kernel having configured its own
exception handlers, so we jump to whatever handlers the bootloader
configured. In the case of QEMU this results in a silent hang, since it
has no useful general exception vector.
Fix this by consistently declaring our TLB-related functions as
functions. For handle_tlbl(), handle_tlbs() & handle_tlbm() we do this
in asm/tlbex.h & we make use of the existing declaration of
tlbmiss_handler_setup_pgd() in asm/mmu_context.h. Our TLB handler
generation code in arch/mips/mm/tlbex.c is adjusted to deal with these
definitions, in most cases simply by casting the function pointers to
u32 pointers.
This allows us to include asm/mmu_context.h in arch/mips/mm/tlbex.c to
get the definitions of tlbmiss_handler_setup_pgd & pgd_current, removing
some needless duplication. Consistently using msk_isa16_mode() on
function pointers means we no longer need the
tlbmiss_handler_setup_pgd_start symbol so that is removed entirely.
Now that we're declaring our functions as functions GCC stops optimizing
out msk_isa16_mode() & a microMIPS kernel built with either GCC 7.3.0 or
8.1.0 boots successfully.
Signed-off-by: Paul Burton <paul.burton@mips.com>
We export tlbmiss_handler_setup_pgd in arch/mips/mm/tlbex.c close to a
declaration of it, rather than close to its definition as is standard.
We've supported exporting symbols in assembly code since commit
22823ab419 ("EXPORT_SYMBOL() for asm"), so move the export to follow
the function's (stub) definition.
Signed-off-by: Paul Burton <paul.burton@mips.com>
The user page-table gets the updated kernel mappings in pti_finalize(),
which runs after the RO+X permissions got applied to the kernel page-table
in mark_readonly().
But with CONFIG_DEBUG_WX enabled, the user page-table is already checked in
mark_readonly() for insecure mappings. This causes false-positive
warnings, because the user page-table did not get the updated mappings yet.
Move the W+X check for the user page-table into pti_finalize() after it
updated all required mappings.
[ tglx: Folded !NX supported fix ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533727000-9172-1-git-send-email-joro@8bytes.org
Currently if you build a 32-bit powerpc kernel and use get_user() to
load a u64 value it will fail to build with eg:
kernel/rseq.o: In function `rseq_get_rseq_cs':
kernel/rseq.c:123: undefined reference to `__get_user_bad'
This is hitting the check in __get_user_size() that makes sure the
size we're copying doesn't exceed the size of the destination:
#define __get_user_size(x, ptr, size, retval)
do {
retval = 0;
__chk_user_ptr(ptr);
if (size > sizeof(x))
(x) = __get_user_bad();
Which doesn't immediately make sense because the size of the
destination is u64, but it's not really, because __get_user_check()
etc. internally create an unsigned long and copy into that:
#define __get_user_check(x, ptr, size)
({
long __gu_err = -EFAULT;
unsigned long __gu_val = 0;
The problem being that on 32-bit unsigned long is not big enough to
hold a u64. We can fix this with a trick from hpa in the x86 code, we
statically check the type of x and set the type of __gu_val to either
unsigned long or unsigned long long.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The machine check code that flushes and restores bolted segments in
real mode belongs in mm/slb.c. This will also be used by pseries
machine check and idle code in future changes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix the below build error using strlcpy instead of strncpy
In function 'pnv_parse_cpuidle_dt',
inlined from 'pnv_init_idle_states' at arch/powerpc/platforms/powernv/idle.c:840:7,
inlined from '__machine_initcall_powernv_pnv_init_idle_states' at arch/powerpc/platforms/powernv/idle.c:870:1:
arch/powerpc/platforms/powernv/idle.c:820:3: error: 'strncpy' specified bound 16 equals destination size [-Werror=stringop-truncation]
strncpy(pnv_idle_states[i].name, temp_string[i],
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PNV_IDLE_NAME_LEN);
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch makes sure we update the mmu_gather page size even if we are
requesting for a fullmm flush. This avoids triggering VM_WARN_ON in code
paths like __tlb_remove_page_size that explicitly check for removing range page
size to be same as mmu gather page size.
Fixes: 5a6099346c ("powerpc/64s/radix: tlb do not flush on page size when fullmm")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
‘type’ is only used when CONFIG_DEBUG_HIGHMEM is set. So add a possibly
unused tag to variable. Remove warning treated as error with W=1:
arch/powerpc/mm/highmem.c:59:6: error: variable ‘type’ set but not used [-Werror=unused-but-set-variable]
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Make sure to include setup.h to provide the following prototypes:
- irqstack_early_init
- setup_power_save
- initialize_cache_info
Fix the following warnings (treated as error in W=1):
arch/powerpc/kernel/setup_32.c:198:13: error: no previous prototype for ‘irqstack_early_init’
arch/powerpc/kernel/setup_32.c:238:13: error: no previous prototype for ‘setup_power_save’
arch/powerpc/kernel/setup_32.c:253:13: error: no previous prototype for ‘initialize_cache_info’
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add gcc attribute unused for two variables. Fix warnings treated as errors
with W=1:
arch/powerpc/kernel/prom_init.c:1388:8: error: variable ‘path’ set but not used [-Werror=unused-but-set-variable]
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These functions can all be static, make it so. Fix warnings treated as
errors with W=1:
arch/powerpc/platforms/powermac/pci.c:1022:6: error: no previous prototype for ‘pmac_pci_fixup_ohci’
arch/powerpc/platforms/powermac/pci.c:1057:6: error: no previous prototype for ‘pmac_pci_fixup_cardbus’
arch/powerpc/platforms/powermac/pci.c:1094:6: error: no previous prototype for ‘pmac_pci_fixup_pciata’
Remove has_address declaration and assignment since it's not used.
Also add gcc attribute unused to fix a warning treated as error with
W=1:
arch/powerpc/platforms/powermac/pci.c:784:19: error: variable ‘has_address’ set but not used
arch/powerpc/platforms/powermac/pci.c:907:22: error: variable ‘ht’ set but not used
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Since the value of x is never intended to be read, remove it. Fix
warning treated as error with W=1:
arch/powerpc/platforms/powermac/udbg_scc.c:76:9: error: variable ‘x’ set but not used
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The header `pmac.h` was not included, leading to the following warnings,
treated as error with W=1:
arch/powerpc/platforms/powermac/time.c:69:13: error: no previous prototype for ‘pmac_time_init’ [-Werror=missing-prototypes]
arch/powerpc/platforms/powermac/time.c:207:15: error: no previous prototype for ‘pmac_get_boot_time’ [-Werror=missing-prototypes]
arch/powerpc/platforms/powermac/time.c:222:6: error: no previous prototype for ‘pmac_get_rtc_time’ [-Werror=missing-prototypes]
arch/powerpc/platforms/powermac/time.c:240:5: error: no previous prototype for ‘pmac_set_rtc_time’ [-Werror=missing-prototypes]
arch/powerpc/platforms/powermac/time.c:259:12: error: no previous prototype for ‘via_calibrate_decr’ [-Werror=missing-prototypes]
arch/powerpc/platforms/powermac/time.c:311:13: error: no previous prototype for ‘pmac_calibrate_decr’ [-Werror=missing-prototypes]
The function `via_calibrate_decr` was made static to silence a warning.
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add a jump target so that a bit of exception handling can be better
reused at the end of this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, in xmon, there is no obvious way to get an address for a
percpu symbol for a particular cpu. Having such an ability would be
good for debugging the system when percpu variables got involved.
Therefore, this patch introduces a new xmon command "lp" to lookup the
address for percpu symbols. Usage of "lp" is similar to "ls", except
that we could add a cpu number to choose the variable of which cpu we
want to lookup. If no cpu number is given, lookup for current cpu.
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
huge_pte_offset_and_shift() has never existed
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The symbol memcpy_nocache_branch defined in order to allow patching
of memset function once cache is enabled leads to confusing reports
by perf tool.
Using the new patch_site functionality solves this issue.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.
Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.
Severe Machine check interrupt [Recovered]
NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
Initiator: CPU
Error type: SLB [Multihit]
Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
pc: c0000000009694c0: vsnprintf+0x80/0x480
lr: c0000000009698e0: vscnprintf+0x20/0x60
sp: c0000000fc777830
msr: 8000000002009033
dar: a803a30c000000d0
current = 0xc00000000bc9ef00
paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01
pid = 8860, comm = insmod
vscnprintf+0x20/0x60
vprintk_emit+0xb4/0x4b0
vprintk_func+0x5c/0xd0
printk+0x38/0x4c
init_module+0x1c0/0x338 [bork_kernel]
do_one_initcall+0x54/0x230
do_init_module+0x8c/0x248
load_module+0x12b8/0x15b0
sys_finit_module+0xa8/0x110
system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace
This patch fixes this issue.
Fixes: a08a53ea4c ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org # v3.15+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With dynamic memory allocation support for crash memory ranges array,
there is no hard limit on the no. of crash memory ranges kernel could
export, but program headers count could overflow in the /proc/vmcore
ELF file while exporting each memory range as PT_LOAD segment. Reduce
the likelihood of a such scenario, by folding adjacent crash memory
ranges which minimizes the total number of PT_LOAD segments.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e ("memblock: Add array resizing support").
On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:
task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000
NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180
REGS: c00000000b73b570 TRAP: 0300 Tainted: G L X (4.4.140+)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22004484 XER: 20000000
CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0
...
NIP [c000000000047df4] smp_send_reschedule+0x24/0x80
LR [c0000000000f9e58] resched_curr+0x138/0x160
Call Trace:
resched_curr+0x138/0x160 (unreliable)
check_preempt_curr+0xc8/0xf0
ttwu_do_wakeup+0x38/0x150
try_to_wake_up+0x224/0x4d0
__wake_up_common+0x94/0x100
ep_poll_callback+0xac/0x1c0
__wake_up_common+0x94/0x100
__wake_up_sync_key+0x70/0xa0
sock_def_readable+0x58/0xa0
unix_stream_sendmsg+0x2dc/0x4c0
sock_sendmsg+0x68/0xa0
___sys_sendmsg+0x2cc/0x2e0
__sys_sendmsg+0x5c/0xc0
SyS_socketcall+0x36c/0x3f0
system_call+0x3c/0x100
as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.
Fixes: 2df173d9e8 ("fadump: Initialize elfcore header and add PT_LOAD program headers.")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Just use PAGE_SIZE directly, fixup variable placement]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
commit e8cb7a55eb ("powerpc: remove superflous inclusions of
asm/fixmap.h") removed inclusion of asm/fixmap.h from files not
including objects from that file.
However, asm/mmu-8xx.h includes call to __fix_to_virt(). The proper
way would be to include asm/fixmap.h in asm/mmu-8xx.h but it creates
an inclusion loop.
So we have to leave asm/fixmap.h in sysdep/cpm_common.c for
CONFIG_PPC_EARLY_DEBUG_CPM
CC arch/powerpc/sysdev/cpm_common.o
In file included from ./arch/powerpc/include/asm/mmu.h:340:0,
from ./arch/powerpc/include/asm/reg_8xx.h:8,
from ./arch/powerpc/include/asm/reg.h:29,
from ./arch/powerpc/include/asm/processor.h:13,
from ./arch/powerpc/include/asm/thread_info.h:28,
from ./include/linux/thread_info.h:38,
from ./arch/powerpc/include/asm/ptrace.h:159,
from ./arch/powerpc/include/asm/hw_irq.h:12,
from ./arch/powerpc/include/asm/irqflags.h:12,
from ./include/linux/irqflags.h:16,
from ./include/asm-generic/cmpxchg-local.h:6,
from ./arch/powerpc/include/asm/cmpxchg.h:537,
from ./arch/powerpc/include/asm/atomic.h:11,
from ./include/linux/atomic.h:5,
from ./include/linux/mutex.h:18,
from ./include/linux/kernfs.h:13,
from ./include/linux/sysfs.h:16,
from ./include/linux/kobject.h:20,
from ./include/linux/device.h:16,
from ./include/linux/node.h:18,
from ./include/linux/cpu.h:17,
from ./include/linux/of_device.h:5,
from arch/powerpc/sysdev/cpm_common.c:21:
arch/powerpc/sysdev/cpm_common.c: In function ‘udbg_init_cpm’:
./arch/powerpc/include/asm/mmu-8xx.h:218:25: error: implicit declaration of function ‘__fix_to_virt’ [-Werror=implicit-function-declaration]
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: error: ‘FIX_IMMR_BASE’ undeclared (first use in this function)
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: note: each undeclared identifier is reported only once for each function it appears in
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/sysdev/cpm_common.o] Error 1
Fixes: e8cb7a55eb ("powerpc: remove superflous inclusions of asm/fixmap.h")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The problem is the the calculation should be "end - start + 1" but the
plus one is missing in this calculation.
Fixes: 8626816e90 ("powerpc: add support for MPIC message register API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch allows the memory removed by memtrace to be readded to the
kernel. So now you don't have to reboot your system to add the memory
back to the kernel or to have a different amount of memory removed.
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The kernel unnecessarily prevents late microcode loading when SMT is
disabled. It should be safe to allow it if all the primary threads are
online.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Commit 33679a5037 ("MIPS: uasm: Remove needless ISA abstraction")
removed use of the MIPS_ISA preprocessor macro, but left a couple of
unused definitions of it behind.
Remove the dead code.
Signed-off-by: Paul Burton <paul.burton@mips.com>
This new symbol needs to be in the workaround-list for buggy
binutils, otherwise the build with gcc-4.6 fails.
Fixes: 39d668e04e ('x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit')
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linux-Next Mailing List <linux-next@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180809094449.ddmnrkz7qkvo3j2x@suse.de
Pull crypto fix from Herbert Xu:
"This fixes a performance regression in arm64 NEON crypto as well as a
crash in x86 aegis/morus on unsupported CPUs"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/aegis,morus - Fix and simplify CPUID checks
crypto: arm64 - revert NEON yield for fast AEAD implementations
Use the standard WARN_ON instead.
If a small kernel is desired, WARN_ON can be disabled globally.
Also remove SSB_DEBUG. Besides WARN_ON it only adds a tiny debug check.
Include this check unconditionally.
Signed-off-by: Michael Buesch <m@bues.ch>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
The host-progs has been kept as an alias of hostprogs-y for a long time
(at least since the beginning of Git era), with the clear prompt:
Usage of host-progs is deprecated. Please replace with hostprogs-y!
Enough time for the migration has passed.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
In order to remove the additional check before calling the
ghes_notify_sea(), make stub definition when !CONFIG_ACPI_APEI_SEA.
After this cleanup, we can simply call the ghes_notify_sea() to let
APEI driver handle the SEA notification.
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit c9b5ad546e "s390/mm: tag normal pages vs pages used in page tables"
accidentally changed the logic in arch_set_page_states(), which is used by
the suspend/resume code. set_page_stable(page, order) was changed to
set_page_stable_dat(page, 0). After this, only the first page of higher order
pages will be set to stable, and a write to one of the unstable pages will
result in an addressing exception.
Fix this by using "order" again, instead of "0".
Fixes: c9b5ad546e ("s390/mm: tag normal pages vs pages used in page tables")
Cc: stable@vger.kernel.org # 4.14+
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The Cortina PHY is not compatible with IEEE 802.3 clause 45.
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
[scottwood: made commit message about compatibility, not driver choice]
Signed-off-by: Scott Wood <oss@buserror.net>
The Cortina PHY is not compatible with IEEE 802.3 clause 45.
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
[scottwood: made commit message about compatibility, not driver choice]
Signed-off-by: Scott Wood <oss@buserror.net>
Cortina PHYs are present on T4240RDB and T2080RDB. Enable the driver
by default.
Signed-off-by: Camelia Groza <camelia.groza@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
commit e8cb7a55eb ("powerpc: remove superflous inclusions of
asm/fixmap.h") removed inclusion of asm/fixmap.h from files not
including objects from that file.
However, asm/mmu-8xx.h includes call to __fix_to_virt(). The proper
way would be to include asm/fixmap.h in asm/mmu-8xx.h but it creates
an inclusion loop.
So we have to leave asm/fixmap.h in sysdep/cpm_common.c for
CONFIG_PPC_EARLY_DEBUG_CPM
CC arch/powerpc/sysdev/cpm_common.o
In file included from ./arch/powerpc/include/asm/mmu.h:340:0,
from ./arch/powerpc/include/asm/reg_8xx.h:8,
from ./arch/powerpc/include/asm/reg.h:29,
from ./arch/powerpc/include/asm/processor.h:13,
from ./arch/powerpc/include/asm/thread_info.h:28,
from ./include/linux/thread_info.h:38,
from ./arch/powerpc/include/asm/ptrace.h:159,
from ./arch/powerpc/include/asm/hw_irq.h:12,
from ./arch/powerpc/include/asm/irqflags.h:12,
from ./include/linux/irqflags.h:16,
from ./include/asm-generic/cmpxchg-local.h:6,
from ./arch/powerpc/include/asm/cmpxchg.h:537,
from ./arch/powerpc/include/asm/atomic.h:11,
from ./include/linux/atomic.h:5,
from ./include/linux/mutex.h:18,
from ./include/linux/kernfs.h:13,
from ./include/linux/sysfs.h:16,
from ./include/linux/kobject.h:20,
from ./include/linux/device.h:16,
from ./include/linux/node.h:18,
from ./include/linux/cpu.h:17,
from ./include/linux/of_device.h:5,
from arch/powerpc/sysdev/cpm_common.c:21:
arch/powerpc/sysdev/cpm_common.c: In function ‘udbg_init_cpm’:
./arch/powerpc/include/asm/mmu-8xx.h:218:25: error: implicit declaration of function ‘__fix_to_virt’ [-Werror=implicit-function-declaration]
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: error: ‘FIX_IMMR_BASE’ undeclared (first use in this function)
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
./arch/powerpc/include/asm/mmu-8xx.h:218:39: note: each undeclared identifier is reported only once for each function it appears in
#define VIRT_IMMR_BASE (__fix_to_virt(FIX_IMMR_BASE))
^
arch/powerpc/sysdev/cpm_common.c:75:7: note: in expansion of macro ‘VIRT_IMMR_BASE’
VIRT_IMMR_BASE);
^
cc1: all warnings being treated as errors
make[1]: *** [arch/powerpc/sysdev/cpm_common.o] Error 1
Fixes: e8cb7a55eb ("powerpc: remove superflous inclusions of asm/fixmap.h")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
The mmio tracer sets io mapping PTEs and PMDs to non present when enabled
without inverting the address bits, which makes the PTE entry vulnerable
for L1TF.
Make it use the right low level macros to actually invert the address bits
to protect against L1TF.
In principle this could be avoided because MMIO tracing is not likely to be
enabled on production machines, but the fix is straigt forward and for
consistency sake it's better to get rid of the open coded PTE manipulation.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For years I thought all parisc machines executed loads and stores in
order. However, Jeff Law recently indicated on gcc-patches that this is
not correct. There are various degrees of out-of-order execution all the
way back to the PA7xxx processor series (hit-under-miss). The PA8xxx
series has full out-of-order execution for both integer operations, and
loads and stores.
This is described in the following article:
http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml
For this reason, we need to define mb() and to insert a memory barrier
before the store unlocking spinlocks. This ensures that all memory
accesses are complete prior to unlocking. The ldcw instruction performs
the same function on entry.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
Enable the -mlong-calls compiler option by default, because otherwise in most
cases linking the vmlinux binary fails due to truncations of R_PARISC_PCREL22F
relocations. This fixes building the 64-bit defconfig.
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Helge Deller <deller@gmx.de>
In nlm_fmn_send() we have a loop which attempts to send a message
multiple times in order to handle the transient failure condition of a
lack of available credit. When examining the status register to detect
the failure we check for a condition that can never be true, which falls
foul of gcc 8's -Wtautological-compare:
In file included from arch/mips/netlogic/common/irq.c:65:
./arch/mips/include/asm/netlogic/xlr/fmn.h: In function 'nlm_fmn_send':
./arch/mips/include/asm/netlogic/xlr/fmn.h:304:22: error: bitwise
comparison always evaluates to false [-Werror=tautological-compare]
if ((status & 0x2) == 1)
^~
If the path taken if this condition were true all we do is print a
message to the kernel console. Since failures seem somewhat expected
here (making the console message questionable anyway) and the condition
has clearly never evaluated true we simply remove it, rather than
attempting to fix it to check status correctly.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20174/
Cc: Ganesan Ramalingam <ganesanr@broadcom.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jayachandran C <jnair@caviumnetworks.com>
Cc: John Crispin <john@phrozen.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Return statements in functions returning bool should use true or false
instead of an integer value. This code was detected with the help of
Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
set_memory_np() is used to mark kernel mappings not present, but it has
it's own open coded mechanism which does not have the L1TF protection of
inverting the address bits.
Replace the open coded PTE manipulation with the L1TF protecting low level
PTE routines.
Passes the CPA self test.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some cases in THP like:
- MADV_FREE
- mprotect
- split
mark the PMD non present for temporarily to prevent races. The window for
an L1TF attack in these contexts is very small, but it wants to be fixed
for correctness sake.
Use the proper low level functions for pmd/pud_mknotpresent() to address
this.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For kernel mappings PAGE_PROTNONE is not necessarily set for a non present
mapping, but the inversion logic explicitely checks for !PRESENT and
PROT_NONE.
Remove the PROT_NONE check and make the inversion unconditional for all not
present mappings.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When building the VDSO with clang it appears to invoke ld without
specifying endianness, even though clang itself was provided with a -EB
or -EL flag. This results in the build failing due to a mismatch between
the objects that are the input to ld, and the output it is attempting to
create:
VDSO arch/mips/vdso/vdso.so.dbg.raw
mips-linux-ld: arch/mips/vdso/elf.o: compiled for a big endian system
and target is little endian
mips-linux-ld: arch/mips/vdso/elf.o: endianness incompatible with that
of the selected emulation
mips-linux-ld: failed to merge target specific data of file
arch/mips/vdso/elf.o
...
Work around this problem by explicitly specifying the link endianness
using -Wl,-EB or -Wl,-EL when -EB or -EL are part of KBUILD_CFLAGS. This
resolves the build failure when using clang, and doesn't have any
negative effect on gcc.
Signed-off-by: Paul Burton <paul.burton@mips.com>
When building using clang, always specify -EB or -EL in order to ensure
we target the desired endianness.
Since clang cross compiles using a single compiler build with multiple
targets, our -dumpmachine tests which don't specify clang's --target
argument check output based upon the build machine rather than the
machine our build will target. This means our detection of whether to
specify -EB fails miserably & we never do. Providing the endianness flag
unconditionally for clang resolves this issue & simplifies the clang
path somewhat.
Signed-off-by: Paul Burton <paul.burton@mips.com>
On 32 bit the kernel sections are not huge-page aligned. When we clone
them on PMD-level we unevitably map some areas that are normal kernel
memory and may contain secrets to user-space. To prevent that we need to
clone the kernel-image on PTE-level for 32 bit.
Also make the page-table cloning code more general so that it can handle
PMD and PTE level cloning. This can be generalized further in the future to
also handle clones on the P4D-level.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533637471-30953-4-git-send-email-joro@8bytes.org
The function sets the global-bit on cloned PMD entries, which only makes
sense when the permissions are identical between the user and the kernel
page-table. Further, only write-permissions are cleared for entry-text and
kernel-text sections, which are not writeable at the end of the boot
process.
The reason why this RW clearing exists is that in the early PTI
implementations the cloned kernel areas were set up during early boot
before the kernel text is set to read only and not touched afterwards.
This is not longer true. The cloned areas are still set up early to get the
entry code working for interrupts and other things, but after the kernel
text has been set RO the clone is repeated which copies the RO PMD/PTEs
over to the user visible clone. That means the initial clearing of the
writable bit can be avoided.
[ tglx: Amended changelog ]
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1533637471-30953-3-git-send-email-joro@8bytes.org
Nadav reported that on guests we're failing to rewrite the indirect
calls to CALLEE_SAVE paravirt functions. In particular the
pv_queued_spin_unlock() call is left unpatched and that is all over the
place. This obviously wrecks Spectre-v2 mitigation (for paravirt
guests) which relies on not actually having indirect calls around.
The reason is an incorrect clobber test in paravirt_patch_call(); this
function rewrites an indirect call with a direct call to the _SAME_
function, there is no possible way the clobbers can be different
because of this.
Therefore remove this clobber check. Also put WARNs on the other patch
failure case (not enough room for the instruction) which I've not seen
trigger in my (limited) testing.
Three live kernel image disassemblies for lock_sock_nested (as a small
function that illustrates the problem nicely). PRE is the current
situation for guests, POST is with this patch applied and NATIVE is with
or without the patch for !guests.
PRE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq *0xffffffff822299e8
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
POST:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: callq 0xffffffff810a0c20 <__raw_callee_save___pv_queued_spin_unlock>
0xffffffff817be9a5 <+53>: xchg %ax,%ax
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063aa0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
NATIVE:
(gdb) disassemble lock_sock_nested
Dump of assembler code for function lock_sock_nested:
0xffffffff817be970 <+0>: push %rbp
0xffffffff817be971 <+1>: mov %rdi,%rbp
0xffffffff817be974 <+4>: push %rbx
0xffffffff817be975 <+5>: lea 0x88(%rbp),%rbx
0xffffffff817be97c <+12>: callq 0xffffffff819f7160 <_cond_resched>
0xffffffff817be981 <+17>: mov %rbx,%rdi
0xffffffff817be984 <+20>: callq 0xffffffff819fbb00 <_raw_spin_lock_bh>
0xffffffff817be989 <+25>: mov 0x8c(%rbp),%eax
0xffffffff817be98f <+31>: test %eax,%eax
0xffffffff817be991 <+33>: jne 0xffffffff817be9ba <lock_sock_nested+74>
0xffffffff817be993 <+35>: movl $0x1,0x8c(%rbp)
0xffffffff817be99d <+45>: mov %rbx,%rdi
0xffffffff817be9a0 <+48>: movb $0x0,(%rdi)
0xffffffff817be9a3 <+51>: nopl 0x0(%rax)
0xffffffff817be9a7 <+55>: pop %rbx
0xffffffff817be9a8 <+56>: pop %rbp
0xffffffff817be9a9 <+57>: mov $0x200,%esi
0xffffffff817be9ae <+62>: mov $0xffffffff817be993,%rdi
0xffffffff817be9b5 <+69>: jmpq 0xffffffff81063ae0 <__local_bh_enable_ip>
0xffffffff817be9ba <+74>: mov %rbp,%rdi
0xffffffff817be9bd <+77>: callq 0xffffffff817be8c0 <__lock_sock>
0xffffffff817be9c2 <+82>: jmp 0xffffffff817be993 <lock_sock_nested+35>
End of assembler dump.
Fixes: 63f70270cc ("[PATCH] i386: PARAVIRT: add common patching machinery")
Fixes: 3010a0663f ("x86/paravirt, objtool: Annotate indirect calls")
Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: stable@vger.kernel.org
The code in __write_64bit_c0_split() is used by MIPS32 kernels running
on MIPS64 CPUs to write a 64-bit value to a 64-bit coprocessor 0
register using a single 64-bit dmtc0 instruction. It does this by
combining the 2x 32-bit registers used to hold the 64-bit value into a
single register, which in the existing code involves three steps:
1) Zero extend register A which holds bits 31:0 of our data, since it
may have previously held a sign-extended value.
2) Shift register B which holds bits 63:32 of our data in bits 31:0
left by 32 bits, such that the bits forming our data are in the
position they'll be in the final 64-bit value & bits 31:0 of the
register are zero.
3) Or the two registers together to form the 64-bit value in one
64-bit register.
From MIPS r2 onwards we have a dins instruction which can effectively
perform all 3 of those steps using a single instruction.
Add a path for MIPS r2 & beyond which uses dins to take bits 31:0 from
register B & insert them into bits 63:32 of register A, giving us our
full 64-bit value in register A with one instruction.
Since we know that MIPS r2 & above support the sel field for the dmtc0
instruction, we don't bother special casing sel==0. Omiting the sel
field would assemble to exactly the same instruction as when we
explicitly specify that it equals zero.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Commit c22c804310 ("MIPS: Fix input modify in
__write_64bit_c0_split()") modified __write_64bit_c0_split() constraints
such that we have both an input & an output which we hope to assign to
the same registers, and modify the output rather than incorrectly
clobbering an input.
The way in which we use both an output & an input parameter with the
input constrained to share the output registers is a little convoluted &
also problematic for clang, which complains if the input & output values
have different widths. For example:
In file included from kernel/fork.c:98:
./arch/mips/include/asm/mmu_context.h:149:19: error: unsupported
inline asm: input with type 'unsigned long' matching output with
type 'unsigned long long'
write_c0_entryhi(cpu_asid(cpu, next));
~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
./arch/mips/include/asm/mmu_context.h:93:2: note: expanded from macro
'cpu_asid'
(cpu_context((cpu), (mm)) & cpu_asid_mask(&cpu_data[cpu]))
^
./arch/mips/include/asm/mipsregs.h:1617:65: note: expanded from macro
'write_c0_entryhi'
#define write_c0_entryhi(val) __write_ulong_c0_register($10, 0, val)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
./arch/mips/include/asm/mipsregs.h:1430:39: note: expanded from macro
'__write_ulong_c0_register'
__write_64bit_c0_register(reg, sel, val); \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
./arch/mips/include/asm/mipsregs.h:1400:41: note: expanded from macro
'__write_64bit_c0_register'
__write_64bit_c0_split(register, sel, value); \
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
./arch/mips/include/asm/mipsregs.h:1498:13: note: expanded from macro
'__write_64bit_c0_split'
: "r,0" (val)); \
^~~
We can both fix this build failure & simplify the code somewhat by
assigning the __tmp variable with the input value in C prior to our
inline assembly, and then using a single read-write output operand (ie.
a constraint beginning with +) to provide this value to our assembly.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Using privcmd_call() for a singleton multicall seems to be wrong, as
privcmd_call() is using stac()/clac() to enable hypervisor access to
Linux user space.
Even if currently not a problem (pv domains can't use SMAP while HVM
and PVH domains can't use multicalls) things might change when
PVH dom0 support is added to the kernel.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
For (bad) historical reasons, OPAL used to create a non-standard pair
of properties "opal-interrupts" and "opal-interrupts-names" for
representing the list of interrupts it wants Linux to request on its
behalf.
Among other issues, the opal-interrupts doesn't have a way to carry
the type of interrupts, and they were assumed to be all level
sensitive.
This is wrong on some recent systems where some of them are edge
sensitive causing warnings in the XIVE code and possible misbehaviours
if they need to be retriggered (typically the NPU2 TCE error
interrupts).
This makes Linux switch to using the standard "interrupts" and
"interrupt-names" properties instead when they are available, using
standard of_irq helpers, which can carry all the desired type
information.
Newer versions of OPAL will generate those properties in addition to
the legacy ones.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup prefix logic to check strlen(r->name). Reinstate setting
of start = 0 in opal_event_shutdown() to avoid double free warnings]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
GCC supports -mcpu=e300c2 and -mcpu=e300c3
This patch gives the opportunity to tune kernel to one of
those two types.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch extends to PPC32 the capability to select the exact
CPU type.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
At the time being, when adding a new CPU for selection, both
Kconfig.cputype and Makefile have to be modified.
This patch moves into Kconfig.cputype the name of the CPU to me
passed to the -mcpu= argument.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Rename the option to TARGET_CPU to echo the gcc documentation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In Makefiles if we're testing a CONFIG_FOO symbol for equality with 'y'
we can instead just use ifdef. The latter reads easily, so convert to
it where possible.
Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com>
Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In __copy_tofrom_user, if we encounter an exception on a store, we
stop copying and return the number of bytes not copied. However,
if the store is wider than one byte and is to an unaligned address,
it is possible that the store operand overlaps a page boundary
and the exception occurred on the latter part of the store operand,
meaning that it would be possible to copy a few more bytes. Since
copy_to_user is generally expected to copy as much as possible,
it would be better to copy those extra few bytes. This adds code
to do that. Since this edge case is not performance-critical,
the code has been written to be compact rather than as fast as
possible.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hand-coded assembler 64-bit copy routines include feature sections
that select one code path or another depending on which CPU we are
executing on. The self-tests for these copy routines end up testing
just one path. This adds a mechanism for selecting any desired code
path at compile time, and makes 2 or 3 versions of each test, each
using a different code path, so as to cover all the possible paths.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Add -mcpu=power4 to CFLAGS for older compilers]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This aims to make the generation of exception table entries for the
loads and stores in __copy_tofrom_user_base clearer and easier to
verify. Instead of having a series of local labels on the loads and
stores, with a series of corresponding labels later for the exception
handlers, we now use macros to generate exception table entries at the
point of each load and store that could potentially trap. We do this
with the macros lex (load exception) and stex (store exception).
These macros are used right before the load or store to which they
apply.
Some complexity is introduced by the fact that we have some more work
to do after hitting an exception, because we need to calculate and
return the number of bytes not copied. The code uses r3 as the
current pointer into the destination buffer, that is, the address of
the first byte of the destination that has not been modified.
However, at various points in the copy loops, r3 can be 4, 8, 16 or 24
bytes behind that point.
To express this offset in an understandable way, we define a symbol
r3_offset which is updated at various points so that it equal to the
difference between the address of the first unmodified byte of the
destination and the value in r3. (In fact it only needs to be
accurate at the point of each lex or stex macro invocation.)
The rules for updating r3_offset are as follows:
* It starts out at 0
* An addi r3,r3,N instruction decreases r3_offset by N
* A store instruction (stb, sth, stw, std) to N(r3)
increases r3_offset by the width of the store (1, 2, 4, 8)
* A store with update instruction (stbu, sthu, stwu, stdu) to N(r3)
sets r3_offset to the width of the store.
There is some trickiness to the way that the lex and stex macros and
the associated exception handlers work. I would have liked to use
the current value of r3_offset in the name of the symbol used as
the exception handler, as in ".Lld_exc_$(r3_offset)" and then
have symbols .Lld_exc_0, .Lld_exc_8, .Lld_exc_16 etc. corresponding
to the offsets that needed to be added to r3. However, I couldn't
see a way to do that with gas.
Instead, the exception handler address is .Lld_exc - r3_offset or
.Lst_exc - r3_offset, that is, the distance ahead of .Lld_exc/.Lst_exc
that we start executing is equal to the amount that we need to add to
r3. This works because r3_offset is always a small multiple of 4,
and our instructions are 4 bytes long. This means that before
.Lld_exc and .Lst_exc, we have a sequence of instructions that
increments r3 by 4, 8, 16 or 24 depending on where we start. The
sequence increments r3 by 4 per instruction (on average).
We also replace the exception table for the 4k copy loop by a
macro per load or store. These loads and stores all use exactly
the same exception handler, which simply resets the argument registers
r3, r4 and r5 to there original values and re-does the whole copy
using the slower loop.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
for_each_node_by_name() iterators only exit normally when the loop
cursor is NULL, So there is no need to call of_node_put().
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
NX increments readOffset by FIFO size in receive FIFO control register
when CRB is read. But the index in RxFIFO has to match with the
corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
may be processing incorrect CRBs and can cause CRB timeout.
VAS FIFO offset is 0 when the receive window is opened during
initialization. When the module is reloaded or in kexec boot, readOffset
in FIFO control register may not match with VAS entry. This patch adds
nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
control register for both high and normal FIFOs.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
[mpe: Fixup uninitialized variable warning]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Export opal_check_token symbol for modules to check the availability
of OPAL calls before using them.
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix build errors and warnings in t1042rdb_diu.c by adding header files
and MODULE_LICENSE().
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: data definition has no type or storage class
early_initcall(t1042rdb_diu_init);
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: error: type defaults to 'int' in declaration of 'early_initcall' [-Werror=implicit-int]
../arch/powerpc/platforms/85xx/t1042rdb_diu.c:152:1: warning: parameter names (without types) in function declaration
and
WARNING: modpost: missing MODULE_LICENSE() in arch/powerpc/platforms/85xx/t1042rdb_diu.o
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The page table fragment allocator uses the main page refcount racily
with respect to speculative references. A customer observed a BUG due
to page table page refcount underflow in the fragment allocator. This
can be caused by the fragment allocator set_page_count stomping on a
speculative reference, and then the speculative failure handler
decrements the new reference, and the underflow eventually pops when
the page tables are freed.
Fix this by using a dedicated field in the struct page for the page
table fragment allocator.
Fixes: 5c1f6ee9a3 ("powerpc: Reduce PTE table memory wastage")
Cc: stable@vger.kernel.org # v3.10+
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pasemi code still uses printk(KERN_ERR/KERN_WARN ... change these to
pr_err(, pr_warn(... to match other powerpc arch code.
No functional changes.
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
[mpe: Unsplit some strings while we're at it]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Call show_user_instructions() in arch/powerpc/kernel/traps.c to dump
instructions at faulty location, useful to debugging.
Before this patch, an unhandled signal message looked like:
pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000]
After this patch, it looks like:
pandafault[10524]: segfault (11) at 100007d0 nip 1000061c lr 7fffbd295100 code 2 in pandafault[10000000+10000]
pandafault[10524]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
pandafault[10524]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
show_user_instructions() is a slightly modified version of
show_instructions() that allows userspace instruction dump.
This will be useful within show_signal_msg() to dump userspace
instructions of the faulty location.
Here is a sample of what show_user_instructions() outputs:
pandafault[10850]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
pandafault[10850]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040
The current->comm and current->pid printed can serve as a glue that
links the instructions dump to its originator, allowing messages to be
interleaved in the logs.
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds VMA address in the message printed for unhandled signals,
similarly to what other architectures, like x86, print.
Before this patch, a page fault looked like:
pandafault[61470]: unhandled signal 11 at 100007d0 nip 1000061c lr 7fff8d185100 code 2
After this patch, a page fault looks like:
pandafault[6303]: segfault 11 at 100007d0 nip 1000061c lr 7fff93c55100 code 2 in pandafault[10000000+10000]
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use %lx format to print registers. This avoids having two different
formats and avoids checking for MSR_64BIT, improving readability of the
function.
Even though we could have used %px, which is functionally equivalent to %lx
as per Documentation/core-api/printk-formats.rst, it is not semantically
correct because the data printed are not pointers. And using %px requires
casting data to (void *).
Besides that, %lx matches the format used in show_regs().
Before this patch:
pandafault[4808]: unhandled signal 11 at 0000000010000718 nip 0000000010000574 lr 00007fff935e7a6c code 2
After this patch:
pandafault[4732]: unhandled signal 11 at 10000718 nip 10000574 lr 7fff86697a6c code 2
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Replace printk_ratelimited() by printk() and a default rate limit
burst to limit displaying unhandled signals messages.
This will allow us to call print_vma_addr() in a future patch, which
does not work with printk_ratelimited().
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Isolate the logic of printing unhandled signals out of _exception_pkey().
No functional change, only code rearrangement.
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Because rfi_flush_fallback runs immediately before the return to
userspace it currently runs with the user r1 (stack pointer). This
means if we oops in there we will report a bad kernel stack pointer in
the exception entry path, eg:
Bad kernel stack pointer 7ffff7150e40 at c0000000000023b4
Oops: Bad kernel stack pointer, sig: 6 [#1]
LE SMP NR_CPUS=32 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1246 Comm: klogd Not tainted 4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3 #7
NIP: c0000000000023b4 LR: 0000000010053e00 CTR: 0000000000000040
REGS: c0000000fffe7d40 TRAP: 4100 Not tainted (4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3)
MSR: 9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE> CR: 44000442 XER: 20000000
CFAR: c00000000000bac8 IRQMASK: c0000000f1e66a80
GPR00: 0000000002000000 00007ffff7150e40 00007fff93a99900 0000000000000020
...
NIP [c0000000000023b4] rfi_flush_fallback+0x34/0x80
LR [0000000010053e00] 0x10053e00
Although the NIP tells us where we were, and the TRAP number tells us
what happened, it would still be nicer if we could report the actual
exception rather than barfing about the stack pointer.
We an do that fairly simply by loading the kernel stack pointer on
entry and restoring the user value before returning. That way we see a
regular oops such as:
Unrecoverable exception 4100 at c00000000000239c
Oops: Unrecoverable exception, sig: 6 [#1]
LE SMP NR_CPUS=32 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 1251 Comm: klogd Not tainted 4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty #40
NIP: c00000000000239c LR: 0000000010053e00 CTR: 0000000000000040
REGS: c0000000f1e17bb0 TRAP: 4100 Not tainted (4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty)
MSR: 9000000002803031 <SF,HV,VEC,VSX,FP,ME,IR,DR,LE> CR: 44000442 XER: 20000000
CFAR: c00000000000bac8 IRQMASK: 0
...
NIP [c00000000000239c] rfi_flush_fallback+0x3c/0x80
LR [0000000010053e00] 0x10053e00
Call Trace:
[c0000000f1e17e30] [c00000000000b9e4] system_call+0x5c/0x70 (unreliable)
Note this shouldn't make the kernel stack pointer vulnerable to a
meltdown attack, because it should be flushed from the cache before we
return to userspace. The user r1 value will be in the cache, because
we load it in the return path, but that is harmless.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Look for fw-features properties to determine the appropriate settings
for the count cache flush, and then call the generic powerpc code to
set it up based on the security feature flags.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use the existing hypercall to determine the appropriate settings for
the count cache flush, and then call the generic powerpc code to set
it up based on the security feature flags.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some CPU revisions support a mode where the count cache needs to be
flushed by software on context switch. Additionally some revisions may
have a hardware accelerated flush, in which case the software flush
sequence can be shortened.
If we detect the appropriate flag from firmware we patch a branch
into _switch() which takes us to a count cache flush sequence.
That sequence in turn may be patched to return early if we detect that
the CPU supports accelerating the flush sequence in hardware.
Add debugfs support for reporting the state of the flush, as well as
runtime disabling it.
And modify the spectre_v2 sysfs file to report the state of the
software flush.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add security feature flags to indicate the need for software to flush
the count cache on context switch, and for the presence of a hardware
assisted count cache flush.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add a macro and some helper C functions for patching single asm
instructions.
The gas macro means we can do something like:
1: nop
patch_site 1b, patch__foo
Which is less visually distracting than defining a GLOBAL symbol at 1,
and also doesn't pollute the symbol table which can confuse eg. perf.
These are obviously similar to our existing feature sections, but are
not automatically patched based on CPU/MMU features, rather they are
designed to be manually patched by C code at some arbitrary point.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Used barrier_nospec to sanitize the syscall table.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Implement the barrier_nospec as a isync;sync instruction sequence.
The implementation uses the infrastructure built for BOOK3S 64.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In a subsequent patch we will enable building security.c for Book3E.
However the NXP platforms are not vulnerable to Meltdown, so make the
Meltdown vulnerability reporting PPC_BOOK3S_64 specific.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we require platform code to call setup_barrier_nospec(). But
if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
then we can call it in setup_arch().
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add a config symbol to encode which platforms support the
barrier_nospec speculation barrier. Currently this is just Book3S 64
but we will add Book3E in a future patch.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
NXP Book3E platforms are not vulnerable to speculative store
bypass, so make the mitigations PPC_BOOK3S_64 specific.
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that '%asm-generic' is added to no-dot-config-targets,
'make asm-generic' does not include the kernel configuration.
You can simply do 'make asm-generic' in the recursed top Makefile
without bothering syncconfig.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Richard Weinberger <richard@nod.at>
Randy Dunlap reports UML occasionally fails to build with -j<N> and
O=<builddir> options.
make[1]: Entering directory '/home/rdunlap/mmotm-2018-0802-1529/UM64'
UPD include/generated/uapi/linux/version.h
WRAP arch/x86/include/generated/asm/dma-contiguous.h
WRAP arch/x86/include/generated/asm/export.h
WRAP arch/x86/include/generated/asm/early_ioremap.h
WRAP arch/x86/include/generated/asm/mcs_spinlock.h
WRAP arch/x86/include/generated/asm/mm-arch-hooks.h
WRAP arch/x86/include/generated/uapi/asm/bpf_perf_event.h
WRAP arch/x86/include/generated/uapi/asm/poll.h
GEN ./Makefile
make[2]: *** No rule to make target 'archheaders'. Stop.
arch/um/Makefile:119: recipe for target 'archheaders' failed
make[1]: *** [archheaders] Error 2
make[1]: *** Waiting for unfinished jobs....
UPD include/config/kernel.release
make[1]: *** wait: No child processes. Stop.
Makefile:146: recipe for target 'sub-make' failed
make: *** [sub-make] Error 2
The cause of the problem is the use of '$(MAKE) KBUILD_SRC=',
which recurses to the top Makefile via the $(objtree)/Makefile
generated by scripts/mkmakefile.
When you run "make -j<N> O=<builddir> ARCH=um", Make can execute
'archheaders' and 'outputmakefile' targets simultaneously because
there is no dependency between them.
If it happens,
$(Q)$(MAKE) KBUILD_SRC= ARCH=$(HEADER_ARCH) archheaders
... tries to run $(objtree)/Makefile that is being updated.
The correct way for the recursion is
$(Q)$(MAKE) -f $(srctree)/Makefile ARCH=$(HEADER_ARCH) archheaders
..., which does not rely on the generated Makefile.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Richard Weinberger <richard@nod.at>
The speculation barrier can be disabled from the command line
with the parameter: "nospectre_v1".
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We only need to use __MASKABLE_EXCEPTION in one of the four cases for
hardware interrupt, so use the helper macros in the other cases.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We pass the "loc" (location) parameter to MASKABLE_EXCEPTION and
friends, but it's not used, so drop it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
_MASKABLE_RELON_EXCEPTION_PSERIES() does nothing useful, update all
callers to use __MASKABLE_RELON_EXCEPTION_PSERIES() directly.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
_MASKABLE_EXCEPTION_PSERIES() does nothing useful, update all callers
to use __MASKABLE_EXCEPTION_PSERIES() directly.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The EXCEPTION_RELON_PROLOG_PSERIES_1() macro does the same job as
EXCEPTION_PROLOG_2 (which we just recently created), except for
"RELON" (relocation on) exceptions.
So rename it as such.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As with the other patches in this series, we are removing the
"PSERIES" from the name as it's no longer meaningful.
In this case it's not simply a case of removing the "PSERIES" as that
would result in a clash with the existing EXCEPTION_PROLOG_1.
Instead we name this one EXCEPTION_PROLOG_2, as it's usually used in
sequence after 0 and 1.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The "PSERIES" in STD_EXCEPTION_PSERIES is to differentiate the macros
from the legacy iSeries versions, which are called
STD_EXCEPTION_ISERIES. It is not anything to do with pseries vs
powernv or powermac etc.
We removed the legacy iSeries code in 2012, in commit 8ee3e0d69623x
("powerpc: Remove the main legacy iSerie platform code").
So remove "PSERIES" from the macros.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
EXCEPTION_RELON_PROLOG_PSERIES() only has two users,
STD_RELON_EXCEPTION_PSERIES() and STD_RELON_EXCEPTION_HV() both of
which "call" SET_SCRATCH0(), so just move SET_SCRATCH0() into
EXCEPTION_RELON_PROLOG_PSERIES().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
EXCEPTION_PROLOG_PSERIES() only has two users, STD_EXCEPTION_PSERIES()
and STD_EXCEPTION_HV() both of which "call" SET_SCRATCH0(), so just
move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES().
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pasemi arch code finds the root of the PCI-e bus by searching the
device-tree for a node called 'pxp'. But the root bus has a compatible
property of 'pasemi,rootbus' so search for that instead.
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The generic implementation of strlen() reads strings byte per byte.
This patch implements strlen() in assembly based on a read of entire
words, in the same spirit as what some other arches and glibc do.
On a 8xx the time spent in strlen is reduced by 3/4 for long strings.
strlen() selftest on an 8xx provides the following values:
Before the patch (ie with the generic strlen() in lib/string.c):
len 256 : time = 1.195055
len 016 : time = 0.083745
len 008 : time = 0.046828
len 004 : time = 0.028390
After the patch:
len 256 : time = 0.272185 ==> 78% improvment
len 016 : time = 0.040632 ==> 51% improvment
len 008 : time = 0.033060 ==> 29% improvment
len 004 : time = 0.029149 ==> 2% degradation
On a 832x:
Before the patch:
len 256 : time = 0.236125
len 016 : time = 0.018136
len 008 : time = 0.011000
len 004 : time = 0.007229
After the patch:
len 256 : time = 0.094950 ==> 60% improvment
len 016 : time = 0.013357 ==> 26% improvment
len 008 : time = 0.010586 ==> 4% improvment
len 004 : time = 0.008784
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from
vmalloc (non-linear mapping) area.
On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out,
pSeries_log_error() also takes a spin_lock while logging error which
is not safe in NMI context. It may endup in deadlock if we get another
MCE before releasing the lock. Fix this by deferring the logging of
rtas error to irq work queue.
Current implementation uses two different buffers to hold rtas error
log depending on whether extended log is provided or not. This makes
bit difficult to identify which buffer has valid data that needs to
logged later in irq work. Simplify this using single buffer, one per
paca, and copy rtas log to it irrespective of whether extended log is
provided or not. Allocate this buffer below RMA region so that it can
be accessed in real mode mce handler.
Fixes: b96672dd84 ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org # v4.14+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The global mce data buffer that used to copy rtas error log is of 2048
(RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read
extended_log_length from rtas error log header, then use max of
extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied.
Ideally the platform (phyp) will never send extended error log with
size > 2048. But if that happens, then we have a risk of buffer overrun
and corruption. Fix this by using min_t instead.
Fixes: d368514c30 ("powerpc: Fix corruption when grabbing FWNMI data")
Reported-by: Michal Suchanek <msuchanek@suse.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's identical to xive_teardown_cpu() so just use the latter
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Those overly verbose statement in the setup of the pool VP
aren't particularly useful (esp. considering we don't actually
use the pool, we configure it bcs HW requires it only). So
remove them which improves the code readability.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The kernel page table caches are tied to init_mm, so there is no
more need for them after userspace is finished.
destroy_context() gets called when we drop the last reference for an
mm, which can be much later than the task exit due to other lazy mm
references to it. We can free the page table cache pages on task exit
because they only cache the userspace page tables and kernel threads
should not access user space addresses.
The mapping for kernel threads itself is maintained in init_mm and
page table cache for that is attached to init_mm.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Merge change log additions from Aneesh]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When the mm is being torn down there will be a full PID flush so
there is no need to flush the TLB on page size changes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This makes it easy to run checkpatch with settings that I like.
Usage is eg:
$ ./arch/powerpc/tools/checkpatch.sh -g origin/master..
To check all commits since origin/master.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Russell Currey <ruscur@russell.cc>
We recently added a warning in arch_local_irq_restore() to check that
the soft masking state matches reality.
Unfortunately it trips in a few places, which are not entirely trivial
to fix. The key problem is if we're doing function_graph tracing of
restore_math(), the warning pops and then seems to recurse. It's not
entirely clear because the system continuously oopses on all CPUs,
with the output interleaved and unreadable.
It's also been observed on a G5 coming out of idle.
Until we can fix those cases disable the warning for now.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For machines without the exrl instruction the BFP jit generates
code that uses an "br %r1" instruction located in the lowcore page.
Unfortunately there is a cut & paste error that puts an additional
"larl %r1,.+14" instruction in the code that clobbers the branch
target address in %r1. Remove the larl instruction.
Cc: <stable@vger.kernel.org> # v4.17+
Fixes: de5cb6eb51 ("s390: use expoline thunks in the BPF JIT")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The memove, memset, memcpy, __memset16, __memset32 and __memset64
function have an additional indirect return branch in form of a
"bzr" instruction. These need to use expolines as well.
Cc: <stable@vger.kernel.org> # v4.17+
Fixes: 97489e0663 ("s390/lib: use expoline for indirect branches")
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.
Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.
SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.
Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Enhance the GHASH implementation that uses 64-bit polynomial
multiplication by adding support for 4-way aggregation. This
more than doubles the performance, from 2.4 cycles per byte
to 1.1 cpb on Cortex-A53.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Checking the TIF_NEED_RESCHED flag is disproportionately costly on cores
with fast crypto instructions and comparatively slow memory accesses.
On algorithms such as GHASH, which executes at ~1 cycle per byte on
cores that implement support for 64 bit polynomial multiplication,
there is really no need to check the TIF_NEED_RESCHED particularly
often, and so we can remove the NEON yield check from the assembler
routines.
However, unlike the AEAD or skcipher APIs, the shash/ahash APIs take
arbitrary input lengths, and so there needs to be some sanity check
to ensure that we don't hog the CPU for excessive amounts of time.
So let's simply cap the maximum input size that is processed in one go
to 64 KB.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
It turns out I had misunderstood how the x86_match_cpu() function works.
It evaluates a logical OR of the matching conditions, not logical AND.
This caused the CPU feature checks for AEGIS to pass even if only SSE2
(but not AES-NI) was supported (or vice versa), leading to potential
crashes if something tried to use the registered algs.
This patch switches the checks to a simpler method that is used e.g. in
the Camellia x86 code.
The patch also removes the MODULE_DEVICE_TABLE declarations which
actually seem to cause the modules to be auto-loaded at boot, which is
not desired. The crypto API on-demand module loading is sufficient.
Fixes: 1d373d4e8e ("crypto: x86 - Add optimized AEGIS implementations")
Fixes: 6ecc9d9ff9 ("crypto: x86 - Add optimized MORUS implementations")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Squeeze out another 5% of performance by minimizing the number
of invocations of kernel_neon_begin()/kernel_neon_end() on the
common path, which also allows some reloads of the key schedule
to be optimized away.
The resulting code runs at 2.3 cycles per byte on a Cortex-A53.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement a faster version of the GHASH transform which amortizes
the reduction modulo the characteristic polynomial across two
input blocks at a time.
On a Cortex-A53, the gcm(aes) performance increases 24%, from
3.0 cycles per byte to 2.4 cpb for large input sizes.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Update the core AES/GCM transform and the associated plumbing to operate
on 2 AES/GHASH blocks at a time. By itself, this is not expected to
result in a noticeable speedup, but it paves the way for reimplementing
the GHASH component using 2-way aggregation.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
As it turns out, checking the TIF_NEED_RESCHED flag after each
iteration results in a significant performance regression (~10%)
when running fast algorithms (i.e., ones that use special instructions
and operate in the < 4 cycles per byte range) on in-order cores with
comparatively slow memory accesses such as the Cortex-A53.
Given the speed of these ciphers, and the fact that the page based
nature of the AEAD scatterwalk API guarantees that the core NEON
transform is never invoked with more than a single page's worth of
input, we can estimate the worst case duration of any resulting
scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages,
processing a page's worth of input at 4 cycles per byte results in
a delay of ~250 us, which is a reasonable upper bound.
So let's remove the yield checks from the fused AES-CCM and AES-GCM
routines entirely.
This reverts commit 7b67ae4d5c and
partially reverts commit 7c50136a8a.
Fixes: 7c50136a8a ("crypto: arm64/aes-ghash - yield NEON after every ...")
Fixes: 7b67ae4d5c ("crypto: arm64/aes-ccm - yield NEON after every ...")
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Passing an array (swapper_pg_dir) as the argument to write_c0_kpgd() in
setup_pw() will become problematic if we modify __write_64bit_c0_split()
to cast its val argument to unsigned long long, because for 32-bit
kernel builds the size of a pointer will differ from the size of an
unsigned long long. This would fall foul of gcc's pointer-to-int-cast
diagnostic.
Cast the value to a long, which should be the same width as the pointer
that we ultimately want & will be sign extended if required to the
unsigned long long that __write_64bit_c0_split() ultimately needs.
Signed-off-by: Paul Burton <paul.burton@mips.com>
The MIPS VDSO code filters out a subset of known-good flags from
KBUILD_CFLAGS to use when building VDSO libraries. When we build using
clang we need to allow the --target flag through, otherwise we'll
generally attempt to build the VDSO for the architecture of the build
machine rather than for MIPS.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20154/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Our genvdso tool performs some rather paranoid checking that the VDSO
library isn't attempting to make use of a GOT by constraining the number
of entries that the GOT is allowed to contain to the minimum 2 entries
that are always generated by binutils.
Unfortunately lld prior to revision 334390 generates a third entry,
which is unused & thus harmless but falls foul of genvdso's checks &
causes the build to fail.
Since we already check that the VDSO contains no relocations it seems
reasonable to presume that it also doesn't contain use of a GOT, which
would involve relocations. Thus rather than attempting to work around
this issue by allowing 3 GOT entries when using lld, simply remove the
GOT checks which seem overly paranoid.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20152/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Commit d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
adjustment corruption") has moved the query and calculation of the
x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
from the get_cpu_cap function to a new function named
get_cpu_address_sizes.
One of the call sites related to Xen PV VMs was unfortunately missed
in the aforementioned commit. This prevents successful boot-up of
kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
is enabled, due to the following code path:
enlighten_pv.c::xen_start_kernel
mmu_pv.c::xen_reserve_special_pages
page.h::__pa
physaddr.c::__phys_addr
physaddr.h::phys_addr_valid
phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
addresses. boot_cpu_data.x86_phys_bits is no longer populated before
the call to xen_reserve_special_pages due to the aforementioned commit
though, so the validation performed by phys_addr_valid fails, which
causes __phys_addr to trigger a BUG, preventing boot-up.
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org # for v4.17 and up
Fixes: d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The kernel image is mapped into two places in the virtual address space
(addresses without KASLR, of course):
1. The kernel direct map (0xffff880000000000)
2. The "high kernel map" (0xffffffff81000000)
We actually execute out of #2. If we get the address of a kernel symbol,
it points to #2, but almost all physical-to-virtual translations point to
Parts of the "high kernel map" alias are mapped in the userspace page
tables with the Global bit for performance reasons. The parts that we map
to userspace do not (er, should not) have secrets. When PTI is enabled then
the global bit is usually not set in the high mapping and just used to
compensate for poor performance on systems which lack PCID.
This is fine, except that some areas in the kernel image that are adjacent
to the non-secret-containing areas are unused holes. We free these holes
back into the normal page allocator and reuse them as normal kernel memory.
The memory will, of course, get *used* via the normal map, but the alias
mapping is kept.
This otherwise unused alias mapping of the holes will, by default keep the
Global bit, be mapped out to userspace, and be vulnerable to Meltdown.
Remove the alias mapping of these pages entirely. This is likely to
fracture the 2M page mapping the kernel image near these areas, but this
should affect a minority of the area.
The pageattr code changes *all* aliases mapping the physical pages that it
operates on (by default). We only want to modify a single alias, so we
need to tweak its behavior.
This unmapping behavior is currently dependent on PTI being in place.
Going forward, we should at least consider doing this for all
configurations. Having an extra read-write alias for memory is not exactly
ideal for debugging things like random memory corruption and this does
undercut features like DEBUG_PAGEALLOC or future work like eXclusive Page
Frame Ownership (XPFO).
Before this patch:
current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000 2M RW NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000 4M RW PSE NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000 462M pmd
current_user:---[ High Kernel Mapping ]---
current_user-0xffffffff80000000-0xffffffff81000000 16M pmd
current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_user-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte
current_user-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd
current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd
After this patch:
current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K pte
current_kernel-0xffffffff82000000-0xffffffff82400000 4M ro PSE GLB NX pmd
current_kernel-0xffffffff82400000-0xffffffff82488000 544K ro NX pte
current_kernel-0xffffffff82488000-0xffffffff82600000 1504K pte
current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82c0d000 52K RW NX pte
current_kernel-0xffffffff82c0d000-0xffffffff82dc0000 1740K pte
current_user:---[ High Kernel Mapping ]---
current_user-0xffffffff80000000-0xffffffff81000000 16M pmd
current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_user-0xffffffff81e11000-0xffffffff82000000 1980K pte
current_user-0xffffffff82000000-0xffffffff82400000 4M ro PSE GLB NX pmd
current_user-0xffffffff82400000-0xffffffff82488000 544K ro NX pte
current_user-0xffffffff82488000-0xffffffff82600000 1504K pte
current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd
[ tglx: Do not unmap on 32bit as there is only one mapping ]
Fixes: 0f561fce4d ("x86/pti: Enable global pages for shared areas")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20180802225831.5F6A2BFC@viggo.jf.intel.com
As there is precious little left in any DTS files referring to the
node "/chosen@0" as opposed to "/chosen", remove the two checks for
the former node name.
[paul.burton@mips.com:
The modified yamon-dt code only operates on
arch/mips/boot/dts/mti/sead3.dts right now, and that uses chosen
rather than chosen@0 anyway, so this should have no behavioural
effect.]
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20131/
Cc: linux-mips@linux-mips.org
Remove open-coded uses of set instructions to use CC_SET()/CC_OUT() in
arch/x86/kvm/vmx.c.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
[Mark error paths as unlikely while touching this. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implement paravirtual apic hooks to enable PV IPIs for KVM if the "send IPI"
hypercall is available. The hypercall lets a guest send IPIs, with
at most 128 destinations per hypercall in 64-bit mode and 64 vCPUs per
hypercall in 32-bit mode.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add kvm hypervisor init time platform setup callback which
will be used to replace native apic hooks by pararvirtual
hooks.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
X86_CR4_OSXSAVE check belongs to sregs check and so move into
kvm_valid_sregs().
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Considering the fact that the pae_root shadow is not needed when
tdp is in use, skip the pae_root shadow page allocation to allow
mmu creation even not being able to obtain memory from DMA32
zone when particular cgroup cpuset.mems or mempolicy control is
applied.
Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
mmu_set_spte() flushes remote tlbs for drop_parent_pte/drop_spte()
and set_spte() separately. This may introduce redundant flush. This
patch is to combine these flushes and check flush request after
calling set_spte().
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Reviewed-by: Junaid Shahid <junaids@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The host's FS.base and GS.base rarely change, e.g. ~0.1% of host/guest
swaps on my system. Cache the last value written to the VMCS and skip
the VMWRITE to the associated VMCS fields when loading host state if
the value hasn't changed since the last VMWRITE.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On a 64-bit host, FS.sel and GS.sel are all but guaranteed to be 0,
which in turn means they'll rarely change. Skip the VMWRITE for the
associated VMCS fields when loading host state if the selector hasn't
changed since the last VMWRITE.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The HOST_{FS,GS}_BASE fields are guaranteed to be written prior to
VMENTER, by way of vmx_prepare_switch_to_guest(). Initialize the
fields to zero for 64-bit kernels instead of pulling the base values
from their respective MSRs. In addition to eliminating two RDMSRs,
vmx_prepare_switch_to_guest() can safely assume the initial value of
the fields is zero in all cases.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Make host_state a property of a loaded_vmcs so that it can be
used as a cache of the VMCS fields, e.g. to lazily VMWRITE the
corresponding VMCS field. Treating host_state as a cache does
not work if it's not VMCS specific as the cache would become
incoherent when switching between vmcs01 and vmcs02.
Move vmcs_host_cr3 and vmcs_host_cr4 into host_state.
Explicitly zero out host_state when allocating a new VMCS for a
loaded_vmcs. Unlike the pre-existing vmcs_host_cr{3,4} usage,
the segment information is not guaranteed to be (re)initialized
when running a new nested VMCS, e.g. HOST_FS_BASE is not written
in vmx_set_constant_host_state().
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove fs_reload_needed and gs_ldt_reload_needed from host_state
and instead compute whether we need to reload various state at
the time we actually do the reload. The state that is tracked
by the *_reload_needed variables is not any more volatile than
the trackers themselves.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
prepare_vmcs02() has an odd comment that says certain fields are
"not in vmcs02". AFAICT the intent of the comment is to document
that various VMCS fields are not handled by prepare_vmcs02(),
e.g. HOST_{FS,GS}_{BASE,SELECTOR}. While technically true, the
comment is misleading, e.g. it can lead the reader to think that
KVM never writes those fields to vmcs02.
Remove the comment altogether as the handling of FS and GS is
not specific to nested VMX, and GUEST_PML_INDEX has been written
by prepare_vmcs02() since commit "4e59516a12a6 (kvm: vmx: ensure
VMCS is current while enabling PML)"
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that the vmx_load_host_state() wrapper is gone, i.e. the only
time we call the core functions is when we're actually about to
switch between guest/host, rename the functions that handle lazy
state switching to vmx_prepare_switch_to_{guest,host}_state() to
better document the full extent of their functionality.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When lazy save/restore of MSR_KERNEL_GS_BASE was introduced[1], the
MSR was intercepted in all modes and was only restored for the host
when the guest is in 64-bit mode. So at the time, going through the
full host restore prior to accessing MSR_KERNEL_GS_BASE was necessary
to load host state and was not a significant waste of cycles.
Later, MSR_KERNEL_GS_BASE interception was disabled for a 64-bit
guest[2], and then unconditionally saved/restored for the host[3].
As a result, loading full host state is overkill for accesses to
MSR_KERNEL_GS_BASE, and completely unnecessary when the guest is
not in 64-bit mode.
Add a dedicated utility to read/write the guest's MSR_KERNEL_GS_BASE
(outside of the save/restore flow) to minimize the overhead incurred
when accessing the MSR. When setting EFER, only decache the MSR if
the new EFER will disable long mode.
Removing out-of-band usage of vmx_load_host_state() also eliminates,
or at least reduces, potential corner cases in its usage, which in
turn will (hopefuly) make it easier to reason about future changes
to the save/restore flow, e.g. optimization of saving host state.
[1] commit 44ea2b1758 ("KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx
autoload msr area")
[2] commit 5897297bc2 ("KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE")
[3] commit c8770e7ba6 ("KVM: VMX: Fix host userspace gsbase corruption")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using 'struct loaded_vmcs*' to track whether the CPU registers
contain host or guest state kills two birds with one stone.
1. The (effective) boolean host_state.loaded is poorly named.
It does not track whether or not host state is loaded into
the CPU registers (which most readers would expect), but
rather tracks if host state has been saved AND guest state
is loaded.
2. Using a loaded_vmcs pointer provides a more robust framework
for the optimized guest/host state switching, especially when
consideration per-VMCS enhancements. To that end, WARN_ONCE
if we try to switch to host state with a different VMCS than
was last used to save host state.
Resolve an occurrence of the new WARN by setting loaded_vmcs after
the call to vmx_vcpu_put() in vmx_switch_vmcs().
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use local variables in vmx_save_host_state() to temporarily track
the selector and base values for FS and GS, and reorganize the
code so that the 64-bit vs 32-bit portions are contained within
a single #ifdef. This refactoring paves the way for future patches
to modify the updating of VMCS state with minimal changes to the
code, and (hopefully) simplifies resolving a likely conflict with
another in-flight patch[1] by being the whipping boy for future
patches.
[1] https://www.spinics.net/lists/kvm/msg171647.html
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When checking emulated VMX instructions for faults, the #UD for "IF
(not in VMX operation)" should take precedence over the #GP for "ELSIF
CPL > 0."
Suggested-by: Eric Northup <digitaleric@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The fault that should be raised for a privilege level violation is #GP
rather than #UD.
Fixes: 727ba748e1 ("kvm: nVMX: Enforce cpl=0 for VMX instructions")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Register tlb_remote_flush callback for vmx when hyperv capability of
nested guest mapping flush is detected. The interface can help to
reduce overhead when flush ept table among vcpus for nested VM. The
tradition way is to send IPIs to all affected vcpus and executes
INVEPT on each vcpus. It will trigger several vmexits for IPI
and INVEPT emulation. Hyper-V provides such hypercall to do
flush for all vcpus and call the hypercall when all ept table
pointers of single VM are same.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch is to provide a way for platforms to register hv tlb remote
flush callback and this helps to optimize operation of tlb flush
among vcpus for nested virtualization case.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch is to add hyperv_nested_flush_guest_mapping support to trace
hvFlushGuestPhysicalAddressSpace hypercall.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V supports a pv hypercall HvFlushGuestPhysicalAddressSpace to
flush nested VM address space mapping in l1 hypervisor and it's to
reduce overhead of flushing ept tlb among vcpus. This patch is to
implement it.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On a VM with only 1 vCPU, the locking fast path will always be
successful. In this case, there is no need to use the the PV qspinlock
code which has higher overhead on the unlock side than the native
qspinlock code.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge check of "sp->role.cr4_pae != !!is_pae(vcpu))" and "vcpu->
arch.mmu.sync_page(vcpu, sp) == 0". kvm_mmu_prepare_zap_page()
is called under both these conditions.
Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is a duplicate of X86_CR3_PCID_NOFLUSH. So just use that instead.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Adds support for storing multiple previous CR3/root_hpa pairs maintained
as an LRU cache, so that the lockless CR3 switch path can be used when
switching back to any of them.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This needs a minor bug fix. The updated patch is as follows.
Thanks,
Junaid
------------------------------------------------------------------------------
kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB
entries for the specific guest virtual address, instead of flushing all
TLB entries associated with the VM.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When the guest indicates that the TLB doesn't need to be flushed in a
CR3 switch, we can also skip resyncing the shadow page tables since an
out-of-sync shadow page table is equivalent to an out-of-sync TLB.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_mmu_free_roots() now takes a mask specifying which roots to free, so
that either one of the roots (active/previous) can be individually freed
when needed.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This allows invlpg() to be called using either the active root_hpa
or the prev_root_hpa.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When PCIDs are enabled, the MSb of the source operand for a MOV-to-CR3
instruction indicates that the TLB doesn't need to be flushed.
This change enables this optimization for MOV-to-CR3s in the guest
that have been intercepted by KVM for shadow paging and are handled
within the fast CR3 switch path.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implement support for INVPCID in shadow paging mode as well.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When using shadow paging mode, propagate the guest's PCID value to
the shadow CR3 in the host instead of always using PCID 0.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove the implicit flush from the set_cr3 handlers, so that the
callers are able to decide whether to flush the TLB or not.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the fast CR3 switch mechanism to locklessly change the MMU root
page when switching between L1 and L2. The switch from L2 to L1 should
always go through the fast path, while the switch from L1 to L2 should
go through the fast path if L1's CR3/EPTP for L2 hasn't changed
since the last time.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This adds support for re-initializing the MMU context in a different
mode while preserving the active root_hpa and the prev_root.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This generalizes the lockless CR3 switch path to be able to work
across different MMU modes (e.g. nested vs non-nested) by checking
that the expected page role of the new root page matches the page role
of the previously stored root page in addition to checking that the new
CR3 matches the previous CR3. Furthermore, instead of loading the
hardware CR3 in fast_cr3_switch(), it is now done in vcpu_enter_guest(),
as by that time the MMU context would be up-to-date with the VCPU mode.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The KVM_REQ_LOAD_CR3 request loads the hardware CR3 using the
current root_hpa.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These functions factor out the base role calculation from the
corresponding kvm_init_*_mmu() functions. The new functions return
what would be the role assigned to a root page in the current VCPU
state. This can be masked with mmu_base_role_mask to derive the base
role.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When using shadow paging, a CR3 switch in the guest results in a VM Exit.
In the common case, that VM exit doesn't require much processing by KVM.
However, it does acquire the MMU lock, which can start showing signs of
contention under some workloads even on a 2 VCPU VM when the guest is
using KPTI. Therefore, we add a fast path that avoids acquiring the MMU
lock in the most common cases e.g. when switching back and forth between
the kernel and user mode CR3s used by KPTI with no guest page table
changes in between.
For now, this fast path is implemented only for 64-bit guests and hosts
to avoid the handling of PDPTEs, but it can be extended later to 32-bit
guests and/or hosts as well.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm_mmu_sync_roots() can locklessly check whether a sync is needed and just
bail out if it isn't.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
sync_page() calls set_spte() from a loop across a page table. It would
work better if set_spte() left the TLB flushing to its callers, so that
sync_page() can aggregate into a single call.
Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No functionality change.
This is done as a preparation for VMCS shadowing virtualization.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No functionality change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose VMCS shadowing to L1 as a VMX capability of the virtual CPU,
whether or not VMCS shadowing is supported by the physical CPU.
(VMCS shadowing emulation)
Shadowed VMREADs and VMWRITEs from L2 are handled by L0, without a
VM-exit to L1.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is done as a preparation for VMCS shadowing emulation.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is done as a preparation to VMCS shadowing emulation.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The shadow vmcs12 cannot be flushed on KVM_GET_NESTED_STATE,
because at that point guest memory is assumed by userspace to
be immutable. Capture the cache in vmx_get_nested_state, adding
another page at the end if there is an active shadow vmcs12.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is done is done as a preparation to VMCS shadowing emulation.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Intel SDM considers these checks to be part of
"Checks on Guest Non-Register State".
Note that it is legal for vmcs->vmcs_link_pointer to be -1ull
when VMCS shadowing is enabled. In this case, any VMREAD/VMWRITE to
shadowed-field sets the ALU flags for VMfailInvalid (i.e. CF=1).
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No functionality change.
This is done as a preparation for VMCS shadowing emulation.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No functionality change.
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For nested virtualization L0 KVM is managing a bit of state for L2 guests,
this state can not be captured through the currently available IOCTLs. In
fact the state captured through all of these IOCTLs is usually a mix of L1
and L2 state. It is also dependent on whether the L2 guest was running at
the moment when the process was interrupted to save its state.
With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
that is in VMX operation.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jim Mattson <jmattson@google.com>
[karahmed@ - rename structs and functions and make them ready for AMD and
address previous comments.
- handle nested.smm state.
- rebase & a bit of refactoring.
- Merge 7/8 and 8/8 into one patch. ]
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If the vCPU enters system management mode while running a nested guest,
RSM starts processing the vmentry while still in SMM. In that case,
however, the pages pointed to by the vmcs12 might be incorrectly
loaded from SMRAM. To avoid this, delay the handling of the pages
until just before the next vmentry. This is done with a new request
and a new entry in kvm_x86_ops, which we will be able to reuse for
nested VMX state migration.
Extracted from a patch by Jim Mattson and KarimAllah Ahmed.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some of the MSRs returned by GET_MSR_INDEX_LIST currently cannot be sent back
to KVM_GET_MSR and/or KVM_SET_MSR; either they can never be sent back, or you
they are only accepted under special conditions. This makes the API a pain to
use.
To avoid this pain, this patch makes it so that the result of the get-list
ioctl can always be used for host-initiated get and set. Since we don't have
a separate way to check for read-only MSRs, this means some Hyper-V MSRs are
ignored when written. Arguably they should not even be in the result of
GET_MSR_INDEX_LIST, but I am leaving there in case userspace is using the
outcome of GET_MSR_INDEX_LIST to derive the support for the corresponding
Hyper-V feature.
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linux does not support Memory Protection Extensions (MPX) in the
kernel itself, thus the BNDCFGS (Bound Config Supervisor) MSR will
always be zero in the KVM host, i.e. RDMSR in vmx_save_host_state()
is superfluous. KVM unconditionally sets VM_EXIT_CLEAR_BNDCFGS,
i.e. BNDCFGS will always be zero after VMEXIT, thus manually loading
BNDCFGS is also superfluous.
And in the event the MPX kernel support is added (unlikely given
that MPX for userspace is in its death throes[1]), BNDCFGS will
likely be common across all CPUs[2], and at the least shouldn't
change on a regular basis, i.e. saving the MSR on every VMENTRY is
completely unnecessary.
WARN_ONCE in hardware_setup() if the host's BNDCFGS is non-zero to
document that KVM does not preserve BNDCFGS and to serve as a hint
as to how BNDCFGS likely should be handled if MPX is used in the
kernel, e.g. BNDCFGS should be saved once during KVM setup.
[1] https://lkml.org/lkml/2018/4/27/1046
[2] http://www.openwall.com/lists/kernel-hardening/2017/07/24/28
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76
sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA
1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR
9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU
mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2
L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt
vJJlibI=
=42a5
-----END PGP SIGNATURE-----
Merge tag 'v4.18-rc6' into HEAD
Pull bug fixes into the KVM development tree to avoid nasty conflicts.
- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAltoBXMVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDyUYP/1feAq3F7ZmhCIZka4c6y/m4EBpq
BjWEEgOAGMEyyB4s98flsRtZcEUxxp6CqEXo2FgCsd1Nj+og7oA7vwOlqy3aGzsi
9f/Z5Wi6SlG06lH5tmYNkyVbGk2tE3s2FzkH5Rg8qZGk+X3OCOdNs/+G20pYAkSp
ESePWSapbQUJSExJ1MqzfdHFidtVA1V+ev8BKdIp2ykl1NRae8LJeKHIbqac49Ym
JclfCLFpQM1M1ElB9j0E8hAvZhz10oOz7TtBR737O/1QEifVyFqGBckPzldvwIJM
zZ+nR+Yzj1ruD109xwaF1iKy9AinZWhiqrtN7UXJ3jwHtNih+sy0R6FQ38GMNoOC
0K02n/qStR5xglGr4BmAcWlOuFtBYWfz6HpSVMqaTWWmOxHEiqS6pXtEA+dV/YyI
wHLbo0YzpWTQm6t1+b/PoByAJ0/hOcD1nOD57b+NGjX7tZV0sGjpGsecvFhTSywh
BN3COBi9k/FOBrOTGDX1qUAI+mEf76vc2BAC+BkkoiiMg3WlY0E9qfQJguUxHdrb
0LS3lDZoHCNoz8RZLrUyenTT0NYGcjPGUTinMDJWG79VGXOWFexTDdCuX0kF90CK
1Zie3O6lrTYolmaiyLUxwukKp1SVUyoA5IpKVwfDJQYUhEfk27yvlzg2MBMcHDRA
uy3QSkmjx9vw/sAu
=gKw8
-----END PGP SIGNATURE-----
Merge tag 'irqchip-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier:
- GICv3 ITS LPI allocation revamp
- GICv3 support for hypervisor-enforced LPI range
- GICv3 ITS conversion to raw spinlock
This patch is to add clocks property for fman ptp timer node.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add clocks property for fman ptp timer node.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vdso{32,64}.so can fail to link with CC=clang when clang tries to find
a suitable GCC toolchain to link these libraries with.
/usr/bin/ld: arch/x86/entry/vdso/vclock_gettime.o:
access beyond end of merged section (782)
This happens because the host environment leaked into the cross compiler
environment due to the way clang searches for suitable GCC toolchains.
Clang is a retargetable compiler, and each invocation of it must provide
--target=<something> --gcc-toolchain=<something> to allow it to find the
correct binutils for cross compilation. These flags had been added to
KBUILD_CFLAGS, but the vdso code uses CC and not KBUILD_CFLAGS (for various
reasons) which breaks clang's ability to find the correct linker when cross
compiling.
Most of the time this goes unnoticed because the host linker is new enough
to work anyway, or is incompatible and skipped, but this cannot be reliably
assumed.
This change alters the vdso makefile to just use LD directly, which
bypasses clang and thus the searching problem. The makefile will just use
${CROSS_COMPILE}ld instead, which is always what we want. This matches the
method used to link vmlinux.
This drops references to DISABLE_LTO; this option doesn't seem to be set
anywhere, and not knowing what its possible values are, it's not clear how
to convert it from CC to LD flag.
Signed-off-by: Alistair Strachan <astrachan@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: kernel-team@android.com
Cc: joel@joelfernandes.org
Cc: Andi Kleen <andi.kleen@intel.com>
Link: https://lkml.kernel.org/r/20180803173931.117515-1-astrachan@google.com
When chunks of the kernel image are freed, free_init_pages() is used
directly. Consolidate the three sites that do this. Also update the
string to give an incrementally better description of that memory versus
what was there before.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: hughd@google.com
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225829.FE0E32EA@viggo.jf.intel.com
The x86 code has several places where it frees parts of kernel image:
1. Unused SMP alternative
2. __init code
3. The hole between text and rodata
4. The hole between rodata and data
We call free_init_pages() to do this. Strangely, we convert the symbol
addresses to kernel direct map addresses in some cases (#3, #4) but not
others (#1, #2).
The virt_to_page() and the other code in free_reserved_area() now works
fine for for symbol addresses on x86, so don't bother converting the
addresses to direct map addresses before freeing them.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: hughd@google.com
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225828.89B2D0E2@viggo.jf.intel.com
The kernel image starts out with the Global bit set across the entire
kernel image. The bit is cleared with set_memory_nonglobal() in the
configurations with PCIDs where the performance benefits of the Global bit
are not needed.
However, this is fragile. It means that we are stuck opting *out* of the
less-secure (Global bit set) configuration, which seems backwards. Let's
start more secure (Global bit clear) and then let things opt back in if
they want performance, or are truly mapping common data between kernel and
userspace.
This fixes a bug. Before this patch, there are areas that are unmapped
from the user page tables (like like everything above 0xffffffff82600000 in
the example below). These have the hallmark of being a wrong Global area:
they are not identical in the 'current_kernel' and 'current_user' page
table dumps. They are also read-write, which means they're much more
likely to contain secrets.
Before this patch:
current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K RW GLB NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE GLB NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000 2M RW GLB NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000 4M RW PSE GLB NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000 462M pmd
current_user:---[ High Kernel Mapping ]---
current_user-0xffffffff80000000-0xffffffff81000000 16M pmd
current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_user-0xffffffff81e11000-0xffffffff82000000 1980K RW GLB NX pte
current_user-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd
current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd
After this patch:
current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000 2M RW NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000 4M RW PSE NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000 462M pmd
current_user:---[ High Kernel Mapping ]---
current_user-0xffffffff80000000-0xffffffff81000000 16M pmd
current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_user-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte
current_user-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd
current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd
Fixes: 0f561fce4d ("x86/pti: Enable global pages for shared areas")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@google.com
Cc: aarcange@redhat.com
Cc: jgross@suse.com
Cc: jpoimboe@redhat.com
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Cc: torvalds@linux-foundation.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: ak@linux.intel.com
Cc: Kees Cook <keescook@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20180802225825.A100C071@viggo.jf.intel.com
Pull x86 fix from Thomas Gleixner:
"A single fix, which addresses boot failures on machines which do not
report EBDA correctly, which can place the trampoline into reserved
memory regions. Validating against E820 prevents that"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed/64: Validate trampoline placement against E820
Pull perf fixes from Thomas Gleixner:
"A set of fixes for perf:
Kernel side:
- Fix the hardcoded index of extra PCI devices on Broadwell which
caused a resource conflict and triggered warnings on CPU hotplug.
Tooling:
- Update the tools copy of several files, including perf_event.h,
powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and x86's
memcpy_64.s (used in 'perf bench mem'), silencing the respective
warnings during the perf tools build.
- Fix the build on the alpine:edge distro"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix hardcoded index of Broadwell extra PCI devices
perf tools: Fix the build on the alpine:edge distro
tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
tools headers uapi: Refresh linux/bpf.h copy
tools headers powerpc: Update asm/unistd.h copy to pick new
tools headers uapi: Update tools's copy of linux/perf_event.h
When nested virtualization is in use, VMENTER operations from the nested
hypervisor into the nested guest will always be processed by the bare metal
hypervisor, and KVM's "conditional cache flushes" mode in particular does a
flush on nested vmentry. Therefore, include the "skip L1D flush on
vmentry" bit in KVM's suggested ARCH_CAPABILITIES setting.
Add the relevant Documentation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed. Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Three changes to the content of the sysfs file:
- If EPT is disabled, L1TF cannot be exploited even across threads on the
same core, and SMT is irrelevant.
- If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
instead of "vulnerable, SMT vulnerable"
- Reorder the two parts so that the main vulnerability state comes first
and the detail on SMT is second.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For VMEXITs caused by external interrupts, vmx_handle_external_intr()
indirectly calls into the interrupt handlers through the host's IDT.
It follows that these interrupts get accounted for in the
kvm_cpu_l1tf_flush_l1d per-cpu flag.
The subsequently executed vmx_l1d_flush() will thus be aware that some
interrupts have happened and conduct a L1d flush anyway.
Setting l1tf_flush_l1d from vmx_handle_external_intr() isn't needed
anymore. Drop it.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The last missing piece to having vmx_l1d_flush() take interrupts after
VMEXIT into account is to set the kvm_cpu_l1tf_flush_l1d per-cpu flag on
irq entry.
Issue calls to kvm_set_cpu_l1tf_flush_l1d() from entering_irq(),
ipi_entering_ack_irq(), smp_reschedule_interrupt() and
uv_bau_message_interrupt().
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().
Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/topology.h
linux/smp.h
asm/smp.h
or
linux/gfp.h
linux/mmzone.h
asm/mmzone.h
asm/mmzone_64.h
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/irqdesc.h
linux/kobject.h
linux/sysfs.h
linux/kernfs.h
linux/idr.h
linux/gfp.h
and others.
This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.
A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.
However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.
Remove the linux/irq.h #include from x86' asm/hardirq.h.
Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.
Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Part of the L1TF mitigation for vmx includes flushing the L1D cache upon
VMENTRY.
L1D flushes are costly and two modes of operations are provided to users:
"always" and the more selective "conditional" mode.
If operating in the latter, the cache would get flushed only if a host side
code path considered unconfined had been traversed. "Unconfined" in this
context means that it might have pulled in sensitive data like user data
or kernel crypto keys.
The need for L1D flushes is tracked by means of the per-vcpu flag
l1tf_flush_l1d. KVM exit handlers considered unconfined set it. A
vmx_l1d_flush() subsequently invoked before the next VMENTER will conduct a
L1d flush based on its value and reset that flag again.
Currently, interrupts delivered "normally" while in root operation between
VMEXIT and VMENTER are not taken into account. Part of the reason is that
these don't leave any traces and thus, the vmx code is unable to tell if
any such has happened.
As proposed by Paolo Bonzini, prepare for tracking all interrupts by
introducing a new per-cpu flag, "kvm_cpu_l1tf_flush_l1d". It will be in
strong analogy to the per-vcpu ->l1tf_flush_l1d.
A later patch will make interrupt handlers set it.
For the sake of cache locality, group kvm_cpu_l1tf_flush_l1d into x86'
per-cpu irq_cpustat_t as suggested by Peter Zijlstra.
Provide the helpers kvm_set_cpu_l1tf_flush_l1d(),
kvm_clear_cpu_l1tf_flush_l1d() and kvm_get_cpu_l1tf_flush_l1d(). Make them
trivial resp. non-existent for !CONFIG_KVM_INTEL as appropriate.
Let vmx_l1d_flush() handle kvm_cpu_l1tf_flush_l1d in the same way as
l1tf_flush_l1d.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
An upcoming patch will extend KVM's L1TF mitigation in conditional mode
to also cover interrupts after VMEXITs. For tracking those, stores to a
new per-cpu flag from interrupt handlers will become necessary.
In order to improve cache locality, this new flag will be added to x86's
irq_cpustat_t.
Make some space available there by shrinking the ->softirq_pending bitfield
from 32 to 16 bits: the number of bits actually used is only NR_SOFTIRQS,
i.e. 10.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, vmx_vcpu_run() checks if l1tf_flush_l1d is set and invokes
vmx_l1d_flush() if so.
This test is unncessary for the "always flush L1D" mode.
Move the check to vmx_l1d_flush()'s conditional mode code path.
Notes:
- vmx_l1d_flush() is likely to get inlined anyway and thus, there's no
extra function call.
- This inverts the (static) branch prediction, but there hadn't been any
explicit likely()/unlikely() annotations before and so it stays as is.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The vmx_l1d_flush_always static key is only ever evaluated if
vmx_l1d_should_flush is enabled. In that case however, there are only two
L1d flushing modes possible: "always" and "conditional".
The "conditional" mode's implementation tends to require more sophisticated
logic than the "always" mode.
Avoid inverted logic by replacing the 'vmx_l1d_flush_always' static key
with a 'vmx_l1d_flush_cond' one.
There is no change in functionality.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
vmx_l1d_flush() gets invoked only if l1tf_flush_l1d is true. There's no
point in setting l1tf_flush_l1d to true from there again.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
One fix for a regression in a recent TLB flush optimisation, which caused us to
incorrectly not send TLB invalidations to coprocessors.
Thanks to:
Frederic Barrat, Nicholas Piggin, Vaibhav Jain.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJbZDxOExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAm
7w//QcYYpBcBu9XK0E53XglSoC4iTzMrPnIIivPsSngLXWtEC/5315nbCOdhT4mh
CTYxmwCm0pZG4kxJntnT4MXwSLhM46eFs586uOS1BgEJw+m85m2tiEsrNQSogJ14
GzZiIk1UbzKUZjpSTIXigx0Um6x+q83kIBhd/ewrejB1wz6d5T7Wktjv2iWewGdf
kOJPXuvaWXeGmJWJ2Gek3T5j7PCII0humEQjsy40XVFR3NYC7jT1r4UONE3NeBTU
xavo/EcB3JHx2MPPgAfJT/KAiHLQZzuR382azf7W5sQJ7/Yy2N1jpskSxFoMEMtc
axYWniHfGwJsnwcO4HPisIA67tXR6EzugAyuta4IQufRVp1s+j2WIzyqjl4uOB7Q
v5ZXBH/Y4wShqVHhUzzQasiJgGYZoUoeOvJIoFNEo5f6UFuT/l9mkuM4vPtNKi6i
NpT2Hj/hTATNRGyrBnvjcktS8uHbVxSHE6Cm4ZHSfQ8POMulUKvhHTZ7gYsxpxUE
ktKCNgNL57QHRQojY0AftpEMLrDYiJEE9+6OLjEk3Mf7ayiZAAjIpy7LRD9+p8Q8
eomEXmwQr1YOXGgvS0mFFPd+P2PfJoQztd1fWZtZIY1xksdwGep65e24hBOEjCAV
7vVx+3pm+dekEFVf976ZYgACHCjOn8Efd5t+AK7b8S0u0uw=
=Z81h
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One fix for a regression in a recent TLB flush optimisation, which
caused us to incorrectly not send TLB invalidations to coprocessors.
Thanks to Frederic Barrat, Nicholas Piggin, Vaibhav Jain"
* tag 'powerpc-4.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/radix: Fix missing global invalidations when removing copro
Peter is objecting to the direct PMU access in RDT. Right now the PMU usage
is broken anyway as it is not coordinated with perf.
Until this discussion settled, disable the PMU mechanics by simply
rejecting the type '2' measurement in the resctrl file.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
CC: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: hpa@zytor.com
Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.
From the specification [1]:
"With enhanced IBRS, the predicted targets of indirect branches
executed cannot be controlled by software that was executed in a less
privileged predictor mode or on another logical processor. As a
result, software operating on a processor with enhanced IBRS need not
use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
privileged predictor mode. Software can isolate predictor modes
effectively simply by setting the bit once. Software need not disable
enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:
"Retpoline is known to be an effective branch target injection (Spectre
variant 2) mitigation on Intel processors belonging to family 6
(enumerated by the CPUID instruction) that do not have support for
enhanced IBRS. On processors that support enhanced IBRS, it should be
used for mitigation instead of retpoline."
The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.
If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.
Enhanced IBRS still requires IBPB for full mitigation.
[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511
Originally-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim C Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
Some Intel processors have an EPT feature whereby the accessed & dirty bits
in EPT entries can be updated by HW. MSR IA32_VMX_EPT_VPID_CAP exposes the
presence of this capability.
There is no point in trying to use that new feature bit in the VMX code as
VMX needs to read the MSR anyway to access other bits, but having the
feature bit for EPT_AD in place helps virtualization management as it
exposes "ept_ad" in /proc/cpuinfo/$proc/flags if the feature is present.
[ tglx: Amended changelog ]
Signed-off-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Peter Shier <pshier@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180801180657.138051-1-pshier@google.com
We've encountered a performance issue when multiple processors stress
{get,put}_mmio_atsd_reg(). These functions contend for
mmio_atsd_usage, an unsigned long used as a bitmask.
The accesses to mmio_atsd_usage are done using test_and_set_bit_lock()
and clear_bit_unlock(). As implemented, both of these will require
a (successful) stwcx to that same cache line.
What we end up with is thread A, attempting to unlock, being slowed by
other threads repeatedly attempting to lock. A's stwcx instructions
fail and retry because the memory reservation is lost every time a
different thread beats it to the punch.
There may be a long-term way to fix this at a larger scale, but for
now resolve the immediate problem by gating our call to
test_and_set_bit_lock() with one to test_bit(), which is obviously
implemented without using a store.
Fixes: 1ab66d1fba ("powerpc/powernv: Introduce address translation services for Nvlink2")
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
from CONFIG_PPC using the same condition as an if guard.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[mpe: Move it under PPC]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16,
but the one-way code (used on remainder blocks) implements it with
vshl + vsri, which is slower. Switch the one-way code to vrev32.16 too.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Enable all 4 SEC units available on d05 boards.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 9c7b185ab2 ("powernv/cpuidle: Parse dt idle properties into
global structure") parses dt idle states into structs, but never marks
them valid. This results in all idle states being lost.
Fixes: 9c7b185ab2 ("powernv/cpuidle: Parse dt idle properties into global structure")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Modify the base address of the mdio mux driver to point to the
start of the mdio mux block's register address space.
Signed-off-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SDM845 has two tsens blocks, one with 13 sensors and the other with 8
sensors. It uses version 2 of the TSENS IP, so use the fallback property to
allow more common code.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
We also split up the regmap address space into two, for the TM and SROT
registers. This was required to deal with different address offsets for the
TM and SROT registers across different SoC families.
8996 has two TSENS IP blocks, initialise the second one too.
Since tsens-common.c/init_common() currently only registers one address
space, the order is important (TM before SROT). This is OK since the code
doesn't really use the SROT functionality yet.
Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Tested-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Andy Gross <andy.gross@linaro.org>
The BTF conflicts were simple overlapping changes.
The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix potential clobbering of user vector register state by AES ghash code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJbYtldAAoJELescNyEwWM0aisH/ReBqfJb1kTr5ybUriSTsuTl
Vjn8O17aH+177P3xlY47+cYWsvewJxiy7MldlLEiER0tsyjYvqTDuzy7+ffdJJ8R
rs0MGhYh9WpD3OVjng7TmwXD71xefk/gLq4MPqmgn9e/DoAt4HO/7jC4c+mSwxRZ
gHsAH5g6WUclRvU1zXaT8QKdZnmwudPFvy5O+bQYQrIJr++zBiyJ47qu1+TjJQuC
kVmM7XV0c0L1fK9z7A18PcW+tMlIu15ITzJwEercXen/7XypdDOufgc4Y9odHCkC
5ZWnV5wZtaJIuo/JWWPnoheWfTqIVF7ggXwsaXmZ0hKfQkBvbJ8fxhogCIWgt40=
=rpVW
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 regression fix from Will Deacon:
"Ard found a nasty arm64 regression in 4.18 where the AES ghash/gcm
code doesn't notify the kernel about its use of the vector registers,
therefore potentially corrupting live user state.
The fix is straightforward and Herbert agreed for it to go via arm64"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair
generic_defconfig explicitly disables CONFIG_INPUT_MOUSEDEV,
CONFIG_INPUT_KEYBOARD & CONFIG_INPUT_MOUSE which results in warnings
when merging board config fragments if any of them require these
options. This is the case for the ranchu board, which means we've had
the following warning when configuring for generic platform targets
since commit f2d0b0d5c1 ("MIPS: ranchu: Add Ranchu as a new
generic-based board"):
$ make ARCH=mips 32r2el_defconfig
Using ./arch/mips/configs/generic_defconfig as base
Merging arch/mips/configs/generic/32r2.config
Merging arch/mips/configs/generic/el.config
Merging ./arch/mips/configs/generic/board-sead-3.config
Merging ./arch/mips/configs/generic/board-ranchu.config
Value of CONFIG_INPUT_KEYBOARD is redefined by fragment ./arch/mips/configs/generic/board-ranchu.config:
Previous value: # CONFIG_INPUT_KEYBOARD is not set
New value: CONFIG_INPUT_KEYBOARD=y
Merging ./arch/mips/configs/generic/board-ni169445.config
Merging ./arch/mips/configs/generic/board-boston.config
Merging ./arch/mips/configs/generic/board-ocelot.config
Merging ./arch/mips/configs/generic/board-xilfpga.config
scripts/kconfig/conf --olddefconfig Kconfig
#
# configuration written to .config
#
Resolve this by removing mention of the CONFIG_INPUT_* Kconfig symbols
from generic_defconfig, allowing them to take their default values &
allowing board config fragments to enable them without warnings.
This may be problematic if CONFIG_ARCH_MIGHT_HAVE_PC_SERIO is ever
enabled for CONFIG_MIPS_GENERIC=y configurations, but for now that isn't
the case so we can worry about that if & when it happens.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20109/
Cc: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Spectre variant 1 attacks are about this sequence of pseudo-code:
index = load(user-manipulated pointer);
access(base + index * stride);
In order for the cache side-channel to work, the access() must me made
to memory which userspace can detect whether cache lines have been
loaded. On 32-bit ARM, this must be either user accessible memory, or
a kernel mapping of that same user accessible memory.
The problem occurs when the load() speculatively loads privileged data,
and the subsequent access() is made to user accessible memory.
Any load() which makes use of a user-maniplated pointer is a potential
problem if the data it has loaded is used in a subsequent access. This
also applies for the access() if the data loaded by that access is used
by a subsequent access.
Harden the get_user() accessors against Spectre attacks by forcing out
of bounds addresses to a NULL pointer. This prevents get_user() being
used as the load() step above. As a side effect, put_user() will also
be affected even though it isn't implicated.
Also harden copy_from_user() by redoing the bounds check within the
arm_copy_from_user() code, and NULLing the pointer if out of bounds.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Fixing __get_user() for spectre variant 1 is not sane: we would have to
add address space bounds checking in order to validate that the location
should be accessed, and then zero the address if found to be invalid.
Since __get_user() is supposed to avoid the bounds check, and this is
exactly what get_user() does, there's no point having two different
implementations that are doing the same thing. So, when the Spectre
workarounds are required, make __get_user() an alias of get_user().
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Borrow the x86 implementation of __inttype() to use in get_user() to
select an integer type suitable to temporarily hold the result value.
This is necessary to avoid propagating the volatile nature of the
result argument, which can cause the following warning:
lib/iov_iter.c:413:5: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var]
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
__get_user_error() is used as a fast accessor to make copying structure
members as efficient as possible. However, with software PAN and the
recent Spectre variant 1, the efficiency is reduced as these are no
longer fast accessors.
In the case of software PAN, it has to switch the domain register around
each access, and with Spectre variant 1, it would have to repeat the
access_ok() check for each access.
Rather than using __get_user_error() to copy each semops element member,
copy each semops element in full using __copy_from_user().
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>