There are two issues. First, our default kdump configuration causes the
kdump service to use the host's default initrd as the crashkernel
environment. This can lead to issues when the default initrd size is
larger than the reserved memory for the crash kernel, which is set via
kernel command line at boot time.
It is common to have the kdump service instead rebuild a minimal
host-specific initrd specifically for the crashkernel environment.
To change this behavior, comment out force_no_rebuild 1 from kdump.conf.
After doing this, a second issue was discovered where the system would
enter the crashkernel environment when a kernel panic occurred, and
successfully collect the crash dump, but upon reboot back to the normal
host OS, the system would no longer boot.
This behavior was root caused to an issue in our kdumpctl command where
the TARGET_INITRD was pointing to our default host initrd, thus when the
kdump service would regenerate the minimal host-specific initrd, this
new initrd would overwrite the host's normal initrd, therefore leading
to failed normal boot.
Since Mariner 2.0 already has a precident to use the host system's initrd,
we need to still preserve this behavior by default, but allow users to opt
in to the better option.
Therefore, this change introduces a new "mariner_2_initrd_use_suffix"
option. When set, the option appends "kdump" suffix to TARGET_INITRD path,
which means the host system's initrd is no longer being targeted for
kdump.
This change also adds a guard-rail in kdumpctl to ensure both
"force_no_rebuild" and "mariner_2_initrd_use_suffix" are not set, otherwise
the kdump will fail to arm correctly since kdump will not be able to locate
the host's initrd.
When compressed kdump collection is enabled, vmcore data is now being
stored in /var/crash/-- directory using the
kdump-lib-initramfs.sh script. Specifically
/var/crash/-<date +%Y-%m-%d-%T>
Previously, the vmcore data was being saved from kdumpctl, which was
storing vmcores in /var/crash/-. Specifically
/var/crash/<date +%Y-%m-%d-%H:%M>
Since there could be automation already in place that expects the older
format, adjust the newer compressed kdump version to align with the older
/var/crash/- directory naming.
Signed-off-by: Chris Co <chrco@microsoft.com>