feat: Persistent Active Boot Log Collection into a Persistent Directory #3381

pavansokkenagaraj · 2025-04-30T13:50:09Z

Is your feature request related to a problem? Please describe.

Yes — when an upgrade fails or a node reboots unexpectedly (e.g., power cycle during upgrade), the system may fall back to booting the passive image due to upgrade_failure flags or incomplete boot assessments.
Currently, there’s no persistent logging of the active boot process, making postmortem diagnostics difficult, especially in dark-site environments with no console or GRUB access.
This results in time-consuming manual debugging, as we cannot easily determine the cause of fallback or incomplete boot.

Describe the solution you'd like

Write active boot logs into a persistent directory

Describe alternatives you've considered

Additional context

This enhancement would support better RCA and faster triage for upgrade failures or boot rollbacks in real-world field deployments, particularly on edge systems.

Slack thread for context:
SpectroCloud Slack – discussion on boot log persistence

Itxaka · 2025-04-30T14:28:05Z

note that usually this happens automatically as long as active is able to switch root into the final system, as at that point /var/log/ is mounted into persistent.

Here is an example of a machine that I set up a couple of days ago and just booted again, all logs are preserved:

But there is a point between kernel -> initramfs -> immucore -> switch root in which those logs are volatile as there is no persistence anywhere. Initramfs journal is logging into memory until it does the switch root if I believe at which point journal stores the in memory logs into disk. There is not mcuh that we can do at that point to store logs.

As that failure (again, first time we see this in more than 2 years with Kairos) cold have happen at any point before what stores the logs to disk, there is not much that we can do.

One could build a custom initramfs that provides log forwarding in real time to a remote syslog but that would require knowing in advance the remote to build it for, which I think its a very specific feature that does not fall onto Kairos broader usage.

Also notice that even if that was to be added by a Kairos consumer, if network or journald are not up before the crash occurs, you would hit the same issue, as no logs would be forwarded.

If you want to support boot rollback or identify the failure its very simple, if after an upgrade the system is booting on passive (you can stat /run/cos/passive_mode) then there has been a problem booting in active. Or you can check the /proc/cmdline to check for the upgrade_failure key. A simple reboot migth fix it like it did in the only case reported and it can be easily set to boot on active via an ssh command to the machine and then reboot the machine.

Itxaka · 2025-04-30T14:40:06Z

it would be possible to forwards all logs via journalctl itself to other places but note that this is nothing that we can do by default as it requires either a know url, or when done to console/kmesg is not recommended as its very slow.

Users can add to cmdline, if the active keeps rebooting/crashing, any of

systemd.journald.forward_to_syslog
systemd.journald.forward_to_kmsg
systemd.journald.forward_to_console
systemd.journald.forward_to_wall

to enable those and see more info or they can configure things like https://www.freedesktop.org/software/systemd/man/latest/systemd-journal-upload.service.html to trigger as a dep of the rescue/reboot to upload the log messages before a crash reboot is triggered, but again all of this things have their own caveats and are best left to the user

pavansokkenagaraj added enhancement New feature or request triage Add this label to issues that should be triaged and prioretized in the next planning call labels Apr 30, 2025

github-project-automation bot added this to 🧙Issue tracking board Apr 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Persistent Active Boot Log Collection into a Persistent Directory #3381

feat: Persistent Active Boot Log Collection into a Persistent Directory #3381

pavansokkenagaraj commented Apr 30, 2025

Itxaka commented Apr 30, 2025

Itxaka commented Apr 30, 2025

feat: Persistent Active Boot Log Collection into a Persistent Directory #3381

feat: Persistent Active Boot Log Collection into a Persistent Directory #3381

Comments

pavansokkenagaraj commented Apr 30, 2025

Itxaka commented Apr 30, 2025

Itxaka commented Apr 30, 2025