Skip to content

adopt: add tag to install the static GRUB config from tree #945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

HuijingHei
Copy link
Member

@HuijingHei HuijingHei commented Jun 5, 2025

Add bootupctl adopt-and-update --with-static-config to migrate
RHCOS system to use static GRUB config, for example 4.1/4.2 born
RHCOS nodes.

Fixes: https://issues.redhat.com/browse/OCPBUGS-52485

Copy link

openshift-ci bot commented Jun 5, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch 17 times, most recently from aac4857 to ad92037 Compare June 9, 2025 06:52
@HuijingHei HuijingHei changed the title migrate: add tag to install the canonical static GRUB config migrate: add tag to install the static GRUB config from tree Jun 9, 2025
@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch 2 times, most recently from 555700c to a5a7636 Compare June 9, 2025 07:10
@HuijingHei HuijingHei marked this pull request as ready for review June 9, 2025 07:41
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! An initial review pass here

src/bootupd.rs Outdated
const BACKUP: &str = "grub.cfg.backup";
let grubconfig = grub_config_dir.join(GRUBCONFIG);

#[cfg(any(target_arch = "x86_64", target_arch = "powerpc64"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonblocking and not a new issue but we should probably centralize this into a #[cfg(grub_arch)] or so (I think we'd need to do that in a build.rs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like #[cfg(grub_arch="bios")] to match #[cfg(any(target_arch = "x86_64", target_arch = "powerpc64"))],
and #[cfg(grub_arch="efi")] to match #[cfg(any( target_arch = "x86_64", target_arch = "aarch64", target_arch = "riscv64" ))] ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah something like that, anyways it's fine as is!

src/bootupd.rs Outdated
if !grubconfig.exists() {
// Not existing, maybe UEFI
println!("Not found '{}'", grubconfig.display());
if cfg!(target_arch = "powerpc64") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said is powerpc64 actually relevant here? It looks like https://mirror.openshift.com/pub/openshift-v4/ppc64le/dependencies/rhcos/ only has a version starting from 4.3...was that not using static configs? (Ugh it's painful to inspect from the outside because that's using the old LUKS setup)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I check 4.3 on ppc, it is already using bootloader=none, maybe better to remove the condition?

Makefile Outdated
@@ -37,6 +37,7 @@ install-grub-static:
.PHONY: install-systemd-unit
install-systemd-unit:
install -m 644 -D -t "${DESTDIR}$(PREFIX)/lib/systemd/system/" systemd/bootloader-update.service
install -m 644 -D -t "${DESTDIR}$(PREFIX)/lib/systemd/system/" systemd/bootloader-migrate.service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to only enable this service for cases that need it which is basically RHCOS right?

I think a lot of cases now are using install-systemd-unit so this will have a large blast radius as is.

Now in theory of course the
ExecCondition=/bin/sh -c '[ "$(ostree config --repo=/sysroot/ostree/repo get sysroot.bootloader 2>/dev/null)" != "none" ]' will mean this new service just starts and then does nothing on each boot for those systems, but...it's probably worth turning this around and only actually installing the service there inside openshift/os or so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And didn't we end up writing something similar to this for the atomic desktops? cc @travier

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added a drop-in for the existing unit for the Atomic Desktops: https://pagure.io/workstation-ostree-config/blob/main/f/bootupd.yaml#_17

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very much a "per-variant" thing here as to how to do this, so we should probably not install this by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am jetlagged right now and may be missing something obvious but) how is that different from what's being done here? Is it about fetching the static config from the new version in the target root?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, unless I'm missing something, the line we're commenting on here is just installing the unit, not enabling it.

Copy link
Member Author

@HuijingHei HuijingHei Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, by default the unit is disabled. We should make it enabled on RHCOS to let it running on each boot. I did testing with starting the service manually after boot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This unit is not just enabling static configs, it's also doing the adopt-and-update flow i.e. updating the bootloader as well (which makes sense!).

Wouldn't it be simpler to structure this then as a drop-in for the bootloader-update.service unit? I think we'd trigger it on the firstboot path in OCP or so?

Copy link
Member Author

@HuijingHei HuijingHei Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to structure this then as a drop-in for the bootloader-update.service unit?

In this case, bootloader-update.service will run update, and we only need to enable static config for installed component, right? This is a little different from adopt-and-update --with-static-config that only enable static config for adoptable component. This would make with-static-config independently without update or adopt like the current migrate. Or also make update --with-static-config work, that means enable static for installed component. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure about structuring this as a drop-in for bootloader-update.service. But I think it's fine to carry the unit definition in OCP if for whatever reason we don't want it to live in bootupd. That probably makes sense anyway since the conditions under which you want to do this migration can be quite specific (and I guess is why e.g. no systemd unit was added as part of the Atomic Desktop migration work either).

@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch 2 times, most recently from da7bbfe to 5f29489 Compare June 10, 2025 03:00
@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch from 5f29489 to c347741 Compare June 10, 2025 15:25
@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch 3 times, most recently from 96242b0 to b1063ac Compare June 11, 2025 06:01
@HuijingHei HuijingHei changed the title migrate: add tag to install the static GRUB config from tree adopt: add tag to install the static GRUB config from tree Jun 11, 2025
@HuijingHei
Copy link
Member Author

Update to bootupctl adopt-and-update --with-static-config, welcome review, thank you all!

@jlebon
Copy link
Member

jlebon commented Jun 11, 2025

So, combining the two threads here. How about:

  • We land bootupctl adopt-and-update --with-static-config here without having a unit.
  • In OCP 4.19, we add a generator that outputs the migration unit only if sysroot.bootloader is not none. That's nicer because (1) it minimizes as much as possible what runs on unaffected nodes; it won't even show up in the transaction, and (2) it gets rid of the gnarly ExecCondition. Accounting for EUS-to-EUS upgrades, we'd need to keep it for 4.20 as well but then could drop it in 4.21.

@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch from b1063ac to 1c8fde0 Compare June 11, 2025 14:46
@HuijingHei
Copy link
Member Author

  • In OCP 4.19, we add a generator that outputs the migration unit only if sysroot.bootloader is none

SGTM, does this mean we will add the script in openshift/os 4.19 branch?

@jlebon
Copy link
Member

jlebon commented Jun 11, 2025

  • In OCP 4.19, we add a generator that outputs the migration unit only if sysroot.bootloader is none

SGTM, does this mean we will add the script in openshift/os 4.19 branch?

I guess so, but meh... I'm also fine to just keep it in rhel-coreos-config. That would mean we wouldn't be able to remove it until 4.22 instead of 4.21, but that's fine. It probably make testing the final fix easier.

@jlebon
Copy link
Member

jlebon commented Jun 11, 2025

In OCP 4.19, we add a generator that outputs the migration unit only if sysroot.bootloader is not none.

That should be I think "only if sysroot.bootloader is unset". s390x sets it to zipl. And if somehow it's set to some value other than none or zipl, we should probably just leave the system alone.

@HuijingHei
Copy link
Member Author

I guess so, but meh... I'm also fine to just keep it in rhel-coreos-config. That would mean we wouldn't be able to remove it until 4.22 instead of 4.21, but that's fine. It probably make testing the final fix easier.

OK to me, do you mean add the generator in rhel-coreos-config for main branch (that is 4.20), then backport to 4.19?

That should be I think "only if sysroot.bootloader is unset". s390x sets it to zipl. And if somehow it's set to some value other than none or zipl, we should probably just leave the system alone.

Maybe add check in generator to skip s390x.

@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch 2 times, most recently from 03e32a4 to 6b37831 Compare June 12, 2025 02:32
@HuijingHei
Copy link
Member Author

HuijingHei commented Jun 12, 2025

I guess so, but meh... I'm also fine to just keep it in rhel-coreos-config. That would mean we wouldn't be able to remove it until 4.22 instead of 4.21, but that's fine. It probably make testing the final fix easier.

Could you help to review coreos/rhel-coreos-config#23 ? Thank you!

@jlebon
Copy link
Member

jlebon commented Jun 12, 2025

I guess so, but meh... I'm also fine to just keep it in rhel-coreos-config. That would mean we wouldn't be able to remove it until 4.22 instead of 4.21, but that's fine. It probably make testing the final fix easier.

OK to me, do you mean add the generator in rhel-coreos-config for main branch (that is 4.20), then backport to 4.19?

The rhel-coreos-config repo does not branch per OCP release. main currently affects all OCP releases based on rhel-9.6.

HuijingHei added a commit to HuijingHei/rhel-coreos-config that referenced this pull request Jun 12, 2025
HuijingHei added a commit to HuijingHei/rhel-coreos-config that referenced this pull request Jun 12, 2025
HuijingHei added a commit to HuijingHei/rhel-coreos-config that referenced this pull request Jun 13, 2025
HuijingHei added a commit to HuijingHei/rhel-coreos-config that referenced this pull request Jun 13, 2025
@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch from 6b37831 to c054196 Compare June 13, 2025 02:40
HuijingHei added a commit to HuijingHei/rhel-coreos-config that referenced this pull request Jun 13, 2025
@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch 2 times, most recently from 7a91078 to a1297fa Compare June 13, 2025 06:58
@HuijingHei
Copy link
Member Author

@jlebon I update the logic that will migrate to static grub config if sysroot.bootloader is not set instead of checking it is not none. Let me know if it is suitable, thank you!

If the EFI System Partition (ESP) is unmounted, GRUB config
installation can fail silently or with little useful information.
This change adds better logging to help diagnose such failures by
clearly reporting when the ESP is not mounted.
Add `bootupctl adopt-and-update --with-static-config` to migrate
RHCOS system to use static GRUB config, for example 4.1/4.2 born
RHCOS nodes.

Fixes: https://issues.redhat.com/browse/OCPBUGS-52485
@HuijingHei HuijingHei force-pushed the coreos-static-grub-migration branch from a1297fa to 6818742 Compare June 13, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants