Skip to content

x86_64 kernels fail to boot after D140931 #1800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ms178 opened this issue Feb 7, 2023 · 7 comments
Closed

x86_64 kernels fail to boot after D140931 #1800

ms178 opened this issue Feb 7, 2023 · 7 comments
Labels
duplicate This issue or pull request already exists

Comments

@ms178
Copy link

ms178 commented Feb 7, 2023

I've been on the distribution provided Clang-15.0.7 for a while and tried out LLVM/Clang-17 (7bff37783f72ed99e4cdd0fd7dad1bf1f119f793) today, I've noticed that my 6.1.10 Kernel compiled and booted fine with Clang-15 but now fails to boot with Clang-17 and the same flags/patches used. It hangs right when Grub tries to load the Kernel image. CPU: Intel Haswell-EP, 64-bit.

More details on the patches used, as always, in my public repo.

config.txt

While skimming through other issues, I've found #1774, but while the symptoms were the same, I cannot tell if they are related. As I compile my Kernel with -O3, and the patch series adressed issues with -Os, it seems to be something different to me.

@nickdesaulniers
Copy link
Member

Thanks for the report. Are you able to run a bisection against LLVM to find which commit may be the culprit?

@nickdesaulniers nickdesaulniers added the more info needed More information requested to issue author from project members. label Feb 7, 2023
@ms178
Copy link
Author

ms178 commented Feb 7, 2023

@nickdesaulniers I am afraid, I cannot afford too much time for that at the moment. If it is a more widespread issue, I am sure it will pop up sooner or later on some bots. I'll re-visit this with a future ToT-LLVM-17 snapshot and let you know if it had been fixed by then.

@nathanchance
Copy link
Member

I'll see if I can reproduce this in QEMU at some point today.

@nathanchance
Copy link
Member

I can reproduce this with just defconfig and mainline with current LLVM main, I am about to bisect once I find a good point. I ran into a separate QEMU main regression that impacted my ability to use my full distribution QEMU wrapper to boot my own kernels and I never thought to try boot-qemu.py first :)

@nathanchance
Copy link
Member

My bisect landed on commit ee5585ed09af ("Recommit "Improve and enable folding of conditional branches with tail calls." (2nd Try)"). I will see if I can investigate more tomorrow, unless somebody else beats me to it.

@nathanchance nathanchance added [BUG] llvm (main) A bug in an unreleased version of LLVM (this label is appropriate for regressions) and removed more info needed More information requested to issue author from project members. labels Feb 8, 2023
@nathanchance nathanchance changed the title Clang-17 built Kernel fails to boot x86_64 kernels fail to boot after D140931 Feb 8, 2023
@nathanchance nathanchance added the [ARCH] x86_64 This bug impacts ARCH=x86_64 label Feb 8, 2023
@nathanchance
Copy link
Member

Okay, this ultimately appears to have the same root cause and fix as #1774. As far as I understand it, the LLVM patch causes the problematic sequence of instructions to be emitted that caused #1774, even at -O2.

In other words, if I build a kernel using a toolchain that contains llvm/llvm-project@ee5585e and I pull the current x86/alternatives branch from tip (currently at 923510c88d2b7) into mainline, everything boots up fine for me; without x86/alternatives, it is broken.

I will ping that thread to see if these patches can be merged into mainline ahead of 6.3 and inquire about a backport to 6.1, as the series appears to require some changes from the call depth tracking series that was merged in 6.2.

@nathanchance nathanchance added [BUG] linux A bug that should be fixed in the mainline kernel. [PATCH] Accepted A submitted patch has been accepted upstream labels Feb 8, 2023
chantra added a commit to chantra/libbpf-ci that referenced this issue Feb 8, 2023
This change get us back onto llvm-16 by default as it seemsthat
currently, qemu VMs are failing to boot when the kernel is built with
llvm 17, it is very likely related to
ClangBuiltLinux/linux#1800 .

Because of the layout of LLVM's apt repo, unless the version is the
latest they distribute, the repository distro need to use the version
number as suffix.

Signed-off-by: Manu Bretelle <[email protected]>
danielocfb pushed a commit to libbpf/ci that referenced this issue Feb 8, 2023
This change get us back onto llvm-16 by default as it seemsthat
currently, qemu VMs are failing to boot when the kernel is built with
llvm 17, it is very likely related to
ClangBuiltLinux/linux#1800 .

Because of the layout of LLVM's apt repo, unless the version is the
latest they distribute, the repository distro need to use the version
number as suffix.

Signed-off-by: Manu Bretelle <[email protected]>
@nathanchance
Copy link
Member

Closing this in favor of #1774, which I have renamed to more accurately describe the issue.

@nathanchance nathanchance added duplicate This issue or pull request already exists and removed [BUG] linux A bug that should be fixed in the mainline kernel. [ARCH] x86_64 This bug impacts ARCH=x86_64 [PATCH] Accepted A submitted patch has been accepted upstream [BUG] llvm (main) A bug in an unreleased version of LLVM (this label is appropriate for regressions) labels Feb 8, 2023
nathanchance added a commit to nathanchance/continuous-integration2 that referenced this issue Feb 9, 2023
…atic calls on x86

A recent change in LLVM causes these to be generated even at -O2 so
apply Peter's series that fixes this in -tip.

I have requested that this series be applied to mainline so that it can
be backported quicker, we will see if that request is honored.

Link: ClangBuiltLinux/linux#1774
Link: ClangBuiltLinux/linux#1800
Signed-off-by: Nathan Chancellor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants