Skip to content

Boot issue with CONFIG_ZERO_CALL_USED_REGS=y and _paravirt_ident_64() #1585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nathanchance opened this issue Feb 9, 2022 · 5 comments
Closed
Labels
[BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 6.1 This bug was fixed in Linux 6.1

Comments

@nathanchance
Copy link
Member

As I originally reported on https://reviews.llvm.org/D110869, Arch Linux's configuration + CONFIG_ZERO_CALL_USED_REGS=y fails to boot for me on two bare metal machines (AMD Ryzen 3 4300G and Intel Core i5 4210U). There is no output on either machine, so something is going wrong before the EFI framebuffer driver is initialized; I've tried exploring earlyprintk options and I have had no success. The exact same configuration works fine with GCC 11.2.1.

Bisecting the translation units and functions reveals that _paravirt_ident_64() is the problematic function, as the following diff allows both machines to boot:

diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 4420499f7bb4..50b71f4fe456 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -96,7 +96,7 @@ static unsigned paravirt_patch_call(void *insn_buff, const void *target,

 #ifdef CONFIG_PARAVIRT_XXL
 /* identity function, which can be inlined */
-u64 notrace _paravirt_ident_64(u64 x)
+u64 notrace __attribute__((zero_call_used_regs("skip"))) _paravirt_ident_64(u64 x)
 {
        return x;
 }

GCC 11.2.1 disassembly of this function:

$ llvm-readelf -p .comment arch/x86/kernel/paravirt.o
String dump of section '.comment':
[     1] GCC: (GNU) 11.2.1 20220205

$ llvm-objdump --disassemble-symbols=_paravirt_ident_64 arch/x86/kernel/paravirt.o

arch/x86/kernel/paravirt.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000180 <_paravirt_ident_64>:
     180: 48 89 f8                      movq    %rdi, %rax
     183: 31 ff                         xorl    %edi, %edi
     185: c3                            retq
     186: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)

Clang 15.0.0 disassembly of this function:

$ llvm-readelf -p .comment arch/x86/kernel/paravirt.o
String dump of section '.comment':
[     1] ClangBuiltLinux clang version 15.0.0 (https://github.com/llvm/llvm-project 009791e0dbc608aafe0ae4d15fcc9fafb7e4a135)

$ llvm-objdump --disassemble-symbols=_paravirt_ident_64 arch/x86/kernel/paravirt.o

arch/x86/kernel/paravirt.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <_paravirt_ident_64>:
       0: 48 89 f8                      movq    %rdi, %rax
       3: 31 ff                         xorl    %edi, %edi
       5: c3                            retq
       6: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)

Unless my eyes are failing me, those functions are literally identical, so there is something else going on here. James Knight had a comment on Phabricator that this function is being used in a fragile way and we could potential use clang's preserve_most attribute to make the calling convention clear to the compiler.

cc @gwelymernans

@nathanchance nathanchance added the [BUG] Untriaged Something isn't working label Feb 9, 2022
@nathanchance nathanchance changed the title Issue with CONFIG_ZERO_CALL_USED_REGS=y and _paravirt_ident_64() Boot issue with CONFIG_ZERO_CALL_USED_REGS=y and _paravirt_ident_64() Feb 9, 2022
@bwendling
Copy link

I won't claim to fully understand all of the reasons for this level of indirection (it appears to be an attempt to reduce register pressure by calling a trampoline function, which is in charge of generating the callee-saved register code, thus reducing the view of those registers from the original function; but that just appears to be insane, so I must have something wrong).

Is there a way for you to hook a GDB into the boot up process to see what's going on? When I look at similar things, I add something like this to extract_kernel and then I can step through it when it tries to boot up in QEMU:

{
    volatile int x = 0;
    while (x == 0)
        continue;
}

In GDB, you can do set <address of x> = 1 to break out of the loop.

@nathanchance
Copy link
Member Author

Is there a way for you to hook a GDB into the boot up process to see what's going on?

I will see if I can get this to reproduce in QEMU, I have not been able to so far. It only reproduces on bare metal and I don't think any of the consumer hardware I have has any serial port for me to use. I wonder if this issue would reproduce on an x86_64 Chromebook? Doesn't ChromeOS have some nice boot debugging utilities?

@nathanchance
Copy link
Member Author

@nathanchance nathanchance added [BUG] linux A bug that should be fixed in the mainline kernel. [PATCH] Submitted A patch has been submitted for review and removed [BUG] Untriaged Something isn't working labels Sep 2, 2022
@dileks
Copy link
Collaborator

dileks commented Sep 26, 2022

Now, I switched to Linux v6.0-rc7 I wanted to test early_printk from peterz Git tree.

Might give you some debug output?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=debug/experimental

@nickdesaulniers
Copy link
Member

commit 8c86f29 ("x86/paravirt: add extra clobbers with ZERO_CALL_USED_REGS enabled")

@nickdesaulniers nickdesaulniers added [FIXED][LINUX] 6.1 This bug was fixed in Linux 6.1 and removed [PATCH] Submitted A patch has been submitted for review labels Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[BUG] linux A bug that should be fixed in the mainline kernel. [FIXED][LINUX] 6.1 This bug was fixed in Linux 6.1
Projects
None yet
Development

No branches or pull requests

4 participants