-
Notifications
You must be signed in to change notification settings - Fork 15
pmac32_defconfig boot failure after LLVM commit 1bd0b82e508d049efdb07f4f8a342f35818df341 #1770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. I'm able to repro.' But we are not hitting a breakpoint set on |
working around this, I get the following stack trace out of GDB:
There are literally no symbols in that range in vmlinux. cc @mpe in case @mpe has any insights to 32b ppc early boot |
Those 0xfff addresses look like they're still in OpenBIOS, not the kernel. If you look at a good boot log you should be able to see that the kernel is loaded somewhere in memory, for me it's 0x1000000. The kernel initially runs at that address, before moving itself down to 0x0. You can tell gdb about that and get symbols to resolve at those addresses. The early boot is a bit hairy, it jumps to the kernel, then back into the bios and back to the kernel. You probably want to skip that, try setting a breakpoint at eg:
If you're not even getting that far you can set a breakpoint at 0x1000000 ( |
Haha, thanks @mpe ! I figured you'd be the one to ask! output from a known good build: gdb from a known good build: (gdb) add-symbol-file vmlinux -s .head.text 0x1000000 -s .text 0x1004000
add symbol table from file "vmlinux" at
.head.text_addr = 0x1000000
.text_addr = 0x1004000
(y or n) y
Reading symbols from vmlinux...
(gdb) b early_init
Breakpoint 1 at 0x1a09a04: early_init. (2 locations)
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y <MULTIPLE>
1.1 y 0x01a09a04 in early_init at arch/powerpc/kernel/early_32.c:21
1.2 y 0xc0a09a04 in early_init at arch/powerpc/kernel/early_32.c:21
(gdb) c
Continuing.
Breakpoint 1, early_init (dt_ptr=35147776) at arch/powerpc/kernel/early_32.c:21
21 unsigned long kva, offset = reloc_offset();
(gdb) bt
#0 early_init (dt_ptr=35147776) at arch/powerpc/kernel/early_32.c:21
#1 0x01000050 in __start () at arch/powerpc/kernel/head_book3s_32.S:144
#2 0x01000028 in __start () at arch/powerpc/kernel/head_book3s_32.S:116
(gdb) info register pc
pc 0x1a09a04 0x1a09a04 <early_init+24> For a bad build, I never hit (gdb) add-symbol-file vmlinux -s .head.text 0x1000000 -s .text 0x1004000
add symbol table from file "vmlinux" at
.head.text_addr = 0x1000000
.text_addr = 0x1004000
(y or n) y
Reading symbols from vmlinux...
(gdb) b early_init
Breakpoint 1 at 0x1a0da64: early_init. (2 locations)
(gdb) c
Continuing.
<inf loop>
(gdb) add-symbol-file vmlinux -s .head.text 0x1000000 -s .text 0x1004000
add symbol table from file "vmlinux" at
.head.text_addr = 0x1000000
.text_addr = 0x1004000
(y or n) y
Reading symbols from vmlinux...
(gdb) b *0x1000000
Breakpoint 1 at 0x1000000: file arch/powerpc/kernel/head_book3s_32.S, line 66.
(gdb) c
Continuing.
Breakpoint 1, _text () at arch/powerpc/kernel/head_book3s_32.S:66
66 nop /* used by __secondary_hold on prep (mtx) and chrp smp */
(gdb) bt
#0 _text () at arch/powerpc/kernel/head_book3s_32.S:66
#1 0xfff0817c in ?? ()
(gdb) info address _text
Symbol "_text" is at 0xc0000000 in a file compiled without debugging.
(gdb) si
Cannot access memory at address 0x1000000
(gdb) info register pc
pc 0x1000000 0x1000000 <_text> (I can use I think that address remapping is throwing gdb off. IIUC If I set a breakpoint on The offending commit got reverted in llvm, but it's likely to be back soon, and might still affect us. |
It seems if I exit TUI I can step...
Comparing disassembly of arch/powerpc/kernel/prom_init.c, the following symbols have non-trivial changes:
Where the differences in If I swap in a bad $ make -kj$(nproc) ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- LLVM=1 LLVM_IAS=0 CC=clang-bad arch/powerpc/kernel/prom_init.o
$ cp arch/powerpc/kernel/prom_init.o prom_init.o.bad
$ make -kj$(nproc) ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- LLVM=1 LLVM_IAS=0 CC=cla
ng-good clean all
$ cp prom_init.o.bad arch/powerpc/kernel/prom_init.o
$ make -kj$(nproc) ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- LLVM=1 LLVM_IAS=0 CC=cla
ng-good all the kernel fails to boot. Yeah, just takes a long time to hit these breakpoints. Waiting to see if I ever hit a breakpoint in |
Please retry with current LLVM main. |
Sorry for the bad news, @LebedevRI, I can still repro the boot failure.
Doing the same experiment in reverse (use clang-bad for whole kernel, clang-good for prom_init.o), the kernel boots!
so I think that isolates the failure down to arch/powerpc/kernel/prom_init.c diffing IR, I see diffs in
|
@nickdesaulniers ok.
|
prom_init.i.txt attaching preprocessed source from 428f36401b1b695fd501ebfdc8773bed8ced8d4e. The first line has a comment that should be removed and used as the command line. Running above tests now. |
requested tests' results:
|
That's expected. |
I've tried moving diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 9b6146056e48..45a606099703 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -11,6 +11,8 @@ CFLAGS_prom_init.o += -fPIC
CFLAGS_btext.o += -fPIC
endif
+CFLAGS_call_prom.o +=-fPIC -fno-stack-protector -ffreestanding -ftrivial-auto-var-init=uninitialized
+
CFLAGS_early_32.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
CFLAGS_cputable.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
CFLAGS_prom_init.o += $(DISABLE_LATENT_ENTROPY_PLUGIN)
@@ -200,14 +202,17 @@ CFLAGS_paca.o += -fno-stack-protector
obj-$(CONFIG_PPC_FPU) += fpu.o
obj-$(CONFIG_ALTIVEC) += vector.o
obj-$(CONFIG_PPC64) += entry_64.o
-obj-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += prom_init.o
+obj-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += prom_init.o call_prom.o
extra-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += prom_init_check
+$(obj)/prom_hybrid.o: $(obj)/prom_init.o $(obj)/call_prom.o
+ $(LD) -r $? -o $@
+
quiet_cmd_prom_init_check = PROMCHK $@
- cmd_prom_init_check = $(CONFIG_SHELL) $< "$(NM)" $(obj)/prom_init.o; touch $@
+ cmd_prom_init_check = $(CONFIG_SHELL) $< "$(NM)" $(obj)/prom_hybrid.o; touch $@
-$(obj)/prom_init_check: $(src)/prom_init_check.sh $(obj)/prom_init.o FORCE
+$(obj)/prom_init_check: $(src)/prom_init_check.sh $(obj)/prom_hybrid.o FORCE
$(call if_changed,prom_init_check)
targets += prom_init_check (then the definition of In qemu+gdb, we get to the (curiously, both |
I can boot OK with
0x300 is the page fault handler, so presumably you're accessing some bad address, contained in r29. |
Can you show a call stack for this? From which function did we call |
Sorry for the delay here...took two weeks off for the holidays. So I tried llvm @ 9e08b083a09ef4e02fb0a4de2c0d3ddc0eccadde with 3a8e009f97d2ed4b751365e514e0bf7131c7ab11 reverted (i.e. restoring Next, I will try @mpe 's patch (thanks!) to see if I can isolate which function perhaps is problematic. (I will try the functions listed in #1770 (comment)). |
Hmm...that patch fails to boot for me with clang 16/binutils 2.39. That toolchain combo seems to work otherwise without that kernel patch applied. I don't see anything obviously wrong with the kernel patch...
The same thing happens in fact.
FWICT, this seems to be the In the booting case, in In the non booting case, we enter mtctr r3
mr r3,r29
bctrl
cmpwi r3,0
... once we execute
TIL PPC has a branch "control register" (CTR). In the booting case before
(Which has the same value later at the breakpoint on So it seems like whatever is at 0xfff025d0 we never "return" from. I'm not sure that symbol is in vmlinux: $ llvm-addr2line 0xfff025d0 --obj=vmlinux
??:0
$ llvm-readelf -s vmlinux| grep fff025d0
$ non-booting case:
booting case:
So the same values. In hex:
In the booting case, I noticed a slight difference of assembler: mtctr r3
- mr r3,r29
bctrl
cmpwi r3,0
... (but we have the same value in ctr when we execute bctrl, so perhaps that's a red herring). @mpe any idea what's going on? It looks like with a known good compiler rev + your patch that one of the calls into the prom via In gdb I have no idea how you print what variable is in a register, or given a variable what register is it in. DWARF definitely has this info encoded. Let's read DWARF by hand!
So if IIUC we:
Seems right to me. So why don't we ever return from the prom? Copy+paste these gdb commands again for myself, so I don't have to keep scrolling up to get them: add-symbol-file vmlinux -s .head.text 0x1000000 -s .text 0x1004000 |
@alexfh FYI |
I just bisected a boot failure with
ARCH=powerpc pmac32_defconfig
to commit 1bd0b82e508d ("[SimplifyCFG]FoldBranchToCommonDest()
: deal with mismatched IV's in PHI's in common successor block") in LLVM:I do not see any console output from the kernel so it appears to be hanging before that point.
The text was updated successfully, but these errors were encountered: