Skip to content

Commit ebb37cf

Browse files
npigginmpe
authored andcommitted
powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled
irq_work_raise should not cause a decrementer exception unless it is called from NMI context. Doing so often just results in an immediate masked decrementer interrupt: <...>-550 90d... 4us : update_curr_rt <-dequeue_task_rt <...>-550 90d... 5us : dbs_update_util_handler <-update_curr_rt <...>-550 90d... 6us : arch_irq_work_raise <-irq_work_queue <...>-550 90d... 7us : soft_nmi_interrupt <-soft_nmi_common <...>-550 90d... 7us : printk_nmi_enter <-soft_nmi_interrupt <...>-550 90d.Z. 8us : rcu_nmi_enter <-soft_nmi_interrupt <...>-550 90d.Z. 9us : rcu_nmi_exit <-soft_nmi_interrupt <...>-550 90d... 9us : printk_nmi_exit <-soft_nmi_interrupt <...>-550 90d... 10us : cpuacct_charge <-update_curr_rt The soft_nmi_interrupt here is the call into the watchdog, due to the decrementer interrupt firing with irqs soft-disabled. This is harmless, but sub-optimal. When it's not called from NMI context or with interrupts enabled, mark the decrementer pending in the irq_happened mask directly, rather than having the masked decrementer interupt handler do it. This will be replayed at the next local_irq_enable. See the comment for details. Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
1 parent 98fd72f commit ebb37cf

File tree

1 file changed

+31
-2
lines changed

1 file changed

+31
-2
lines changed

arch/powerpc/kernel/time.c

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -513,6 +513,35 @@ static inline void clear_irq_work_pending(void)
513513
"i" (offsetof(struct paca_struct, irq_work_pending)));
514514
}
515515

516+
void arch_irq_work_raise(void)
517+
{
518+
preempt_disable();
519+
set_irq_work_pending_flag();
520+
/*
521+
* Non-nmi code running with interrupts disabled will replay
522+
* irq_happened before it re-enables interrupts, so setthe
523+
* decrementer there instead of causing a hardware exception
524+
* which would immediately hit the masked interrupt handler
525+
* and have the net effect of setting the decrementer in
526+
* irq_happened.
527+
*
528+
* NMI interrupts can not check this when they return, so the
529+
* decrementer hardware exception is raised, which will fire
530+
* when interrupts are next enabled.
531+
*
532+
* BookE does not support this yet, it must audit all NMI
533+
* interrupt handlers to ensure they call nmi_enter() so this
534+
* check would be correct.
535+
*/
536+
if (IS_ENABLED(CONFIG_BOOKE) || !irqs_disabled() || in_nmi()) {
537+
set_dec(1);
538+
} else {
539+
hard_irq_disable();
540+
local_paca->irq_happened |= PACA_IRQ_DEC;
541+
}
542+
preempt_enable();
543+
}
544+
516545
#else /* 32-bit */
517546

518547
DEFINE_PER_CPU(u8, irq_work_pending);
@@ -521,8 +550,6 @@ DEFINE_PER_CPU(u8, irq_work_pending);
521550
#define test_irq_work_pending() __this_cpu_read(irq_work_pending)
522551
#define clear_irq_work_pending() __this_cpu_write(irq_work_pending, 0)
523552

524-
#endif /* 32 vs 64 bit */
525-
526553
void arch_irq_work_raise(void)
527554
{
528555
preempt_disable();
@@ -531,6 +558,8 @@ void arch_irq_work_raise(void)
531558
preempt_enable();
532559
}
533560

561+
#endif /* 32 vs 64 bit */
562+
534563
#else /* CONFIG_IRQ_WORK */
535564

536565
#define test_irq_work_pending() 0

0 commit comments

Comments
 (0)