-
Notifications
You must be signed in to change notification settings - Fork 7.5k
kernel: Non-deterministic and very high ISR latencies #13610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@cvinayak To clarify: you're saying that reverting my changes made in #12605 reduces latencies? Odd. My own experiments on the ISR timing itself showed significant decreases in lock duration with that rework, especially on nRF51. It's not immediately obvious how the timer ISR could be at fault. There are a lot more I'm not working Zephyr at the moment so will have to leave this to @pizi-nordic. |
@cvinayak any chance you can add a testcase for this to be able to catch such regressions in the future? |
@cvinayak A timer ISR was never really lockless -- it has to call into the timeout system and out into arbitrary callbacks, all of which need synchronization. Someone should definitely check the bisection though -- I agree that the referenced commit doesn't seem particularly relevant. |
Oh, and to be clear:
There really are not, and I'm not sure why you'd think so. Synchronization in the spinlockification patch was almost entirely 1:1, and where it differed it was uniformly in the direction of reducing latency by breaking existing recursive irq_lock usage. If we've regressed it's a bug and we should fix it. |
I was mixing two things, one being an unsupported assumption that since a lot of material was added/changed since 1.13.0, there would be more locks of any sort. This command:
shows the rough number of lock operations has decreased from 106 to 87 in in But I also meant to indicate that if you're looking for an |
I'm sorry, I misread @cvinayak bisection report. If reverting only the changes to nrf_rtc_timer.c between 37fbf fixes your latency problems, then that points to a pretty tight list of patches mostly from @pabigot (FWIW: these all look fine to me):
If you're reverting your whole tree to that, then I'd guess the more likely culprit is the spinlockification series. Note that most of it should be bisectable independently (though IIRC there's a header dependency that @carlescufi discovered that has to be manually fixed) so you can zero in on the specific subsystem at fault. That series "shouldn't" have changed latencies at all, it's just refactoring. |
@andyross, I only reverted changes to nrf_rtc_timer.c and saw reduced latencies. @pabigot below are two images of logic analyzer measurements of irq_locking durations (channel 0), this is by visual inspection and I may have missed bigger latency measurements (files attached for perusal). After the optimizations, latencies upto ~21 us @andyross not sure why there is so many times CPU wake up (in 10 ms intervals, but not always): Ideally, there should be no kernel/application code running when there is no data transmissions on Bluetooth. The measurement was taken, when my iPhone was sleeping, only BLE link maintained by the periodic empty packets at the connection intervals. Next week, I will try to profile all interrupt disables by the kernel/arch, to get a better idea of latencies and CPU use for basically only maintaining empty Bluetooth connection. |
@cvinayak @andyross Some of the wakeups are explained in Without knowing exactly what channel 0 measures I can't explain how you see 21 us lock durations, but I never saw anything greater than 2.88 us on an nRF51, nor did I see anything greater than 11 us before my changes. Information about timing related to my changes to |
Timeslicing has always been "default y" in kconfig. But it only affects threads that are at preemptible priorities, which aren't the default. It did indeed get reworked along with the rest of the timing system during the churn last fall, but it's almost entirely orthogonal to tickless. I'll admit I remain confused, though. There really doesn't seem to be anything in that patch series that should impact latencies. The locking strategy is basically the same. |
If a driver isn't reentrant or thread-safe, that is a serious bug in the driver and not a problem in the kernel...what I2C drivers are like this? |
@SavinayDharmappa Please take a look at this. |
Per zephyrproject-rtos#13610, recent changes to this driver seem to have introduced unexpected latency regressions. This patch effectively reverts these patches which changed the meat of the driver: ac36886 drivers: nrf: timer: add inline qualifier where inlining is intended 084363a drivers: timer: nrf: refactor for speed and correctness 71882ff drivers: timer: nrf: drop unnecessary counter mask 4b24e88 drivers: timer: nrf: use irq_lock instead of spinlock While backporting these seemingly unrelated hygiene patches: 7cbdb6c drivers/timer: Restore non-tickless tick count behavior d30c9ae drivers: nrf_power_clock: Migrate to DTS. 75f77db include: misc: util.h: Rename min/max to MIN/MAX Signed-off-by: Andy Ross <[email protected]>
Let's at least start with a reversion per the comments above, as that seems lowest risk. Can you guys try the linked patch to see if that resolve the issue as @cvinayak reported? If so, then that seems simple. If not we'll have to try something else. Unfortunately I have neither a nrf51 board or iPhone (though I can probably scrounge up the former) so local reproduction is going to take some work. |
FWIW I have a very hard time believing that the I would believe there are other |
I don't disagree, but at this point we need to take what we can get. @cvinayak can confirm, if it works we can merge and then fix it at our leisure. |
Recent discussion in PR #14176 seems to indicate that it addresses this issue. For anyone here who might have missed that, please reveiw. |
Per #13610, recent changes to this driver seem to have introduced unexpected latency regressions. This patch effectively reverts these patches which changed the meat of the driver: ac36886 drivers: nrf: timer: add inline qualifier where inlining is intended 084363a drivers: timer: nrf: refactor for speed and correctness 71882ff drivers: timer: nrf: drop unnecessary counter mask 4b24e88 drivers: timer: nrf: use irq_lock instead of spinlock While backporting these seemingly unrelated hygiene patches: 7cbdb6c drivers/timer: Restore non-tickless tick count behavior d30c9ae drivers: nrf_power_clock: Migrate to DTS. 75f77db include: misc: util.h: Rename min/max to MIN/MAX Signed-off-by: Andy Ross <[email protected]>
Think we can close this one. I opened #14577 to track the need for performance work in this ISR. Please add notes there as needed. |
The system clock on Nordic is based on 32 KiHz timer. Scheduling ticks requires that deadlines be specified with a timer counter that aligns to a system clock. With the Zephyr default 100 clocks-per-sec configuration this results in 100 ticks every 32700 ticks of the cycle timer. This reveals two problems: * The uptime clock misrepresents elapsed time because it runs 0.208% (68/32768) faster than the best available clock; * Calculation of timer counter compare values often requires an integer division and multiply operation to produce a value that's a multiple of clock-ticks-per-second. Integer division on the Cortex-M1 nRF51 is done in software with a (value-dependent) algorithm with a non-constant runtime that can be significant. This can produce missed Bluetooth deadlines as discussed in issue zephyrproject-rtos#14577 and others. By changing the default divisor to one that evenly divides the 2^15 clock rate the time interrupts are disabled to manage timers is significantly reduced. Note that the central_hr configuration described in issue zephyrproject-rtos#13610 does not distinguish latency due to timer management from other irq_block/spinlock regions, and the maximum observed latency will still exceed the nominal 10 us allowed maximum. However this does occur much less frequently than changing the timer deadline which can happen multiple times per tick. Signed-off-by: Peter A. Bigot <[email protected]>
The default system clock on all Nordic devices is based on a 32 KiHz (2^15 Hz) timer. Scheduling ticks requires that deadlines be specified with a timer counter that aligns to a system clock. With the Zephyr default 100 clocks-per-sec configuration this results in 100 ticks every 32700 ticks of the cycle timer. This reveals two problems: * The uptime clock misrepresents elapsed time because it runs 0.208% (68/32768) faster than the best available clock; * Calculation of timer counter compare values often requires an integer division and multiply operation to produce a value that's a multiple of clock-ticks-per-second. Integer division on the Cortex-M1 nRF51 is done in software with a (value-dependent) algorithm with a non-constant runtime that can be significant. This can produce missed Bluetooth deadlines as discussed in upstream zephyrproject-rtos#14577 and others. By changing the default divisor to one that evenly divides the 2^15 clock rate the time interrupts are disabled to manage timers is significantly reduced, as is the error between uptime and real time. Do this at the top level, moving SYS_CLOCK_HW_CYCLES_PER_SEC there as well since the two parameters are related. Note that the central_hr configuration described in upstream zephyrproject-rtos#13610 does not distinguish latency due to timer management from other irq_block/spinlock regions, and the maximum observed latency will still exceed the nominal 10 us allowed maximum. However this does occur much less frequently than changing the timer deadline which can happen multiple times per tick. Signed-off-by: Peter A. Bigot <[email protected]>
The default system clock on all Nordic devices is based on a 32 KiHz (2^15 Hz) timer. Scheduling ticks requires that deadlines be specified with a timer counter that aligns to a system clock. With the Zephyr default 100 clocks-per-sec configuration this results in 100 ticks every 32700 ticks of the cycle timer. This reveals two problems: * The uptime clock misrepresents elapsed time because it runs 0.208% (68/32768) faster than the best available clock; * Calculation of timer counter compare values often requires an integer division and multiply operation to produce a value that's a multiple of clock-ticks-per-second. Integer division on the Cortex-M1 nRF51 is done in software with a (value-dependent) algorithm with a non-constant runtime that can be significant. This can produce missed Bluetooth deadlines as discussed in upstream #14577 and others. By changing the default divisor to one that evenly divides the 2^15 clock rate the time interrupts are disabled to manage timers is significantly reduced, as is the error between uptime and real time. Do this at the top level, moving SYS_CLOCK_HW_CYCLES_PER_SEC there as well since the two parameters are related. Note that the central_hr configuration described in upstream #13610 does not distinguish latency due to timer management from other irq_block/spinlock regions, and the maximum observed latency will still exceed the nominal 10 us allowed maximum. However this does occur much less frequently than changing the timer deadline which can happen multiple times per tick. Signed-off-by: Peter A. Bigot <[email protected]>
I am running into this issue on Zephyr master (d045bd7) in my own application on the This is the log just before the crash, with
|
@lopsided98 do you have a master commit which was fine? could you check if v1.14-branch works fine for your application? I would try to bisect to find the commit beyond which the controller is seeing increased latencies. Is this reproducible in the vanilla samples/bluetooth/peripheral sample? Thinking aloud, assuming the assert happened in an encrypted connection, the Radio ISR CPU time too seems to have increased, maybe changes in controller code too may be contributing to the increased overall CPU use. I should do some GPIO debug pin profiling to confirm the new numbers seem above. |
Describe the bug
Non-deterministic and very high ISR latencies. Bluetooth controller asserts when running samples/bluetooth/central_hr on nrf51_pca10028 board.
If a connection establishes with a peripheral in vicinity, the Max. ISR latencies measured is very high, over 100 us.
Legend for the log message:
l: Radio peripheral ISR latency Current value of 9us, Minimum of 8us, Maximum of 111us.
t: Radio peripheral ISR CPU usage Current value of 46, Minimum of 46us, Maximum of 95us.
To Reproduce
Checkout 8bd360f
Steps to reproduce the behavior:
central_hr asserts when scanning without any peripheral in vicinity.
Use iPhone 6S, LightBlue application and enable a virtual peripheral with heart rate service, then
Expected behavior
Console output as in v1.13.0, with acceptable ISR latencies of < 30 us.
Impact
Showstopper, Blocker for supporting Bluetooth on nRF51 Series and critical for nRF52 series due to bad ISR latencies.
Screenshots or console output
Refer to the ones added in the description of the bug.
Environment (please complete the following information):
Additional context
None.
The text was updated successfully, but these errors were encountered: