Skip to content

Bluetooth: BLE stack notifications blocks host side for too long (drivers/bluetooth/hci/spi.c and hci_spi) #50658

Closed
@flo-carrere

Description

@flo-carrere

First, I have no pretention in stating it is a bug, it may be the normal way of working of the BLE stack architecture at this moment.
In this case, it could be reclassified as an enhancement request if relevant. I may be expecting too much.

Preliminary context

I am using a classic dual-chip configuration:

  • Host-side : custom board (stm32h7, similar to a nucleo-h743zi), using a host-build only
    • 3 services (HID, BAS, DIS)
    • HID reporting rate @133hz (7.5ms)
  • Controller-side : standard hci_spi sample project flashed on nrf52840 addon board connected through SPI@1MHz

Describe the bug

From the host, using a simple bt_gatt_notify_cb to update a BLE service characteristic (let's say a 17-bytes long HID report update).
I have noticed this call is blocking (OK) for a unexpectedly long period of time (NOK) of approximately 1.5ms degrading others thread "performances" in a non-negligible way.

It's basically a 1.5ms penalty every 7.5ms that is spent in a busy-wait style fashion starving other actions in the system.
More details below on my investigations.

On the other hand, a similar system reporting through USB-HID at the same rate, do not suffer from the same deficiency and a reasonable 0.1 <-> 0.7ms is spent.

  • What have you tried to diagnose or workaround this issue?
  1. Independently of my full system I have been able to reproduce this blocking behavior by setting up a sample test application for the host side, reporting a classic BAS notification (battery level %) periodically (every 50ms down to every 7.5ms (target)) and measuring by software the time spent for a bt_gatt_notify_cb operation.
// Sample test - pseudo code
period = 50ms;
while(1) {
  time;
  bt_gatt_notify(); // reporting BAS notification
  elapsed = now - time;
  printf("%d", elapsed); // connected:~750us <--> ~980us, not connected: ~5us
  wait( period - elapsed );
}

The host-side prj.conf, I use for the sample test is provided below.
prj.conf.txt

  1. I have also tried to pimp my BLE stack configuration using the throughput maximization sample project but with no effect.

  2. When debugging, I noticed that the vast majority of time was spent in "waiting" for a rescheduling operation to complete in the implementation of the BLE stack, for some reason.

In a nutshell my call stack (host):

bt_gatt_notify_cb // app waiting here
  // GATT
    // ATT
     // ACL
       // KERNEL QUEUE - k_queue_append_list
        z_reschedule(&queue->lock, key);    <<<< Waiting here for ~80% of the time
     // HCI_SPI transfer happening at some point
   
   // ... returns and continues normally up to the next notification

  // Transfer OTA happening on controller -> at next connection interval -> OK   
  // callback notification sent received -> OK
  1. I have not tried yet but I envisioned moving to a single-chip configuration (Host + Controller using HCI RAM) running on nrf52840 and connecting my current "Host" MCU with a custom protocol over SPI to receive raw notifications.

This way I could have a fully externalized BLE stack not impacting the rest of my system (hopefully)
With the big cons of maintaining two FWs instead of a single one....

Do you think it's a valid way of working around the issue?

  • Is this a regression? If yes, have you been able to "git bisect" it to a specific commit?
    I don't think it's a regression, probably the way it has always worked

Expected behavior

From a system architecture point of view, I was expecting the dual-chip configuration to work in tandem:

  • Host-side -> serializing and publishing notification very fast over HCI -> achieving a minimal time waiting
  • Controller-side -> offloading the full BLE-OTA transfer
    • Tuned with sufficiently large HCI events queues according to the report rate/size/types

It's a pity to have two powerful MCUs (STM32H7 on host + nrf52840 on controller) and having such a poor behavior.

Impact

As stated this "poor notification performances" are very annoying as it considerably degrades the full system performance, and sometimes delays time-critical operations (acquisition threads...).

Environment (please complete the following information):

  • Building on Windows 10, connecting my device (BLE) on the same laptop.
  • Toolchain
    • Zephyr SDK 0.14.1
    • Zephyr v3.0 > Similar behavior on Zephyr v3.1

Metadata

Metadata

Labels

area: Bluetootharea: Bluetooth HostBluetooth Host (excluding BR/EDR)bugThe issue is a bug, or the PR is fixing a bugpriority: lowLow impact/importance bug

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions