Skip to content

Zephyr/BLE stack: Problem pairing/bonding more than one device, whilst the first device is still connected. #12946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DeclanTraill opened this issue Jan 31, 2019 · 4 comments
Assignees
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: medium Medium impact/importance bug

Comments

@DeclanTraill
Copy link
Contributor

Describe the bug
A clear and concise description of what the bug is.
What have you tried to diagnose or workaround this issue?

To Reproduce
Steps to reproduce the behavior:

  1. mkdir build; cd build
  2. cmake -DBOARD=board_xyz
  3. make
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Impact
What impact does this issue have on your progress (e.g., annoyance, showstopper)

Screenshots or console output
If applicable, add a screenshot (drag-and-drop an image), or console logs
(cut-and-paste text and put a code fence (```) before and after, to help
explain the issue.

Environment (please complete the following information):

  • OS: (e.g. Linux, MacOS, Windows)
  • Toolchain (e.g Zephyr SDK, ...)
  • Commit SHA or Version used

Additional context
Add any other context about the problem here.

@DeclanTraill DeclanTraill added the bug The issue is a bug, or the PR is fixing a bug label Jan 31, 2019
@DeclanTraill
Copy link
Contributor Author

Hi Zephyr group,

We have found on our Zephyr system that the first device (Android or iOS) pairs/bonds with not problems, but a second (or third etc) device fails to pair/bond most of the time.

Sometimes it works ok & both devices become paired/bonded and stay connected, but most of the time it fails and the second device is disconnected as soon as the pairing key has been entered on the device.

Here is some debug output for when the pairing/bonding of the first device works, and then when it fails for the second device:

WHEN IT WORKS (FOR THE FIRST DEVICE):

[ble_p] [INF] ble_p_connected: 1/4, 00:00:46:00:00:01 (public) is NOT bonded
passkey display 520717
[] [INF] rvmn_ui_bt_auth_pairing_complete: 00:00:46:00:00:01 (public), bonded:1
[] [ERR] rvmn_ui_bt_auth_pairing_complete: not bonded

WHEN IT DOESN’T WORK (FOR THE SECOND DEVICE):

ble_p] [INF] ble_p_connected: 2/4, d4:61:9d:ee:41:53 (public) is NOT bonded
passkey display 183049
[ble_p] [INF] ble_p_disconnected: 2/4, d4:61:9d:ee:41:53 (public) is NOT bonded
[ble_p] [ERR] ble_p_disconnected: 1/4, d4:61:9d:ee:41:53 (public) NOT
[ble_p] [DBG] ble_p_disconnected: d4:61:9d:ee:41:53 (public), reason 0x08: BT_HCI_ERR_CONN_TIMEOUT

(see the end of this message for more detailed debug for the 2nd device pairing: when it works and when it fails. Also attached to the end of this message are snippets of the code invoked when it fails to pair. I have also attached 3 files with debug output for good & bad cases, and the code snippets (as the code formatting is messed up when posted in this bug report))

I investigated the cause of the disconnection in the Zephyr BLE stack, and found it to be caused by trx_done not being TRUE when the isr() function is called (in ctrl.c) whilst in STATE_RX _radio.state:

The trx_done is set based on the value of NRF_RADIO->EVENTS_END.

It seems that when the second device tries to pair, we get a BLE isr() occur whilst in STATE_RX, but NRF_RADIO->EVENTS_END is not set.
Thus it appears to not have completed a Tx/Rx in the Radio?

Why is this happening, and is there some configuration/setting we can change to fix this, or is it a Zephyr BLE stack bug?

===========================================

MORE DETAILED DEBUG FOR A GOOD AND BAD 2ND DEVICE PAIRING:

GOOD:

chan 0x200019c4 cid 0x0006
bt_smp_send_security_req
[ble_p] [INF] ble_p_connected: 2/4, d4:61:9d:ee:41:53 (public) is NOT bonded
Received SMP code 0x01 len 7
smp_pairing_req 2
smp_send
legacy_request_tk
passkey display 047521
1: 0x00005151, 0x0000466d
path 2
Received SMP code 0x03 len 17
smp_pairing_confirmlegacy_pairing_confirm
k a1b90000000000000000000000000000 r f4e30f18083a5d2b60a9fefa255e7e42
ia 70:8d:3f:6a:09:b0 (random) ra cd:1a:c0:a3:1e:d1 (random)
preq 01040005100303 pres 02000005100301
p1 01010104000510030302000005100301
p2 d11ea3c01acdb0096a3f8d7000000000
smp_send
Checksum->Received=0, Calculated=-1460267513
Received SMP code 0x04 len 17
smp_pairing_random
legacy_pairing_random
k a1b90000000000000000000000000000 r 440945259a216547c437e4f51a7bd2c1
ia 70:8d:3f:6a:09:b0 (random) ra cd:1a:c0:a3:1e:d1 (random)
preq 01040005100303 pres 02000005100301
p1 01010104000510030302000005100301
p2 d11ea3c01acdb0096a3f8d7000000000
pcnf 2e80efe8e092d0746e0acec7ae5980ba cfm 2e80efe8e092d0746e0acec7ae5980ba
generated STK 2fe10699d44b25382e2f82da54563694
smp_send_pairing_random
smp_send
chan 0x200019c4 conn 0x20000f3c handle 1 encrypt 0x01 hci status 0x00
smp_send
smp_send
Received SMP code 0x06 len 17
smp_encrypt_info
Received SMP code 0x07 len 11
smp_master_ident
Received SMP code 0x08 len 17
smp_ident_info
Received SMP code 0x09 len 8
identity d4:61:9d:ee:41:53 (public)
Checksum->Received=0, Calculated=-362357784
[] [INF] rvmn_ui_bt_auth_pairing_complete: d4:61:9d:ee:41:53 (public), bonded:1
[] [ERR] rvmn_ui_bt_auth_pairing_complete: not bonded

===========================================

BAD:

chan 0x200019c4 cid 0x0006
bt_smp_send_security_req
[ble_p] [INF] ble_p_connected: 2/4, d4:61:9d:ee:41:53 (public) is NOT bonded
Received SMP code 0x01 len 7
smp_pairing_req 2
smp_send
legacy_request_tk
passkey display 374607
1: 0x00005151, 0x0000466d
path 2
Received SMP code 0x03 len 17
smp_pairing_confirmlegacy_pairing_confirm
k 4fb70500000000000000000000000000 r 4fabe9472b860a80a8bce4ea16a46c4d
ia 70:8d:3f:6a:09:b0 (random) ra cd:1a:c0:a3:1e:d1 (random)
preq 01040005100303 pres 02000005100301
p1 01010104000510030302000005100301
p2 d11ea3c01acdb0096a3f8d7000000000
smp_send
Received SMP code 0x04 len 17
smp_pairing_random
legacy_pairing_random
k 4fb70500000000000000000000000000 r bc4ee9e578a706b95416d965e6fb87d4
ia 70:8d:3f:6a:09:b0 (random) ra cd:1a:c0:a3:1e:d1 (random)
preq 01040005100303 pres 02000005100301
p1 01010104000510030302000005100301
p2 d11ea3c01acdb0096a3f8d7000000000
pcnf bfab71779e36b7b715e8ed81d43cb00a cfm bfab71779e36b7b715e8ed81d43cb00a
generated STK 8af9f859052c89589243d02bcd96a3ad
smp_send_pairing_random
smp_send
Checksum->Received=-1, Calculated=-1441840795
chan 0x200019c4 conn 0x20000f3c handle 2 encrypt 0x01 hci status 0x00
smp_send
smp_send
chan 0x200019c4 cid 0x0006
[ble_p] [INF] ble_p_disconnected: 2/4, d4:61:9d:ee:41:53 (public) is NOT bonded
[ble_p] [ERR] ble_p_disconnected: 1/4, d4:61:9d:ee:41:53 (public) NOT
[ble_p] [DBG] ble_p_disconnected: d4:61:9d:ee:41:53 (public), reason 0x08: BT_HCI_ERR_CONN_TIMEOUT
Checksum->Received=-16777216, Calculated=-579520369

===========================================

**RELEVANT CODE SNIPPETS (From : ctrl.c and radio.c):
iPhone_PAIRING_WORKED.txt
iPhone_2nd_device_connect_failure_code_snippets.txt
iPhone_PAIRING_FAILED.txt

**


u32_t radio_is_done(void)
{
if (NRF_RADIO->EVENTS_END != 0) {
/* On packet END event increment last packet end time value.
* Note: this depends on the function being called exactly once
* in the ISR function.
*/
last_pdu_end_us += EVENT_TIMER->CC[2];
return 1;
} else {
return 0;
}
}

#else /* !CONFIG_BT_CTLR_SW_SWITCH_SINGLE_TIMER /
u32_t radio_is_done(void)
{
return (NRF_RADIO->EVENTS_END != 0); <<<<< THIS LINE
}
#endif /
!CONFIG_BT_CTLR_SW_SWITCH_SINGLE_TIMER */


static void isr(void)
{
u8_t trx_done;
u8_t crc_ok;
u8_t devmatch_ok;
u8_t devmatch_id;
u8_t irkmatch_ok;
u8_t irkmatch_id;
u8_t rssi_ready;

   DEBUG_RADIO_ISR(1);

   /* Read radio status and events */
   trx_done = radio_is_done();                                       **<<<<< THIS LINE**
   if (trx_done) {

#if defined(CONFIG_BT_CTLR_PROFILE_ISR)
/* sample the packet timer here, use it to calculate ISR latency
* and generate the profiling event at the end of the ISR.
/
radio_tmr_sample();
#endif /
CONFIG_BT_CTLR_PROFILE_ISR */

          crc_ok = radio_crc_is_valid();
          devmatch_ok = radio_filter_has_match();
          devmatch_id = radio_filter_match_get();
          irkmatch_ok = radio_ar_has_match();
          irkmatch_id = radio_ar_match_get();
          rssi_ready = radio_rssi_is_ready();
   } else {
          crc_ok = devmatch_ok = irkmatch_ok = rssi_ready = 0;
          devmatch_id = irkmatch_id = 0xFF;
   }

   /* Clear radio status and events */
   radio_status_reset();
   radio_tmr_status_reset();
   radio_filter_status_reset();
   radio_ar_status_reset();
   radio_rssi_status_reset();

#if defined(CONFIG_BT_CTLR_GPIO_PA_PIN) ||
defined(CONFIG_BT_CTLR_GPIO_LNA_PIN)
radio_gpio_pa_lna_disable();
#endif /* CONFIG_BT_CTLR_GPIO_PA_PIN || CONFIG_BT_CTLR_GPIO_LNA_PIN */

   switch (_radio.state) {
   case STATE_TX:
          isr_radio_state_tx();
          break;

   case STATE_RX:                                                          **<<<<< THIS LINE**
          isr_radio_state_rx(trx_done, crc_ok, devmatch_ok, devmatch_id,                
                          irkmatch_ok, irkmatch_id, rssi_ready);
          break;

   case STATE_ABORT:
   case STATE_STOP:
   case STATE_CLOSE:
          isr_radio_state_close();                                          **<<<<< THIS LINE**
          break;

   case STATE_NONE:
          /* Ignore Duplicate Radio Disabled IRQ due to forced stop
          * using Radio Disable task.
          */
          break;

   default:
          LL_ASSERT(0);
          break;
   }

   DEBUG_RADIO_ISR(0);

}


At the start of the function isr_radio_state_rx, trx_done is tested, and if not true, _radio.state is set to STATE_CLOSE, then the connection is closed

static inline void isr_radio_state_rx(u8_t trx_done, u8_t crc_ok,
u8_t devmatch_ok, u8_t devmatch_id,
u8_t irkmatch_ok, u8_t irkmatch_id,
u8_t rssi_ready)
{
u32_t err;
u8_t rl_idx;

   if (!((trx_done) || ((SILENT_CONNECTION) &&           **<<<<< THIS LINE**
                      (_radio.role == ROLE_SLAVE)))) {
          _radio.state = STATE_CLOSE;
          radio_disable();

          return;
   }

static inline void isr_radio_state_close(void)
{
u32_t dont_close = 0;

   switch (_radio.role) {
   case ROLE_ADV:
          dont_close = isr_close_adv();
          break;

   case ROLE_SCAN:
          dont_close = isr_close_scan();
          break;

   case ROLE_SLAVE:
   case ROLE_MASTER:
          isr_close_conn();                                     **<<<<< THIS LINE**
          break;

This is the function called from code above, resulting in the connection being closed with a BT_HCI_ERR_CONN_TIMEOUT

static inline void isr_close_conn(void)
{
u16_t ticks_drift_plus;
u16_t ticks_drift_minus;
u16_t latency_event;
u16_t elapsed_event;
u8_t reason_peer;
u16_t lazy;
u8_t force;

   /* Local initiated terminate happened */
   if (_radio.conn_curr == 0) {
          return;
   }

   /* Master transmitted ack for the received terminate ind or
   * Slave received terminate ind.
   */
   reason_peer = _radio.conn_curr->llcp_terminate.reason_peer;
   if (reason_peer && ((_radio.role == ROLE_SLAVE) ||
                     _radio.conn_curr->master.terminate_ack)) {
          terminate_ind_rx_enqueue(_radio.conn_curr, reason_peer);

          connection_release(_radio.conn_curr);
          _radio.conn_curr = NULL;

          return;
   }

   ticks_drift_plus = 0;
   ticks_drift_minus = 0;
   latency_event = _radio.conn_curr->latency_event;
   elapsed_event = latency_event + 1;

   /* calculate drift if anchor point sync-ed */
   if (_radio.packet_counter &&
       (!SILENT_CONNECTION || (_radio.packet_counter != 0xFF))) {
          if (_radio.role == ROLE_SLAVE) {
                 u32_t start_to_address_expected_us;
                 u32_t start_to_address_actual_us;
                 u32_t window_widening_event_us;
                 u32_t preamble_to_addr_us;

                 /* calculate the drift in ticks */
                 start_to_address_actual_us = radio_tmr_aa_restore() -
                                          radio_tmr_ready_get();
                 window_widening_event_us =
                       _radio.conn_curr->slave.window_widening_event_us;

#if defined(CONFIG_BT_CTLR_PHY)
preamble_to_addr_us =
addr_us_get(_radio.conn_curr->phy_rx);
#else /* !CONFIG_BT_CTLR_PHY /
preamble_to_addr_us = addr_us_get(0);
#endif /
!CONFIG_BT_CTLR_PHY */
start_to_address_expected_us = RADIO_TICKER_JITTER_US +
(RADIO_TICKER_JITTER_US << 1) +
preamble_to_addr_us +
window_widening_event_us;
if (start_to_address_actual_us <=
start_to_address_expected_us) {
ticks_drift_plus = HAL_TICKER_US_TO_TICKS(
window_widening_event_us);
ticks_drift_minus = HAL_TICKER_US_TO_TICKS(
(start_to_address_expected_us -
start_to_address_actual_us));
} else {
ticks_drift_plus = HAL_TICKER_US_TO_TICKS(
start_to_address_actual_us);
ticks_drift_minus = HAL_TICKER_US_TO_TICKS(
RADIO_TICKER_JITTER_US +
(RADIO_TICKER_JITTER_US << 1) +
preamble_to_addr_us);
}

                 /* Reset window widening, as anchor point sync-ed */
                 _radio.conn_curr->slave.window_widening_event_us = 0;
                 _radio.conn_curr->slave.window_size_event_us = 0;

                 /* apply latency if no more data */
                 if (_radio.conn_curr->pkt_tx_head) {
                       struct pdu_data *pdu_data_tx;

                       pdu_data_tx = (void *)
                              _radio.conn_curr->pkt_tx_head->pdu_data;
                       if (pdu_data_tx->len ||
                           _radio.conn_curr->packet_tx_head_offset) {
                              _radio.conn_curr->latency_event = 0;
                       }
                 } else if (_radio.conn_curr->slave.latency_enabled) {
                       _radio.conn_curr->latency_event =
                              _radio.conn_curr->latency;
                 }
          } else if (reason_peer) {
                 _radio.conn_curr->master.terminate_ack = 1;
          }

          /* Reset connection failed to establish procedure */
          _radio.conn_curr->connect_expire = 0;
   }

   /* check connection failed to establish */
   else if (_radio.conn_curr->connect_expire) {
          if (_radio.conn_curr->connect_expire > elapsed_event) {
                 _radio.conn_curr->connect_expire -= elapsed_event;
          } else {
                 terminate_ind_rx_enqueue(_radio.conn_curr,
                                     BT_HCI_ERR_CONN_FAIL_TO_ESTAB);

                 connection_release(_radio.conn_curr);
                 _radio.conn_curr = NULL;

                 return;
          }
   }

   /* if anchor point not sync-ed, start supervision timeout, and break
   * latency if any.
   */
   else {
          /* Start supervision timeout, if not started already */
          if (!_radio.conn_curr->supervision_expire) {
                 _radio.conn_curr->supervision_expire =
                       _radio.conn_curr->supervision_reload;
          }
   }

   /* check supervision timeout */
   force = 0;
   if (_radio.conn_curr->supervision_expire) {
          if (_radio.conn_curr->supervision_expire > elapsed_event) {
                 _radio.conn_curr->supervision_expire -= elapsed_event;

                 /* break latency */
                 _radio.conn_curr->latency_event = 0;

                 /* Force both master and slave when close to
                 * supervision timeout.
                 */
                 if (_radio.conn_curr->supervision_expire <= 6) {
                       force = 1;
                 }
                 /* use randomness to force slave role when anchor
                 * points are being missed.
                 */
                 else if (_radio.role == ROLE_SLAVE) {
                       if (latency_event != 0) {
                              force = 1;
                       } else {
                              force = _radio.conn_curr->slave.force & 0x01;

                              /* rotate force bits */
                              _radio.conn_curr->slave.force >>= 1;
                              if (force) {
                                     _radio.conn_curr->slave.force |=
                                            ((u32_t)1 << 31);
                              }
                       }
                 }
          } else {
                 terminate_ind_rx_enqueue(_radio.conn_curr,    **<<<<< THIS LINE**
                                     BT_HCI_ERR_CONN_TIMEOUT);

                 connection_release(_radio.conn_curr);               **<<<<< THIS LINE**
                 _radio.conn_curr = NULL;

                 return;
          }
   }

@cvinayak
Copy link
Collaborator

cvinayak commented Feb 4, 2019

@DeclanTraill Thank you for the detailed analysis, but could you fill the details to reproduce the issue as per the template in your first comment in this issue?

Please state the Zephyr commit hash you are using, the sample you are using, the .config file for your application, any code deviations from the upstream sample implementation, the devices, boards, phone and peers used.

Unless, I am made to reproduce the issue, its will be difficult to analyze the issue. Awaiting your response.

@faisalSaleemSetec
Copy link
Contributor

duplicate of #13083 ..
All the information you asked for is given there and the issue is reproduced on the sample code bluetooth\peripheral..

@carlescufi
Copy link
Member

Closing as duplicate of #13083

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug platform: nRF Nordic nRFx priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants