Skip to content

BLE HID sample often asserts on Windows 10 reconnection #15183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Olivier-ProGlove opened this issue Apr 4, 2019 · 2 comments · Fixed by #15190
Closed

BLE HID sample often asserts on Windows 10 reconnection #15183

Olivier-ProGlove opened this issue Apr 4, 2019 · 2 comments · Fixed by #15190
Assignees
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@Olivier-ProGlove
Copy link
Contributor

Describe the bug

I was checking if #14938 could fix the issue "BLE HID sample fails to reconnect on Windows 10 tablets - Wrong Sequence Number (follow-up)" #14044.

I have the failing assert <err> bt_ctlr_hci: assert: '0' failed after the device connects.
I am using Nordic nRF52840 dev kit (PCA10056) in a good third of the reconnection (it is the first time I tried, so maybe if depends of the environment).

Because there is a couple of LL_ASSERT(0) in subsys/bluetooth/controller/hci/hci.c, I added an error message before each of these asserts. I can see it is always the one from encode_data_ctrl() that fails: <err> bt_ctlr_hci: encode_data_ctrl: opcode:0x0

Here is my code change:

@@ -3246,6 +3251,7 @@ static void encode_data_ctrl(struct node_rx_pdu *node_rx,
                break;
 
        default:
+               BT_ERR("encode_data_ctrl: opcode:0x%x", pdu_data->llctrl.opcode);
                LL_ASSERT(0);
                return;
        }

To Reproduce

  1. Build and flash peripheral_hids on Nordic nRF52840 dev kit (PCA10056).
  2. Pair the Windows 10 tablet to the device (all good!)
  3. Reset the Nordic dev kit by pressing 'BOOT/RESET'

Expected behavior
Windows 10 should automatically reconnect to the dev kit as it advertises on restart.

Screenshots or console output

***** Booting Zephyr OS v1.14.0-rc3-80-g417d349727e3 *****
Bluetooth initialized
Advertising successfully started
[00:00:00.007,720] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.007,720] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.007,720] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 1.14 Build 0
[00:00:00.008,087] <wrn> bt_hci_core: No ID address. App must call settings_load()
[00:00:00.010,803] <inf> bt_hci_core: Identity: ea:ac:67:83:44:59 (random)
[00:00:00.010,803] <inf> bt_hci_core: HCI: version 5.0 (0x09) revision 0x0000, manufacturer 0x05f1
[00:00:00.010,833] <inf> bt_hci_core: LMP: version 5.0 (0x09) subver 0xffff
Connected bc:83:85:0c:f3:ba (public)
[00:00:06.202,453] <err> bt_ctlr_hci: encode_data_ctrl: opcode:0x0
[00:00:06.202,453] <err> bt_ctlr_hci: assert: '0' failed
***** Kernel OOPS! *****
Current thread ID = 0x200008e8
Faulting instruction address = 0xeea2
Fatal fault in thread 0x200008e8! Aborting.

Environment:

  • Microsoft Windows Surface Pro 4 Build Version: 1809 OS build: 17763.316
  • Upstream Zephyr from yesterday: SHA1: 417d349727e3e 417d349

Additional context

Adding PDU_DATA_LLCTRL_TYPE_CONN_UPDATE_IND to the switch seems to work:

@@ -3245,7 +3250,12 @@ static void encode_data_ctrl(struct node_rx_pdu *node_rx,
                le_unknown_rsp(pdu_data, handle, buf);
                break;
 
+       case PDU_DATA_LLCTRL_TYPE_CONN_UPDATE_IND:
+               BT_WARN("encode_data_ctrl: Skip CONN_UPDATE_IND");
+               break;
+

But I do not know if some actions need to be taken on PDU_DATA_LLCTRL_TYPE_CONN_UPDATE_IND.

cc: @joerchan @carlescufi @cvinayak

@Olivier-ProGlove Olivier-ProGlove added the bug The issue is a bug, or the PR is fixing a bug label Apr 4, 2019
@carlescufi carlescufi assigned joerchan and unassigned carlescufi Apr 4, 2019
@carlescufi
Copy link
Member

I believe that an upcoming patch from @joerchan will fix this as well.

@Olivier-ProGlove
Copy link
Contributor Author

Here is the sniffer traces when the assert occurs (I removed my workaround) to force LL_ASSERT(0):windows10-reconnection-assert.zip

I duplicated the issue twice to ensure we have consistent traces (and it looks like the trace are consistant):

  • reconnection-assert1.pcapng: First try
  • reconnection-assert2.pcapng: Second try

Screenshot from 2019-04-04 13-29-37

@nashif nashif added the priority: medium Medium impact/importance bug label Apr 4, 2019
joerchan added a commit to joerchan/zephyr that referenced this issue Apr 5, 2019
The fix done in zephyrproject-rtos#14938 introduced a later assert when raising an HCI
event for the procedure that was terminated during the procedure
collision handling. This assert happens because the unknown rsp
has information that is needed when raising the event.
Solve this by copying the scratch packet into the node buffer so that
we keep the data.

Fixes zephyrproject-rtos#15183

Signed-off-by: Joakim Andersson <[email protected]>
nashif pushed a commit that referenced this issue Apr 5, 2019
The fix done in #14938 introduced a later assert when raising an HCI
event for the procedure that was terminated during the procedure
collision handling. This assert happens because the unknown rsp
has information that is needed when raising the event.
Solve this by copying the scratch packet into the node buffer so that
we keep the data.

Fixes #15183

Signed-off-by: Joakim Andersson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Bluetooth bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants