-
Notifications
You must be signed in to change notification settings - Fork 7.5k
CONFIG_BT_HCI_TX_STACK_SIZE is too small #13585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am wondering if callbacks informing about connection/filter status/etc. are called from this thread. |
The definition is actually as follows:
So I suppose it's the |
@pdunaj can you please send a PR so that you set a reasonable value based on your measurements? |
Hi @carlescufi , sure. Will look for something sensible tomorrow and check if any callback can affect it. Hi @jhedberg, I actually prefer to increase it only when BT_CENTRAL is enabled. We don't seem to see any problem on the peripheral side. |
Btw, if the |
@pdunaj It'd be good to understand the root cause first. The reason this Kconfig option is not settable by the application is that the assumption has been that the app can't influence this stack's usage. Now it sounds like this might not be true, in which case we might want to revisit making this settable by the application. |
@pdunaj or do you mean that simply enabling BT_CENTRAL increases it (i.e. doesn't depend on the app doing central-related actions at runtime)? Meaning, something in the controller-side TX code-path changes when BT_CENTRAL is set? In that case it'd be justified to put the necessary logic in Kconfig (not that I'm particularly happy about the level of complexity this is growing to) |
@jhedberg I mean adding a line First I plan to check if any callback can be called from this function. I guess not as I expect rx side to handle it. Still I don't know this stack so I will have to have a look. |
The When I enabled shell I also got fault on PRI RX stack. I expanded this one to 640 and it also is not too big it seems. So it seems that the problem may be caused by increased RAM consumption cause by log messages. |
I have also tried to see why we have this logs enabled as we select BT_DEBUG_NONE... and found out that BT_DEBUG is being selected by ncs's discovery manager's option BT_GATT_DM_DATA_PRINT. If anything this option should depend on !BT_DEBUG_NONE being selected (any level except none) which actually would be depending on BT_DEBUG. @kapi-no I will add a change to NCS and see if stack problems are gone if debugs are disabled. There is still however an open question of how to expand this stacks - if debugs are enabled these will be too small and crash will occur. We have stack guard but unless somebody has it too they will not notice stack overflow. I think we should expand the stacks anyway but depend on BT_DEBUG option being set. |
I think it would be good to have someone from the log subsystem maintainers to look at this. One of the promises (or at least hopes) of moving log processing to a deferred thread was that the overhead we were suffering from due to printk would be gone, or at least drastically reduced. IIRC the problem with the new log subsystem is that it creates extra stack variables for every log call, so the more logs you have in a function the more stack it consumes. When I looked at this it seemed to me like the variables themselves (at least in the Bluetooth case) were identical in content for each c-file, so it should hopefully be possible to optimize this somehow. |
I removed the config option that caused selection of BT_DEBUG - logs are gone. Still I need to keep the stack size increased.
|
I think I found the culprit (at least for RX pri thread). When I get rid of stack analysis code (look for
It seems that this is caused by printing of the value. As when I comment out printk from stack_analysis only I get 216/608 stack usage, which is still low. |
I obtain unused stack size in the thread and use logger to report it instead of printk. Below is the stack usage:
So it seems we must get rid of printk or accept the fact that this thread will have to have bigger stack if CONFIG_INIT_STACKS is enabled. We can also remove code printing stack usage altogether (this info can be obtained from shell command). |
I changed
I would still prefer to update the macro as I see it being used in other BT modules in various occasions. |
@pdunaj updating STACK_ANALYZE to use logger sounds like a good approach to me. All this kind of places in the code that do stack analysis exist mainly for apps without the ability to issue the "kernel stacks" shell command, so I'd consider making these places dependent on the kernel shell module (and whatever other options "kernel stacks" depends on). Anyway, that can be done as a follow-up work. |
Change for stack analyze macro: |
There is still an open question why tx thread need for RAM is higher. I will try to find it out and get back here. |
I think I am done with the investigation. Stack is bumped up by around nearly 400B when pairing is complete. This happens as Hi @nashif , this is not low as it causes stack overflow. People will not even notice that unless they have stack guards enabled. |
#13789 created for stack expansion |
Describe the bug
I have pca10059 configured as central (running ncs nrf_desktop). The device crashes during connection attempt with a peripheral.
CONFIG_BT_HCI_TX_STACK_SIZE
is set to 640B in Zephyr. If extend it to 1024B program works (I have not tried lower values yet).We have not seen it in the past. The thing is that due to recent bugs we have MPU stack guard enabled by default now.
To Reproduce
Steps to reproduce the behavior:
Build nrf_desktop for pca10059 and try to connect to peripheral (DK working as mouse from the same project should work).
Try to connect and observe crash.
Expected behavior
Should not crash.
Impact
Showstopper.
Screenshots or console output
Environment (please complete the following information):
rev: a76f833 (ncs) from ec424b7 (zephyr)
Additional context
N/A
The text was updated successfully, but these errors were encountered: