-
Notifications
You must be signed in to change notification settings - Fork 7.5k
CONFIG_LOG_IMMEDIATE leads to unobvious faults in unrelated rotines due to stack overflow #13897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
Isn't this a duplicate of #13423? Enabling stack sentinel gives more specific output:
|
Or not, |
@rlubos, any hint how to figure out what thread is it by its id "statically", e.g. by looking at zephyr.lst? I can't see a consistent way to do that, but by a chance I see that 0x004020d8 in my stacktrace is sysworkq, will try to increase its stack size. |
No, even using CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096 (from default 1K) doesn't help. And well, it's pretty clear by quoted disassembly that something (logging? ;-) ) calls k_poll with some NULL param. |
Statically, I look into
But typically, when running on hardware, I prefer |
I see, so this boils down to the same infamous gnu ld misfeature, ignored for years: https://sourceware.org/bugzilla/show_bug.cgi?id=16566 |
It seems that net_mgmt event thread stack is too small. With this setting Edit: @rlubos had already noticed the same fix, sorry for the noise, I missed some of the early discussion in this thread :) |
When |
@pfalcon can we close it? When I wonder if we shouldn't enable stack guard by default to avoid misinterpreted issues. |
@nordic-krch: Thanks for all the info and changes, I would need to retest it, and them will be able to close. I remember that I have a few logging-related tickets to retest in my backlog. Thanks for your patience. |
Dropped prio, reassigned to myself in the meantime. |
every bug needs a priority, if this is not a bug, either close it or track it as something else please. Or better, close as is without removing priority (@nordic-krch should have done that with explanation, @pfalcon retests and reopens if bug still present). |
Well, actually I wanted to drop an RFC to the mailing list that we need "waiting for verification" status for bugs. I skipped that, because there're to many fronts of work open already, and I'm not even sure if it should be generic "waiting for feedback" status instead. And here it's not even that something was "fixed", it's that issue was explained. As @nordic-krch writes:
Deciding on things like that what would amount to resolution of this issue.
I don't believe you propose to close unresolved bugs. And nope, I'm not a bug technician here to remember that there're some closed, but unverified bugs. But I definitely try to help with reporting bugs and managing them, that's why I update priorities, etc. |
Ok, so the current status is with master 61bcd76, the issue still occurs (samples/net/sockets/echo_server with CONFIG_LOG_IMMEDIATE=y). So, something needs to be done about that.
@nordic-krch, when you say "stack guard", which exactly Kconfig option do you mean? |
Ok, I assume you meant CONFIG_STACK_SENTINEL=y . (I'm somewhat mixed up with all those canaries and sentinels.) |
So, here's an example of changes which stems from looking into how to address this ticket: #14155 . As you imagine, that's pretty "far" and partial changes. 3-4 (or maybe 5-6) more changes like that are required to call this ticket resolved. Sorry, but I don't have time to do all those 3-6 changes now, I'm already occupied with previous changes to make. So, this ticket is likely going to be open for a while. It's ok to move it into 1.15 timeframe of course, once it's clear it doesn't fit into 1.14. Thanks. |
Are you sure stack overflow is the culprit? |
Looks like this is no longer a valid issue -> closing. |
Describe the bug
Running samples/net/sockets/echo_server with CONFIG_LOG_IMMEDIATE=y leads to fault on startup and/or on first connect. E.g. for qemu_x86 it leads to fault soon (~0.3s) after startup:
To Reproduce
Steps to reproduce the behavior:
0. master ddf744d
On frdm_k64f, the fault happens of first connection instead.
Expected behavior
No faults.
Impact
Can't use logging comfortably (#11655), d'oh.
Screenshots or console output
See above.
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: