-
Notifications
You must be signed in to change notification settings - Fork 7.5k
Alleged multiple design and implementation issues with logging #11655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ping to all active TSC members: @nashif, @galak, @carlescufi, @MaureenHelm, etc. I'm developing a new ethernet driver and last 2 weeks any more or less work session ended up hitting the wall with logging (see ticket refs above). I tried to explain why I submitted #10769 and #10770: they were early warning bells that something's not right with how logging and shell subsystems were reimplemented in Zephyr. When rewriting such baseline subsystems as logging and shell, following should be a baseline requirement and acceptance criteria: New implementation of such subsystems should support all the features, and by default work the same, as the original subsystem (unless they were totally broken, and of course logging/shell in Zephyr wasn't). Orthogonal to that, here's a baseline requirement for logging: Logging should just work by default And well, old logging just worked. And new one just DOES NOT, hence #11007, #11008, #11633, #11417, #10715, etc. This is not to undermine the great work done by Nordic developers and @nordic-krch in particular. This is to raise before the TSC the question of change control and acceptance criteria handling in the project. Because these great changes should not have been accepted in the shape they went it. If new logging was supposed to replace old logging, it should have been developed and configured to mirror the functionality and behavior of the old logging. Any changes to that behavior should have gone as separate changes, under well rationed change control schedule (e.g. in different Zephyr release to the initial rewrite). We consequently have 2 choices:
Actually, we won't have 6-12 month to enjoy the great new features of the new logging - because as soon as we enable 16K of logging buffer bu default, half of boards will fall down in the CI - starting with nRF51. So hopefully, we can go to choice A real soon now. I would like to raise these 2 topics:
|
Under the assumption that there is a clean layer of abstraction between the logging API and the buffering/backend, why not have a Kconfig which can give the user a choice whether logging should be buffered/deferred, or done synchronously. We won't have to impose a buffer size on small devices in that case. I think this is what CONFIG_LOG_INPLACE_PROCESS does? Also as a side note, I'd like to learn more why CONFIG_LOG_PROCESS_THREAD_PRIO is default -2 |
yes, the issue is probably the buffer size but actual size of the buffer is hard to determine since it is dynamic - you enable debug level in one module and suddenly you need twice bigger buffer. We can improve reporting of such scenarios.
Well, that's you. When you have embedded application that has time critical parts that is a must and we seen applications being developed that would suffer by lack of deferred logging. You should also see further since deferred logging allows not only logging to uart but also to flash or over network. |
Exactly, so only size we can use is conservative, and that's a lot (maybe of course not 16K, but say 5K isn't too little too).
Thanks. That would be indeed the baseline measure as argued above. But I wonder is there's enough experience collected already to take measures beyond that, so I'm really keen to raise this question for this week's TSC. Hope I'll do it right, and hope you can be there.
Yep, it's me. But what I do is just what any programmer can/would/will do - enable logging for networking. I didn't even enable global networking logging, just a handful of modules with:
But nobody says "Let's removed deferred logging"! It's absolutely great feature which should be at everyone fingertips for cases when it's really needed. But I see the situation exactly as you guys needed a solution to debug realtime stuff like Bluetooth, but overoptimistically went ahead to make it the default Zephyr logging. Unfortunately, practice shows that it scales poorly to the real-world Zephyr needs, doesn't integrate well (printk issues, etc.). I'm not sure about other subsystems, but networking is a complex, multilayer code. It does produce a lot of logging, and that logging required to debug issues due to the subsystem complexity. Then it's imperative that it works for programmer, not against one. And even if logging for other individual subsystems is "simple", then Zephyr itself is vast and complex collection and interconnection of subsystems, and there's nothing wrong with enabling logging for multiple subsystems, and then we get to the same problems which are already exposed with networking. |
Related: #11625 |
So, these issues were discussed at today's TSC meeting. The general response was that deferred logging is valuable and we want to "get it right", no give up on it. Fine, let's look into that. Current points then:
Ping to @pfl in particular based on the comments, I full agree that we need to share such information, and work on common ways to deal with it, rather everyone painfully find workarounds on their own later. That's whole point of me raising this issue. Likewise, I agree that solutions need to be worked out and elaborated first, that's why again this is a ticket, primarily waiting for feedback from the logging maintainer on how to proceed further. |
Please close this and file specific bugs for each of these issues if bugs don't exist already. Having one bug open where you just have a litany of complaints (with a grandiose title like "LOGGING CAN'T BE TRUSTED") isn't terribly helpful in getting these issues prioritized and resolved. Thanks |
@andrewboie, not reading description and previous comments and posting pundit-style comments isn't helpful either. Because there's a whole bunch of tickets already, and maybe time is indeed to take "grandiose" measures and do something with logging with bird's eye view of its issues in mind. Otherwise, if you've got time to comment around, can you please spend it on replying to exactly specific tickets regarding stuff you did before, like #10499, #10911. Thanks too. |
I read the entire discussion.
Great!
No. These are bugs. They get solved. Take it to the mailing list or the TSC if you want to get on a soapbox. |
@andrewboie: Please don't close tickets open by other people unless there're resolved. I don't know where manners like that come from, but you already showed your negligence of project technical issues in #5355 and discouraging attitude towards people who report it. Please contain yourself and don't let that spread out across the entire project. If you ignore and don't care about various issues, let the other people who care work towards the solutions.
This is especially hilarious. Do you just want your own cozy entry in the latest Github hit, https://github.com/nikolas/github-drama , and that's why you behave like that? |
This was taken to the TSC, exactly because it's easy to get into "let's ignore and work around issues" behavior, and I'm quite along that way, as #11633 (comment) . And the biggest bug so far is "how did we get to such problems in the first place", and this is to what this ticket is dedicated. And replies like #11633 (comment) "if shell is used then it is more complex", shows that it's exactly the most important issue to pay attention to. We can resolve minor technical issues in a loop infinitely, never getting it right, because the problem is on the design and process level. So, this ticket is to track the progress of disillusionment regarding usage of deferred logging as the default setting. Or alternatively, to track the progress of finding the sweet spot of compromises with it, and me eating my hat, which I'm fine with - there's always something to learn, and witnessing a magic unicorn solution is fully worth it, if only resolving each of them which Zephyr has would be as easy as me eating a hat. |
@pfalcon your style of opening multiple issues and cross linking them over and over again and using the issue system for open discussions is confusing and irritating. This is an issue tracking system, not a forum or a mailing list. We need real bug reports with details how they can be reproduced and other information related to a specific bug to help resolve the issues, not open-ended rants that can't be acted upon directly. I am already lost in the number of issues and cross-linking that I (and probably others) are not able to focus on the real problems. This entry is not even marked asa bug, so I am not sure what you are expecting here. |
Really? ;-) I'm taking a bad example from guys like laperie who open multiple issue and cross link them, e.g. #7591 . As for "using the issue system for open discussions is confusing and irritating" - that's a hit. What else do you think issue systems are? For discussion of issues and finding ways to solve them.
In the description the code snippet is provided. If you need the exact setup to reproduce, I'm happy to provide one (it's based on #11680). Otherwise, I'm working with @nordic-krch and happy to provide any info and cooperate with him.
Well, for me "action" is clear - we need to switch default logging to non-deferred, and this point is argued here. But you guys disagree, that apparently what turn it into a "rant". However this is a technical/project management ticket, and I'd appreciate it being treated as such.
Unfortunately, that's because there's an interconnected number of problems. And to keep track of it and untangle it, we exactly need a number of properly cross-references tickets.
Marked as RFC, thanks. |
You lecturing other people about attitude is irony on a cosmic level. The amount of time you spent writing these walls of text would be better spent looking at the log code helping to fix it. Meanwhile, since you are clearly in a lot of agony about this whole situation, I leave you with the world's smallest violin, playing you a sad, sad, song: 🎻 |
May end up even worse waste of time if turns out that I'd pull it in the opposite direction the maintainer and other people want it to be. Need to sync up on the action plan first, waiting for responses from @nordic-krch (he apparently multitasks a lot too, everyone works hard to get that proverbial LTS out and not be ashamed of).
Thanks, appreciate it. |
The history unfolds: 2018-12-12 #12040 "Log messages are dropped even with huge log buffer" occurred. Was fixed, but: Now I'm back to hacking on my #11680, and here's what I'm greeted with:
Quick grep over subsys/logging/ doesn't show anything directly related to k_msgq, but it's definitely logging-related, because commenting a particular LOG_DBG makes that assert go away. And yeah, that LOG_DBG is in the func called from ISR, and yeah, I need it there ;-). So, this again should show skeptics among us why this issue exists, precisely titled as it is. It's not about a particular technical problem, it's about paradigmatic approach to the logging design. Particular issues can be fixed. Just to spawn new issues. The process may asymptotically subside. But just imagine that we're trying to reach value of "pi" using an asymptotic formula for "e". We'll never get there using that formula, no matter how hard we try. |
Please consider changing the title of this issue, the title does not look like an RFC, nor it does look like an enhancement. If this is a bug, it should be marked as such and prioritised to be fixed for the next release. |
I think that this could be closed now as issues are solved:
|
Right. So, I don't think this ticket will be resolved (note: not using the term "closed") until:
But the whole subject of this ticket that such cases will reoccur again and again. Unfortunately, they reoccur rarely enough, so it will take a long time in build up a consistent developer opinion regarding that way of working. |
I hope that will never happen. It seems then that it will stay open for a while. |
It's the same story again (#11008, #11417) - logging doesn't log what's expected. Consider following snippet from ethernet driver ISR handler (i.e. it executes on packet reception) run in the context of
samples/net/echo_server
:The output after startup followed the initial ping packet is (sorry for a lot of extra debug output):
What we should pay attention to is:
eth_smsc9220_isr: 8 8
) no debug logging occurs.[00:00:17.896,378] <dbg> eth_smsc9220.eth_smsc9220_isr: out RX FIFO: ...
. But per the code above,out RX FIFO:
logging should occur afterin RX FIFO:
, and the latter didn't occur.It should be noted that on the next ping packet, the problems subsides, and there's
in RX FIFO:
followed byout RX FIFO:
.Based on such a black-box analysis of logging behavior, we can make following conjectures (as conjectures, they may be false):
But p.3 is absolutely IMPOSSIBLE way to deal with the situation. At worst, it should store a message notifying user that some output was lost (this is worst solution, as such a message can be lost itself). Normally, it should switch to non-deferred mode. Under condition logging should be dropped silently as a default option.
The text was updated successfully, but these errors were encountered: