Skip to content

[RFC] logging: Default to redirecting printk() through it #10715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

carlescufi
Copy link
Member

Default to redirecting the printk() calls through the logging subsystem.
This has 2 advantages:

  • Removes potential race conditions between printk() and the logger
    backend usage of the UART or any other backend
  • Printsk the printk() statements in-order with the rest of the
    logs, since now by default printk() messages are deferred

Disadvantages:

  • printk() statements are limited in length
  • Additional memory consumption by default

Signed-off-by: Carles Cufi [email protected]

@carlescufi carlescufi added the RFC Request For Comments: want input from the community label Oct 19, 2018
@@ -185,6 +185,7 @@ config LOG_MAX_LEVEL

config LOG_PRINTK
bool "Enable processing of printk messages."
default y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will ugly clash with shell when boot banner is enabled :/.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't. That has been fixed with #10510

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 19, 2018

I didn't yet even looked into details of new logging (i.e. not enabled it "for real" to debug some issue), but already see artifacts due to its buffering nature: lack of output in case of crash and:

Printsk the printk() statements in-order with the rest of the
logs

this, yes.

But for me this "solution" looks like a workaround of deficiencies of new logging (disclaimer: not trying to diminish "efficiencies" of it, just again, I haven't had a chance to witness them yet in real life). Instead, there should be an easy to access switch to make logging synchronous and it should be spammily advertized, so no developer had a chance to miss it.

So, informally I'm -1, and happy to move this into formal review. (Just trying to avoid knee-jerk reactions re: new things introduced). At the very least, it's too early for such changes yet, IMHO, there should be more time left to let people "feel" thru it.

@codecov-io
Copy link

codecov-io commented Oct 19, 2018

Codecov Report

Merging #10715 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10715      +/-   ##
==========================================
+ Coverage   53.23%   53.25%   +0.01%     
==========================================
  Files         214      214              
  Lines       25831    25831              
  Branches     5697     5697              
==========================================
+ Hits        13752    13757       +5     
+ Misses       9762     9759       -3     
+ Partials     2317     2315       -2
Impacted Files Coverage Δ
misc/printk.c 91.56% <0%> (+3.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6973c6a...d32dc5a. Read the comment docs.

Copy link
Collaborator

@pfalcon pfalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, as there're too many +1's now, let me call for more consideration (== time) on this one.

@nordic-krch
Copy link
Collaborator

@pfalcon please note that with #10098 logger will flush all buffered logs in case of captured error so there are few cases when you won't see the log:

  • stop on breakpoint. In that case what i do if i need log is to temporarly put LOG_PANIC() before breakpoint line
  • error/reset which is not caputred by sw (e.g. power loss)

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 19, 2018

Please show us screen captures of samples/hello_world before and after the change.

(And may I say that such information should go into the commit message (of course, if there's interest in the quicker turnaround on the matter, otherwise can indeed wait until 10-20 people try themselves and form an opinion)).

@carlescufi
Copy link
Member Author

carlescufi commented Oct 19, 2018

Please show us screen captures of samples/hello_world before and after the change.

(And may I say that such information should go into the commit message (of course, if there's interest in the quicker turnaround on the matter, otherwise can indeed wait until 10-20 people try themselves and form an opinion)).

hello_world does not change at all its output with this PR. The difference is when there's intermingled messages, in which case this PR fixes the order in which the output messages are printed:

Without this PR

***** Booting Zephyr OS zephyr-v1.13.0-1191-gfbe43203b2 *****
Bluetooth initialized
Advertising successfully started
[00:00:00.007,629] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.007,659] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.007,659] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 1.13 Build 99
[00:00:00.007,904] <wrn> bt_hci_core: bt_init: No ID address. Expecting one to come from storage.
[00:00:00.013,336] <inf> bt_hci_core: Identity: fb:d0:38:54:d1:7a (random)
[00:00:00.013,336] <inf> bt_hci_core: HCI: version 5.0 (0x09) revision 0x0000, manufacturer 0x05f1
[00:00:00.013,336] <inf> bt_hci_core: LMP: version 5.0 (0x09) subver 0xffff

With this PR

***** Booting Zephyr OS zephyr-v1.13.0-1191-gfbe43203b2 *****
[00:00:00.007,720] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.007,720] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.007,751] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 1.13 Build 99
[00:00:00.007,995] <wrn> bt_hci_core: bt_init: No ID address. Expecting one to come from storage.
Bluetooth initialized
[00:00:00.011,383] <inf> bt_hci_core: Identity: fb:d0:38:54:d1:7a (random)
[00:00:00.011,413] <inf> bt_hci_core: HCI: version 5.0 (0x09) revision 0x0000, manufacturer 0x05f1
[00:00:00.011,413] <inf> bt_hci_core: LMP: version 5.0 (0x09) subver 0xffff
Advertising successfully started

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 19, 2018

hello_world does not change at all its output with this PR.

Wait, do you want to say that "redirecting thru logging" doesn't mean that usual logging message envelope will be added to it? Because for me, that's what phrase "redirect thru logging" means, and then again, the commit message could be improved to avoid confusion.

Default to redirecting the printk() calls through the logging subsystem
using the special logginng log_printk() function that encapsulates raw
strings into buffers and prints them as-is, without any of the logger
additional metadata.

This has 2 advantages:

* Removes potential race conditions between printk() and the logger
backend usage of the UART or any other backend
* Printsk the printk() statements in-order with the rest of the
logs, since now by default printk() messages are deferred

Disadvantages:

* printk() statements are limited in length
* Additional memory consumption by default

Signed-off-by: Carles Cufi <[email protected]>
@carlescufi
Copy link
Member Author

hello_world does not change at all its output with this PR.

Wait, do you want to say that "redirecting thru logging" doesn't mean that usual logging message envelope will be added to it? Because for me, that's what phrase "redirect thru logging" means, and then again, the commit message could be improved to avoid confusion.

That's right, I edited the commit message to clarify this.

@carlescufi
Copy link
Member Author

carlescufi commented Oct 19, 2018

@aescolar the native_posix build of tests/kernel/common and tests/posix/common is failing. Are you somehow synchronizing on the state of printk() output and therefore failing because this is now deferred?

@carlescufi carlescufi requested a review from aescolar October 19, 2018 11:45
@aescolar
Copy link
Member

@carlescufi : regarding tests/posix/common see #10508

@aescolar
Copy link
Member

@carlescufi : regarding tests/kernel/common it seems it is failing like in all the other platforms.

Since the printk test is verifying the standard printk functionality,
ensure that printk is not being redirected to the logging subsystem.

Signed-off-by: Carles Cufi <[email protected]>
@carlescufi
Copy link
Member Author

@carlescufi : regarding tests/posix/common see #10508

ACK, thanks!

@carlescufi : regarding tests/kernel/common it seems it is failing like in all the other platforms.

Oops, my mistake. Pushed a fix. Thanks!

Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds awful to me. Given that the logging subsystem is up and available now, the ONLY remaining use should be for synchronous debug logging, right?

And making printk asynchronous and forcing it to depend on another rather large subsystem is absolutely a hardship for debugging.

I mean, I need to get printk output from interrupt handlers. I need it to work in early boot. I need to see it synchronously so I can debug SMP ordering issues. If that doesn't work by default then I'll be constantly turning this feature off and complaining when support for "traditional" printk bitrots, as it surely will.

I guess the meat of my question is: who are the printk() users who will actually benefit from this change?

@carlescufi
Copy link
Member Author

carlescufi commented Oct 19, 2018

This sounds awful to me. Given that the logging subsystem is up and available now, the ONLY remaining use should be for synchronous debug logging, right?

The only remaining use of printk() is synchronous debug logging and also providing a simple way of outputting strings to a serial backend without additional code (the logging subsystem requires defining a module AFAIK).

And making printk asynchronous and forcing it to depend on another rather large subsystem is absolutely a hardship for debugging.

That is not what we are doing though. We are only making it asynchronous when the users enable CONFIG_LOG. Without CONFIG_LOG enabled, printk remains exactly as it was. So you only need to disable CONFIG_LOG (which is not enabled by default btw).

I mean, I need to get printk output from interrupt handlers. I need it to work in early boot. I need to see it synchronously so I can debug SMP ordering issues. If that doesn't work by default then I'll be constantly turning this feature off and complaining when support for "traditional" printk bitrots, as it surely will.

I guess the meat of my question is: who are the printk() users who will actually benefit from this change?

Most samples use printk(). When using synchronous printk() with asynchronous LOG(), the output is not shown in the right order. Synchronous printk() in ISRs typically yields to non-functioning hardware because of delays introduced. So I'd say most users that use samples as a starting point would benefit.

@carlescufi
Copy link
Member Author

@nordic-krch @jarz-nordic @andyross @nashif @jhedberg

That said, @andyross has a point, so there could be an alternative solution. Instead of doing this, we could do the following:

  • Create a new default logger that the app/sample can LOG() from directly without any extra additions, enabled by default
  • Replace the calls to printk with LOG() in all samples
  • Show the usage of printk() as a synchronous logging mechanism in a new sample
  • Keep CONFIG_LOG_PRINTK=n as of today

Thoughts?

@andrewboie
Copy link
Contributor

does this work for fatal CPU exceptions?

@nordic-krch
Copy link
Collaborator

@andyross logger is intended to be used in interrupt handlers and actually it's much more convenient there since it does not block interrupt for significant amount of time (like synchronous printk will do). It takes <5us to log a message while it will take >200us when synchronous approach is used. Logger is ready from the start (initiliazed as early as possible in _Cstart). Ordering is also ensured since messages are queued sequentially and timestamp is added.

Logger supports multiple backends and each backend can have different filter settings so in future we can add something like dmesg where messages from kernel would be buffered in ram and flushed on shell command.

Another thing is that when printk goes through logger, output is multiplexed on message basis. If printk goes directly to the output it will compete with the logger, or shell or any other user.

@andrewboie
Logger supports panic mode. Once panic mode is triggered logger switches all backends to synchronous mode and starts working in synchronous mode. #10098 added triggering panic modes to fault handlers.

@andyross
Copy link
Collaborator

Having fast logging doesn't do any good if I never see the message. All I'm saying is that to do my job I absolutely need a trivially easy synchronous output routine I can use anywhere. If that's not going to be printk() then what is it? If it's a non-default mode of the logging subsystem, are we testing it to make sure it always works?

@carlescufi
Copy link
Member Author

Since there is no consensus here I will instead send PRs to switch the samples to using LOG() instead of printk(), which probably makes sense anyway.

@carlescufi carlescufi closed this Oct 30, 2018
@pfalcon
Copy link
Collaborator

pfalcon commented Oct 30, 2018

@carlescufi : Well, it's hard to refrain from comment that road to far and roundabout is paved with intentions. We've got so advanced, so interstellar logging ... that now we have to try hard to find a way to work it around.

using LOG() instead of printk(), which probably makes sense anyway.

... And while doing so, we're trying to hog the notion of "console", first by shell, now by logging, round and round. Dare I say what I've said before #8168: the only thing we needed in is interrupt-driven, not thread-driven logging, with clear way to make it fully blocking, as it was before This is in turn addressed by #10765.

But surely, let's keep working it around. I don't see that converge to anything though, until logging is scaled down to work in down-to-earth mode.

@carlescufi
Copy link
Member Author

@pfalcon not sure I follow the irony-charged statements. The logger might not be ideal, but it's functional and it certainly works better than what we had before.
You say we only needed interrupt-driven logging, but the problem was not only that it was blocking, but also the cycles it took to format the string. That's why we use deferred logging, to be able to log anywhere without impacting the timing too much.

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 30, 2018

The logger might not be ideal, but it's functional and it certainly works better than what we had before.
You say we only needed interrupt-driven logging, but the problem was not only that it was blocking, but also the cycles it took to format the string. That's why we use deferred logging, to be able to log anywhere without impacting the timing too much.

Logger is way too good for task which most people have at hand. You know rsyslog? Like, there was normal syslogd, but then they wrote rsyslog. You probably can't run 1024 CPU cluster without it. Not sure anybody need it at home. Or some advanced NoSQL database - not sure about Redis, but MongoDB just must have it. But embedded system? Well, nice niche feature which may be needed sometimes.

That's why we use deferred logging, to be able to log anywhere without impacting the timing too much.

Yes, that's why we use deferred logging. But turns out, deferred logging doesn't play well with non-deferred output. Our solution? Defer output! Just imagine, what is printed as normal output by one thread, will be output by another thread. Nobody else has that, but Zephyr will. There's no irony here - it's plain-word description of the situation we find ourselves in.

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 30, 2018

Back to the contents of this PR - I'm debugging a case when on driver initialization it gets bombarded with spurious IRQs. Bug in QEMU, or it's dawning on me that maybe IRQ polarity is wrong. ISR has printk() to have insight into that stuff. As expected, I get absolutely no output with this patch - an unacknowledged interrupt on ARM is effectively an infinite loop. The solution would seem simple with CONFIG_LOG_PRINTK=n, but who knows about it? The commit message in this PR doesn't even mention it, but to keep everyone in the know, that's far from enough. I picture monthly automated FAQ posting to the mailing list along the lines of:

Q: I'm pulling my hair for 2nd day - my stuff just locks up with no output.
A: Jut try CONFIG_LOG_PRINTK=n.

But seriously, let's indeed keep things how they were with printk. I appreciate this issue being closed.

@nordic-krch
Copy link
Collaborator

@andyross

Having fast logging doesn't do any good if I never see the message. All I'm saying is that to do my job I absolutely need a trivially easy synchronous output routine I can use anywhere.

Have you been experiencing any issues with new deferred logging? We've been using same architecture for some time and it worked well. There is CONFIG_LOG_INPLACE_PROCESS option which handles logs synchronously. Currently i see following cases when you will not see latest logs:

  • breakpoint
  • sudden death (e.g power loss, pin reset)
  • fault before logger is initialized
  • log thread not being executed.

For each case there is countermeasure you can apply (except sudden death).

@nordic-krch
Copy link
Collaborator

@pfalcon

Logger is way too good for task which most people have at hand.

I must disagree. When we want to log in interrupts, log to different backends with independent filtering, log from multicores and we have kb of rom and ram and mHz not gHz we need something different than simple printk. You are mainly focusing on the cost of deferred logging forgetting benefits. I heard that using sys_log couldn't be used in usb, i've seen that net ping was decreased hundred times by switching to new logger (#8816 (comment)) and i have seen how we benefited in nordic sdk by switching to deferred logging.

I think we should focus on closing those gaps where possible (like logging in initial phase).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments: want input from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants