[RFC] logging: Default to redirecting printk() through it #10715

carlescufi · 2018-10-19T10:25:25Z

Default to redirecting the printk() calls through the logging subsystem.
This has 2 advantages:

Removes potential race conditions between printk() and the logger
backend usage of the UART or any other backend
Printsk the printk() statements in-order with the rest of the
logs, since now by default printk() messages are deferred

Disadvantages:

printk() statements are limited in length
Additional memory consumption by default

Signed-off-by: Carles Cufi [email protected]

jakub-uC · 2018-10-19T10:45:28Z

subsys/logging/Kconfig

@@ -185,6 +185,7 @@ config LOG_MAX_LEVEL

 config LOG_PRINTK
 	bool "Enable processing of printk messages."
+	default y


This will ugly clash with shell when boot banner is enabled :/.

it shouldn't. That has been fixed with #10510

pfalcon · 2018-10-19T10:55:00Z

I didn't yet even looked into details of new logging (i.e. not enabled it "for real" to debug some issue), but already see artifacts due to its buffering nature: lack of output in case of crash and:

Printsk the printk() statements in-order with the rest of the
logs

this, yes.

But for me this "solution" looks like a workaround of deficiencies of new logging (disclaimer: not trying to diminish "efficiencies" of it, just again, I haven't had a chance to witness them yet in real life). Instead, there should be an easy to access switch to make logging synchronous and it should be spammily advertized, so no developer had a chance to miss it.

So, informally I'm -1, and happy to move this into formal review. (Just trying to avoid knee-jerk reactions re: new things introduced). At the very least, it's too early for such changes yet, IMHO, there should be more time left to let people "feel" thru it.

codecov-io · 2018-10-19T10:56:39Z

Codecov Report

Merging #10715 into master will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #10715      +/-   ##
==========================================
+ Coverage   53.23%   53.25%   +0.01%     
==========================================
  Files         214      214              
  Lines       25831    25831              
  Branches     5697     5697              
==========================================
+ Hits        13752    13757       +5     
+ Misses       9762     9759       -3     
+ Partials     2317     2315       -2

Impacted Files	Coverage Δ
misc/printk.c	`91.56% <0%> (+3.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6973c6a...d32dc5a. Read the comment docs.

pfalcon

Ok, as there're too many +1's now, let me call for more consideration (== time) on this one.

nordic-krch · 2018-10-19T11:00:35Z

@pfalcon please note that with #10098 logger will flush all buffered logs in case of captured error so there are few cases when you won't see the log:

stop on breakpoint. In that case what i do if i need log is to temporarly put LOG_PANIC() before breakpoint line
error/reset which is not caputred by sw (e.g. power loss)

pfalcon · 2018-10-19T11:02:20Z

Please show us screen captures of samples/hello_world before and after the change.

(And may I say that such information should go into the commit message (of course, if there's interest in the quicker turnaround on the matter, otherwise can indeed wait until 10-20 people try themselves and form an opinion)).

carlescufi · 2018-10-19T11:09:12Z

Please show us screen captures of samples/hello_world before and after the change.

(And may I say that such information should go into the commit message (of course, if there's interest in the quicker turnaround on the matter, otherwise can indeed wait until 10-20 people try themselves and form an opinion)).

hello_world does not change at all its output with this PR. The difference is when there's intermingled messages, in which case this PR fixes the order in which the output messages are printed:

Without this PR

***** Booting Zephyr OS zephyr-v1.13.0-1191-gfbe43203b2 *****
Bluetooth initialized
Advertising successfully started
[00:00:00.007,629] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.007,659] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.007,659] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 1.13 Build 99
[00:00:00.007,904] <wrn> bt_hci_core: bt_init: No ID address. Expecting one to come from storage.
[00:00:00.013,336] <inf> bt_hci_core: Identity: fb:d0:38:54:d1:7a (random)
[00:00:00.013,336] <inf> bt_hci_core: HCI: version 5.0 (0x09) revision 0x0000, manufacturer 0x05f1
[00:00:00.013,336] <inf> bt_hci_core: LMP: version 5.0 (0x09) subver 0xffff

With this PR

***** Booting Zephyr OS zephyr-v1.13.0-1191-gfbe43203b2 *****
[00:00:00.007,720] <inf> bt_hci_core: HW Platform: Nordic Semiconductor (0x0002)
[00:00:00.007,720] <inf> bt_hci_core: HW Variant: nRF52x (0x0002)
[00:00:00.007,751] <inf> bt_hci_core: Firmware: Standard Bluetooth controller (0x00) Version 1.13 Build 99
[00:00:00.007,995] <wrn> bt_hci_core: bt_init: No ID address. Expecting one to come from storage.
Bluetooth initialized
[00:00:00.011,383] <inf> bt_hci_core: Identity: fb:d0:38:54:d1:7a (random)
[00:00:00.011,413] <inf> bt_hci_core: HCI: version 5.0 (0x09) revision 0x0000, manufacturer 0x05f1
[00:00:00.011,413] <inf> bt_hci_core: LMP: version 5.0 (0x09) subver 0xffff
Advertising successfully started

pfalcon · 2018-10-19T11:15:45Z

hello_world does not change at all its output with this PR.

Wait, do you want to say that "redirecting thru logging" doesn't mean that usual logging message envelope will be added to it? Because for me, that's what phrase "redirect thru logging" means, and then again, the commit message could be improved to avoid confusion.

Default to redirecting the printk() calls through the logging subsystem using the special logginng log_printk() function that encapsulates raw strings into buffers and prints them as-is, without any of the logger additional metadata. This has 2 advantages: * Removes potential race conditions between printk() and the logger backend usage of the UART or any other backend * Printsk the printk() statements in-order with the rest of the logs, since now by default printk() messages are deferred Disadvantages: * printk() statements are limited in length * Additional memory consumption by default Signed-off-by: Carles Cufi <[email protected]>

carlescufi · 2018-10-19T11:19:25Z

hello_world does not change at all its output with this PR.

Wait, do you want to say that "redirecting thru logging" doesn't mean that usual logging message envelope will be added to it? Because for me, that's what phrase "redirect thru logging" means, and then again, the commit message could be improved to avoid confusion.

That's right, I edited the commit message to clarify this.

carlescufi · 2018-10-19T11:28:23Z

@aescolar the native_posix build of tests/kernel/common and tests/posix/common is failing. Are you somehow synchronizing on the state of printk() output and therefore failing because this is now deferred?

aescolar · 2018-10-19T12:03:14Z

@carlescufi : regarding tests/posix/common see #10508

aescolar · 2018-10-19T12:15:34Z

@carlescufi : regarding tests/kernel/common it seems it is failing like in all the other platforms.

Since the printk test is verifying the standard printk functionality, ensure that printk is not being redirected to the logging subsystem. Signed-off-by: Carles Cufi <[email protected]>

carlescufi · 2018-10-19T13:16:27Z

@carlescufi : regarding tests/posix/common see #10508

ACK, thanks!

@carlescufi : regarding tests/kernel/common it seems it is failing like in all the other platforms.

Oops, my mistake. Pushed a fix. Thanks!

andyross

This sounds awful to me. Given that the logging subsystem is up and available now, the ONLY remaining use should be for synchronous debug logging, right?

And making printk asynchronous and forcing it to depend on another rather large subsystem is absolutely a hardship for debugging.

I mean, I need to get printk output from interrupt handlers. I need it to work in early boot. I need to see it synchronously so I can debug SMP ordering issues. If that doesn't work by default then I'll be constantly turning this feature off and complaining when support for "traditional" printk bitrots, as it surely will.

I guess the meat of my question is: who are the printk() users who will actually benefit from this change?

carlescufi · 2018-10-19T14:20:47Z

This sounds awful to me. Given that the logging subsystem is up and available now, the ONLY remaining use should be for synchronous debug logging, right?

The only remaining use of printk() is synchronous debug logging and also providing a simple way of outputting strings to a serial backend without additional code (the logging subsystem requires defining a module AFAIK).

And making printk asynchronous and forcing it to depend on another rather large subsystem is absolutely a hardship for debugging.

That is not what we are doing though. We are only making it asynchronous when the users enable CONFIG_LOG. Without CONFIG_LOG enabled, printk remains exactly as it was. So you only need to disable CONFIG_LOG (which is not enabled by default btw).

I mean, I need to get printk output from interrupt handlers. I need it to work in early boot. I need to see it synchronously so I can debug SMP ordering issues. If that doesn't work by default then I'll be constantly turning this feature off and complaining when support for "traditional" printk bitrots, as it surely will.

I guess the meat of my question is: who are the printk() users who will actually benefit from this change?

Most samples use printk(). When using synchronous printk() with asynchronous LOG(), the output is not shown in the right order. Synchronous printk() in ISRs typically yields to non-functioning hardware because of delays introduced. So I'd say most users that use samples as a starting point would benefit.

carlescufi · 2018-10-19T14:33:27Z

@nordic-krch @jarz-nordic @andyross @nashif @jhedberg

That said, @andyross has a point, so there could be an alternative solution. Instead of doing this, we could do the following:

Create a new default logger that the app/sample can LOG() from directly without any extra additions, enabled by default
Replace the calls to printk with LOG() in all samples
Show the usage of printk() as a synchronous logging mechanism in a new sample
Keep CONFIG_LOG_PRINTK=n as of today

Thoughts?

andrewboie · 2018-10-20T05:31:37Z

does this work for fatal CPU exceptions?

nordic-krch · 2018-10-24T06:51:11Z

@andyross logger is intended to be used in interrupt handlers and actually it's much more convenient there since it does not block interrupt for significant amount of time (like synchronous printk will do). It takes <5us to log a message while it will take >200us when synchronous approach is used. Logger is ready from the start (initiliazed as early as possible in _Cstart). Ordering is also ensured since messages are queued sequentially and timestamp is added.

Logger supports multiple backends and each backend can have different filter settings so in future we can add something like dmesg where messages from kernel would be buffered in ram and flushed on shell command.

Another thing is that when printk goes through logger, output is multiplexed on message basis. If printk goes directly to the output it will compete with the logger, or shell or any other user.

@andrewboie
Logger supports panic mode. Once panic mode is triggered logger switches all backends to synchronous mode and starts working in synchronous mode. #10098 added triggering panic modes to fault handlers.

andyross · 2018-10-24T16:01:04Z

Having fast logging doesn't do any good if I never see the message. All I'm saying is that to do my job I absolutely need a trivially easy synchronous output routine I can use anywhere. If that's not going to be printk() then what is it? If it's a non-default mode of the logging subsystem, are we testing it to make sure it always works?

carlescufi · 2018-10-30T17:29:30Z

Since there is no consensus here I will instead send PRs to switch the samples to using LOG() instead of printk(), which probably makes sense anyway.

pfalcon · 2018-10-30T18:11:10Z

@carlescufi : Well, it's hard to refrain from comment that road to far and roundabout is paved with intentions. We've got so advanced, so interstellar logging ... that now we have to try hard to find a way to work it around.

using LOG() instead of printk(), which probably makes sense anyway.

... And while doing so, we're trying to hog the notion of "console", first by shell, now by logging, round and round. Dare I say what I've said before #8168: the only thing we needed in is interrupt-driven, not thread-driven logging, with clear way to make it fully blocking, as it was before This is in turn addressed by #10765.

But surely, let's keep working it around. I don't see that converge to anything though, until logging is scaled down to work in down-to-earth mode.

carlescufi · 2018-10-30T18:36:40Z

@pfalcon not sure I follow the irony-charged statements. The logger might not be ideal, but it's functional and it certainly works better than what we had before.
You say we only needed interrupt-driven logging, but the problem was not only that it was blocking, but also the cycles it took to format the string. That's why we use deferred logging, to be able to log anywhere without impacting the timing too much.

pfalcon · 2018-10-30T18:51:29Z

The logger might not be ideal, but it's functional and it certainly works better than what we had before.
You say we only needed interrupt-driven logging, but the problem was not only that it was blocking, but also the cycles it took to format the string. That's why we use deferred logging, to be able to log anywhere without impacting the timing too much.

Logger is way too good for task which most people have at hand. You know rsyslog? Like, there was normal syslogd, but then they wrote rsyslog. You probably can't run 1024 CPU cluster without it. Not sure anybody need it at home. Or some advanced NoSQL database - not sure about Redis, but MongoDB just must have it. But embedded system? Well, nice niche feature which may be needed sometimes.

That's why we use deferred logging, to be able to log anywhere without impacting the timing too much.

Yes, that's why we use deferred logging. But turns out, deferred logging doesn't play well with non-deferred output. Our solution? Defer output! Just imagine, what is printed as normal output by one thread, will be output by another thread. Nobody else has that, but Zephyr will. There's no irony here - it's plain-word description of the situation we find ourselves in.

pfalcon · 2018-10-30T20:21:10Z

Back to the contents of this PR - I'm debugging a case when on driver initialization it gets bombarded with spurious IRQs. Bug in QEMU, or it's dawning on me that maybe IRQ polarity is wrong. ISR has printk() to have insight into that stuff. As expected, I get absolutely no output with this patch - an unacknowledged interrupt on ARM is effectively an infinite loop. The solution would seem simple with CONFIG_LOG_PRINTK=n, but who knows about it? The commit message in this PR doesn't even mention it, but to keep everyone in the know, that's far from enough. I picture monthly automated FAQ posting to the mailing list along the lines of:

Q: I'm pulling my hair for 2nd day - my stuff just locks up with no output.
A: Jut try CONFIG_LOG_PRINTK=n.

But seriously, let's indeed keep things how they were with printk. I appreciate this issue being closed.

nordic-krch · 2018-10-31T07:41:36Z

@andyross

Having fast logging doesn't do any good if I never see the message. All I'm saying is that to do my job I absolutely need a trivially easy synchronous output routine I can use anywhere.

Have you been experiencing any issues with new deferred logging? We've been using same architecture for some time and it worked well. There is CONFIG_LOG_INPLACE_PROCESS option which handles logs synchronously. Currently i see following cases when you will not see latest logs:

breakpoint
sudden death (e.g power loss, pin reset)
fault before logger is initialized
log thread not being executed.

For each case there is countermeasure you can apply (except sudden death).

nordic-krch · 2018-10-31T07:58:20Z

@pfalcon

Logger is way too good for task which most people have at hand.

I must disagree. When we want to log in interrupts, log to different backends with independent filtering, log from multicores and we have kb of rom and ram and mHz not gHz we need something different than simple printk. You are mainly focusing on the cost of deferred logging forgetting benefits. I heard that using sys_log couldn't be used in usb, i've seen that net ping was decreased hundred times by switching to new logger (#8816 (comment)) and i have seen how we benefited in nordic sdk by switching to deferred logging.

I think we should focus on closing those gaps where possible (like logging in initial phase).

carlescufi requested a review from nordic-krch as a code owner October 19, 2018 10:25

carlescufi requested review from jakub-uC, jhedberg, nashif and pfalcon October 19, 2018 10:25

carlescufi added the RFC Request For Comments: want input from the community label Oct 19, 2018

jakub-uC reviewed Oct 19, 2018

View reviewed changes

jakub-uC approved these changes Oct 19, 2018

View reviewed changes

nordic-krch approved these changes Oct 19, 2018

View reviewed changes

pfalcon requested changes Oct 19, 2018

View reviewed changes

carlescufi force-pushed the log-printk-default branch from f777831 to d32dc5a Compare October 19, 2018 11:19

carlescufi requested a review from aescolar October 19, 2018 11:45

tests: kernel: printk: Disable deferred printk handling

876dcdd

Since the printk test is verifying the standard printk functionality, ensure that printk is not being redirected to the logging subsystem. Signed-off-by: Carles Cufi <[email protected]>

carlescufi requested review from andrewboie, andyross, nniranjhana, pswarnak and spoorthik as code owners October 19, 2018 13:15

andyross reviewed Oct 19, 2018

View reviewed changes

carlescufi closed this Oct 30, 2018

This was referenced Nov 26, 2018

Alleged multiple design and implementation issues with logging #11655

Closed

logging: CONFIG_LOG_INPLACE_PROCESS is not synchronous #11633

Closed

pfalcon mentioned this pull request Jun 24, 2020

libc: newlib: assert: Override handler for C std assert #26024

Closed

[RFC] logging: Default to redirecting printk() through it #10715

[RFC] logging: Default to redirecting printk() through it #10715

Uh oh!

Conversation

carlescufi commented Oct 19, 2018

Uh oh!

jakub-uC Oct 19, 2018

Choose a reason for hiding this comment

Uh oh!

nordic-krch Oct 19, 2018

Choose a reason for hiding this comment

Uh oh!

pfalcon commented Oct 19, 2018

Uh oh!

codecov-io commented Oct 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pfalcon left a comment

Choose a reason for hiding this comment

Uh oh!

nordic-krch commented Oct 19, 2018

Uh oh!

pfalcon commented Oct 19, 2018

Uh oh!

carlescufi commented Oct 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfalcon commented Oct 19, 2018

Uh oh!

carlescufi commented Oct 19, 2018

Uh oh!

carlescufi commented Oct 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aescolar commented Oct 19, 2018

Uh oh!

aescolar commented Oct 19, 2018

Uh oh!

carlescufi commented Oct 19, 2018

Uh oh!

andyross left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlescufi commented Oct 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

carlescufi commented Oct 19, 2018

Uh oh!

andrewboie commented Oct 20, 2018

Uh oh!

nordic-krch commented Oct 24, 2018

Uh oh!

andyross commented Oct 24, 2018

Uh oh!

carlescufi commented Oct 30, 2018

Uh oh!

pfalcon commented Oct 30, 2018

Uh oh!

carlescufi commented Oct 30, 2018

Uh oh!

pfalcon commented Oct 30, 2018

Uh oh!

pfalcon commented Oct 30, 2018

Uh oh!

nordic-krch commented Oct 31, 2018

Uh oh!

nordic-krch commented Oct 31, 2018

Uh oh!

Uh oh!

codecov-io commented Oct 19, 2018 •

edited

Loading

carlescufi commented Oct 19, 2018 •

edited

Loading

carlescufi commented Oct 19, 2018 •

edited

Loading

andyross left a comment •

edited

Loading

carlescufi commented Oct 19, 2018 •

edited

Loading