-
Notifications
You must be signed in to change notification settings - Fork 41k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Console logging incorrectly uses Charset.defaultCharset() or UTF-8 #44353
Conversation
protected Charset getDefaultCharset() { | ||
return StandardCharsets.UTF_8; | ||
return Charset.defaultCharset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will impact log4j2 CONSOLE_CHARSET
and FILE_CHARSET
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine that the log4j console output is changed, but I'm not sure about the file encoding. I'd prefer if the file stays utf-8. #43118 is about console logging only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current situation
Log4j2 | Logback | |
---|---|---|
Console | UTF-8 | Charset.default() |
File | UTF-8 | Charset.default() |
The proposed PR aims to align the situation as follows:
Log4j2 | Logback | |
---|---|---|
Console | Console.charset() or Charset.default() | Console.charset() or Charset.default() |
File | Charset.default() | Charset.default() |
Alternatively, we could leave protected Charset getDefaultCharset()
untouched, and this would result in
Log4j2 | Logback | |
---|---|---|
Console | Console.charset() or UTF-8 | Console.charset() or Charset.default() |
File | UTF-8 | Charset.default() |
I think it's fine that the log4j console output is changed, but I'm not sure about the file encoding. I'd prefer if the file stays utf-8.
this would result in
Log4j2 | Logback | |
---|---|---|
Console | Console.charset() or Charset.default() | Console.charset() or Charset.default() |
File | UTF-8 | Charset.default() |
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last table. I have some changes here, WDYT? https://github.com/mhalbritter/spring-boot/tree/pr/44353
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not retain UTF-8
for Log4j2 in this case?
Log4j2 | Logback | |
---|---|---|
Console | Console.charset() or UTF-8 | Console.charset() or Charset.default() |
File | UTF-8 | Charset.default() |
In that case, the logic will remain the same as the current one, except for scenarios
where the Console
is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Phil said here:
I think I'd be in favor of using
System.console().charset()
if we can andCharset.defaultCharset()
if theConsole
isnull
. That would mostly align with Logback defaults. The JDK also appears to useCharset.defaultCharset()
as a fallback (at least in JDK 17).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this but at the same time UTF-8
is the default one in Log4j2.
If it's alright with you, could we wait for @philwebb to provide clarification?
UPDATE:
I didn't notice this comment when writing mine.
I'm actually surprised that the logback file is not written in UTF-8. I'll talk to the team about that.
Prior to this commit, LogbackLoggingSystemProperties doesn't respect Console.charset(). It used Charset.getDefaultCharset() for logback and UTF-8 for log42j as defaults. This commit changes the behaviour of LogbackLoggingSystemProperties to use Console.charset() when available. If no console is present, the default charset is used instead. These changes bring consistency across logging implementations. See spring-projectsgh-43118 Signed-off-by: Dmytro Nosan <[email protected]>
Prior to this commit, LogbackLoggingSystemProperties doesn't respect Console.charset(). It used Charset.getDefaultCharset() for logback and UTF-8 for log42j as defaults. This commit changes the behaviour of LogbackLoggingSystemProperties to use Console.charset() when available. If no console is present, the default charset is used instead. These changes bring consistency across logging implementations. See spring-projectsgh-44353 Signed-off-by: Dmytro Nosan <[email protected]>
I'm actually surprised that the logback file is not written in UTF-8. I'll talk to the team about that. |
I investigated that charset stuff a bit more, and I'll try to summarize my findings: On Linux, with the |
This small program prints some environment variables and uses Logback and Log4j without Spring Boot to print some UTF-8 chars. That's the output: Java 17
Java 23
LogbackIf Log4jIf the |
In the above output, every time |
Then I looked at the behavior when using Spring Boot. Spring Boot uses UTF-8 as default charset for files and console when using Log4j2. Spring Boot uses LogbackJava 17
Java 23
Log4j2Java 17
Java 23
|
Java 23 and Logback is broken, because Java 17 and Log4j is broken, because Java 23 and Log4j is broken, because |
So, to clean this up, I think (my brain is a bit exhausted from all the charsets) if we implement it like this it works on all systems:
Which is what this PR does. But it also touches the default file encoding, which I a) either leave as it is or |
WDYT? Do you agree? I wouldn't be surprised if I mixed something up in between. |
Thank you very much, @mhalbritter for such a detailed analysis!
I think both options are good as they bring consistency between Log4j2 and Logback. I personally prefer to have:
Having |
Thanks @mhalbritter I believe this is a good compromise. On one hand, the bug is fixed; on the other, the encoding can be revisited in a major version. |
See spring-projectsgh-44353 Signed-off-by: Johnny Lim <[email protected]>
Prior to this commit,
LogbackLoggingSystemProperties
doesn't respectConsole.charset()
. It usedCharset.getDefaultCharset()
for logback andUTF-8
for log42j as defaults.This commit changes the behaviour of
LogbackLoggingSystemProperties
to useConsole.charset()
when available. If no console is present, the defaultcharset is used instead. These changes bring consistency across
logging implementations.
See gh-43118