Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Console logging incorrectly uses Charset.defaultCharset() or UTF-8 #44353

Closed
wants to merge 1 commit into from

Conversation

nosan
Copy link
Contributor

@nosan nosan commented Feb 19, 2025

Prior to this commit, LogbackLoggingSystemProperties doesn't respect Console.charset(). It used Charset.getDefaultCharset() for logback and UTF-8 for log42j as defaults.

This commit changes the behaviour of LogbackLoggingSystemProperties to use
Console.charset() when available. If no console is present, the default
charset is used instead. These changes bring consistency across
logging implementations.

See gh-43118

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Feb 19, 2025
protected Charset getDefaultCharset() {
return StandardCharsets.UTF_8;
return Charset.defaultCharset();
Copy link
Contributor Author

@nosan nosan Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will impact log4j2 CONSOLE_CHARSET and FILE_CHARSET.

Copy link
Contributor

@mhalbritter mhalbritter Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine that the log4j console output is changed, but I'm not sure about the file encoding. I'd prefer if the file stays utf-8. #43118 is about console logging only.

Copy link
Contributor Author

@nosan nosan Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current situation

Log4j2 Logback
Console UTF-8 Charset.default()
File UTF-8 Charset.default()

The proposed PR aims to align the situation as follows:

Log4j2 Logback
Console Console.charset() or Charset.default() Console.charset() or Charset.default()
File Charset.default() Charset.default()

Alternatively, we could leave protected Charset getDefaultCharset() untouched, and this would result in

Log4j2 Logback
Console Console.charset() or UTF-8 Console.charset() or Charset.default()
File UTF-8 Charset.default()

I think it's fine that the log4j console output is changed, but I'm not sure about the file encoding. I'd prefer if the file stays utf-8.

this would result in

Log4j2 Logback
Console Console.charset() or Charset.default() Console.charset() or Charset.default()
File UTF-8 Charset.default()

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last table. I have some changes here, WDYT? https://github.com/mhalbritter/spring-boot/tree/pr/44353

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not retain UTF-8 for Log4j2 in this case?

Log4j2 Logback
Console Console.charset() or UTF-8 Console.charset() or Charset.default()
File UTF-8 Charset.default()

In that case, the logic will remain the same as the current one, except for scenarios
where the Console is available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Phil said here:

I think I'd be in favor of using System.console().charset() if we can and Charset.defaultCharset() if the Console is null. That would mostly align with Logback defaults. The JDK also appears to use Charset.defaultCharset() as a fallback (at least in JDK 17).

Copy link
Contributor Author

@nosan nosan Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this but at the same time UTF-8 is the default one in Log4j2.

https://github.com/apache/logging-log4j2/blob/cab5454e201de95817600e79a26235c72e65d195/log4j-core/src/main/java/org/apache/logging/log4j/core/layout/AbstractStringLayout.java#L194

If it's alright with you, could we wait for @philwebb to provide clarification?

UPDATE:

I didn't notice this comment when writing mine.

I'm actually surprised that the logback file is not written in UTF-8. I'll talk to the team about that.

Prior to this commit, LogbackLoggingSystemProperties
doesn't respect Console.charset(). It used Charset.getDefaultCharset()
for logback and  UTF-8 for log42j as defaults.

This commit changes the behaviour of LogbackLoggingSystemProperties to use
 Console.charset() when available. If no console is present, the default
 charset is used instead.

These changes bring consistency across logging implementations.

See spring-projectsgh-43118

Signed-off-by: Dmytro Nosan <[email protected]>
@mhalbritter mhalbritter added type: bug A general bug and removed status: waiting-for-triage An issue we've not yet triaged labels Feb 25, 2025
@mhalbritter mhalbritter added this to the 3.5.0-M3 milestone Feb 25, 2025
@mhalbritter mhalbritter added the for: merge-with-amendments Needs some changes when we merge label Feb 25, 2025
mhalbritter pushed a commit to mhalbritter/spring-boot that referenced this pull request Feb 25, 2025
Prior to this commit, LogbackLoggingSystemProperties
doesn't respect Console.charset(). It used Charset.getDefaultCharset()
for logback and  UTF-8 for log42j as defaults.

This commit changes the behaviour of LogbackLoggingSystemProperties to use
 Console.charset() when available. If no console is present, the default
 charset is used instead.

These changes bring consistency across logging implementations.

See spring-projectsgh-44353

Signed-off-by: Dmytro Nosan <[email protected]>
@mhalbritter
Copy link
Contributor

I'm actually surprised that the logback file is not written in UTF-8. I'll talk to the team about that.

@mhalbritter mhalbritter added the for: team-meeting An issue we'd like to discuss as a team to make progress label Feb 25, 2025
@mhalbritter
Copy link
Contributor

I investigated that charset stuff a bit more, and I'll try to summarize my findings:

On Linux, with the LANG variable, you can control the value of System.console().charset(), and with -Dfile.encoding you can change Charset.defaultCharset() (however, this is undefined behavior on Java 18+). If you don't specify -Dfile.encoding, Java < 18 uses some environment dependent logic to find the default locale. This is UTF-8 on Linux and MacOS most of the time, and on Windows it's not. On Java 18+ this returns UTF-8 unless backward compatibility mode is enabled.

@mhalbritter
Copy link
Contributor

Archive.zip

This small program prints some environment variables and uses Logback and Log4j without Spring Boot to print some UTF-8 chars.

That's the output:

Java 17

LANG=en_US.utf8 app/build/distributions/app/bin/app     
Java version: 17.0.14
stdout.encoding: null
file.encoding: UTF-8
native.encoding: UTF-8
sun.stdout.encoding: UTF-8
sun.stderr.encoding: UTF-8
sun.jnu.encoding: UTF-8
Console charset: UTF-8
Default charset: UTF-8
System.out.println: 你好
UTF-8: 你好
Console charset: 你好
Default charset: 你好
11:13:29.981 [main] INFO  org.example.App -- Logback: 你好
Log4j: 你好
LANG=en_US.iso885915 app/build/distributions/app/bin/app
Java version: 17.0.14
stdout.encoding: null
file.encoding: ISO-8859-15
native.encoding: ISO-8859-15
sun.stdout.encoding: ISO-8859-15
sun.stderr.encoding: ISO-8859-15
sun.jnu.encoding: ISO-8859-15
Console charset: ISO-8859-15
Default charset: ISO-8859-15
System.out.println: ??
UTF-8: 你好
Console charset: ??
Default charset: ??
11:13:45.118 [main] INFO  org.example.App -- Logback: ??
Log4j: ??

Java 23

LANG=en_US.utf8 app/build/distributions/app/bin/app     
Java version: 23.0.2
stdout.encoding: UTF-8
file.encoding: UTF-8
native.encoding: UTF-8
sun.stdout.encoding: null
sun.stderr.encoding: null
sun.jnu.encoding: UTF-8
Console charset: UTF-8
Default charset: UTF-8
System.out.println: 你好
UTF-8: 你好
Console charset: 你好
Default charset: 你好
11:13:13.565 [main] INFO  org.example.App -- Logback: 你好
Log4j: 你好
LANG=en_US.iso885915 app/build/distributions/app/bin/app
Java version: 23.0.2
stdout.encoding: ISO-8859-15
file.encoding: UTF-8
native.encoding: ISO-8859-15
sun.stdout.encoding: null
sun.stderr.encoding: null
sun.jnu.encoding: ISO-8859-15
Console charset: ISO-8859-15
Default charset: UTF-8
System.out.println: ??
UTF-8: 你好
Console charset: ??
Default charset: 你好
11:12:43.208 [main] INFO  org.example.App -- Logback: 你好
Log4j: 你好

Logback

If charset is not set, ch.qos.logback.core.encoder.LayoutWrappingEncoder.convertToBytes uses String.getBytes() - which uses the default charset (Java < 18: depends on OS, Java >= 18: UTF-8)

Log4j

If the ConsoleAppender doesn't have an explicit layout, it creates a PatternLayout using the default target charset (see logic in org.apache.logging.log4j.core.appender.AbstractAppender.Builder.getOrCreateLayout(java.nio.charset.Charset)).
The default target charset is found by reading the system properties sun.stdout.encoding and sun.stderr.encoding. If null, Charset.defaultCharset() is used.
When explicitly creating a PatternLayout and charset is not set, UTF-8 is used (see constructor of AbstractStringLayout)

@mhalbritter
Copy link
Contributor

mhalbritter commented Feb 27, 2025

In the above output, every time System.out.println: differs from Logback: or from Log4j:, there's a bug where the output doesn't respect the users locale. That's the case for Java 23, because there Charset.defaultCharset() returns UTF-8 (isn't affected by the LANG variable) and both Logback and Log4j use it, but System.out.println() uses the Console.charset() charset, which is affected by the LANG variable. Both Logback and Log4j is affected because they don't use Console.charset().

@mhalbritter
Copy link
Contributor

Then I looked at the behavior when using Spring Boot.

Spring Boot uses UTF-8 as default charset for files and console when using Log4j2.

Spring Boot uses Charset.defaultCharset() as default charset for files and console when using Logback.

Logback

Java 17

LANG=en_US.utf8 java -jar build/libs/console-encoding-boot-0.0.1-SNAPSHOT.jar 

2025-02-27T13:49:27.583+01:00  INFO 63033 --- [console-encoding-boot] [           main] com.example.console_encoding_boot.CLR    : Java version: 17.0.12
2025-02-27T13:49:27.583+01:00  INFO 63033 --- [console-encoding-boot] [           main] com.example.console_encoding_boot.CLR    : Logback: 你好
System.out.println: 你好
LANG=en_US.iso885915 java -jar build/libs/console-encoding-boot-0.0.1-SNAPSHOT.jar 

2025-02-27T13:48:18.627+01:00  INFO 62919 --- [console-encoding-boot] [           main] com.example.console_encoding_boot.CLR    : Java version: 17.0.12
2025-02-27T13:48:18.627+01:00  INFO 62919 --- [console-encoding-boot] [           main] com.example.console_encoding_boot.CLR    : Logback: ??
System.out.println: ??

Java 23

LANG=en_US.utf8 java -jar build/libs/console-encoding-boot-0.0.1-SNAPSHOT.jar

2025-02-27T13:49:59.741+01:00  INFO 63182 --- [console-encoding-boot] [           main] com.example.console_encoding_boot.CLR    : Java version: 23.0.2
2025-02-27T13:49:59.742+01:00  INFO 63182 --- [console-encoding-boot] [           main] com.example.console_encoding_boot.CLR    : Logback: 你好
System.out.println: 你好
LANG=en_US.iso885915 java -jar build/libs/console-encoding-boot-0.0.1-SNAPSHOT.jar

2025-02-27T13:50:29.151+01:00  INFO 63283 --- [console-encoding-boot] [           main] com.example.console_encoding_boot.CLR    : Java version: 23.0.2
2025-02-27T13:50:29.152+01:00  INFO 63283 --- [console-encoding-boot] [           main] com.example.console_encoding_boot.CLR    : Logback: 你好
System.out.println: ??

Log4j2

Java 17

LANG=en_US.utf8 java -jar build/libs/console-encoding-boot-0.0.1-SNAPSHOT.jar

2025-02-27T13:53:26.281+01:00  INFO 65029 --- [console-encoding-boot] [           main] c.e.c.CLR                                : Java version: 17.0.14
2025-02-27T13:53:26.281+01:00  INFO 65029 --- [console-encoding-boot] [           main] c.e.c.CLR                                : Log4j: 你好
System.out.println: 你好
LANG=en_US.iso885915 java -jar build/libs/console-encoding-boot-0.0.1-SNAPSHOT.jar

2025-02-27T13:53:53.553+01:00  INFO 65126 --- [console-encoding-boot] [           main] c.e.c.CLR                                : Java version: 17.0.14
2025-02-27T13:53:53.553+01:00  INFO 65126 --- [console-encoding-boot] [           main] c.e.c.CLR                                : Log4j: 你好
System.out.println: ??

Java 23

LANG=en_US.utf8 java -jar build/libs/console-encoding-boot-0.0.1-SNAPSHOT.jar

2025-02-27T13:54:55.303+01:00  INFO 65290 --- [console-encoding-boot] [           main] c.e.c.CLR                                : Java version: 23.0.2
2025-02-27T13:54:55.303+01:00  INFO 65290 --- [console-encoding-boot] [           main] c.e.c.CLR                                : Log4j: 你好
System.out.println: 你好
LANG=en_US.iso885915 java -jar build/libs/console-encoding-boot-0.0.1-SNAPSHOT.jar

2025-02-27T13:55:17.995+01:00  INFO 65391 --- [console-encoding-boot] [           main] c.e.c.CLR                                : Java version: 23.0.2
2025-02-27T13:55:17.995+01:00  INFO 65391 --- [console-encoding-boot] [           main] c.e.c.CLR                                : Log4j: 你好
System.out.println: ??

@mhalbritter
Copy link
Contributor

Java 23 and Logback is broken, because System.out.println() uses Console.charset() and Logback is configured to use Charset.defaultCharset() (which returns UTF-8).

Java 17 and Log4j is broken, because System.out.println uses the value from LANG but Log4j is hardcoded to use UTF-8.

Java 23 and Log4j is broken, because System.out.println() uses Console.charset() but Log4j is hardcoded to use UTF-8.

@mhalbritter
Copy link
Contributor

mhalbritter commented Feb 27, 2025

So, to clean this up, I think (my brain is a bit exhausted from all the charsets) if we implement it like this it works on all systems:

Log4j2 Logback
Console Console.charset() or Charset.default() Console.charset() or Charset.default()

Which is what this PR does. But it also touches the default file encoding, which I

a) either leave as it is or
b) change to UTF-8

@mhalbritter
Copy link
Contributor

WDYT? Do you agree? I wouldn't be surprised if I mixed something up in between.

@nosan
Copy link
Contributor Author

nosan commented Feb 27, 2025

Thank you very much, @mhalbritter for such a detailed analysis!

Which is what this PR does. But it also touches the default file encoding, which I
a) either leave as it is or
b) change to UTF-8

I think both options are good as they bring consistency between Log4j2 and Logback. I personally prefer to have:

Log4j2 Logback
Console Console.charset() or Charset.default() Console.charset() or Charset.default()
File UTF-8 UTF-8

Having Charset.defaultCharset() for a file also a decent option (https://openjdk.org/jeps/400) UTF-8 will be used as default charset. (JAVA 18+)

@mhalbritter
Copy link
Contributor

mhalbritter commented Feb 28, 2025

Thanks @nosan! I've decided to not touch the file encoding in this PR. I've opened #44472 for that.

That's the new state:

Log4j2 Logback
Console Console.charset() or Charset.default() Console.charset() or Charset.default()
File UTF-8 Charset.default()

@mhalbritter mhalbritter removed for: merge-with-amendments Needs some changes when we merge for: team-meeting An issue we'd like to discuss as a team to make progress labels Feb 28, 2025
@mhalbritter mhalbritter self-assigned this Feb 28, 2025
@mhalbritter mhalbritter changed the title Use Console charset for console logging when available Console logging incorrectly uses Charset.defaultCharset() Feb 28, 2025
@mhalbritter mhalbritter changed the title Console logging incorrectly uses Charset.defaultCharset() Console logging incorrectly uses Charset.defaultCharset() or UTF-8 Feb 28, 2025
@nosan
Copy link
Contributor Author

nosan commented Feb 28, 2025

Thanks @mhalbritter

I believe this is a good compromise. On one hand, the bug is fixed; on the other, the encoding can be revisited in a major version.
I really like it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants