Skip to content

Regression: ZipFile/ZipArchive entry name decoding not working correctly when entryNameEncoding is specified #92283

Closed
@deng0

Description

@deng0

Description

The documentation for ZipFile.Open contains the following remarks:

When you open a zip archive file for reading and entryNameEncoding is set to a value other than null, entry names are decoded according to the following rules:
When the language encoding flag is not set, the specified entryNameEncoding is used to decode the entry name.
When the language encoding flag is set, UTF-8 is used to decode the entry name.

This is how it always worked, but in .NET 7 and 8 there seems to be a bug, so that the last rule is no longer applied.
It seems entryNameEncoding is always used, even when the zip file entry has the language encoding flag set.
In my opinion this is a serious regression.

Reproduction Steps

Her are some test zip files, to reproduce the problem:
test_win.zip
test_dotnet.zip

The first zip was created by the windows 11 file explorer and the second was created by .NET without specifying entryNameEncoding. The windows file explorer does not set the language encoding flag, but .NET does.
The problems begin when you try to read those zip files with .NET.

When reading a zip file with .NET you always had to specify the entryNameEncoding, otherwise the special file name characters would not be read correctly. Something like this:

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Encoding entryNameEncoding = Encoding.GetEncoding(850);
ZipFile.ExtractToDirectory(@"C:\temp\test_win.zip", @"c:\temp\test_win_extracted", entryNameEncoding, true);
ZipFile.ExtractToDirectory(@"C:\temp\test_donet.zip", @"c:\temp\test_dotnet_extracted", entryNameEncoding, true);

Expected behavior

For both zip files the name of the extracted file should be "Nürburgring.txt"

Actual behavior

The zip file created by the file explorer is correctly extracted, but the other file is not correctly extracted "N├╝rburgring.txt" when using .NET 7/8.

Regression?

In .NET Framework and .NET 6 this worked correctly.
You could always specify an entryNameEncoding and .NET correctly respected the language encoding flag.

Known Workarounds

No response

Configuration

No response

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions