Skip to content

Increase buffer size for readir_r and add support for enumerating filenames > 255 bytes #116619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jozkee
Copy link
Member

@jozkee jozkee commented Jun 13, 2025

readdir_r is the re-entrant version of readdir; it works by calling readdir with a lock. On success it copies the next entry into a user buffer with memcpy(). It is prone to truncation errors if the following is true:

  1. The user buffer (a dirent*) is allocated with a d_name field too small to hold the filename. POSIX is misleading as it suggests that d_name "shall contain a filename of at most {NAME_MAX} + 1," hence why some libraries, including dotnet, assumed filenames do not exceed 256 bytes.
  2. The filesystem allows filenames > 255 bytes. For example, NTFS filename limit is 255 UTF-16 code-units, which can expand to more than 255 bytes, ZFS v2.3.0 introduced support for 1023 bytes filenames.

I noticed trucation doesn't occur on macOS because they use a 1024 size for d_name, I'm borrowing their approach here.

Alternative

An alternative approach is to get rid of readdir_r entirely, but I was hesitant to implement it for the following reasons:

  • it may de-stabilize platforms where readdir is not thread-safe when called using different directory handles.
  • pal_io.c has preprocessors for platforms that I cannot verify would continue working if we change to readdir. Specifically, for AIX, see its licensing.

I will still list the reasons I believe support it:

  • POSIX version 2024 has updated its Rationale to mention "readdir_r() has been marked obsolescent and readdir() is now required to be thread safe as long as there are no concurrent calls to it on a single directory stream". On top of that Future Directions states that the function may be removed in a future version.
  • We are already prefering readdir in platforms that deprecate readdir_r; glibc and bionic are some examples. musl and macOS are examples of the opposite but that I've tested that work with readdir.
  • Increasing the buffer size may become just a temporary fix.
  • Other libraries have been moving away from readdir_r, for example glibc as I already mentioned, and Rust rust-lang/rust@bc04a4e.

cc @AustinWise @NattyNarwhal

@jozkee jozkee self-assigned this Jun 13, 2025
@Copilot Copilot AI review requested due to automatic review settings June 13, 2025 04:29
@jozkee jozkee added this to the 10.0.0 milestone Jun 13, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR increases the internal buffer size used by readdir_r to support filenames longer than 255 bytes and introduces NTFS-specific tests for file enumeration on Linux. It also adjusts the file system decoding buffers and removes an unused helper to streamline filename processing.

  • Adjust readdir_r buffer size in System.Native to avoid truncation errors
  • Add tests and setup files for NTFS on Linux to cover long filenames
  • Update file system entry decoding buffer and remove an obsolete helper function

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/native/libs/System.Native/pal_io.c Updates buffer size calculation for readdir_r based on d_name offset and adds a 1024-byte extension on non-SunOS platforms
src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/ManualTests/System.IO.FileSystem.Manual.Tests.csproj Adds new NTFS test files
src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/ManualTests/NtfsOnLinuxTests.cs Introduces tests covering file enumeration for filenames longer than 255 bytes
src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/ManualTests/NtfsOnLinuxSetup.cs Provides NTFS loopback device setup and teardown for tests
src/libraries/System.Runtime/tests/System.IO.FileSystem.Tests/Directory/EnumerableTests.cs Adds a parallel thread safety test for file enumeration
src/libraries/System.Private.CoreLib/src/System/TimeZoneInfo.Unix.NonAndroid.cs Removes an unused directory entry helper method
src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEntry.Unix.cs Updates the fixed buffer size for file name decoding using a new constant
src/libraries/Common/src/Interop/Unix/System.Native/Interop.ReadDir.cs Improves UTF8 decoding logic in the directory entry retrieval

process.Start();
process.BeginOutputReadLine();
process.BeginErrorReadLine();
process.WaitForExit();
Copy link
Preview

Copilot AI Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider checking the process exit code immediately after WaitForExit() to verify that the shell command executed successfully.

Suggested change
process.WaitForExit();
process.WaitForExit();
if (process.ExitCode != 0)
{
throw new InvalidOperationException($"Shell command failed with exit code {process.ExitCode}. Command: {command}");
}

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Jun 13, 2025

POSIX is misleading as it suggests that d_name "shall contain a filename of at most {NAME_MAX} + 1," hence why some libraries, including dotnet, assumed filenames do not exceed 256 bytes.POSIX is misleading as it suggests that d_name "shall contain a filename of at most {NAME_MAX} + 1," hence why some libraries, including dotnet, assumed filenames do not exceed 256 bytes.

Nit: I am not sure what's misleading about this. POSIX defines ENAMETOOLONG error that's returned for too long file names in number of different situations. I understand that the error handling is not quite right in some implementations, but that does not make the POSIX spec misleading.

An alternative approach is to get rid of readdir_r entirely, but I was hesitant to implement it for the following reason

This was a concern for a low-risk servicing change done on a short notice. I do not think it is concern for main. We should switch to using readdir exclusively in main unless somebody points to a real problem with doing that.

pal_io.c has preprocessors for platforms that I cannot verify would continue working if we change to readdir.pal_io.c has preprocessors for platforms that I cannot verify would continue working if we change to readdir.

You can ask folks who maintain these platforms here.

cc @am11 Are you aware of any issues with switching all platforms to readdir?

@jkotas
Copy link
Member

jkotas commented Jun 13, 2025

lock

Locks are not a problem. We do have our own lock around the calls to readdir/readdir_r.

The question is whether readdir uses process global static variable for storing state between invocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants