Skip to content

[Windows] Illustration of fix in FileSystemWatcher to prevent SOH fragmentation from memory leak of pinned 8K byte[] buffers #116613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

subbucse
Copy link

Illustration of fix in FileSystemWatcher to prevent SOH fragmentation from memory leak of pinned 8K byte[] buffers

  • Illustration with comments of race condition between garbage collection of FileSystemWatcher instance and directory changes callback with errorCode = ERROR_OPERATION_ABORTED on IOCP thread, when FileSystemWatcher is disposed before releasing its last reference.
  • When GC wins the race, we are leaving the door open to leaking the pinned 8k byte[] on SOH which over time will cause heavy heap fragmentation preventing heap compaction and affecting allocation and GC performance.
  • Placeholder for fix with cleanup code when we detect that the WeakRef in the AsyncState in the IO callback has been released.
  • Suggestion to move pinned buffer allocation to POH at time of alloc instead of alloc on SOH and pinning soon after, so that fragmentation is avoided even when FileSystemWatcher is kept alive.
  • Will attach a sample program to produce the memory leak and also if needed (to microsoft internal maintainers) I can produce a process dump with full heaps from our production service which shows the leak.

… from memory leak of pinned 8K byte[] buffers
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 12, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@subbucse subbucse marked this pull request as draft June 12, 2025 23:58
…atcher in the PR as FileSystemWatcherHeapLeak.csproj

- Please run with strictest input parameters as disposeProbability = 100, maxGeneration = 2, compacting = true, and heap will start growing from leaked 8K byte[] buffers eventually reaching ~2 GB on my devbox over 3-4 hours.
- If you relax the parameters, say disposeProbability < 100, maxGeneration = 0/1, compacting = false, you will only *increase* the rate at which heap leak shows up
* 2. Removing the need to pin the buffer using GCHandle later on when provisioning / Packing the ThreadPoolBoundHandleOverlapped,
* which is done when allocating a new PreAllocatedOverlapped below.
*/
byte[] buffer = AllocateBuffer(pinned: true);
Copy link
Author

@subbucse subbucse Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub You mentioned in our teams chat that there was explicit intent to avoid pinning on POH because the pinning is supposed to be temporary. Did you mean that you expect the pinned buffer to be unpinned and given up for GC after the IO callback comes in from every async overlapped call to ReadDirectoryChangesW ?

If so, I understand not using POH, because the pinning is highly transient, but that's not what's happening now. As you see below, we only call state.ThreadPoolBinding.FreeNativeOverlapped(overlappedPointer) which will internally call AsyncState.PreAllocatedOverlapped.Release(), and NOT AsyncState.PreAllocatedOverlapped.Dispose() which is what ultimately unpins and frees up the pinned 8K byte[] buffer.

So if the intent is to keep the buffer pinned until end of lifetime of FSW, then shouldn't we consider using POH since that timeframe could be long?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean that you expect the pinned buffer to be unpinned and given up for GC after the IO callback comes in from every async overlapped call to ReadDirectoryChangesW ?

Not necessarily. My statement was about why we avoid the Pinned Object Heap, because things on that heap are never moved and thus creating allocations there that won't effectively be there for the remainder of the process can lead to significant fragmentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the intent is to keep the buffer pinned until end of lifetime of FSW, then shouldn't we consider using POH since that timeframe could be long?

There can be a significant difference between the lifetime of an FSW instance and the lifetime of the process. It's totally feasible to create transient FSWs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants