Skip to content

Avoid sort allocations #116109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Avoid sort allocations #116109

wants to merge 1 commit into from

Conversation

2A5F
Copy link

@2A5F 2A5F commented May 29, 2025

Modified ArraySortHelper and added ArraySortHelper<T, TComparer> to allow generic struct TComparer without memory allocation

Span.Sort<T, TComparer> will not box now

#39466
#39543

@jkotas
Copy link
Member

jkotas commented May 31, 2025

@EgorBot -intel

using BenchmarkDotNet.Attributes;

public class Bench
{
    int[] a = new int[10000];
    Comparison<int> c = (int x, int y) => x - y;

    [IterationSetup]
    public void IterationSetup() { for (int i = 0; i < a.Length; i++) a[i] = i; }

    [Benchmark]
    public void Sort() => Array.Sort(a, c);
}

@jkotas jkotas added area-System.Memory tenet-performance Performance related issue and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 31, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented May 31, 2025

Could you please share perf numbers that demonstrate the performance improvements?

The one micro-benchmark that I have kicked off above shows ~10% regression: EgorBot/runtime-utils#370 (comment) .

@2A5F
Copy link
Author

2A5F commented May 31, 2025

No, I haven't done any performance testing.
Maybe passing TComparer by ref causes performance overhead.
Can this test bot output jit diffs?

@2A5F 2A5F marked this pull request as draft May 31, 2025 09:29
@adamsitnik
Copy link
Member

@2A5F You can use the benchmarks from the performance repo (we use them for ensuring there are no regressions):

https://github.com/dotnet/performance/blob/3499787dbbde3807402516462453e67097066d4a/src/benchmarks/micro/libraries/System.Collections/Sort.cs#L53-L54

You can both trace files and disassembler output:

https://github.com/dotnet/performance/tree/main/src/benchmarks/micro#quick-start

@2A5F 2A5F force-pushed the avoid-sort-allocations branch from c46122b to 1b22a69 Compare June 2, 2025 15:07
@2A5F
Copy link
Author

2A5F commented Jun 2, 2025

@adamsitnik Thanks for the notice
@jkotas After testing, I think it is impossible to unify the generic paths without regression, so for now I will only use generic in the Span.Sort<TComparer>.


Test Code

base on https://github.com/dotnet/performance/blob/3499787dbbde3807402516462453e67097066d4a/src/benchmarks/micro/libraries/System.Collections/Sort.cs

[Benchmark]
public void Span_ComparerStruct() => _arrays[_iterationIndex++].AsSpan(0, Size).Sort(new ComparableComparerStruct());

private readonly struct ComparableComparerStruct : IComparer<T>
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public int Compare(T x, T y) => x.CompareTo(y);
}

All test on

BenchmarkDotNet v0.14.1-nightly.20250107.205, Windows 10 (10.0.20348.1906) (VMware)
AMD Ryzen 9 7950X 4.49GHz, 16 CPU, 32 logical and 32 physical cores
.NET SDK 10.0.100-preview.6.25281.103
  [Host]     : .NET 10.0.0 (10.0.25.28203), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-VBHUQD : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-BXQXFN : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

PowerPlanMode=00000000-0000-0000-0000-000000000000  InvocationCount=5000  IterationTime=250ms
MaxIterationCount=20  MinIterationCount=15  MinWarmupIterationCount=6
UnrollFactor=1  WarmupCount=-1

Int32

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 3.562 μs 0.1579 μs 0.1819 μs 3.547 μs 3.312 μs 3.931 μs 0.30 0.02 - 1,126 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 11.943 μs 0.2768 μs 0.3188 μs 11.825 μs 11.523 μs 12.474 μs 1.00 0.04 - 1,910 B 88 B 1.00

BigStruct

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 4.677 μs 0.1230 μs 0.1417 μs 4.645 μs 4.362 μs 4.949 μs 0.34 0.01 - 1,352 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 13.659 μs 0.2965 μs 0.3173 μs 13.594 μs 13.215 μs 14.288 μs 1.00 0.03 - 2,426 B 88 B 1.00

IntClass

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 23.89 μs 0.541 μs 0.601 μs 23.92 μs 22.59 μs 24.86 μs 0.92 0.03 - 2,958 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 26.00 μs 0.519 μs 0.556 μs 26.01 μs 24.97 μs 26.85 μs 1.00 0.03 - 968 B 88 B 1.00

IntStruct

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 3.054 μs 0.1060 μs 0.1221 μs 3.006 μs 2.880 μs 3.293 μs 0.26 0.01 - 1,102 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 11.784 μs 0.2308 μs 0.2159 μs 11.787 μs 11.312 μs 12.174 μs 1.00 0.03 - 1,916 B 88 B 1.00

String

Method Job Toolchain Size Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Code Size Allocated Alloc Ratio
Span_ComparerStruct Job-VBHUQD pr 512 153.6 μs 1.99 μs 1.76 μs 153.6 μs 150.0 μs 156.8 μs 1.00 0.02 - 2,951 B - 0.00
Span_ComparerStruct Job-BXQXFN main 512 154.3 μs 2.19 μs 2.05 μs 153.6 μs 151.8 μs 158.0 μs 1.00 0.02 - 968 B 88 B 1.00

@2A5F
Copy link
Author

2A5F commented Jun 2, 2025

Ideas and todo:

  1. Consider get a pointer to interface method to simulate a value type delegate
  2. Delegate struct wrapper has 1% regression (this test result may be invalid), maybe this is acceptable, or try take out the pointer and rewrap it
  3. Null ICompear<T> path can replace to Compear<T>.Default to avoid interface virtual call

  1. Not feasible, C# does not have syntax to get instance method pointers. Even if use InlineIL.Fody to get it, the benchmark is not as good as the delegate.
  2. Since 1 is not feasible, then 2 is also not feasible.

@andrewjsaid
Copy link
Contributor

To avoid duplicate code, could you do something like this:

struct DelegateWrappedComparer<T> : IComparer<T>
{
    private Comparer<T> _comparer;

    public int Compare(T? x, T? y) => _comparer.Compare(x, y);
}

And point the "old" version of the method to the new one, with this wrapper?

Hope this is clear.

@2A5F
Copy link
Author

2A5F commented Jun 3, 2025

@andrewjsaid For Comparer<T>, I'm not sure if this makes sense. Perhaps a struct wrapper can convert an interface virtual call into an abstract class virtual call; however public apis uses only IComparer<T> not Comparer<T>. And Comparer<T>.Default have some jit magic (I don't know what it is, but the test results are like this) so dose not need struct wrapper.


Delegate is Comparison<T> and the test results have regression EgorBot/runtime-utils#370 (comment)


This test has expired/is invalid. I will retest the new code later when I have time.

(benchmarks are really slow)

@andrewjsaid
Copy link
Contributor

andrewjsaid commented Jun 3, 2025

My reasoning was that by using the struct wrapper, methods like the below would be specialized for that struct and the JIT would be able to inline the call to DelegateWrappedComparer<T>.CompareTo thus making the SwapIfGreater<TComparer>(...) and SwapIfGreater(...)` equivalent.

private static void SwapIfGreater<TComparer>(Span<T> keys, TComparer comparer, int i, int j)
  where TComparer : IComparer<T>, allows ref struct

However on further reflection I note that currently since SwapIfGreater is not generic, by specializing it per DelegateWrappedComparer<T> this would mean that instead of having a single re-usable method if T is a class (via __CANON) the JIT would have to create a new copy for each T which can create significant bloat. Not sure if worth the maintenance cost of duplicate methods.

@2A5F
Copy link
Author

2A5F commented Jun 3, 2025

However on further reflection I note that currently since SwapIfGreater is not generic, by specializing it per DelegateWrappedComparer this would mean that instead of having a single re-usable method if T is a class (via __CANON) the JIT would have to create a new copy for each T which can create significant bloat. Not sure if worth the maintenance cost of duplicate methods.

#39466 (comment)

@andrewjsaid
Copy link
Contributor

Oops already suggested 🤦
Too many issues / PRs with similar names I missed that one

Thanks for the link

@2A5F
Copy link
Author

2A5F commented Jun 3, 2025

https://github.com/2A5F/dotnet-runtime/tree/try-uni-path

Test results show that it is impossible to unify the path, therefore we should stick to only using the generic path on Span.Sort<TComparer>.

results.zip

Int32

Method Job Toolchain Size Mean Error Ratio Code Size Allocated Alloc Ratio
LinqQuery Job-GTWJCC pr 512 11.922 μs 0.2930 μs 1.80 NA 6424 B 1.00
LinqQuery Job-IYQVVX main 512 6.650 μs 0.2414 μs 1.00 NA 6424 B 1.00
LinqOrderByExtension Job-GTWJCC pr 512 11.844 μs 0.2215 μs 1.74 NA 6424 B 1.00
LinqOrderByExtension Job-IYQVVX main 512 6.819 μs 0.1276 μs 1.00 NA 6424 B 1.00
Array Job-GTWJCC pr 512 3.129 μs 0.1116 μs 1.08 1,398 B - NA
Array Job-IYQVVX main 512 2.911 μs 0.1527 μs 1.00 1,471 B - NA
Array_ComparerClass Job-GTWJCC pr 512 9.033 μs 0.1730 μs 2.37 2,039 B 64 B 1.00
Array_ComparerClass Job-IYQVVX main 512 3.817 μs 0.0552 μs 1.00 2,854 B 64 B 1.00
Array_ComparerStruct Job-GTWJCC pr 512 11.110 μs 0.1638 μs 1.03 2,054 B 88 B 1.00
Array_ComparerStruct Job-IYQVVX main 512 10.756 μs 0.1067 μs 1.00 1,935 B 88 B 1.00
Array_Comparison Job-GTWJCC pr 512 8.869 μs 0.1738 μs 2.25 1,463 B - NA
Array_Comparison Job-IYQVVX main 512 3.951 μs 0.0790 μs 1.00 2,237 B - NA
Span Job-GTWJCC pr 512 3.077 μs 0.0635 μs 1.10 1,382 B - NA
Span Job-IYQVVX main 512 2.794 μs 0.0783 μs 1.00 1,454 B - NA
Span_ComparerClass Job-GTWJCC pr 512 10.003 μs 0.0744 μs 2.52 2,017 B 64 B 1.00
Span_ComparerClass Job-IYQVVX main 512 3.965 μs 0.0792 μs 1.00 2,832 B 64 B 1.00
Span_ComparerStruct Job-GTWJCC pr 512 2.948 μs 0.0738 μs 0.26 1,121 B - 0.00
Span_ComparerStruct Job-IYQVVX main 512 11.357 μs 0.2478 μs 1.00 1,910 B 88 B 1.00
Span_Comparison Job-GTWJCC pr 512 8.984 μs 0.1889 μs 2.53 1,479 B - NA
Span_Comparison Job-IYQVVX main 512 3.553 μs 0.1403 μs 1.00 2,269 B - NA
List Job-GTWJCC pr 512 3.035 μs 0.0765 μs 1.01 1,425 B - NA
List Job-IYQVVX main 512 3.012 μs 0.0939 μs 1.00 1,490 B - NA

IntStruct

Method Job Toolchain Size Mean Error Ratio Code Size Allocated Alloc Ratio
LinqQuery Job-GTWJCC pr 512 11.354 μs 0.2133 μs 1.67 NA 6424 B 1.00
LinqQuery Job-IYQVVX main 512 6.827 μs 0.3876 μs 1.00 NA 6424 B 1.00
LinqOrderByExtension Job-GTWJCC pr 512 11.717 μs 0.3758 μs 1.81 NA 6424 B 1.00
LinqOrderByExtension Job-IYQVVX main 512 6.499 μs 0.2709 μs 1.00 NA 6424 B 1.00
Array Job-GTWJCC pr 512 2.930 μs 0.0657 μs 0.99 1,563 B - NA
Array Job-IYQVVX main 512 2.964 μs 0.1239 μs 1.00 1,563 B - NA
Array_ComparerClass Job-GTWJCC pr 512 4.503 μs 0.1320 μs 1.15 2,538 B 64 B 1.00
Array_ComparerClass Job-IYQVVX main 512 3.932 μs 0.2248 μs 1.00 2,794 B 64 B 1.00
Array_ComparerStruct Job-GTWJCC pr 512 12.385 μs 0.3028 μs 1.13 2,100 B 88 B 1.00
Array_ComparerStruct Job-IYQVVX main 512 10.946 μs 0.2124 μs 1.00 1,939 B 88 B 1.00
Array_Comparison Job-GTWJCC pr 512 4.531 μs 0.1236 μs 1.04 1,949 B - NA
Array_Comparison Job-IYQVVX main 512 4.361 μs 0.1659 μs 1.00 2,214 B - NA
Span Job-GTWJCC pr 512 2.963 μs 0.1268 μs 0.92 1,523 B - NA
Span Job-IYQVVX main 512 3.236 μs 0.1359 μs 1.00 1,523 B - NA
Span_ComparerClass Job-GTWJCC pr 512 4.659 μs 0.1869 μs 1.28 2,519 B 64 B 1.00
Span_ComparerClass Job-IYQVVX main 512 3.656 μs 0.1346 μs 1.00 2,790 B 64 B 1.00
Span_ComparerStruct Job-GTWJCC pr 512 2.954 μs 0.1297 μs 0.26 1,102 B - 0.00
Span_ComparerStruct Job-IYQVVX main 512 11.240 μs 0.2037 μs 1.00 1,913 B 88 B 1.00
Span_Comparison Job-GTWJCC pr 512 4.651 μs 0.1655 μs 1.19 1,960 B - NA
Span_Comparison Job-IYQVVX main 512 3.916 μs 0.0757 μs 1.00 2,216 B - NA
List Job-GTWJCC pr 512 3.013 μs 0.1068 μs 1.00 1,625 B - NA
List Job-IYQVVX main 512 3.021 μs 0.0867 μs 1.00 1,588 B - NA

@2A5F
Copy link
Author

2A5F commented Jun 3, 2025

@2A5F
Copy link
Author

2A5F commented Jun 3, 2025

Here are the benchmark results for the current pr branch

results.zip

Int32

Method Job Toolchain Size Mean Error Ratio Code Size Allocated Alloc Ratio
LinqQuery Job-DLLHDC pr 512 7.006 μs 0.2085 μs 1.04 NA 6424 B 1.00
LinqQuery Job-KCPLYN main 512 6.770 μs 0.2490 μs 1.00 NA 6424 B 1.00
LinqOrderByExtension Job-DLLHDC pr 512 6.841 μs 0.1929 μs 1.05 NA 6424 B 1.00
LinqOrderByExtension Job-KCPLYN main 512 6.534 μs 0.1585 μs 1.00 NA 6424 B 1.00
Array Job-DLLHDC pr 512 3.028 μs 0.0947 μs 1.09 1,471 B - NA
Array Job-KCPLYN main 512 2.791 μs 0.1001 μs 1.00 1,473 B - NA
Array_ComparerClass Job-DLLHDC pr 512 3.900 μs 0.1326 μs 0.98 2,861 B 64 B 1.00
Array_ComparerClass Job-KCPLYN main 512 3.967 μs 0.1316 μs 1.00 2,836 B 64 B 1.00
Array_ComparerStruct Job-DLLHDC pr 512 11.220 μs 0.2205 μs 1.03 1,933 B 88 B 1.00
Array_ComparerStruct Job-KCPLYN main 512 10.890 μs 0.2212 μs 1.00 1,936 B 88 B 1.00
Array_Comparison Job-DLLHDC pr 512 3.816 μs 0.0728 μs 0.93 2,260 B - NA
Array_Comparison Job-KCPLYN main 512 4.110 μs 0.0821 μs 1.00 2,265 B - NA
Span Job-DLLHDC pr 512 3.010 μs 0.0930 μs 0.97 1,452 B - NA
Span Job-KCPLYN main 512 3.124 μs 0.1247 μs 1.00 1,452 B - NA
Span_ComparerClass Job-DLLHDC pr 512 3.480 μs 0.1144 μs 0.93 2,837 B 64 B 1.00
Span_ComparerClass Job-KCPLYN main 512 3.762 μs 0.1218 μs 1.00 2,813 B 64 B 1.00
Span_ComparerStruct Job-DLLHDC pr 512 3.018 μs 0.0947 μs 0.25 1,126 B - 0.00
Span_ComparerStruct Job-KCPLYN main 512 11.924 μs 0.3302 μs 1.00 1,910 B 88 B 1.00
Span_Comparison Job-DLLHDC pr 512 3.936 μs 0.0932 μs 0.98 2,271 B - NA
Span_Comparison Job-KCPLYN main 512 4.004 μs 0.1429 μs 1.00 2,274 B - NA
List Job-DLLHDC pr 512 3.044 μs 0.1474 μs 1.01 1,532 B - NA
List Job-KCPLYN main 512 3.010 μs 0.1136 μs 1.00 1,490 B - NA

IntStruct

Method Job Toolchain Size Mean Error Ratio Code Size Allocated Alloc Ratio
LinqQuery Job-DLLHDC pr 512 6.521 μs 0.1350 μs 1.06 NA 6424 B 1.00
LinqQuery Job-KCPLYN main 512 6.174 μs 0.1346 μs 1.00 NA 6424 B 1.00
LinqOrderByExtension Job-DLLHDC pr 512 6.290 μs 0.1650 μs 0.94 NA 6424 B 1.00
LinqOrderByExtension Job-KCPLYN main 512 6.710 μs 0.2066 μs 1.00 NA 6424 B 1.00
Array Job-DLLHDC pr 512 2.929 μs 0.0714 μs 0.96 1,556 B - NA
Array Job-KCPLYN main 512 3.062 μs 0.0857 μs 1.00 1,535 B - NA
Array_ComparerClass Job-DLLHDC pr 512 3.875 μs 0.0897 μs 0.88 2,794 B 64 B 1.00
Array_ComparerClass Job-KCPLYN main 512 4.407 μs 0.1691 μs 1.00 2,817 B 64 B 1.00
Array_ComparerStruct Job-DLLHDC pr 512 10.709 μs 0.1716 μs 0.99 1,939 B 88 B 1.00
Array_ComparerStruct Job-KCPLYN main 512 10.771 μs 0.1996 μs 1.00 1,939 B 88 B 1.00
Array_Comparison Job-DLLHDC pr 512 3.920 μs 0.1806 μs 0.92 2,225 B - NA
Array_Comparison Job-KCPLYN main 512 4.272 μs 0.1662 μs 1.00 2,201 B - NA
Span Job-DLLHDC pr 512 3.041 μs 0.0855 μs 1.00 1,544 B - NA
Span Job-KCPLYN main 512 3.047 μs 0.0720 μs 1.00 1,516 B - NA
Span_ComparerClass Job-DLLHDC pr 512 4.036 μs 0.1537 μs 1.01 2,810 B 64 B 1.00
Span_ComparerClass Job-KCPLYN main 512 3.987 μs 0.1042 μs 1.00 2,818 B 64 B 1.00
Span_ComparerStruct Job-DLLHDC pr 512 2.805 μs 0.0819 μs 0.23 1,102 B - 0.00
Span_ComparerStruct Job-KCPLYN main 512 12.106 μs 0.2035 μs 1.00 1,916 B 88 B 1.00
Span_Comparison Job-DLLHDC pr 512 4.077 μs 0.1523 μs 0.99 2,253 B - NA
Span_Comparison Job-KCPLYN main 512 4.133 μs 0.1298 μs 1.00 2,238 B - NA
List Job-DLLHDC pr 512 2.929 μs 0.0865 μs 0.95 1,585 B - NA
List Job-KCPLYN main 512 3.090 μs 0.1255 μs 1.00 1,604 B - NA

@2A5F 2A5F marked this pull request as ready for review June 3, 2025 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Memory community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants