-
Notifications
You must be signed in to change notification settings - Fork 5k
Fix regression in Array.Sort for floats/doubles #37941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regression in Array.Sort for floats/doubles #37941
Conversation
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs
Show resolved
Hide resolved
046f85c
to
07dcfe1
Compare
src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs
Show resolved
Hide resolved
07dcfe1
to
04e1399
Compare
@stephentoub perhaps consider testing performance of following type too: public class ComparableClassInt32
: IComparable<ComparableClassInt32>
{
public readonly int Value;
public ComparableClassInt32(int value) =>
Value = value;
public int CompareTo(ComparableClassInt32 other) =>
Value.CompareTo(other.Value);
} basic reference type overhead. Since |
A knowing smile? 😉 Yes, this case regresses. It appears to be due to dictionary lookups when calling LessThan/GreaterThan, which also prevent inlining. Evaluating options... |
Ha yeah 😉 Back in 2018 I went through all "stages of inlining": huh?, wuhuu! (AggressiveInlining), doh! (reference type regression), oh come on! (JIT-me-not issues) 😅 Looking forward to what you come up with. 😀 |
#38229 will hopefully be the solution here. |
I'm certainly open to reconsidering #10048. The changes are simple enough, though I might end up restricting it to But I still don't have a clear picture of how allowing this actually provides benefit -- it sounds like you think with this we can unify some code and either make it perform better or at least not lose any perf. So perhaps a benchmark along these lines would be instructive? |
Not to put words in @nietras mouth, but I expect what he's hoping to do is improve the
Sounds like a good thing to add to dotnet/performance. |
@stephentoub exactly. :) Although, I am not sure we could unify on Note this is not just about sorting. Sorting, however, for me is a good example of where .NET comes short. I use the value type as a inlineable "functor" pattern for data processing algorithms. Think loops over millions of elements. Unfortunately, we can't unify on this pattern due to these kinds of issues, which
It's not just the allocation. It's so the compare can be inlined. So the "functor" can be applied inlined. To yield a customized loop for performance. As you probably know.
@AndyAyersMS I don't know what kind of benchmark you are thinking about but something simple like below shows the issue. using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
namespace CompareBenchmarking
{
public class Program
{
static void Main(string[] args) =>
BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);
}
public class CompareFloat : Compare<float> { protected override float GetNext() => _random.Next(); }
public class CompareInt32 : Compare<int> { protected override int GetNext() => _random.Next(); }
public struct ComparisonComparer<T> : IComparer<T>
{
readonly Comparison<T> _comparison;
public ComparisonComparer(Comparison<T> comparison) =>
_comparison = comparison;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public int Compare(T x, T y) => _comparison(x, y);
}
[MemoryDiagnoser]
[DisassemblyDiagnoser]
public abstract class Compare<T>
where T : IComparable<T>
{
static readonly Comparer<T> _comparer = Comparer<T>.Default;
static readonly Comparison<T> _comparison = Comparer<T>.Default.Compare;
readonly ComparisonComparer<T> _comparisonComparer = new ComparisonComparer<T>(_comparison);
protected Random _random;
T _x;
T _y;
protected abstract T GetNext();
[GlobalSetup]
public void Setup()
{
_random = new Random(42);
_x = GetNext();
_y = GetNext();
}
[Benchmark]
public int CompareTo() => _x.CompareTo(_y);
[Benchmark]
public int Comparer() => _comparer.Compare(_x, _y);
[Benchmark(Baseline = true)]
public int Comparison() => _comparison(_x, _y);
[Benchmark]
public int ComparisonComparer() => _comparisonComparer.Compare(_x, _y);
}
} With the following results on .NET 5.0 Preview 2. The factor of 2.31x pretty much says it all although that would be pretty self-evident given the extra indirection and code generation issues. Just imagine this in a tight loop. :) BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.329 (2004/?/20H1)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.2.20176.6
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.16006, CoreFX 5.0.20.16006), X64 RyuJIT
DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.16006, CoreFX 5.0.20.16006), X64 RyuJIT CompareInt32
CompareFloat
If this is interesting I can make a PR to the benchmark repo. |
04e1399
to
356828c
Compare
356828c
to
a3e3d79
Compare
RyuJit would not inline methods that contained delegate invokes. Remove this limitation. Closes dotnet#10048. See also dotnet#37941.
Several months back we moved the sorting logic for primtive types out of native code into managed. Doing so helped to make the logic reusable for spans and helped to reduce GC latency, and also actually helped with throughput in a variety of cases. But it ended up regressing throughput for sorting larger arrays of floating-point values, with float/double.CompareTo not getting inlined, and even if it were inlined, containing much more logic than was present in the native implementation. The native implementation did a pre-pass to move all NaNs to the front and then just used simple < and > comparison operations, so the managed implementation now does as well.
a3e3d79
to
2bec1bf
Compare
@stephentoub should I try make a PR for the Also thank you for the mention in your awesome Performance Improvements in .NET 5 :) |
Hi @nietras, @stephentoub is OOF. I think it would be a good idea to start the PR for the I have opened #39466 to have this tracked. For time line, I do not expect we would be able to get this change into .NET 5.
Why would that be needed? I am wondering whether it would make sense to delete the |
Several months back we moved the sorting logic for primtive types out of native code into managed. Doing so helped to make the logic reusable for spans and helped to reduce GC latency, and also actually helped with throughput in a variety of cases. But it ended up regressing throughput for sorting larger arrays of floating-point values, with float/double.CompareTo not getting inlined, and even if it were inlined, containing much more logic than was present in the native implementation. The native implementation did a pre-pass to move all NaNs to the front and then just used simple < and > comparison operations, so the managed implementation now does as well.
With the exception of a large array of already-sorted Int32 values where there is still a small regression after this PR, all of the cases I've tested are either as good or better than .NET Core 3.1.
@jkotas, @GrabYourPitchforks, @tannergooding, thanks for your offline suggestions on approaches here; I tried out a variety of them, including vectorized float/double.CompareTo as well as unsafe casts to wrapper types with customized IComparable implementations, and this ended up being the best overall. Thanks as well to @nietras for pointing out the regression.
Benchmark: