Fix regression in Array.Sort for floats/doubles #37941

stephentoub · 2020-06-16T01:38:27Z

Several months back we moved the sorting logic for primtive types out of native code into managed. Doing so helped to make the logic reusable for spans and helped to reduce GC latency, and also actually helped with throughput in a variety of cases. But it ended up regressing throughput for sorting larger arrays of floating-point values, with float/double.CompareTo not getting inlined, and even if it were inlined, containing much more logic than was present in the native implementation. The native implementation did a pre-pass to move all NaNs to the front and then just used simple < and > comparison operations, so the managed implementation now does as well.

With the exception of a large array of already-sorted Int32 values where there is still a small regression after this PR, all of the cases I've tested are either as good or better than .NET Core 3.1.

@jkotas, @GrabYourPitchforks, @tannergooding, thanks for your offline suggestions on approaches here; I tried out a variety of them, including vectorized float/double.CompareTo as well as unsafe casts to wrapper types with customized IComparable implementations, and this ended up being the best overall. Thanks as well to @nietras for pointing out the regression.

Benchmark:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Diagnostics.CodeAnalysis;
using System.Linq;

[MemoryDiagnoser]
public class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);
}

public class DoubleSorting : Sorting<double> { protected override double GetNext() => _random.Next(); }
public class Int32Sorting : Sorting<int> { protected override int GetNext() => _random.Next(); }
public class StringSorting : Sorting<string>
{
    protected override string GetNext() => string.Create(_random.Next(1, 5), _random, (dest, r) =>
    {
        for (int i = 0; i < dest.Length; i++) dest[i] = (char)('a' + r.Next(26));
    });
}

public abstract class Sorting<T>
{
    protected Random _random;
    private T[] _orig, _array;

    [Params(10, 100_000)]
    public int Size { get; set; }

    protected abstract T GetNext();

    [GlobalSetup]
    public void Setup()
    {
        _random = new Random(42);
        _orig = Enumerable.Range(0, Size).Select(_ => GetNext()).ToArray();
        _array = (T[])_orig.Clone();
        Array.Sort(_array);
    }

    [Benchmark]
    public void Sorted() => Array.Sort(_array);

    [Benchmark]
    public void Random()
    {
        _orig.AsSpan().CopyTo(_array);
        Array.Sort(_array);
    }
}

Type	Method	Toolchain	Size	Mean	Ratio
DoubleSorting	Sorted	netcore31	10	59.58 ns	2.13
DoubleSorting	Sorted	master	10	30.52 ns	1.09
DoubleSorting	Sorted	pr	10	28.01 ns	1.00

DoubleSorting	Random	netcore31	10	77.50 ns	1.71
DoubleSorting	Random	master	10	67.61 ns	1.49
DoubleSorting	Random	pr	10	45.44 ns	1.00

DoubleSorting	Sorted	netcore31	100000	990,325.98 ns	1.27
DoubleSorting	Sorted	master	100000	2,898,290.96 ns	3.73
DoubleSorting	Sorted	pr	100000	777,209.08 ns	1.00

DoubleSorting	Random	netcore31	100000	5,940,056.30 ns	1.08
DoubleSorting	Random	master	100000	7,880,560.62 ns	1.44
DoubleSorting	Random	pr	100000	5,473,335.58 ns	1.00

Int32Sorting	Sorted	netcore31	10	38.02 ns	2.31
Int32Sorting	Sorted	master	10	17.09 ns	1.04
Int32Sorting	Sorted	pr	10	16.49 ns	1.00

Int32Sorting	Random	netcore31	10	49.97 ns	1.63
Int32Sorting	Random	master	10	31.30 ns	1.02
Int32Sorting	Random	pr	10	30.63 ns	1.00

Int32Sorting	Sorted	netcore31	100000	572,908.37 ns	0.90
Int32Sorting	Sorted	master	100000	640,966.00 ns	1.00
Int32Sorting	Sorted	pr	100000	639,118.53 ns	1.00

Int32Sorting	Random	netcore31	100000	5,072,236.56 ns	1.06
Int32Sorting	Random	master	100000	5,024,276.17 ns	1.05
Int32Sorting	Random	pr	100000	4,801,833.82 ns	1.00

StringSorting	Sorted	netcore31	10	575.87 ns	1.24
StringSorting	Sorted	master	10	432.40 ns	0.93
StringSorting	Sorted	pr	10	465.86 ns	1.00

StringSorting	Random	netcore31	10	1,758.64 ns	1.15
StringSorting	Random	master	10	1,425.75 ns	0.93
StringSorting	Random	pr	10	1,532.01 ns	1.00

StringSorting	Sorted	netcore31	100000	83,774,554.44 ns	1.14
StringSorting	Sorted	master	100000	74,485,247.25 ns	1.01
StringSorting	Sorted	pr	100000	73,450,518.68 ns	1.00

StringSorting	Random	netcore31	100000	103,108,672.31 ns	1.14
StringSorting	Random	master	100000	89,373,058.33 ns	0.99
StringSorting	Random	pr	100000	90,577,543.59 ns	1.00

src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs

jkotas

Nice!

src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs

nietras · 2020-06-17T10:17:16Z

@stephentoub perhaps consider testing performance of following type too:

    public class ComparableClassInt32 
        : IComparable<ComparableClassInt32>
    {
        public readonly int Value;

        public ComparableClassInt32(int value) =>
            Value = value;

        public int CompareTo(ComparableClassInt32 other) => 
            Value.CompareTo(other.Value);
    }

basic reference type overhead. Since string compares are "slow" you won't see possible reference type regressions for this. Just a suggestion :)

stephentoub · 2020-06-17T14:32:05Z

Just a suggestion :)

A knowing smile? 😉 Yes, this case regresses. It appears to be due to dictionary lookups when calling LessThan/GreaterThan, which also prevent inlining. Evaluating options...

nietras · 2020-06-17T15:10:58Z

knowing smile? 😉

Ha yeah 😉 Back in 2018 I went through all "stages of inlining": huh?, wuhuu! (AggressiveInlining), doh! (reference type regression), oh come on! (JIT-me-not issues) 😅

Looking forward to what you come up with. 😀

stephentoub · 2020-06-22T19:26:35Z

#38229 will hopefully be the solution here.

nietras · 2020-06-22T19:46:37Z

Nice 👍 Now if #35791/#10048 were resolved too, we would have something to talk about 😉 Happy to help with that if I could get some pointers 😀

AndyAyersMS · 2020-06-22T20:10:58Z

Happy to help with that if I could get some pointers

I'm certainly open to reconsidering #10048. The changes are simple enough, though I might end up restricting it to AggressiveInlining callees since the jit underestimates the size impact of a delegate invoke.

But I still don't have a clear picture of how allowing this actually provides benefit -- it sounds like you think with this we can unify some code and either make it perform better or at least not lose any perf. So perhaps a benchmark along these lines would be instructive?

stephentoub · 2020-06-22T20:59:31Z

it sounds like you think with this we can unify some code and either make it perform better or at least not lose any perf.

Not to put words in @nietras mouth, but I expect what he's hoping to do is improve the Array.Sort(..., IComparer<T>) code paths. Today we have one sort implementation that covers both providing an IComparer<T> and a Comparison<T> (a delegate); the former is implemented by creating a delegate to its Compare method, which means Array.Sort(..., IComparer<T>) is allocating. On top of that, we just recently added a span-based sort, and it's currently defined with a generic TComparer : IComparer<T> comparer, with the idea that you could provide a struct-based comparer and it wouldn't allocate... unfortunately, it's the worst of all worlds right now, in that we box the TComparer into an IComparer<T>, and then create a delegate from its Compare method. If all of these code quality issues got sorted out, you could imagine we just had a single code path for TComparer that was written generically to use its Compare sans boxing, and then if you used the Comparison<T> overload, we'd create a struct-based TComparer that just wrapped that Comparison<T> to invoke it. If we everything worked out really well, it could potentially even be unified with the separate, duplicate code path we currently have that targets T : IComparable<T>, using a TComparer comparer struct that delegated to the T : IComparable<T> implementation.

So perhaps a benchmark along these lines would be instructive?

Sounds like a good thing to add to dotnet/performance.

nietras · 2020-06-23T18:41:56Z

If all of these code quality issues got sorted out, you could imagine we just had a single code path for TComparer that was written generically to use its Compare sans boxing, and then if you used the Comparison overload, we'd create a struct-based TComparer that just wrapped that Comparison to invoke it. If we everything worked out really well, it could potentially even be unified with the separate, duplicate code path we currently have that targets T : IComparable, using a TComparer comparer struct that delegated to the T : IComparable implementation.

@stephentoub exactly. :) Although, I am not sure we could unify on TComparer as such, but potentially yes. This would be the fulfillment of the API I have proposed and the on/off work I have been doing on this for the last 3 years... failing due to the many issues around inlining. At the very minimum we could replace the Comparison<T> path with the TComparer path without duplicating code.

Note this is not just about sorting. Sorting, however, for me is a good example of where .NET comes short. I use the value type as a inlineable "functor" pattern for data processing algorithms. Think loops over millions of elements. Unfortunately, we can't unify on this pattern due to these kinds of issues, which Sort exemplifies very well. Lots of other devs use this pattern, but we often end up having to duplicate code, and since this is code that is combinatorial on rank, different kinds of transformations etc. it adds up. It's a lot of replicated code. I have talked about this before and don't want to sound like a broken record player 😅

with the idea that you could provide a struct-based comparer and it wouldn't allocate...

It's not just the allocation. It's so the compare can be inlined. So the "functor" can be applied inlined. To yield a customized loop for performance. As you probably know.

perhaps a benchmark along these lines would be instructive?

@AndyAyersMS I don't know what kind of benchmark you are thinking about but something simple like below shows the issue.

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;

namespace CompareBenchmarking
{
    public class Program
    {
        static void Main(string[] args) => 
            BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);
    }

    public class CompareFloat : Compare<float> { protected override float GetNext() => _random.Next(); }
    public class CompareInt32 : Compare<int> { protected override int GetNext() => _random.Next(); }

    public struct ComparisonComparer<T> : IComparer<T>
    {
        readonly Comparison<T> _comparison;

        public ComparisonComparer(Comparison<T> comparison) =>
            _comparison = comparison;

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public int Compare(T x, T y) => _comparison(x, y);
    }

    [MemoryDiagnoser]
    [DisassemblyDiagnoser]
    public abstract class Compare<T>
        where T : IComparable<T>
    {
        static readonly Comparer<T> _comparer = Comparer<T>.Default;
        static readonly Comparison<T> _comparison = Comparer<T>.Default.Compare;
        readonly ComparisonComparer<T> _comparisonComparer = new ComparisonComparer<T>(_comparison);

        protected Random _random;
        T _x;
        T _y;

        protected abstract T GetNext();

        [GlobalSetup]
        public void Setup()
        {
            _random = new Random(42);
            _x = GetNext();
            _y = GetNext();
        }

        [Benchmark]
        public int CompareTo() => _x.CompareTo(_y);

        [Benchmark]
        public int Comparer() => _comparer.Compare(_x, _y);

        [Benchmark(Baseline = true)]
        public int Comparison() => _comparison(_x, _y);

        [Benchmark]
        public int ComparisonComparer() => _comparisonComparer.Compare(_x, _y);
    }
}

With the following results on .NET 5.0 Preview 2. The factor of 2.31x pretty much says it all although that would be pretty self-evident given the extra indirection and code generation issues. Just imagine this in a tight loop. :)

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.329 (2004/?/20H1)
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.2.20176.6
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.16006, CoreFX 5.0.20.16006), X64 RyuJIT
  DefaultJob : .NET Core 5.0.0 (CoreCLR 5.0.20.16006, CoreFX 5.0.20.16006), X64 RyuJIT

CompareInt32

Method	Mean	Error	StdDev	Ratio
CompareTo	0.4995 ns	0.0051 ns	0.0048 ns	0.38
Comparer	1.1854 ns	0.0032 ns	0.0028 ns	0.89
Comparison	1.3258 ns	0.0065 ns	0.0054 ns	1.00
ComparisonComparer	3.0613 ns	0.0096 ns	0.0089 ns	2.31

CompareFloat

Method	Mean	Error	StdDev	Ratio	RatioSD
CompareTo	1.237 ns	0.0018 ns	0.0015 ns	0.60	0.00
Comparer	1.905 ns	0.0628 ns	0.0557 ns	0.93	0.03
Comparison	2.060 ns	0.0047 ns	0.0039 ns	1.00	0.00
ComparisonComparer	3.479 ns	0.0096 ns	0.0090 ns	1.69	0.01

If this is interesting I can make a PR to the benchmark repo.

RyuJit would not inline methods that contained delegate invokes. Remove this limitation. Closes dotnet#10048. See also dotnet#37941.

RyuJit would not inline methods that contained delegate invokes. Remove this limitation. Closes #10048. See also #37941.

Several months back we moved the sorting logic for primtive types out of native code into managed. Doing so helped to make the logic reusable for spans and helped to reduce GC latency, and also actually helped with throughput in a variety of cases. But it ended up regressing throughput for sorting larger arrays of floating-point values, with float/double.CompareTo not getting inlined, and even if it were inlined, containing much more logic than was present in the native implementation. The native implementation did a pre-pass to move all NaNs to the front and then just used simple < and > comparison operations, so the managed implementation now does as well.

stephentoub · 2020-06-28T00:57:07Z

I moved the comparer functions into the two generic helper classes, and with #38229 (thanks, @jkotas), the reference-type-Int32-wrapper case is good again, with everything else being appx what it was before as well.

I'll merge when this is green.

nietras · 2020-07-16T10:13:51Z

@stephentoub should I try make a PR for the TComparer change? Is there interest in getting this in? The change itself to TComparer is easy enough, it's the whole sort helpers creation etc. that needs to change a lot, and to support reference type delegate comparer will likely require a "unsafe" cast of Comparison<T> to Comparison<object>. Is that acceptable?

Also thank you for the mention in your awesome Performance Improvements in .NET 5 :)

jkotas · 2020-07-16T19:51:47Z

Hi @nietras,

@stephentoub is OOF.

I think it would be a good idea to start the PR for the TComparer changes so that we can start sorting out the code quality issues that it is likely to expose. Preferably, we would fix them instead of working around them by duplicating large amounts of code.

I have opened #39466 to have this tracked. For time line, I do not expect we would be able to get this change into .NET 5.

"unsafe" cast of Comparison<T> to Comparison<object>

Why would that be needed?

I am wondering whether it would make sense to delete the TComparer overloads from the public surface for now until we can get the right implementation for them in place. I think they can stay, but just wanted to bring it up.

cc @eiriktsarpalis

stephentoub added area-System.Runtime tenet-performance Performance related issue labels Jun 16, 2020

stephentoub added this to the 5.0.0 milestone Jun 16, 2020

stephentoub commented Jun 16, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs Outdated Show resolved Hide resolved

tannergooding approved these changes Jun 16, 2020

View reviewed changes

jkotas approved these changes Jun 16, 2020

View reviewed changes

kunalspathak mentioned this pull request Jun 16, 2020

Mono Pri0 Runtime Tests Run Linux arm64 release leg failing #37946

Closed

nietras reviewed Jun 16, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs Outdated Show resolved Hide resolved

EgorBo reviewed Jun 16, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs Show resolved Hide resolved

stephentoub closed this Jun 16, 2020

stephentoub reopened this Jun 16, 2020

danmoseley reviewed Jun 16, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs Show resolved Hide resolved

stephentoub closed this Jun 16, 2020

stephentoub reopened this Jun 16, 2020

stephentoub closed this Jun 17, 2020

stephentoub reopened this Jun 17, 2020

stephentoub force-pushed the fixsortperf_utilslessthanref branch from 046f85c to 07dcfe1 Compare June 17, 2020 02:27

stephentoub closed this Jun 17, 2020

stephentoub reopened this Jun 17, 2020

GrabYourPitchforks reviewed Jun 17, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs Show resolved Hide resolved

stephentoub closed this Jun 17, 2020

stephentoub reopened this Jun 17, 2020

stephentoub force-pushed the fixsortperf_utilslessthanref branch from 07dcfe1 to 04e1399 Compare June 17, 2020 09:43

stephentoub added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jun 18, 2020

stephentoub force-pushed the fixsortperf_utilslessthanref branch from 04e1399 to 356828c Compare June 25, 2020 22:27

jkotas mentioned this pull request Jun 26, 2020

Enable inlining of shared generics code within same type #38229

Merged

AndyAyersMS mentioned this pull request Jun 26, 2020

Allow to inline methods with delegate invocation (at least methods marked with AggressiveInlining) #10048

Closed

stephentoub force-pushed the fixsortperf_utilslessthanref branch from 356828c to a3e3d79 Compare June 26, 2020 21:34

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this pull request Jun 26, 2020

JIT: allow inlinees with delegate invoke

fa018bc

RyuJit would not inline methods that contained delegate invokes. Remove this limitation. Closes dotnet#10048. See also dotnet#37941.

AndyAyersMS mentioned this pull request Jun 26, 2020

JIT: allow inlinees with delegate invoke #38479

Merged

jkotas pushed a commit that referenced this pull request Jun 27, 2020

JIT: allow inlinees with delegate invoke (#38479)

dd10afe

RyuJit would not inline methods that contained delegate invokes. Remove this limitation. Closes #10048. See also #37941.

stephentoub added 3 commits June 27, 2020 18:25

Special-case Half

5e88e06

Move Less/GreaterThan to each type that needs them

2bec1bf

stephentoub force-pushed the fixsortperf_utilslessthanref branch from a3e3d79 to 2bec1bf Compare June 27, 2020 23:57

stephentoub merged commit e1c9ab4 into dotnet:master Jun 28, 2020

stephentoub deleted the fixsortperf_utilslessthanref branch June 28, 2020 01:57

jkotas mentioned this pull request Jul 16, 2020

Sort<T, TComparer> extensions performance #39466

Open

stephentoub mentioned this pull request Jul 19, 2020

Address Sort<T, TComparer> extensions performance #39543

Closed

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix regression in Array.Sort for floats/doubles #37941

Fix regression in Array.Sort for floats/doubles #37941

Uh oh!

stephentoub commented Jun 16, 2020 •

edited

Loading

Uh oh!

Uh oh!

jkotas left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nietras commented Jun 17, 2020

Uh oh!

stephentoub commented Jun 17, 2020

Uh oh!

nietras commented Jun 17, 2020

Uh oh!

stephentoub commented Jun 22, 2020

Uh oh!

nietras commented Jun 22, 2020

Uh oh!

AndyAyersMS commented Jun 22, 2020

Uh oh!

stephentoub commented Jun 22, 2020

Uh oh!

nietras commented Jun 23, 2020 •

edited

Loading

Uh oh!

stephentoub commented Jun 28, 2020

Uh oh!

nietras commented Jul 16, 2020

Uh oh!

jkotas commented Jul 16, 2020 •

edited

Loading

Uh oh!

Uh oh!

Fix regression in Array.Sort for floats/doubles #37941

Fix regression in Array.Sort for floats/doubles #37941

Uh oh!

Conversation

stephentoub commented Jun 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nietras commented Jun 17, 2020

Uh oh!

stephentoub commented Jun 17, 2020

Uh oh!

nietras commented Jun 17, 2020

Uh oh!

stephentoub commented Jun 22, 2020

Uh oh!

nietras commented Jun 22, 2020

Uh oh!

AndyAyersMS commented Jun 22, 2020

Uh oh!

stephentoub commented Jun 22, 2020

Uh oh!

nietras commented Jun 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stephentoub commented Jun 28, 2020

Uh oh!

nietras commented Jul 16, 2020

Uh oh!

jkotas commented Jul 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

stephentoub commented Jun 16, 2020 •

edited

Loading

nietras commented Jun 23, 2020 •

edited

Loading

jkotas commented Jul 16, 2020 •

edited

Loading