[exploration] Use crossbeam MPMC channel instead of std::sync::mpsc #486

kimsnj · 2019-09-19T20:41:36Z

For issue #440, I've played a bit with crossbeam and fd.

The implementation of the exec method is as expected simpler as it doesn't need any mutex or arc.

In term of binary size (in release mode with musl target), the difference is quite small:

fd_crossbeam_bounded:   3605520
fd_crossbeam_unbounded: 3605512
fd_std_mpsc:            3611064

However, on my machinempsc seems to perform better without --exec :

Benchmark #1: ./fd_std_mpsc -HI '.*[0-9]\.rs$' ~
  Time (mean ± σ):      1.018 s ±  0.004 s    [User: 10.319 s, System: 1.464 s]
  Range (min … max):    1.012 s …  1.029 s    10 runs

Benchmark #2: ./fd_crossbeam_bounded -HI '.*[0-9]\.rs$' ~
  Time (mean ± σ):      1.086 s ±  0.026 s    [User: 10.510 s, System: 2.048 s]
  Range (min … max):    1.046 s …  1.135 s    10 runs

Benchmark #3: ./fd_crossbeam_unbounded -HI '.*[0-9]\.rs$' ~
  Time (mean ± σ):      1.088 s ±  0.019 s    [User: 10.593 s, System: 1.994 s]
  Range (min … max):    1.066 s …  1.126 s    10 runs

Summary
  './fd_std_mpsc -HI '.*[0-9]\.rs$' ~' ran
    1.07 ± 0.03 times faster than './fd_crossbeam_bounded -HI '.*[0-9]\.rs$' ~'
    1.07 ± 0.02 times faster than './fd_crossbeam_unbounded -HI '.*[0-9]\.rs$' ~'
Benchmark #1: ./fd_std_mpsc -HI '.*

With --exec, it depends on the number of outputs of results. With a search that yields more than 5.000 results, crossbeam is a bit faster:

[0-9]\.rs$' ~ -x echo {}
  Time (mean ± σ):      7.374 s ±  1.901 s    [User: 18.191 s, System: 23.729 s]
  Range (min … max):    5.439 s … 10.620 s    10 runs

Benchmark #2: ./fd_crossbeam_bounded -HI '.*[0-9]\.rs$' ~ -x echo {}
  Time (mean ± σ):      7.005 s ±  1.426 s    [User: 18.130 s, System: 22.579 s]
  Range (min … max):    5.841 s … 10.711 s    10 runs

Benchmark #3: ./fd_crossbeam_unbounded -HI '.*[0-9]\.rs$' ~ -x echo {}
  Time (mean ± σ):      7.838 s ±  1.366 s    [User: 18.465 s, System: 26.202 s]
  Range (min … max):    5.794 s … 10.574 s    10 runs

Summary
  './fd_crossbeam_bounded -HI '.*[0-9]\.rs$' ~ -x echo {}' ran
    1.05 ± 0.35 times faster than './fd_std_mpsc -HI '.*[0-9]\.rs$' ~ -x echo {}'
    1.12 ± 0.30 times faster than './fd_crossbeam_unbounded -HI '.*[0-9]\.rs$' ~ -x echo {}'

But not anymore with a search that matches ~150 results,

Benchmark #1: ./fd_std_mpsc -HI '.*[0-9]\.jpg$' ~ -x echo {}
  Time (mean ± σ):      1.210 s ±  0.020 s    [User: 11.063 s, System: 2.122 s]
  Range (min … max):    1.187 s …  1.252 s    10 runs

Benchmark #2: ./fd_crossbeam_bounded -HI '.*[0-9]\.jpg$' ~ -x echo {}
  Time (mean ± σ):      1.229 s ±  0.011 s    [User: 11.269 s, System: 2.123 s]
  Range (min … max):    1.207 s …  1.244 s    10 runs

Benchmark #3: ./fd_crossbeam_unbounded -HI '.*[0-9]\.jpg$' ~ -x echo {}
  Time (mean ± σ):      1.226 s ±  0.020 s    [User: 11.134 s, System: 2.256 s]
  Range (min … max):    1.196 s …  1.250 s    10 runs

Summary
  './fd_std_mpsc -HI '.*[0-9]\.jpg$' ~ -x echo {}' ran
    1.01 ± 0.02 times faster than './fd_crossbeam_unbounded -HI '.*[0-9]\.jpg$' ~ -x echo {}'
    1.02 ± 0.02 times faster than './fd_crossbeam_bounded -HI '.*[0-9]\.jpg$' ~ -x echo {}'

I can do more experiments if you'd like.

Cheers,

sharkdp · 2019-09-21T09:02:54Z

Very cool, thank you for looking into this!

I ran a few benchmarks on my own (for non-exec commands) and also found that the crossbeam version is mostly slower (by a significant amount, except for one of the benchmarks, where the unbounded version is slightly faster).

bounded-100 refers to a version with a bounded(100) crossbeam channel.

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master '.*[0-9]\.jpg$' '/home/shark'`	200.8 ± 1.5	198.4	203.2	1.00
`./fd-crossbeam-bounded-100 '.*[0-9]\.jpg$' '/home/shark'`	260.8 ± 11.8	247.0	283.0	1.30
`./fd-crossbeam-unbounded '.*[0-9]\.jpg$' '/home/shark'`	261.7 ± 9.0	243.4	276.0	1.30

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI '.*[0-9]\.jpg$' '/home/shark'`	524.1 ± 3.6	520.7	532.2	1.00
`./fd-crossbeam-bounded-100 -HI '.*[0-9]\.jpg$' '/home/shark'`	626.3 ± 12.2	607.2	642.9	1.19
`./fd-crossbeam-unbounded -HI '.*[0-9]\.jpg$' '/home/shark'`	619.5 ± 12.0	607.8	643.9	1.18

Command	Mean [s]	Min [s]	Max [s]	Relative
`./fd-master --hidden --no-ignore '' '/home/shark'`	1.111 ± 0.027	1.073	1.162	1.08
`./fd-crossbeam-bounded-100 --hidden --no-ignore '' '/home/shark'`	1.713 ± 0.122	1.516	1.950	1.67
`./fd-crossbeam-unbounded --hidden --no-ignore '' '/home/shark'`	1.027 ± 0.023	1.000	1.075	1.00

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI --extension jpg '' '/home/shark'`	539.3 ± 3.5	535.0	544.0	1.00
`./fd-crossbeam-bounded-100 -HI --extension jpg '' '/home/shark'`	674.9 ± 21.0	642.3	709.1	1.25
`./fd-crossbeam-unbounded -HI --extension jpg '' '/home/shark'`	683.3 ± 16.0	665.9	715.0	1.27

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`./fd-master -HI --type l '' '/home/shark'`	520.8 ± 3.7	514.8	526.7	1.00
`./fd-crossbeam-bounded-100 -HI --type l '' '/home/shark'`	624.4 ± 7.5	616.6	638.6	1.20
`./fd-crossbeam-unbounded -HI --type l '' '/home/shark'`	621.2 ± 14.2	602.0	643.3	1.19

I'm a bit surprised, as crossbeam claims to be faster than std::mpsc, even in the MPSC case (https://github.com/crossbeam-rs/crossbeam-channel/tree/master/benchmarks#results).

A few thoughts:

The bounded(…) channel experiment is definitely interesting. However, I don't think that the channel size is necessarily related to MAX_BUFFER_SIZE, like in your example. Where could we profit from a bounded channel? Could this potentially help us with memory issues like Excessive memory usage. #471?
I ran a short experiment where I printed the current size of the channel in the unbounded case. For cases with many search results (fd .), the channel size very grows rapidly to sizes of 100,000 and more! I guess this means that we are limited by the speed of the receiver thread which can not print results fast enough (even in the non-colored case, which is much faster). This could be an interesting route for optimization which was also discussed in the past (fd without pattern is much slower than find #304 (comment)).
Regarding the last point, another thing that could be interesting would be to "render" the output line in the senders (remove the ./ prefix, colorize the path, etc). This way, we could move even more work into the senders, profiting from parallelization.

sharkdp · 2020-04-03T20:38:48Z

Going to close this for now. Thank you very much for this experiment!

use crossbeam MPMC channel instead of std::sync::mpsc

e3799f8

sharkdp closed this Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exploration] Use crossbeam MPMC channel instead of std::sync::mpsc #486

[exploration] Use crossbeam MPMC channel instead of std::sync::mpsc #486

kimsnj commented Sep 19, 2019

sharkdp commented Sep 21, 2019 •

edited

Loading

sharkdp commented Apr 3, 2020

[exploration] Use crossbeam MPMC channel instead of std::sync::mpsc #486

[exploration] Use crossbeam MPMC channel instead of std::sync::mpsc #486

Conversation

kimsnj commented Sep 19, 2019

sharkdp commented Sep 21, 2019 • edited Loading

sharkdp commented Apr 3, 2020

sharkdp commented Sep 21, 2019 •

edited

Loading