use Eigen as a BLAS alternative #858

borg323 · 2019-05-23T22:06:16Z

This is a port of leela-zero/leela-zero#1692.

oscardssmith · 2019-05-23T22:11:00Z

How does it compare performance-wise?

borg323 · 2019-05-23T22:20:42Z

I did a quick test with a T35 net and it is about 8-10% quicker than ~~MKL~~OpenBLAS on my i5. More tests are of course welcome.
Edit: The above test was against OpenBLAS, MKL is about 2% faster than Eigen in the same test.

lealgo · 2019-05-23T23:11:44Z

Builds and runs on Android. Great news! Now we've got two back-ends that are able to build through cross-compilation.

On my first tests it appears to be just a little bit slower than OpenBlas:

HWHWI:/data/local/tmp $ ./lc0-blas benchmark -w /sdcard/lc0/36089 --threads=8 --max-prefetch=0 --minibatch-size=128                                                                                         
       _
|   _ | |
|_ |_ |_| v0.22.0-dev built May 20 2019
Loading weights file from: /sdcard/lc0/36089
Creating backend [blas]...
BLAS, maximum batch size set to 256
BLAS vendor: OpenBlas.
OpenBlas [OpenBLAS 0.3.6 NO_LAPACK NO_LAPACKE NO_AFFINITY ARMV8 MAX_THREADS=8].
OpenBlas found 8 ARMV8 core(s).
OpenBLAS using 1 core(s) for this backend.
BLAS max batch size is 256.
Benchmark time 134ms, 2 nodes, 14 nps, move e2e4
Benchmark time 199ms, 3 nodes, 15 nps, move e2e4
Benchmark time 327ms, 5 nodes, 15 nps, move e2e4
Benchmark time 503ms, 8 nodes, 15 nps, move e2e4
Benchmark time 692ms, 12 nodes, 17 nps, move e2e4
Benchmark time 940ms, 20 nodes, 21 nps, move e2e4
Benchmark time 1241ms, 28 nodes, 22 nps, move e2e4
Benchmark time 1445ms, 30 nodes, 20 nps, move e2e4
Benchmark time 1455ms, 33 nodes, 22 nps, move e2e4
Benchmark time 1660ms, 47 nodes, 28 nps, move e2e4
Benchmark time 1850ms, 61 nodes, 32 nps, move e2e4
Benchmark time 1862ms, 64 nodes, 34 nps, move e2e4
Benchmark time 2007ms, 72 nodes, 35 nps, move e2e4
Benchmark time 2113ms, 79 nodes, 37 nps, move e2e4
Benchmark time 2506ms, 107 nodes, 42 nps, move e2e4
Benchmark time 3062ms, 160 nodes, 52 nps, move e2e4
Benchmark time 3702ms, 211 nodes, 56 nps, move e2e4
Benchmark time 4374ms, 268 nodes, 61 nps, move e2e4
Benchmark time 5216ms, 334 nodes, 64 nps, move e2e4
Benchmark time 7007ms, 477 nodes, 68 nps, move e2e4
Benchmark time 7008ms, 487 nodes, 69 nps, move e2e4
Benchmark time 7607ms, 553 nodes, 72 nps, move e2e4
bestmove e2e4
Benchmark final time 8.38771s calculating 77.1367 nodes per second.

HWHWI:/data/local/tmp $ ./lc0-eigen benchmark -w /sdcard/lc0/36089 --threads=8 --max-prefetch=0 --minibatch-size=128                                                                                        
       _
|   _ | |
|_ |_ |_| v0.22.0-dev built May 23 2019
Loading weights file from: /sdcard/lc0/36089
Creating backend [blas]...
Using Eigen version 3.3.7
BLAS max batch size is 256.
Benchmark time 163ms, 2 nodes, 12 nps, move e2e4
Benchmark time 245ms, 3 nodes, 12 nps, move e2e4
Benchmark time 412ms, 5 nodes, 12 nps, move e2e4
Benchmark time 642ms, 8 nodes, 12 nps, move e2e4
Benchmark time 885ms, 13 nodes, 14 nps, move e2e4
Benchmark time 1271ms, 22 nodes, 17 nps, move e2e4
Benchmark time 1730ms, 34 nodes, 19 nps, move e2e4
Benchmark time 1858ms, 38 nodes, 20 nps, move e2e4
Benchmark time 1959ms, 40 nodes, 20 nps, move e2e4
Benchmark time 1980ms, 44 nodes, 22 nps, move d2d4
Benchmark time 2311ms, 67 nodes, 28 nps, move d2d4
Benchmark time 2791ms, 100 nodes, 35 nps, move d2d4
Benchmark time 3417ms, 129 nodes, 37 nps, move d2d4
Benchmark time 3705ms, 153 nodes, 41 nps, move d2d4
Benchmark time 3791ms, 166 nodes, 43 nps, move e2e4
Benchmark time 3936ms, 179 nodes, 45 nps, move e2e4
Benchmark time 4842ms, 251 nodes, 51 nps, move e2e4
Benchmark time 5679ms, 299 nodes, 52 nps, move e2e4
Benchmark time 5684ms, 307 nodes, 54 nps, move e2e4
Benchmark time 6885ms, 393 nodes, 57 nps, move e2e4
Benchmark time 8298ms, 518 nodes, 62 nps, move e2e4
bestmove e2e4
Benchmark final time 9.32481s calculating 63.0576 nodes per second.

lealgo · 2019-05-24T14:24:26Z

Android builds :D

lc0-eigen-armv7a.zip
lc0-eigen-aarch64.zip
lc0-eigen-cortexa57.zip

src/neural/blas/network_blas.cc

mooskagh · 2019-05-26T08:52:49Z

src/neural/blas/convolution1.cc

@@ -44,7 +56,7 @@ void Convolution1::Forward(const size_t batch_size, const size_t input_channels,

    const float* batch_input = input + i * kSquares * input_channels;
    float* batch_output = output + i * kSquares * output_channels;
-
+#ifndef USE_EIGEN


Could BLAS and Eigen in theory co-exist?
Then if would be better to have templated functions (e.g. <bool is_eigen>) which would be shown up as different backends.

See for example:
cudnn vs cudnn-fp16
https://github.com/LeelaChessZero/lc0/blob/master/src/neural/cuda/network_cudnn.cc#L805
#5 (comment)

tensorflow vs tensorflow-cpu
https://github.com/LeelaChessZero/lc0/blob/master/src/neural/network_tf.cc#L342

May be not that straighforward though as different header files are needed (so #ifdefs will be needed in any case), and calling non-included functions in if(false) {} won't compile.
Calling them from non-instanciated function template specializations will work though:

template<> DoStuff<false /* is eigen*/>() { known_functions(); } template<> DoStuff<true /* is eigen*/>() { unknown_functions(); // Is fine } #if defined(HAVE_EIGEN) REGISTER_NETWORK("eigen", MakeBlasNetwork<true>, 90) #endif #if defined(HAVE_BLAS) REGISTER_NETWORK("blas", MakeBlasNetwork<false>, 90) #endif

Not sure if easy enough to be bothered with, but that way it would be nicer.

Also there is this brute-force way to allow co-existance:

template<bool is_eigen> MyFunction() { #ifdef HAVE_EIGEN if (is_eigen) { // Do eigen stuff. } #endif #ifdef HAVE_BLAS if (!is_eigen) { // Do blas stuff. } #endif } #if defined(HAVE_EIGEN) REGISTER_NETWORK("eigen", MakeBlasNetwork<true>, 90) #endif #if defined(HAVE_BLAS) REGISTER_NETWORK("blas", MakeBlasNetwork<false>, 90) #endif

I don't think there is a reason for this: eigen is only suitable for the build machine, so blas is the way to go for redistributable cpu only binaries. The only use case I can see is for benchmarking between eigen and mkl on a specific machine - is there any other use case?

borg323 added 2 commits May 24, 2019 00:52

import Eigen 3.3.7

dfb86a8

use Eigen as a blas alternative

73cb236

lealgo mentioned this pull request May 25, 2019

Android build with Cross Compilation for BLAS and OpenCL back-ends #848

Closed

mooskagh reviewed May 26, 2019

View reviewed changes

Merge branch 'master' into eigen

8177d88

mooskagh approved these changes May 27, 2019

View reviewed changes

blas change std::cerr to CERR

ceb5ddf

borg323 merged commit 6028c05 into LeelaChessZero:master May 27, 2019

borg323 deleted the eigen branch May 27, 2019 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use Eigen as a BLAS alternative #858

use Eigen as a BLAS alternative #858

Uh oh!

borg323 commented May 23, 2019

Uh oh!

oscardssmith commented May 23, 2019

Uh oh!

borg323 commented May 23, 2019 •

edited

Loading

Uh oh!

lealgo commented May 23, 2019 •

edited

Loading

Uh oh!

lealgo commented May 24, 2019 •

edited

Loading

Uh oh!

Uh oh!

mooskagh May 26, 2019

Uh oh!

borg323 May 27, 2019

Uh oh!

Uh oh!

use Eigen as a BLAS alternative #858

use Eigen as a BLAS alternative #858

Uh oh!

Conversation

borg323 commented May 23, 2019

Uh oh!

oscardssmith commented May 23, 2019

Uh oh!

borg323 commented May 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lealgo commented May 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lealgo commented May 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mooskagh May 26, 2019

Choose a reason for hiding this comment

Uh oh!

borg323 May 27, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

borg323 commented May 23, 2019 •

edited

Loading

lealgo commented May 23, 2019 •

edited

Loading

lealgo commented May 24, 2019 •

edited

Loading