-
Notifications
You must be signed in to change notification settings - Fork 500
Feature/ssb benchmark #2280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ghafek
wants to merge
2,765
commits into
apache:main
Choose a base branch
from
ghafek:feature/ssb-benchmark
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Feature/ssb benchmark #2280
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch resolves a remaining FIXME after improved rewrite code coverage by fixing the expressions and other rewrite configs so the test actually triggers the existing rewrite.
This patch makes some simple performance improvement in order to reduce the runtime of the sparse component tests (300+s -> 30s). In detail the runtime of specific tests improved as follows: * SparseBlockMerge: 149s -> 14.7s * SparseBlockIndexRange: 110s -> 13.4s * SparseBlockGetFirstIndex: 29s -> 1.3s
This patch adds real-data tests for the new adasyn builtin function, and changes the implementation to a vectorized implementation that extracts over-sampled rows via a randomized permutation matrix multiply. On the Diabetes dataset (with moderate class imbalance of 500 vs 268) ADASYN slightly improves the test accuracy from 78.3 to 78.7%. It is also noteworthy that the original ADASYN paper from 2008 only achieved 0.6831 and 0.6833 (with ADASYN) on this dataset.
This generalizes the adasyn test for additional real data set. On the titantic dataset, adasyn gives a 1.6% improvement of test accuracy (for a basic logreg model, 0.781 -> 0.797).
This patch fixes endless loops in transformencode, if the tfspec references columns outside the column range.
The multi-threaded implementation of ultra-sparse matrices has a couple of shortcomings (e.g., count column nnz, block allocation, too late fallback to single-threaded). On a large 85M x 85M graph with 90M non-zeros the transpose did not finish in hours. In this patch we now introduces a more sophisticated sparse row iterator (row and column lower/bounds) in order to facilitate a simple and fast transpose ultra sparse operation. However, this implementation was still much slower than falling back to single-threaded operations and thus use single-threaded transpose for all ultra-sparse matrices instead of if nnz < max(rows,cols). Now this operations completes in <9s.
There was a regression where all sparse matrix-vector elementwise operations are now only executed single-threaded. This patch fixes the most important branch for sparse-safe matrix-vector operations, but in subsequent task we also need to fix all the other cases. When running connected components on the Europe road network, the individual binary multiply operations improved by 10-20x on a box with 48 vcores. End-to-end the entire components() invocation with 20 iterations improved from 282s (246s for b(*)) to 112s (75s for b(*)). The 10x improvements do not carry fully through because the output MCSR is converted to CSR when appending to the buffer pool (57s of 75s).
This patch adds the missing multi-threading for all cases of binary elementwise operations, except one special case that directly constructs a CSR output. Furthermore, in safeBinaryMVSparseDenseRow we now avoid unnecessary allocation of temporary vectors by doing the filling inplace on the first output row of every task.
This patch adds a test that systematically applies the single- and multi-threaded writers/readers for matrices and frames, all formats, as well as dense and sparse data. These tests also revealed bugs in the hdf5 readers/writers where incorrect data is read for single-threaded sparse as well as multi-threaded dense and sparse.
… script level. Closes apache#2259
This commit adds vectorized kernels for matrix multiplication. the vector API improves performance for single-threaded execution of our AMD box improves by ~80% and Intel by ~60% for dense mm. These improvements are with allocation overhead of the output and in ideal cases where the input is cached and the JIT compilation is done. The biggest change for users is that SystemDS now would require `--add-modules=jdk.incubator.vector` to all execution calls. The commit appropriately modifies all scripts to do this. However, all calling code must be modified if it bypasses the bin/systemds and calls Java directly. To measure the performance difference on your machine, use the added script: src/test/scripts/performance/matrixMultiplication.sh Closes apache#2216
Co-authored-by: Kevin Innerebner <[email protected]>
This patch adds an initial version of the representation optimizer for the Scuro library. It is a two stage optimization where in the first step the best unimodal representation for given raw modalities is found and in the next step the k-best unimodal rerpesentations are combined into multimodal representations and evaluated against the target downstream task. Additionally, this patch adds tests for each stage of the optimizer. Closes apache#2267
This patch downgrades the library versions of Scuro dependencies. Closes apache#2269
This patch fixes the incorrect size propagation of unique which led to incorrect results if the dimensions are used in subsequent ops. Thanks to Chi-Hsin Huang for catching this bug. Furthermore, this patch also includes minor updates for code quality (removed unused imports, annotated unused functions)
e.g., a-A-b -> (a-b)-A; a+A-b -> (a-b)+A Closes apache#2272.
This patch fixes issues of the test dml scripts in terms of missing casts from 1-x-1 matrices to scalars. Interestingly, the test ran fine in local environments because the parser validation runs differently, and subsequently these 1-x-1 matrices where automatically rewritten to scalars.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.