Skip to content

[SystemDS-#3524] Multi-threading of transformdecode/[SystemDS-#3521] Improved Feature Transformations #2275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

Isso-W
Copy link

@Isso-W Isso-W commented Jun 19, 2025

This pull request introduces a new framework for column decoding in Apache SystemDS, with the addition of a base class ColumnDecoder and several specialized implementations (ColumnDecoderBin, ColumnDecoderComposite, ColumnDecoderRecode,ColumnDecoderPassthrough and ColumnDecoderDummycode). These changes provide a flexible and extensible structure for decoding encoded data in matrix-to-frame transformations. Below are the most important changes grouped by theme:

Core Framework for Column Decoding

  • Added ColumnDecoder as an abstract base class to define the structure for decoding operations, including methods for decoding (columnDecode), handling sub-range decoding (subRangeDecoder), and metadata initialization (initMetaData). It also implements Externalizable for efficient serialization.

Current Issues

  • ColumnDecoderDummycode ist not supported yet, as well as the test case ColumnDecoderMixedMethodsTest
  • There are still some issues with DecoderDummyCode, it do not work together with DecoderRecode

Copy link

codecov bot commented Jun 20, 2025

Codecov Report

Attention: Patch coverage is 41.71779% with 285 lines in your changes missing coverage. Please review.

Project coverage is 72.85%. Comparing base (bc2993e) to head (61f6d39).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
...ime/transform/decode/ColumnDecoderPassThrough.java 17.14% 57 Missing and 1 partial ⚠️
...ntime/transform/decode/ColumnDecoderDummycode.java 33.33% 47 Missing and 3 partials ⚠️
...sds/runtime/transform/decode/ColumnDecoderBin.java 46.25% 38 Missing and 5 partials ⚠️
...ntime/transform/decode/ColumnDecoderComposite.java 47.14% 36 Missing and 1 partial ⚠️
.../runtime/transform/decode/ColumnDecoderRecode.java 61.05% 31 Missing and 6 partials ⚠️
.../sysds/runtime/transform/decode/ColumnDecoder.java 21.95% 32 Missing ⚠️
...runtime/transform/decode/ColumnDecoderFactory.java 51.72% 27 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2275      +/-   ##
============================================
- Coverage     72.95%   72.85%   -0.10%     
- Complexity    46062    46159      +97     
============================================
  Files          1479     1486       +7     
  Lines        172575   173144     +569     
  Branches      33776    33898     +122     
============================================
+ Hits         125898   126149     +251     
- Misses        37191    37481     +290     
- Partials       9486     9514      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@phaniarnab
Copy link
Contributor

Thank you for the patch. I will take a look into this next week @Isso-W.
Meanwhile, please add the missing license headers to the new files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants