[SYSTEMDS-3840] Builtin scripts parameter consolidation #2228

e-strauss · 2025-02-13T21:54:09Z

This PR consolidates the parameters of the builtin DML scripts and introduces a formatting sheme for the DML builtin parameters. Additionally, the changes were also applied to the Python API using the auto generator script, where I added a small fix, because some builtins were not parsed correctly.

The changes also resulted in a couple changes in the test script for the builtin scripts. For the discovery of the usages of the changed builtins, I had to manually "grep" the ./src/test/scripts folder, so that I dont break any tests, but also the ./script folder to catch the usages in tutorials, perftest, etc. I hope, I've catched everything there, but I can not guaranty it, since the github test just run on test script folder.

I applied the following formatting convention:

camelCase, no "_"
matrix / vector arguments must be capitalized
for scalars, the first character must be lower case

allowed:
X, Xtest, ScaleFactor,
tol, maxIter

not allowed:
X_test, max_iter

Furtheremore, I used the following python script / notebook for the discovery of all unique parameter names:
https://gist.github.com/e-strauss/ccec59f504ed3882a2141d859ff681ec

In the following, I want to give a quick overview over the consolidations, that I applied.
In total, I found 62 matches for consolidation and reduced the total number of unique parameters to 340, which was previously at 402.

I changed the spelling of the following 76 parameters names:
https://gist.github.com/e-strauss/3a5f27160178024c079ed37072c54cfa

Through either consolidation or applying the naming convention, I removed the following 138 parameter names:
https://gist.github.com/e-strauss/3a5f27160178024c079ed37072c54cfa

A list of all current parameters can be found here:
https://gist.github.com/e-strauss/8aad1143b016d7bda28f3884038d62d3

codecov · 2025-02-13T22:20:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 21.64%. Comparing base (1609470) to head (2b29972).
Report is 28 commits behind head on main.

Additional details and impacted files

@@              Coverage Diff              @@
##               main    #2228       +/-   ##
=============================================
- Coverage     72.47%   21.64%   -50.83%     
+ Complexity    45440    14140    -31300     
=============================================
  Files          1469     1469               
  Lines        170873   170873               
  Branches      33315    33315               
=============================================
- Hits         123836    36993    -86843     
- Misses        37622   130547    +92925     
+ Partials       9415     3333     -6082

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

e-strauss · 2025-02-14T11:44:13Z

I just squashed the commits and removed the changes in the github action configs

Baunsgaard · 2025-05-13T11:05:43Z

LGTM, we just need to resolve the merge conflicts.

This commit consolidates the parameters of the builtin DML scripts and introduces a formatting sheme for the DML builtin parameters. Additionally, the changes were also applied to the Python API using the auto generator script, where I added a small fix, because some builtins were not parsed correctly. The changes in the builtin scripts arguments: order_rand -> Order search_iterations -> iter batch_size -> batchSize n_bins -> n ytest -> Ytest max_iter -> maxIter max_func_invoc, max_iterations -> maxIter include_mean -> includeMean icpt -> intercept maxi -> maxIteration maxii -> maxInnerIteration cMask -> ctypes is_verbose -> verbose k_max -> maxK kmax -> iter (start/end)_stepsize -> (start/end)Stepsize (start/end)_vicinity -> (start/end)Vicinity sim_seed -> seed k_value -> k CL_T -> ctypes trans_continuous -> tranCont select_k -> selectK k_min -> minK k_max -> maxK select_feature -> selectFeature feature_max -> maxFeatures feature_importance -> featureImportance predict_con_tg -> predictCont START_SELECTED -> initSelectFeature frequency_threshold -> frequencyThreshold distance_threshold -> distanceThreshold is_verbose -> verbose thresh -> threshold reduced_dims -> reducedDims max_iter -> maxIter is_verbose -> verbose print_iter -> printIter min_leaf -> minLeaf min_split -> minSplit num_trees -> numTrees max_depth -> maxDepth max_features -> maxFeatures max_values -> maxValues max_dataratio -> maxDataRatio sample_frac -> sampleFrac feature_frac -> featureFrac window_size -> windowSize sample_percent -> sampleFrac is_verbose -> verbose alpha -> lr batch_size -> batchSize learning_rate -> lr out_activation -> outActivation loss_fcn -> lossFn validation_split -> validationSplit types -> ctypes sml_type -> smlType num_trees -> numTrees learning_rate -> lr max_depth -> maxDepth lambda -> reg icpt -> intercept maxi, moi, iter -> maxIter mii -> maxInnerIter epsilon -> tol maxIterations -> maxIter maxii -> maxInnerIter eps -> tol max_iter, iterations -> maxIter is_verbose -> verbose avg_sample_size_per_centroid -> avgSampleSizePerCentroid space_decomp -> spaceDecomp n_components -> nComponents init_param -> initParams reg_covar -> reg rnk -> rank eps -> tolerance maxi -> maxIter dth, thr -> threshold maxi -> maxIter dth, thr -> threshold maxi -> maxIter

e-strauss · 2025-05-14T10:26:17Z

@mboehm7 Hi, can we merge in the changes ? or is there something I should change first

Baunsgaard · 2025-06-06T08:39:18Z

@mboehm7 ,
We need to move forward on this PR, since it is starting to be containing conflicts with the dependent documentation PR.

I did look though the changes and they seem reasonable.

However, you (@mboehm7) voiced some concerns on the changes. Could you give a LGTM, merge, or voice your concerns here?

e-strauss force-pushed the DML_parameter_consolidation branch from d60724e to 2b29972 Compare February 14, 2025 11:42

e-strauss force-pushed the DML_parameter_consolidation branch from 2b29972 to 7e0ae37 Compare May 14, 2025 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYSTEMDS-3840] Builtin scripts parameter consolidation #2228

[SYSTEMDS-3840] Builtin scripts parameter consolidation #2228

Uh oh!

e-strauss commented Feb 13, 2025 •

edited

Loading

Uh oh!

codecov bot commented Feb 13, 2025 •

edited

Loading

Uh oh!

e-strauss commented Feb 14, 2025

Uh oh!

Baunsgaard commented May 13, 2025

Uh oh!

e-strauss commented May 14, 2025

Uh oh!

Baunsgaard commented Jun 6, 2025

Uh oh!

Uh oh!

[SYSTEMDS-3840] Builtin scripts parameter consolidation #2228

Are you sure you want to change the base?

[SYSTEMDS-3840] Builtin scripts parameter consolidation #2228

Uh oh!

Conversation

e-strauss commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

e-strauss commented Feb 14, 2025

Uh oh!

Baunsgaard commented May 13, 2025

Uh oh!

e-strauss commented May 14, 2025

Uh oh!

Baunsgaard commented Jun 6, 2025

Uh oh!

Uh oh!

e-strauss commented Feb 13, 2025 •

edited

Loading

codecov bot commented Feb 13, 2025 •

edited

Loading