-
Notifications
You must be signed in to change notification settings - Fork 500
[SYSTEMDS-3840] Builtin scripts parameter consolidation #2228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2228 +/- ##
=============================================
- Coverage 72.47% 21.64% -50.83%
+ Complexity 45440 14140 -31300
=============================================
Files 1469 1469
Lines 170873 170873
Branches 33315 33315
=============================================
- Hits 123836 36993 -86843
- Misses 37622 130547 +92925
+ Partials 9415 3333 -6082 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d60724e
to
2b29972
Compare
I just squashed the commits and removed the changes in the github action configs |
LGTM, we just need to resolve the merge conflicts. |
This commit consolidates the parameters of the builtin DML scripts and introduces a formatting sheme for the DML builtin parameters. Additionally, the changes were also applied to the Python API using the auto generator script, where I added a small fix, because some builtins were not parsed correctly. The changes in the builtin scripts arguments: order_rand -> Order search_iterations -> iter batch_size -> batchSize n_bins -> n ytest -> Ytest max_iter -> maxIter max_func_invoc, max_iterations -> maxIter include_mean -> includeMean icpt -> intercept maxi -> maxIteration maxii -> maxInnerIteration cMask -> ctypes is_verbose -> verbose k_max -> maxK kmax -> iter (start/end)_stepsize -> (start/end)Stepsize (start/end)_vicinity -> (start/end)Vicinity sim_seed -> seed k_value -> k CL_T -> ctypes trans_continuous -> tranCont select_k -> selectK k_min -> minK k_max -> maxK select_feature -> selectFeature feature_max -> maxFeatures feature_importance -> featureImportance predict_con_tg -> predictCont START_SELECTED -> initSelectFeature frequency_threshold -> frequencyThreshold distance_threshold -> distanceThreshold is_verbose -> verbose thresh -> threshold reduced_dims -> reducedDims max_iter -> maxIter is_verbose -> verbose print_iter -> printIter min_leaf -> minLeaf min_split -> minSplit num_trees -> numTrees max_depth -> maxDepth max_features -> maxFeatures max_values -> maxValues max_dataratio -> maxDataRatio sample_frac -> sampleFrac feature_frac -> featureFrac window_size -> windowSize sample_percent -> sampleFrac is_verbose -> verbose alpha -> lr batch_size -> batchSize learning_rate -> lr out_activation -> outActivation loss_fcn -> lossFn validation_split -> validationSplit types -> ctypes sml_type -> smlType num_trees -> numTrees learning_rate -> lr max_depth -> maxDepth lambda -> reg icpt -> intercept maxi, moi, iter -> maxIter mii -> maxInnerIter epsilon -> tol maxIterations -> maxIter maxii -> maxInnerIter eps -> tol max_iter, iterations -> maxIter is_verbose -> verbose avg_sample_size_per_centroid -> avgSampleSizePerCentroid space_decomp -> spaceDecomp n_components -> nComponents init_param -> initParams reg_covar -> reg rnk -> rank eps -> tolerance maxi -> maxIter dth, thr -> threshold maxi -> maxIter dth, thr -> threshold maxi -> maxIter
2b29972
to
7e0ae37
Compare
@mboehm7 Hi, can we merge in the changes ? or is there something I should change first |
@mboehm7 , I did look though the changes and they seem reasonable. However, you (@mboehm7) voiced some concerns on the changes. Could you give a LGTM, merge, or voice your concerns here? |
This PR consolidates the parameters of the builtin DML scripts and introduces a formatting sheme for the DML builtin parameters. Additionally, the changes were also applied to the Python API using the auto generator script, where I added a small fix, because some builtins were not parsed correctly.
The changes also resulted in a couple changes in the test script for the builtin scripts. For the discovery of the usages of the changed builtins, I had to manually "grep" the ./src/test/scripts folder, so that I dont break any tests, but also the ./script folder to catch the usages in tutorials, perftest, etc. I hope, I've catched everything there, but I can not guaranty it, since the github test just run on test script folder.
I applied the following formatting convention:
allowed:
X, Xtest, ScaleFactor,
tol, maxIter
not allowed:
X_test, max_iter
Furtheremore, I used the following python script / notebook for the discovery of all unique parameter names:
https://gist.github.com/e-strauss/ccec59f504ed3882a2141d859ff681ec
In the following, I want to give a quick overview over the consolidations, that I applied.
In total, I found 62 matches for consolidation and reduced the total number of unique parameters to 340, which was previously at 402.
I changed the spelling of the following 76 parameters names:
https://gist.github.com/e-strauss/3a5f27160178024c079ed37072c54cfa
Through either consolidation or applying the naming convention, I removed the following 138 parameter names:
https://gist.github.com/e-strauss/3a5f27160178024c079ed37072c54cfa
A list of all current parameters can be found here:
https://gist.github.com/e-strauss/8aad1143b016d7bda28f3884038d62d3