You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The benchmark suite trains on the same train-test partition (of real data) in each fold. This is due to the train-test split being constructed from the random state of the dataloader object which is passed to the benchmarking suite, which is fixed. The benchmarking suite loops over different random states, but these only affect the random state of the generators/metrics, not of the dataloader, and thus the same training partition of real data is used each time.
I would expect this is unintentional, as a proper (sort of) cross-validation procedure requires training and testing across different random splits, to avoid biased results from biased splits.
How to Reproduce
Go to 'benchmark/init.py', line 197. Here, we see the train partition being made by calling .train() on the dataloader object.
Go to 'plugins/core/dataloader.py', line 88 and line 452. Here we see the train-test split is made from the (fixed) dataloader random state.
You can also easily investigate this issue by executing the benchmarking suite and printing the training data at each benchmark iteration; it is always the same.
Expected Behavior
Make a different train-test split in each benchmarking iteration.
An easy fix could be, e.g., passing the varying random state as an argument to the .train() method of the dataloader in each benchmarking iteration.
Background
I first noticed this issue since the "performance" metrics (i.e., ML efficacy from XGBoost etc.) have basically no variance for the real data - this makes sense when it is computed for the same train test split (and predictive model) each time.
The text was updated successfully, but these errors were encountered:
Description
The benchmark suite trains on the same train-test partition (of real data) in each fold. This is due to the train-test split being constructed from the random state of the dataloader object which is passed to the benchmarking suite, which is fixed. The benchmarking suite loops over different random states, but these only affect the random state of the generators/metrics, not of the dataloader, and thus the same training partition of real data is used each time.
I would expect this is unintentional, as a proper (sort of) cross-validation procedure requires training and testing across different random splits, to avoid biased results from biased splits.
How to Reproduce
Expected Behavior
Make a different train-test split in each benchmarking iteration.
An easy fix could be, e.g., passing the varying random state as an argument to the .train() method of the dataloader in each benchmarking iteration.
Background
I first noticed this issue since the "performance" metrics (i.e., ML efficacy from XGBoost etc.) have basically no variance for the real data - this makes sense when it is computed for the same train test split (and predictive model) each time.
The text was updated successfully, but these errors were encountered: