Skip to content

Benchmark suite trains on same train partition across folds #327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JimAchterbergLUMC opened this issue Mar 30, 2025 · 0 comments
Open

Comments

@JimAchterbergLUMC
Copy link

Description

The benchmark suite trains on the same train-test partition (of real data) in each fold. This is due to the train-test split being constructed from the random state of the dataloader object which is passed to the benchmarking suite, which is fixed. The benchmarking suite loops over different random states, but these only affect the random state of the generators/metrics, not of the dataloader, and thus the same training partition of real data is used each time.

I would expect this is unintentional, as a proper (sort of) cross-validation procedure requires training and testing across different random splits, to avoid biased results from biased splits.

How to Reproduce

  1. Go to 'benchmark/init.py', line 197. Here, we see the train partition being made by calling .train() on the dataloader object.
  2. Go to 'plugins/core/dataloader.py', line 88 and line 452. Here we see the train-test split is made from the (fixed) dataloader random state.
  3. You can also easily investigate this issue by executing the benchmarking suite and printing the training data at each benchmark iteration; it is always the same.

Expected Behavior

Make a different train-test split in each benchmarking iteration.
An easy fix could be, e.g., passing the varying random state as an argument to the .train() method of the dataloader in each benchmarking iteration.

Background

I first noticed this issue since the "performance" metrics (i.e., ML efficacy from XGBoost etc.) have basically no variance for the real data - this makes sense when it is computed for the same train test split (and predictive model) each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant