Closed
Description
There has been interesting improvements in histogram based trees. Specifically from LightGBM and FastBDT
While XGBoost support dynamic histograms and exact greedy as option for deeper depth. We would like to think what changes are needed to make DMatrix suitable for histogram aggregations so a faster approximation algorithm can be achieved.
- Being able get subset of rows efficiently
- Stored the quantized binning id of feature values
These seems to be achievable in in memory format. For external memory format, getting subset of rows seems to be a bit more complicated.
Let us see what data structure refactor could be done in DMatrix to support the in-memory format easily, and thus enables implementation of the recent improvements in xgboost.
Metadata
Metadata
Assignees
Labels
No labels