Skip to content

DMatrix Refactor for Approximate Static Histogram Based Trees #1673

Closed
@tqchen

Description

@tqchen

There has been interesting improvements in histogram based trees. Specifically from LightGBM and FastBDT

While XGBoost support dynamic histograms and exact greedy as option for deeper depth. We would like to think what changes are needed to make DMatrix suitable for histogram aggregations so a faster approximation algorithm can be achieved.

  • Being able get subset of rows efficiently
  • Stored the quantized binning id of feature values

These seems to be achievable in in memory format. For external memory format, getting subset of rows seems to be a bit more complicated.

Let us see what data structure refactor could be done in DMatrix to support the in-memory format easily, and thus enables implementation of the recent improvements in xgboost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions