Closed
Description
On the following screenshot I compare performance of recode(cat_data_vec, cat_vec=>"None")
for 2 cases:
a) cat_vec is a categorical array
b) cat_vec is a string array
It turns out that performance of a) is significantly lower
It is not easy to reproduce the real scale of performance differences on artificial data, by on real data (300k records and 30k unique categories) case a) is actually 10 times slower (40 seconds) than b) (4 seconds).
Metadata
Metadata
Assignees
Labels
No labels