Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
Arrow seems to have issues with labelled data, as it might come from STATA datasets.
Filter labelled data in an arrow table does not work and throws an error. So far, so good, I can deal with that.
library(haven)
library(arrow)
library(tibble)
library(dplyr)
d <- tibble(
a = labelled(x = 1:5, label = "example variable a"),
b = labelled(x = 11:15, label = "example variable b")
)
d
#> # A tibble: 5 × 2
#> a b
#> <int+lbl> <int+lbl>
#> 1 1 11
#> 2 2 12
#> 3 3 13
#> 4 4 14
#> 5 5 15
d %>%
as_arrow_table() %>%
filter(a > 3) %>%
collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! NotImplemented: Function 'greater' has no kernel matching input types (<labelled<integer>[0]>: example variable a, <labelled<integer>[0]>: example variable a)
But when leaving out the final collect()
to execute the query, the R session crashes completely:
d %>%
as_arrow_table() %>%
filter(a > 5)
# R crashes....
Component(s)
R