Skip to content

[R] R arrow cannot handle labelled data in arrow tables #45601

Closed
@EinMaulwurf

Description

@EinMaulwurf

Describe the bug, including details regarding any error messages, version, and platform.

Arrow seems to have issues with labelled data, as it might come from STATA datasets.

Filter labelled data in an arrow table does not work and throws an error. So far, so good, I can deal with that.

library(haven)
library(arrow)
library(tibble)
library(dplyr)

d <- tibble(
  a = labelled(x = 1:5, label = "example variable a"),
  b = labelled(x = 11:15, label = "example variable b")
)

d
#> # A tibble: 5 × 2
#>   a         b        
#>   <int+lbl> <int+lbl>
#> 1 1         11       
#> 2 2         12       
#> 3 3         13       
#> 4 4         14       
#> 5 5         15

d %>%
  as_arrow_table() %>%
  filter(a > 3) %>%
  collect()
#> Error in `compute.arrow_dplyr_query()`:
#> ! NotImplemented: Function 'greater' has no kernel matching input types (<labelled<integer>[0]>: example variable a, <labelled<integer>[0]>: example variable a)

But when leaving out the final collect() to execute the query, the R session crashes completely:

d %>%
  as_arrow_table() %>%
  filter(a > 5)
# R crashes....

Component(s)

R

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions