Skip to content

replace_na(NA, 0) returning FALSE in tidyr 1.2.0 (instead of 0 as in earlier versions) #1317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fabiangehring opened this issue Feb 8, 2022 · 7 comments

Comments

@fabiangehring
Copy link

We just descovered a breaking change of replace_na in tidyr 1.2.0.
In earlier versions tidyr::replace_na(NA, 0) returned 0 (expected). In 1.2.0 it returns FALSE (unexpected).

tidyr::replace_na(NA, 0) #returning FALSE instead of 0
@batpigandme
Copy link
Contributor

Thanks for the report.

Might be related to the change in strictness for replace_na() re. change of type. See:
https://www.tidyverse.org/blog/2022/02/tidyr-1-2-0/#missing-values

@batpigandme
Copy link
Contributor

batpigandme commented Feb 8, 2022

0 and 1 are, by default, parsed as logical, so the error is more obvious if you use 2:

tidyr::replace_na(NA, 0)
#> [1] FALSE

tidyr::replace_na(NA, 1)
#> [1] TRUE

tidyr::replace_na(NA, 2)
#> Error in `stop_vctrs()`:
#> ! Can't convert from `replace` <double> to `data` <logical> due to loss of precision.
#> • Locations: 1

Created on 2022-02-08 by the reprex package (v2.0.1)

@DavisVaughan
Copy link
Member

As @batpigandme mentioned, this is part of the switch to using vctrs in replace_na(). The replacement value (RHS) is now cast to the type of the original column (LHS). This generally adds a lot of type safety.

library(tidyr)

# The RHS (0, double) is cast to the type of the LHS (logical)
# `0` is cast to `FALSE`
replace_na(c(TRUE, NA), 0)
#> [1]  TRUE FALSE

# Same behavior here
replace_na(NA, 0)
#> [1] FALSE

# If you use a value that can't be cast to logical, thats an error
replace_na(NA, 1.5)
#> Error:
#> ! Can't convert from `replace` <double> to `data` <logical> due to loss of precision.
#> • Locations: 1

I am not entirely sure how your NA vector/column came to be, but you currently have two options to work around this

library(tidyr)

# Use the double version of NA
replace_na(NA_real_, 0)
#> [1] 0

# Or just cast your logical vector to double ahead of time
replace_na(as.double(NA), 0)
#> [1] 0

I've also opened r-lib/vctrs#1514 so we can think about this specific case of a vector of all logical NAs a little more, as it is a bit of a special case.

@DavisVaughan
Copy link
Member

Could you help us understand how your column of NA values was generated to begin with?

@fabiangehring
Copy link
Author

fabiangehring commented Feb 8, 2022

Thanks for pointing me to the stricter "type consistency".

The error showed up in a situation where I try to extract a number out of a string and convert it to an integer.

# working in tidyr < 1.2.0
stringr::str_extract("abc#1", "(?<=#)[0-9]+") %>%
  replace_na(0L) %>%
  as.integer()

tidyr < 1.2.0 allowed this sloppiness. tidyr 1.2.0 now forces me to be cleaner (which is ok, when thinking about it).

# this works in tidyr 1.2.0
stringr::str_extract("abc#1", "(?<=#)[0-9]+") %>%
  tidyr::replace_na("0") %>%
  as.integer()

# this works in tidyr 1.2.0
stringr::str_extract("abc#1", "(?<=#)[0-9]+") %>%
  as.integer() %>%
  tidyr::replace_na(0L)

Maybe the new behaviour is worth mentioning in the docs.

@DavisVaughan
Copy link
Member

DavisVaughan commented Feb 8, 2022

Ah, yes this case here is slightly different from your original example. This is definitely an example of where I think the new vctrs behavior is "safer" and helps you produce better code in the end.

library(tidyr)

extraction <- stringr::str_extract(c(NA, "abc#1"), "(?<=#)[0-9]+")
extraction
#> [1] NA  "1"

# We definitely want this to error, that is part of the benefit of switching to vctrs!
replace_na(extraction, 0L)
#> Error:
#> ! Can't convert `replace` <integer> to match type of `data` <character>.

# Yes, something like this is the right / safer way
replace_na(extraction, "0") %>%
  readr::parse_integer()
#> [1] 0 1

# I'd probably parse and then replace, so you can use `0L`.
# That will also catch any parse failures.
readr::parse_integer(extraction) %>%
  replace_na(0L)
#> [1] 0 1

Created on 2022-02-08 by the reprex package (v2.0.1)

@DavisVaughan
Copy link
Member

It does seem worth mentioning in the docs that replace will be cast to the type of data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants