-
Notifications
You must be signed in to change notification settings - Fork 415
pivot_longer only returns a subset of data from cols #792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
bug
an unexpected problem or unintended behavior
pivoting ♻️
pivot rectangular data to different "shapes"
Milestone
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Slightly simpler reprex: library(tidyr)
df <- tibble(g = 1:2, x1 = 1:2, y1 = 1:2, x2 = 1:2)
df %>% pivot_longer(-g)
#> # A tibble: 6 x 3
#> g name value
#> <int> <chr> <int>
#> 1 1 x1 1
#> 2 1 y1 1
#> 3 1 x2 1
#> 4 2 x1 2
#> 5 2 y1 2
#> 6 2 x2 2
df %>% pivot_longer(-g, names_pattern = "(..)")
#> # A tibble: 6 x 3
#> g name value
#> <int> <chr> <int>
#> 1 1 x1 1
#> 2 1 y1 1
#> 3 1 x2 1
#> 4 2 x1 2
#> 5 2 y1 2
#> 6 2 x2 2
df %>% pivot_longer(-g, names_pattern = "(.).")
#> # A tibble: 4 x 3
#> g name value
#> <int> <chr> <int>
#> 1 1 x 1
#> 2 1 y 1
#> 3 2 x 2
#> 4 2 y 2
df %>% pivot_longer(-g, names_to = c("x", "num"), names_pattern = "(.)(.)")
#> # A tibble: 6 x 4
#> g x num value
#> <int> <chr> <chr> <int>
#> 1 1 x 1 1
#> 2 1 y 1 1
#> 3 1 x 2 1
#> 4 2 x 1 2
#> 5 2 y 1 2
#> 6 2 x 2 2 Created on 2019-11-27 by the reprex package (v0.3.0) |
I think it's a little easier to understand the problem by looking at the specs: library(tidyr)
df <- tibble(g = 1:2, x1 = 1:2, y1 = 1:2, x2 = 1:2)
# OK: the additional columns uniquely identify each row
df %>% build_longer_spec(-g)
#> # A tibble: 3 x 3
#> .name .value name
#> <chr> <chr> <chr>
#> 1 x1 value x1
#> 2 y1 value y1
#> 3 x2 value x2
df %>% build_longer_spec(-g, names_pattern = "(..)")
#> # A tibble: 3 x 3
#> .name .value name
#> <chr> <chr> <chr>
#> 1 x1 value x1
#> 2 y1 value y1
#> 3 x2 value x2
df %>% build_longer_spec(-g, names_to = c("x", "num"), names_pattern = "(.)(.)")
#> # A tibble: 3 x 4
#> .name .value x num
#> <chr> <chr> <chr> <chr>
#> 1 x1 value x 1
#> 2 y1 value y 1
#> 3 x2 value x 2
# NOT OK: the additional columns don't uniquely identify each row
df %>% build_longer_spec(-g, names_pattern = "(.).")
#> # A tibble: 3 x 3
#> .name .value name
#> <chr> <chr> <chr>
#> 1 x1 value x
#> 2 y1 value y
#> 3 x2 value x Created on 2019-12-06 by the reprex package (v0.3.0) |
Just to circle back to the motivating example, this is the result with the dev version: tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to="part", names_pattern="(.*)[.].*")
#> # A tibble: 8 x 3
#> Species part value
#> <fct> <chr> <dbl>
#> 1 setosa Sepal 5.1
#> 2 setosa Sepal 3.5
#> 3 setosa Petal 1.4
#> 4 setosa Petal 0.2
#> 5 versicolor Sepal 7
#> 6 versicolor Sepal 3.2
#> 7 versicolor Petal 4.7
#> 8 versicolor Petal 1.4 Created on 2019-12-07 by the reprex package (v0.3.0) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug
an unexpected problem or unintended behavior
pivoting ♻️
pivot rectangular data to different "shapes"
I'm using pivot_longer with names_pattern, and the number of rows returned depends on the names_to/names_pattern arguments. I expected that the number of rows returns should only depend on cols.
For example consider two columns of the iris data. I expected that both of the following would result in 8 rows.
However the first one only returns 4 rows (only the Width data, not the Length data).
I would have expected a warning, something like "you asked to pivot columns 1:4 but we only are returning data from columns 2 and 4"
The text was updated successfully, but these errors were encountered: