Skip to content

pivot_longer only returns a subset of data from cols #792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tdhock opened this issue Oct 23, 2019 · 5 comments
Closed

pivot_longer only returns a subset of data from cols #792

tdhock opened this issue Oct 23, 2019 · 5 comments
Labels
bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes"
Milestone

Comments

@tdhock
Copy link
Contributor

tdhock commented Oct 23, 2019

I'm using pivot_longer with names_pattern, and the number of rows returned depends on the names_to/names_pattern arguments. I expected that the number of rows returns should only depend on cols.

For example consider two columns of the iris data. I expected that both of the following would result in 8 rows.

tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to="part", names_pattern="(.*)[.].*")
tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to=c("part", "dim"), names_pattern="(.*)[.](.*)")

However the first one only returns 4 rows (only the Width data, not the Length data).

> tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to="part", names_pattern="(.*)[.].*")
# A tibble: 4 x 3
  Species    part  value
  <fct>      <chr> <dbl>
1 setosa     Sepal   3.5
2 setosa     Petal   0.2
3 versicolor Sepal   3.2
4 versicolor Petal   1.4
> tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to=c("part", "dim"), names_pattern="(.*)[.](.*)")
# A tibble: 8 x 4
  Species    part  dim    value
  <fct>      <chr> <chr>  <dbl>
1 setosa     Sepal Length   5.1
2 setosa     Sepal Width    3.5
3 setosa     Petal Length   1.4
4 setosa     Petal Width    0.2
5 versicolor Sepal Length   7  
6 versicolor Sepal Width    3.2
7 versicolor Petal Length   4.7
8 versicolor Petal Width    1.4
> 

I would have expected a warning, something like "you asked to pivot columns 1:4 but we only are returning data from columns 2 and 4"

@mikmart

This comment has been minimized.

@hadley hadley added bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes" labels Nov 13, 2019
@hadley

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Nov 28, 2019

Slightly simpler reprex:

library(tidyr)
df <- tibble(g = 1:2, x1 = 1:2, y1 = 1:2, x2 = 1:2)

df %>% pivot_longer(-g)
#> # A tibble: 6 x 3
#>       g name  value
#>   <int> <chr> <int>
#> 1     1 x1        1
#> 2     1 y1        1
#> 3     1 x2        1
#> 4     2 x1        2
#> 5     2 y1        2
#> 6     2 x2        2
df %>% pivot_longer(-g, names_pattern = "(..)")
#> # A tibble: 6 x 3
#>       g name  value
#>   <int> <chr> <int>
#> 1     1 x1        1
#> 2     1 y1        1
#> 3     1 x2        1
#> 4     2 x1        2
#> 5     2 y1        2
#> 6     2 x2        2
df %>% pivot_longer(-g, names_pattern = "(.).")
#> # A tibble: 4 x 3
#>       g name  value
#>   <int> <chr> <int>
#> 1     1 x         1
#> 2     1 y         1
#> 3     2 x         2
#> 4     2 y         2

df %>% pivot_longer(-g, names_to = c("x", "num"), names_pattern = "(.)(.)")
#> # A tibble: 6 x 4
#>       g x     num   value
#>   <int> <chr> <chr> <int>
#> 1     1 x     1         1
#> 2     1 y     1         1
#> 3     1 x     2         1
#> 4     2 x     1         2
#> 5     2 y     1         2
#> 6     2 x     2         2

Created on 2019-11-27 by the reprex package (v0.3.0)

@hadley
Copy link
Member

hadley commented Dec 6, 2019

I think it's a little easier to understand the problem by looking at the specs:

library(tidyr)
df <- tibble(g = 1:2, x1 = 1:2, y1 = 1:2, x2 = 1:2)

# OK: the additional columns uniquely identify each row
df %>% build_longer_spec(-g)
#> # A tibble: 3 x 3
#>   .name .value name 
#>   <chr> <chr>  <chr>
#> 1 x1    value  x1   
#> 2 y1    value  y1   
#> 3 x2    value  x2
df %>% build_longer_spec(-g, names_pattern = "(..)")
#> # A tibble: 3 x 3
#>   .name .value name 
#>   <chr> <chr>  <chr>
#> 1 x1    value  x1   
#> 2 y1    value  y1   
#> 3 x2    value  x2
df %>% build_longer_spec(-g, names_to = c("x", "num"), names_pattern = "(.)(.)")
#> # A tibble: 3 x 4
#>   .name .value x     num  
#>   <chr> <chr>  <chr> <chr>
#> 1 x1    value  x     1    
#> 2 y1    value  y     1    
#> 3 x2    value  x     2

# NOT OK: the additional columns don't uniquely identify each row
df %>% build_longer_spec(-g, names_pattern = "(.).")
#> # A tibble: 3 x 3
#>   .name .value name 
#>   <chr> <chr>  <chr>
#> 1 x1    value  x    
#> 2 y1    value  y    
#> 3 x2    value  x

Created on 2019-12-06 by the reprex package (v0.3.0)

@hadley hadley closed this as completed in e1b548b Dec 7, 2019
@hadley
Copy link
Member

hadley commented Dec 7, 2019

Just to circle back to the motivating example, this is the result with the dev version:

tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to="part", names_pattern="(.*)[.].*")
#> # A tibble: 8 x 3
#>   Species    part  value
#>   <fct>      <chr> <dbl>
#> 1 setosa     Sepal   5.1
#> 2 setosa     Sepal   3.5
#> 3 setosa     Petal   1.4
#> 4 setosa     Petal   0.2
#> 5 versicolor Sepal   7  
#> 6 versicolor Sepal   3.2
#> 7 versicolor Petal   4.7
#> 8 versicolor Petal   1.4

Created on 2019-12-07 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes"
Projects
None yet
Development

No branches or pull requests

3 participants