pivot_longer only returns a subset of data from cols #792

tdhock · 2019-10-23T21:55:32Z

I'm using pivot_longer with names_pattern, and the number of rows returned depends on the names_to/names_pattern arguments. I expected that the number of rows returns should only depend on cols.

For example consider two columns of the iris data. I expected that both of the following would result in 8 rows.

tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to="part", names_pattern="(.*)[.].*")
tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to=c("part", "dim"), names_pattern="(.*)[.](.*)")

However the first one only returns 4 rows (only the Width data, not the Length data).

> tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to="part", names_pattern="(.*)[.].*")
# A tibble: 4 x 3
  Species    part  value
  <fct>      <chr> <dbl>
1 setosa     Sepal   3.5
2 setosa     Petal   0.2
3 versicolor Sepal   3.2
4 versicolor Petal   1.4
> tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to=c("part", "dim"), names_pattern="(.*)[.](.*)")
# A tibble: 8 x 4
  Species    part  dim    value
  <fct>      <chr> <chr>  <dbl>
1 setosa     Sepal Length   5.1
2 setosa     Sepal Width    3.5
3 setosa     Petal Length   1.4
4 setosa     Petal Width    0.2
5 versicolor Sepal Length   7  
6 versicolor Sepal Width    3.2
7 versicolor Petal Length   4.7
8 versicolor Petal Width    1.4
>

I would have expected a warning, something like "you asked to pivot columns 1:4 but we only are returning data from columns 2 and 4"

The text was updated successfully, but these errors were encountered:

hadley · 2019-11-28T02:44:45Z

Slightly simpler reprex:

library(tidyr)
df <- tibble(g = 1:2, x1 = 1:2, y1 = 1:2, x2 = 1:2)

df %>% pivot_longer(-g)
#> # A tibble: 6 x 3
#>       g name  value
#>   <int> <chr> <int>
#> 1     1 x1        1
#> 2     1 y1        1
#> 3     1 x2        1
#> 4     2 x1        2
#> 5     2 y1        2
#> 6     2 x2        2
df %>% pivot_longer(-g, names_pattern = "(..)")
#> # A tibble: 6 x 3
#>       g name  value
#>   <int> <chr> <int>
#> 1     1 x1        1
#> 2     1 y1        1
#> 3     1 x2        1
#> 4     2 x1        2
#> 5     2 y1        2
#> 6     2 x2        2
df %>% pivot_longer(-g, names_pattern = "(.).")
#> # A tibble: 4 x 3
#>       g name  value
#>   <int> <chr> <int>
#> 1     1 x         1
#> 2     1 y         1
#> 3     2 x         2
#> 4     2 y         2

df %>% pivot_longer(-g, names_to = c("x", "num"), names_pattern = "(.)(.)")
#> # A tibble: 6 x 4
#>       g x     num   value
#>   <int> <chr> <chr> <int>
#> 1     1 x     1         1
#> 2     1 y     1         1
#> 3     1 x     2         1
#> 4     2 x     1         2
#> 5     2 y     1         2
#> 6     2 x     2         2

^{Created on 2019-11-27 by the reprex package (v0.3.0)}

hadley · 2019-12-06T22:59:24Z

I think it's a little easier to understand the problem by looking at the specs:

library(tidyr)
df <- tibble(g = 1:2, x1 = 1:2, y1 = 1:2, x2 = 1:2)

# OK: the additional columns uniquely identify each row
df %>% build_longer_spec(-g)
#> # A tibble: 3 x 3
#>   .name .value name 
#>   <chr> <chr>  <chr>
#> 1 x1    value  x1   
#> 2 y1    value  y1   
#> 3 x2    value  x2
df %>% build_longer_spec(-g, names_pattern = "(..)")
#> # A tibble: 3 x 3
#>   .name .value name 
#>   <chr> <chr>  <chr>
#> 1 x1    value  x1   
#> 2 y1    value  y1   
#> 3 x2    value  x2
df %>% build_longer_spec(-g, names_to = c("x", "num"), names_pattern = "(.)(.)")
#> # A tibble: 3 x 4
#>   .name .value x     num  
#>   <chr> <chr>  <chr> <chr>
#> 1 x1    value  x     1    
#> 2 y1    value  y     1    
#> 3 x2    value  x     2

# NOT OK: the additional columns don't uniquely identify each row
df %>% build_longer_spec(-g, names_pattern = "(.).")
#> # A tibble: 3 x 3
#>   .name .value name 
#>   <chr> <chr>  <chr>
#> 1 x1    value  x    
#> 2 y1    value  y    
#> 3 x2    value  x

^{Created on 2019-12-06 by the reprex package (v0.3.0)}

hadley · 2019-12-07T17:00:09Z

Just to circle back to the motivating example, this is the result with the dev version:

tidyr::pivot_longer(iris[c(1,51),], 1:4, names_to="part", names_pattern="(.*)[.].*")
#> # A tibble: 8 x 3
#>   Species    part  value
#>   <fct>      <chr> <dbl>
#> 1 setosa     Sepal   5.1
#> 2 setosa     Sepal   3.5
#> 3 setosa     Petal   1.4
#> 4 setosa     Petal   0.2
#> 5 versicolor Sepal   7  
#> 6 versicolor Sepal   3.2
#> 7 versicolor Petal   4.7
#> 8 versicolor Petal   1.4

^{Created on 2019-12-07 by the reprex package (v0.3.0)}

This comment has been minimized.

Sign in to view

hadley added bug an unexpected problem or unintended behavior pivoting ♻️ pivot rectangular data to different "shapes" labels Nov 13, 2019

This comment has been minimized.

Sign in to view

hadley added this to the v1.1.0 milestone Nov 28, 2019

hadley mentioned this issue Dec 6, 2019

Allow components of names_to in pivot_longer to be NA #793

Closed

hadley closed this as completed in e1b548b Dec 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pivot_longer only returns a subset of data from cols #792

pivot_longer only returns a subset of data from cols #792

tdhock commented Oct 23, 2019

This comment has been minimized.

This comment has been minimized.

hadley commented Nov 28, 2019 •

edited

Loading

hadley commented Dec 6, 2019

hadley commented Dec 7, 2019

pivot_longer only returns a subset of data from cols #792

pivot_longer only returns a subset of data from cols #792

Comments

tdhock commented Oct 23, 2019

This comment has been minimized.

This comment has been minimized.

hadley commented Nov 28, 2019 • edited Loading

hadley commented Dec 6, 2019

hadley commented Dec 7, 2019

hadley commented Nov 28, 2019 •

edited

Loading