Skip to content

FR: group_indices() should also return vector of group representatives #2121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krlmlr opened this issue Sep 12, 2016 · 5 comments
Closed

FR: group_indices() should also return vector of group representatives #2121

krlmlr opened this issue Sep 12, 2016 · 5 comments
Assignees
Labels
feature a feature request or enhancement

Comments

@krlmlr
Copy link
Member

krlmlr commented Sep 12, 2016

perhaps as an attribute. This will allow simplifying and accelerating tidyr::nest().

Example:

iris[c(1,2,50,51,100,101), ] %>% group_by(Species) %>% group_indices
## [1] 1 1 1 2 2 3
## attr(,"repr")
## [1] 1 4 6
@hadley hadley added the feature a feature request or enhancement label Feb 22, 2017
@romainfrancois
Copy link
Member

romainfrancois commented Apr 23, 2018

So the "representative" are 1-based indices of the first row in the data that belong to this group ? @krlmlr.

In that case, we can easily update grouped_indices_grouped_df_impl :

// [[Rcpp::export]]
IntegerVector grouped_indices_grouped_df_impl(GroupedDataFrame gdf) {
  int n = gdf.nrows();
  IntegerVector res = no_init(n);
  int ngroups = gdf.ngroups();
  GroupedDataFrameIndexIterator it = gdf.group_begin();
  for (int i = 0; i < ngroups; i++, ++it) {
    const SlicingIndex& index = *it;
    int n_index = index.size();
    for (int j = 0; j < n_index; j++) {
      res[ index[j] ] = i + 1;
    }
  }
  return res;
}

@romainfrancois romainfrancois self-assigned this Apr 23, 2018
@romainfrancois
Copy link
Member

romainfrancois commented Apr 23, 2018

Need to be careful when doing #341 because empty groups won't have representatives.

They can be NA or 0 or perhaps some unique negative number

@krlmlr
Copy link
Member Author

krlmlr commented Apr 23, 2018

Good catch. For 1-based representatives, NA_integer_ seems better because df[NA_integer_, ] still is a one-row data frame as opposed to df[0, ]:

iris[NA_integer_, ]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> NA           NA          NA           NA          NA    <NA>
iris[0, ]
#> [1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
#> <0 rows> (or 0-length row.names)

Created on 2018-04-23 by the reprex package (v0.2.0).

@romainfrancois
Copy link
Member

We don't need this anymore as we have group_data():

> iris[c(1,2,50,51,100,101), ] %>% group_by(Species) %>% group_data()
# A tibble: 3 x 2
  Species    .rows    
  <fct>      <list>   
1 setosa     <int [3]>
2 versicolor <int [2]>
3 virginica  <int [1]>

tidyr::nest will just use group_data() as in #3622

@lock
Copy link

lock bot commented Nov 26, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Nov 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants