-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Make the current group size a variable updated by reference #6727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the current group size a variable updated by reference #6727
Conversation
R/data-mask.R
Outdated
private$env_current <- new_environment(data = list( | ||
`dplyr:::current_group_id` = 0L, | ||
`dplyr:::current_group_size` = 0L | ||
)) | ||
|
||
private$chops <- .Call(dplyr_lazy_vec_chop_impl, data, rows, private$env_current, private$grouped, private$rowwise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We previously created an environment sort of like env_current
from inside dplyr_lazy_vec_chop_impl
and set it as the parent env of chops
, but that meant we would always access that end on the R side using parent.env(private$chops)
and that was a little clunky.
Creating the env on the R side and passing it in seems a little cleaner, and means we have easy access to env_current
and its "current" group information variables for use in n()
and cur_group_id()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement. I agree this makes the code clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a more informative name? current_group_info
?
R/data-mask.R
Outdated
get_current_group_id = function() { | ||
# `dplyr:::current_group_id` is modified by reference at the C level. | ||
# If the result of `get_current_group_id()` is used in a persistent way | ||
# (like in `cur_group_id()`), then it must be duplicated on the way out. | ||
private[["env_current"]][["dplyr:::current_group_id"]] | ||
}, | ||
|
||
get_current_group_size = function() { | ||
# `dplyr:::current_group_size` is modified by reference at the C level. | ||
# If the result of `get_current_group_size()` is used in a persistent way | ||
# (like in `n()`), then it must be duplicated on the way out. | ||
private[["env_current"]][["dplyr:::current_group_size"]] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important to note that these return the exact reference to the object we modify by reference at the C level.
This is why we duplicate()
the result before returning from n()
and cur_group_id()
. If we just returned the variable without duplicating then it could get modified by reference accidentally
R/data-mask.R
Outdated
private$env_current <- new_environment(data = list( | ||
`dplyr:::current_group_id` = 0L, | ||
`dplyr:::current_group_size` = 0L | ||
)) | ||
|
||
private$chops <- .Call(dplyr_lazy_vec_chop_impl, data, rows, private$env_current, private$grouped, private$rowwise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement. I agree this makes the code clearer.
R/data-mask.R
Outdated
private$env_current <- new_environment(data = list( | ||
`dplyr:::current_group_id` = 0L, | ||
`dplyr:::current_group_size` = 0L | ||
)) | ||
|
||
private$chops <- .Call(dplyr_lazy_vec_chop_impl, data, rows, private$env_current, private$grouped, private$rowwise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we find a more informative name? current_group_info
?
- Provide access to it through `$get_current_group_size()` - Use this in `n()`, which makes it a little faster
40bdc3b
to
a7f05dc
Compare
This PR:
$get_current_group_size()
n()
, which makes it a little fasterThis also gives us an easy private way to access the current group size for #6685 (comment)
It also gets rid of the need to do
parent.env(chops)
internally, which always felt a little clunky to me.Created on 2023-02-15 with reprex v2.0.2.9000