You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Function tapply() is the obvious way to produce arrays from data frames.
But users of dplyr have other aggregation functionality that keeps them in the realm of the tidy dataset format.
Perhaps it would be useful to ease their ocassional jumps to array computing offering a kind of tapply() tailored to their conventions.
If you dare to infringe Hadley Wickham's function names copyright, a simple example of this could be:
The main difference dplyr's group_by+summarise has with tapply (and thus, with a refined rray_sumarise) is the groups we are considering. In the first case groups are formed based on the data, an so only the combinations actually present in the data are returned. In the second, "a priori" clasifications are prescribed in the form of factor variables, and an exhaustive crossing of them will be the returned result no matter what the data set actually contains. Not only some individual cells, but even entire rows with no data will be in the result as long as their factor level was prescribed. The order of the levels would be kept as well.
This predictable result seems preferable in aggregate production automation scenarios.
This is an obvious aclaration, but I think it is important here as another justification (besides the ability to operate aggregates of diferent granularities thanks to rray broadcasting, of course) of why a functionality like this complements what dplyr offers now.
My rray_summarise() function based on current dplyr::group_by() doesn't address this completely.
(In terms of dplyr's issue#4392 , I am solving the "expand" part)
But in view of tidyverse/dplyr#4392 (comment) it could change to something completely different.
Function tapply() is the obvious way to produce arrays from data frames.
But users of dplyr have other aggregation functionality that keeps them in the realm of the tidy dataset format.
Perhaps it would be useful to ease their ocassional jumps to array computing offering a kind of tapply() tailored to their conventions.
If you dare to infringe Hadley Wickham's function names copyright, a simple example of this could be:
Created on 2019-06-20 by the reprex package (v0.2.1)
The text was updated successfully, but these errors were encountered: