Skip to content

pull() can also return named vectors #4102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 12, 2019

Conversation

ilarischeinin
Copy link
Contributor

This PR implements an additional parameter for pull() that lets one return named vectors.

Essentially, it is a shorthand so that instead of this:

library(tidyverse)

band_instruments %>% {set_names(pull(., plays), nm = pull(., name))}
#>     John     Paul    Keith 
#> "guitar"   "bass" "guitar"

One can write:

band_instruments %>% pull(plays, name)
#>     John     Paul    Keith 
#> "guitar"   "bass" "guitar"

If one specifies only one variable, the behaviour is exactly the same as before. But if there is a second variable, it is used for the names of the returned vector.

pull(band_instruments, plays)
#> "guitar"   "bass" "guitar"
pull(band_instruments, plays, name)
#>     John     Paul    Keith 
#> "guitar"   "bass" "guitar"

Created on 2019-01-11 by the reprex package (v0.2.1)

Basic tests for the new functionality are included.

I didn't set a default value for the new parameter namevar, and check if it is missing(). But I also pondered if a more fitting pair for the default value of var = -1 could be namevar = 0 for the unnamed case, and I'm happy to change it to that if you like that one better.

As the PR is quite small, I haven't filed an issue, but happy to do that if it's preferred. The contributing instructions tell to do it for "substantial" PRs, but this felt so simple I went straight for the implementation.

I didn't yet make any changes to NEWS (or DESCRIPTION) as in branch master they are still at 0.8.0 for the upcoming release. I'm happy to add to this PR later to add an item to NEWS, but thought I'd leave it out for now to not cause any extra work for maintainers, in case a number of PRs go about bumping the version number and adding a new header to NEWS.

@hadley hadley added the breaking change ☠️ API change likely to affect existing code label Jan 11, 2019
@hadley
Copy link
Member

hadley commented Jan 11, 2019

Adding a new parameter to a generic is technically a breaking API change (since it may cause CRAN packages to fail) so we'll need to carefully consider it. I think the basic idea is fine, but I'd called the new argument name (or names?) and default it to NULL. A missing default implies that the argument is required.

@ilarischeinin
Copy link
Contributor Author

Thanks for the feedback.

I didn't realize it counted as a breaking change. As it's just a small shortcut, I'm not sure if it's worth the pain then.

For what it's worth, I changed the default to NULL and renamed the new argument to name.

@romainfrancois romainfrancois requested a review from hadley January 16, 2019 17:16
@romainfrancois
Copy link
Member

Do we have a way to check if dependent packages define their own method for pull() ?

@krlmlr
Copy link
Member

krlmlr commented Jan 25, 2019

We already have . %>% select(name, var) %>% deframe() .?

@romainfrancois
Copy link
Member

That's kind of a stretch @krlmlr ?

@ilarischeinin
Copy link
Contributor Author

I'm currently in the middle of a revdepcheck run. It still has 10h to go, but I just tested the following approach: I extracted all the source packages it has downloaded so far (663 out of 1500), and grep'd for "pull" in their NAMESPACEs:

AMR/NAMESPACE:S3method(pull,atc)
AMR/NAMESPACE:S3method(pull,bactid)
AMR/NAMESPACE:S3method(pull,mo)
AMR/NAMESPACE:exportMethods(pull.atc)
AMR/NAMESPACE:exportMethods(pull.bactid)
AMR/NAMESPACE:exportMethods(pull.mo)
AMR/NAMESPACE:importFrom(dplyr,pull)
DiagrammeR/NAMESPACE:importFrom(dplyr,pull)
amt/NAMESPACE:export(pull)
amt/NAMESPACE:importFrom(dplyr,pull)
bayesplot/NAMESPACE:importFrom(dplyr,pull)
beezdemand/NAMESPACE:export(pull)
blastula/NAMESPACE:importFrom(dplyr,pull)
blorr/NAMESPACE:importFrom(dplyr,pull)
dbplyr/NAMESPACE:S3method(pull,tbl_sql)
evaluator/NAMESPACE:importFrom(dplyr,pull)
genogeographer/NAMESPACE:importFrom(dplyr,pull)
getTBinR/NAMESPACE:importFrom(dplyr,pull)
ggeffects/NAMESPACE:importFrom(dplyr,pull)
ggpubr/NAMESPACE:importFrom(dplyr,pull)
highcharter/NAMESPACE:importFrom(dplyr,pull)
infer/NAMESPACE:importFrom(dplyr,pull)
inferr/NAMESPACE:importFrom(dplyr,pull)

So, at least AMR seems to extend pull(). I can report back when the whole run has finished.

(I didn't know about deframe(), thank you!)

@krlmlr
Copy link
Member

krlmlr commented Jan 25, 2019

Is it? I thought, functions should do one thing and do it well.

@ilarischeinin
Copy link
Contributor Author

Here are the cases I identified among the 1,500 packages revdepcheck downloaded that extend dplyr's generic. Below is also the full grep output. (I also noticed that package Nmisc has implemented this exact functionality with pull_with_names().)

AMR/NAMESPACE:S3method(pull,atc)
AMR/NAMESPACE:S3method(pull,bactid)
AMR/NAMESPACE:S3method(pull,mo)
AMR/NAMESPACE:exportMethods(pull.atc)
AMR/NAMESPACE:exportMethods(pull.bactid)
AMR/NAMESPACE:exportMethods(pull.mo)

https://gitlab.com/msberends/AMR/blob/master/R/atc.R
https://gitlab.com/msberends/AMR/blob/master/R/mo.R
(pull.bactid has been removed from the current dev version of the package.)

dbplyr/NAMESPACE:S3method(pull,tbl_sql)

https://github.com/tidyverse/dbplyr/blob/master/R/verb-pull.R

mrgsolve/NAMESPACE:S3method(pull,mrgsims)

https://github.com/metrumresearchgroup/mrgsolve/blob/master/R/mrgsims.R

pems.utils/NAMESPACE:S3method(pull, pems)

https://r-forge.r-project.org/scm/viewvc.php/pkg/pems.utils/R/tidyverse.tools.R?view=markup&revision=88&root=pems

tidygraph/NAMESPACE:S3method(pull,morphed_tbl_graph)

https://github.com/thomasp85/tidygraph/blob/master/R/pull.R

Full output

$ grep pull */NAMESPACE
AMR/NAMESPACE:S3method(pull,atc)
AMR/NAMESPACE:S3method(pull,bactid)
AMR/NAMESPACE:S3method(pull,mo)
AMR/NAMESPACE:exportMethods(pull.atc)
AMR/NAMESPACE:exportMethods(pull.bactid)
AMR/NAMESPACE:exportMethods(pull.mo)
AMR/NAMESPACE:importFrom(dplyr,pull)
DiagrammeR/NAMESPACE:importFrom(dplyr,pull)
MetamapsDB/NAMESPACE:importFrom(dplyr,pull)
Nmisc/NAMESPACE:export(pull_with_names)
Organism.dplyr/NAMESPACE:importFrom(dplyr,pull)
amt/NAMESPACE:export(pull)
amt/NAMESPACE:importFrom(dplyr,pull)
bayesplot/NAMESPACE:importFrom(dplyr,pull)
beezdemand/NAMESPACE:export(pull)
blastula/NAMESPACE:importFrom(dplyr,pull)
blorr/NAMESPACE:importFrom(dplyr,pull)
dbplyr/NAMESPACE:S3method(pull,tbl_sql)
evaluator/NAMESPACE:importFrom(dplyr,pull)
genogeographer/NAMESPACE:importFrom(dplyr,pull)
getTBinR/NAMESPACE:importFrom(dplyr,pull)
ggeffects/NAMESPACE:importFrom(dplyr,pull)
ggpubr/NAMESPACE:importFrom(dplyr,pull)
highcharter/NAMESPACE:importFrom(dplyr,pull)
infer/NAMESPACE:importFrom(dplyr,pull)
inferr/NAMESPACE:importFrom(dplyr,pull)
keyholder/NAMESPACE:export(pull_key)
lplyr/NAMESPACE:S3method(pull_,data.frame)
lplyr/NAMESPACE:S3method(pull_,list)
lplyr/NAMESPACE:S3method(pull_,matrix)
lplyr/NAMESPACE:export(pull)
lplyr/NAMESPACE:export(pull_)
mem/NAMESPACE:importFrom(dplyr,pull)
mice/NAMESPACE:importFrom(dplyr,pull)
mrgsolve/NAMESPACE:S3method(pull,mrgsims)
mrgsolve/NAMESPACE:importFrom(dplyr,pull)
olsrr/NAMESPACE:importFrom(dplyr,pull)
pammtools/NAMESPACE:importFrom(dplyr,pull)
parsnip/NAMESPACE:importFrom(dplyr,pull)
pems.utils/NAMESPACE:importFrom(dplyr, pull)
pems.utils/NAMESPACE:export(pull)
pems.utils/NAMESPACE:S3method(pull, pems)
pems.utils/NAMESPACE:#no pull_
perturbatr/NAMESPACE:importFrom(dplyr,pull)
pointblank/NAMESPACE:importFrom(dplyr,pull)
qualtRics/NAMESPACE:importFrom(dplyr,pull)
rfishbase/NAMESPACE:importFrom(dplyr,pull)
rise/NAMESPACE:importFrom("dplyr", "pull")
rmweather/NAMESPACE:importFrom(dplyr,pull)
rnoaa/NAMESPACE:export(meteo_pull_monitors)
rwalkr/NAMESPACE:export(pull_sensor)
sabre/NAMESPACE:importFrom(dplyr,pull)
sjlabelled/NAMESPACE:importFrom(dplyr,pull)
sjmisc/NAMESPACE:importFrom(dplyr,pull)
sjstats/NAMESPACE:importFrom(dplyr,pull)
taxadb/NAMESPACE:importFrom(dplyr,pull)
tidybayes/NAMESPACE:importFrom(dplyr,pull)
tidybayes/NAMESPACE:importFrom(tidyselect,vars_pull)
tidygraph/NAMESPACE:S3method(pull,morphed_tbl_graph)
tidygraph/NAMESPACE:S3method(pull,tbl_graph)
tidygraph/NAMESPACE:export(pull)
tidygraph/NAMESPACE:importFrom(dplyr,pull)
tidyhydat/NAMESPACE:export(pull_station_number)
tidyselect/NAMESPACE:export(vars_pull)
togglr/NAMESPACE:importFrom(dplyr,pull)
tsibble/NAMESPACE:S3method(pull_interval,Date)
tsibble/NAMESPACE:S3method(pull_interval,POSIXt)
tsibble/NAMESPACE:S3method(pull_interval,default)
tsibble/NAMESPACE:S3method(pull_interval,difftime)
tsibble/NAMESPACE:S3method(pull_interval,hms)
tsibble/NAMESPACE:S3method(pull_interval,nanotime)
tsibble/NAMESPACE:S3method(pull_interval,numeric)
tsibble/NAMESPACE:S3method(pull_interval,ordered)
tsibble/NAMESPACE:S3method(pull_interval,yearmon)
tsibble/NAMESPACE:S3method(pull_interval,yearmonth)
tsibble/NAMESPACE:S3method(pull_interval,yearqtr)
tsibble/NAMESPACE:S3method(pull_interval,yearquarter)
tsibble/NAMESPACE:S3method(pull_interval,yearweek)
tsibble/NAMESPACE:export(pull_interval)
wosr/NAMESPACE:export(pull_cited_refs)
wosr/NAMESPACE:export(pull_incites)
wosr/NAMESPACE:export(pull_related_recs)
wosr/NAMESPACE:export(pull_wos)
wosr/NAMESPACE:export(pull_wos_apply)
yardstick/NAMESPACE:importFrom(tidyselect,vars_pull)

@romainfrancois
Copy link
Member

Thanks for the detailed analysis. It would not be a lot of packages to amend. It's essentially a case of "do we want that now" @hadley

@hadley
Copy link
Member

hadley commented Jan 29, 2019

I think we can merge in 0.9.0

@romainfrancois romainfrancois added this to the 0.9.0 milestone Jan 29, 2019
@krlmlr
Copy link
Member

krlmlr commented May 11, 2019

deframe() feels slightly odd, because you need to select-rename beforehand. pull() has better semantics because it defaults to the last column.

I'd like to propose the following course of action:

  • Move pull() to {tibble}. Then we can truthfully say that all {dplyr} verbs return a data frame. (Currenty, pull() is the only black sheep in the herd.)

  • Implement a new pull_named() generic in {tibble}, the default implementation forwards to pull() and by default picks the first column. (It's a fair assumption that the first column often contains a primary key of sorts.)

  • Reexport pull(), pull_named() and enframe() (but not deframe()) in {dplyr} for now, but alert implementers that they should switch to {tibble}.

@hadley hadley merged commit 3cf568b into tidyverse:master Dec 12, 2019
@hadley
Copy link
Member

hadley commented Dec 12, 2019

I'd rather keep pull() in dplyr for now; there's already a lot of changes planned for the next release, so I'd prefer to let the dust settle before deciding if we need to move it.

ilarischeinin added a commit to ilarischeinin/dplyr that referenced this pull request Dec 18, 2019
@ilarischeinin ilarischeinin deleted the feature-pull-named-vector branch December 18, 2019 15:46
romainfrancois pushed a commit that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change ☠️ API change likely to affect existing code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants