-
Notifications
You must be signed in to change notification settings - Fork 181
Add support for CROSS JOIN
through full_join
over non-overlapping columns
#24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
R/tbl-lazy.R
Outdated
@@ -201,7 +201,14 @@ distinct_.tbl_lazy <- function(.data, ..., .dots = list(), .keep_all = FALSE) { | |||
add_op_join <- function(x, y, type, by = NULL, copy = FALSE, | |||
suffix = c(".x", ".y"), | |||
auto_index = FALSE, ...) { | |||
by <- common_by(by, x, y) | |||
by_intersect <- intersect(tbl_vars(x), tbl_vars(y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't handle the case of a full join where by
is a named vector that maps the names of the join columns in the left table to the join columns in the right table, as in:
full_join(tbl_left, tbl_right, by = c("xl" = "xr"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; 6e5daa9 fixes it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference, no change here.
Let's please discuss at tidyverse/dplyr#2924 whether this is the best approach for implementing cross joins. |
I think it's ok to merge this as a short term work around for the absence of |
R/sql-generic.R
Outdated
stop("Unknown join type:", type, call. = FALSE) | ||
) | ||
|
||
select <- sql_join_vars(con, vars) | ||
|
||
on <- sql_vector( | ||
on <- if (length(by$x) + length(by$y) <= 0) NULL else sql_vector( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please pull this out into:
if (...) {
on <- ...
} else {
on <- ...
}
Or maybe even consider pulling out into own function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
R/tbl-lazy.R
Outdated
@@ -201,7 +201,14 @@ distinct_.tbl_lazy <- function(.data, ..., .dots = list(), .keep_all = FALSE) { | |||
add_op_join <- function(x, y, type, by = NULL, copy = FALSE, | |||
suffix = c(".x", ".y"), | |||
auto_index = FALSE, ...) { | |||
by <- common_by(by, x, y) | |||
by_intersect <- intersect(tbl_vars(x), tbl_vars(y)) | |||
by <- if (length(by_intersect) > 0 || !identical(type, "full") || !is.null(by)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here - I really don't like this style of if
-else
because it's easy to miss that the entire block gets assigned to some variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
tests/testthat/test-joins.R
Outdated
@@ -92,3 +92,8 @@ test_that("sql generated correctly for all sources", { | |||
|
|||
expect_equal_tbls(xy) | |||
}) | |||
|
|||
test_that("full join is promoted to cross join for no overlapping variables", { | |||
result <- df1 %>% full_join(df2) %>% collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like making this the default because if you accidentally supply the wrong data frames you might create a query that takes hours to run. Could we make it explicit, i.e. by = character()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
Thanks! |
Fix tidyverse/dplyr#2924 by adding support for cross joins.
Currently,
Triggers:
Error:
byrequired, because the data sources have no common variables
.This PR implements
CROSS JOINS
support forfull_join
over non-overlapping columns.