-
Notifications
You must be signed in to change notification settings - Fork 1.5k
POC: Reduce Arc
cloning on hashmap build side
#16380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
POC: Reduce Arc
cloning on hashmap build side
#16380
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a POC for #16206 to see if this is intended. This might cause a rework of all the other joins to support Vec<(usize, usize)> (I was thinking to just isolate this for just the hash join, so maybe having separate functions for those would be fine).
@@ -850,6 +850,43 @@ pub(crate) fn apply_join_filter_to_indices( | |||
)) | |||
} | |||
|
|||
|
|||
pub(crate) fn apply_join_filter_to_hash_indices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is temporary new functions, just so I can test it out first by isolating it for hash joins.
let arr_right = take(right.as_ref(), indices_right, None)?; | ||
eq_dyn_null(arr_left.as_ref(), arr_right.as_ref(), null_equals_null) | ||
}) | ||
.try_fold(equal, |acc, equal2| and(&acc, &equal2?))?; | ||
|
||
let filter_builder = FilterBuilder::new(&equal).optimize().build(); | ||
|
||
let left_filtered = filter_builder.filter(indices_left)?; | ||
let left_filtered = filter_builder.filter(indices_left)?; // annoying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this is a bit annoying, do you know how we can filter on this left indice (Vec<(usize, usize)>)@Dandandan
I've noticed that it is possible for |
&random_state, | ||
&mut hashes_buffer, | ||
0, | ||
true, | ||
)?; | ||
offset += batch.num_rows(); | ||
|
||
for row_id in 0..batch_num_rows { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can use extend
Yes that will be the tricky part. We gain some speed during probing by concatenating the left side (which itself can be slow) into a singe batch. |
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?