Skip to content

Support (left outer) (anti) semi join in hash join v2 #10133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 76 commits into
base: master
Choose a base branch
from

Conversation

gengliqi
Copy link
Contributor

What problem does this PR solve?

Issue Number: ref #9060

Problem Summary:

What is changed and how it works?

Should merge after #9956.

Support (left outer) (anti) semi join in hash join v2

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

gengliqi added 30 commits March 6, 2025 21:44
u
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
Copy link
Contributor

ti-chi-bot bot commented Apr 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. labels Apr 24, 2025
Copy link
Contributor

ti-chi-bot bot commented Apr 24, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from gengliqi, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 24, 2025
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
@gengliqi gengliqi force-pushed the join-v2-semi-join branch from 1809fae to 4ba216d Compare April 28, 2025 20:18
u
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
@gengliqi gengliqi marked this pull request as ready for review April 29, 2025 18:36
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 29, 2025
u
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
Signed-off-by: gengliqi <[email protected]>
u
Signed-off-by: gengliqi <[email protected]>
@gengliqi
Copy link
Contributor Author

gengliqi commented May 1, 2025

/retest

@gengliqi gengliqi requested review from Copilot and removed request for Copilot May 1, 2025 17:21
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates various components to support hash join v2 and (anti) semi join operations by introducing new serialization facilities and failpoint flags. It also adds tests for the new serializeByteSize methods across multiple column types.

  • Updated Block::swap to handle new members (start_offset, segment_row_id_col).
  • Added new failpoint flags for join v2 probe enable/disable.
  • Introduced serializeByteSize implementations and corresponding tests in several Column classes.

Reviewed Changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
dbms/src/Core/Block.cpp Swapped new members to ensure Block swap consistency.
dbms/src/Common/FailPoint.cpp Added new failpoints for join v2 probe enable and disable.
dbms/src/Common/ColumnNTAlignBuffer.h Updated comment wording for clarity.
dbms/src/Columns/tests/gtest_column_misc.cpp Added tests covering new serializeByteSize API for various column types.
dbms/src/Columns/filterColumn.cpp Added explicit instantiation for filterImpl with char pointer types.
Other Column files Provided serializeByteSize implementations (or exceptions) consistent with design.
Files not reviewed (1)
  • dbms/CMakeLists.txt: Language not supported

@@ -326,6 +326,12 @@ INSTANTIATE(Decimal32, DecimalPaddedPODArray<Decimal32>)
INSTANTIATE(Decimal64, DecimalPaddedPODArray<Decimal64>)
INSTANTIATE(Decimal128, DecimalPaddedPODArray<Decimal128>)
INSTANTIATE(Decimal256, DecimalPaddedPODArray<Decimal256>)
// Cannot use INSTANTIATE micro because `const T * data_pos` + `T: char *` will be intepreted as `const char **`
Copy link
Preview

Copilot AI May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment contains a spelling error: 'intepreted' should be corrected to 'interpreted'.

Suggested change
// Cannot use INSTANTIATE micro because `const T * data_pos` + `T: char *` will be intepreted as `const char **`
// Cannot use INSTANTIATE micro because `const T * data_pos` + `T: char *` will be interpreted as `const char **`

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant