Skip to content

Support join field list and join options #3803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Jun 20, 2025

Description

Support join field list and join options:

  1. Add a setting to enable SPL-compatible syntax: plugins.ppl.spl_compatible.enabled
  2. Add new join syntax: {SPL_compatible_grammar_enabled}? JOIN (joinOption)* (fieldList)? right = tableOrSubqueryClause

SPL Compatible Syntax

(Experimental)
(Since 3.2.0)
(prerequisite: plugins.ppl.spl_compatible.enabled=true)
join [type=<joinType>] [overwrite=<bool>] <join-field-list> <right-dataset>

  • type=<joinType>: optional. The type of join to perform. The default is INNER if not specified. Other option is LEFT, RIGHT, FULL, CROSS, SEMI, ANTI.
  • overwrite=<bool>: optional. Specifies whether duplicate-named fields from <right-dataset> (subsearch results) should replace corresponding fields in the main search results. The default value is true.
  • join-field-list: optional. The fields to use to build join criteria. The join-field-list must be present in both sides. If no <join-field-list> is present, all fields that are common to both sides are used.
  • right-dataset: mandatory. Right dataset could be either an index or a subsearch with/without alias.

Related Issues

Resolves #3775

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -29,6 +29,7 @@ public enum Key {
PATTERN_MODE("plugins.ppl.pattern.mode"),
PATTERN_MAX_SAMPLE_COUNT("plugins.ppl.pattern.max.sample.count"),
PATTERN_BUFFER_LIMIT("plugins.ppl.pattern.buffer.limit"),
SPL_COMPATIBLE_GRAMMAR_ENABLED("plugins.ppl.spl_compatible.enabled"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any other options? such as plugins.ppl.splunk_compatible.enabled to highlight splunk instead of spl

@LantaoJin LantaoJin marked this pull request as ready for review June 20, 2025 10:29
@LantaoJin LantaoJin added PPL Piped processing language calcite calcite migration releated labels Jun 20, 2025
SPL Compatible Syntax
=====================
| (Experimental)
| (Since 3.2.0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may catch up 3.1.0?

Description
-----------

This setting is present from 3.2.0. Enabling Calcite is a prerequisite. You can use this setting to decide whether to allow parsing a query of Splunk SPL compatible grammar.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is expected behavior of PPL commands/function if is not compatible with Splunk grammer?

Copy link
Member Author

@LantaoJin LantaoJin Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current implementation is SPL-compatible, not SPL-only.
For example:
When compatible config is true, user can run both rex and parse command. When compatible config is false, only parse command works.

If we want SPL-only, when SPL-only config is true, parse command won't work any more. Which one do you prefer?

| (prerequisite: plugins.ppl.spl_compatible.enabled=true)
| join [type=<joinType>] [overwrite=<bool>] <join-field-list> <right-dataset>

* type=<joinType>: optional. The type of join to perform. The default is ``INNER`` if not specified. Other option is ``LEFT``, ``RIGHT``, ``FULL``, ``CROSS``, ``SEMI``, ``ANTI``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could limit to inner | outer | left, then add other join types with perf-test gradually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em, I think we can add a config to enable other join type, and disable them by default, instead of removing from code.

| (Experimental)
| (Since 3.2.0)
| (prerequisite: plugins.ppl.spl_compatible.enabled=true)
| join [type=<joinType>] [overwrite=<bool>] <join-field-list> <right-dataset>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change <right-dataset> to <dataset> | <subsearch>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is optional, change to []

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add [left=<left-alias>] [right=<right-alias>] where ... in syntax?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add [left=<left-alias>] [right=<right-alias>] where ... in syntax?

this is spl2 syntax, I don't think we need to support it in compatible mode.

[left=<left-alias>] [right=<right-alias>] on ... is listed in default PPL syntax


* type=<joinType>: optional. The type of join to perform. The default is ``INNER`` if not specified. Other option is ``LEFT``, ``RIGHT``, ``FULL``, ``CROSS``, ``SEMI``, ``ANTI``.
* overwrite=<bool>: optional. Specifies whether duplicate-named fields from <right-dataset> (subsearch results) should replace corresponding fields in the main search results. The default value is ``true``.
* join-field-list: optional. The fields to use to build join criteria. The ``join-field-list`` must be present in both sides. If no <join-field-list> is present, all fields that are common to both sides are used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polish join-filed-list description,

The fields used to build the join criteria. The join field list must exist on both sides. If no join field list is specified, all fields common to both sides will be used as join keys.

@@ -138,3 +178,5 @@ Assume table1 and table2 only contain field ``id``, following PPL queries and th
- table1.id, t2.id, a
* - source=table1 | join right=tt on table1.id=t2.id [ source=table2 as t2 | eval b = id ] | eval a = 1
- table1.id, tt.id, tt.b, a

For the Splunk SPL compatible syntax (since 3.2.0), duplicate-named fields in output results are deduplicated, with field retention determined by the value of 'overwrite' option.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is expected behavior, not limitation, right?

@@ -315,7 +323,8 @@ tableSourceClause

// join
joinCommand
: (joinType) JOIN sideAlias joinHintList? joinCriteria? right = tableOrSubqueryClause
: joinType JOIN sideAlias joinHintList? joinCriteria right = tableOrSubqueryClause
| {SPL_compatible_grammar_enabled}? JOIN (joinOption)* (fieldList)? right = tableOrSubqueryClause
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not support alias? left=l right=r

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow options to appear before or after fieldList, e.g. ... | join id overwrite=true [search index=product]

Copy link
Member Author

@LantaoJin LantaoJin Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not support alias? left=l right=r

in sideAlias

allow options to appear before or after fieldList

Before only.

Comment on lines +200 to +201
"Join type is ambiguous, please remove the join type before JOIN keyword or 'type='"
+ " option.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in spl mode, ... | inner join xxx with throw syntax exception, in which case user will see this error message,

Argument.ArgumentMap argumentMap = Argument.ArgumentMap.of(arguments);
if (argumentMap.get("type") != null) {
Join.JoinType joinTypeFromArgument = getJoinType(argumentMap.get("type").toString());
if (ctx.joinType() == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JoinType is not a valid token in SPL mode. The AST builder should be aware of SPL mode and handle it separately.

@@ -315,7 +323,8 @@ tableSourceClause

// join
joinCommand
: (joinType) JOIN sideAlias joinHintList? joinCriteria? right = tableOrSubqueryClause
: joinType JOIN sideAlias joinHintList? joinCriteria right = tableOrSubqueryClause
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should expose only one syntax at a time? Concern of enable both PPL and SPL-compatible syntax is

  • Hard to document and maintain, tooling (like syntax highlighting, query editors) must also be mode-aware.
  • Increases complexity of semantic validation and testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current design is when the config is set to true, both ppl and spl syntax are supported. For your concerns, let me check the challenge in implementation level next week. I don’t think the document has any blocker. In future we can add a legacy config to disable the current ppl syntax as long as the implementation level has no blocker. For join, the current ppl join is a mutation of join of spl2. We may still need to support since splunk can both support spl1 and spl2. For your second concern, as long as implementation part has no blocker, the validation should work well since validation is done by antlr4 itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calcite calcite migration releated PPL Piped processing language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support join field list and duplicated field strategies
2 participants