-
Notifications
You must be signed in to change notification settings - Fork 158
Support join field list and join options #3803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
@@ -29,6 +29,7 @@ public enum Key { | |||
PATTERN_MODE("plugins.ppl.pattern.mode"), | |||
PATTERN_MAX_SAMPLE_COUNT("plugins.ppl.pattern.max.sample.count"), | |||
PATTERN_BUFFER_LIMIT("plugins.ppl.pattern.buffer.limit"), | |||
SPL_COMPATIBLE_GRAMMAR_ENABLED("plugins.ppl.spl_compatible.enabled"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any other options? such as plugins.ppl.splunk_compatible.enabled
to highlight splunk
instead of spl
SPL Compatible Syntax | ||
===================== | ||
| (Experimental) | ||
| (Since 3.2.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may catch up 3.1.0?
Description | ||
----------- | ||
|
||
This setting is present from 3.2.0. Enabling Calcite is a prerequisite. You can use this setting to decide whether to allow parsing a query of Splunk SPL compatible grammar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is expected behavior of PPL commands/function if is not compatible with Splunk grammer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation is SPL-compatible
, not SPL-only
.
For example:
When compatible config is true, user can run both rex
and parse
command. When compatible config is false, only parse
command works.
If we want SPL-only
, when SPL-only
config is true, parse
command won't work any more. Which one do you prefer?
| (prerequisite: plugins.ppl.spl_compatible.enabled=true) | ||
| join [type=<joinType>] [overwrite=<bool>] <join-field-list> <right-dataset> | ||
|
||
* type=<joinType>: optional. The type of join to perform. The default is ``INNER`` if not specified. Other option is ``LEFT``, ``RIGHT``, ``FULL``, ``CROSS``, ``SEMI``, ``ANTI``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could limit to inner | outer | left
, then add other join types with perf-test gradually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Em, I think we can add a config to enable other join type, and disable them by default, instead of removing from code.
| (Experimental) | ||
| (Since 3.2.0) | ||
| (prerequisite: plugins.ppl.spl_compatible.enabled=true) | ||
| join [type=<joinType>] [overwrite=<bool>] <join-field-list> <right-dataset> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change <right-dataset>
to <dataset> | <subsearch>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is optional, change to []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add [left=<left-alias>] [right=<right-alias>] where ...
in syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add
[left=<left-alias>] [right=<right-alias>] where ...
in syntax?
this is spl2 syntax, I don't think we need to support it in compatible mode.
[left=<left-alias>] [right=<right-alias>] on ...
is listed in default PPL syntax
|
||
* type=<joinType>: optional. The type of join to perform. The default is ``INNER`` if not specified. Other option is ``LEFT``, ``RIGHT``, ``FULL``, ``CROSS``, ``SEMI``, ``ANTI``. | ||
* overwrite=<bool>: optional. Specifies whether duplicate-named fields from <right-dataset> (subsearch results) should replace corresponding fields in the main search results. The default value is ``true``. | ||
* join-field-list: optional. The fields to use to build join criteria. The ``join-field-list`` must be present in both sides. If no <join-field-list> is present, all fields that are common to both sides are used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polish join-filed-list description,
The fields used to build the join criteria. The join field list must exist on both sides. If no join field list is specified, all fields common to both sides will be used as join keys.
@@ -138,3 +178,5 @@ Assume table1 and table2 only contain field ``id``, following PPL queries and th | |||
- table1.id, t2.id, a | |||
* - source=table1 | join right=tt on table1.id=t2.id [ source=table2 as t2 | eval b = id ] | eval a = 1 | |||
- table1.id, tt.id, tt.b, a | |||
|
|||
For the Splunk SPL compatible syntax (since 3.2.0), duplicate-named fields in output results are deduplicated, with field retention determined by the value of 'overwrite' option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is expected behavior, not limitation, right?
@@ -315,7 +323,8 @@ tableSourceClause | |||
|
|||
// join | |||
joinCommand | |||
: (joinType) JOIN sideAlias joinHintList? joinCriteria? right = tableOrSubqueryClause | |||
: joinType JOIN sideAlias joinHintList? joinCriteria right = tableOrSubqueryClause | |||
| {SPL_compatible_grammar_enabled}? JOIN (joinOption)* (fieldList)? right = tableOrSubqueryClause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not support alias? left=l right=r
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow options to appear before or after fieldList, e.g. ... | join id overwrite=true [search index=product]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not support alias?
left=l right=r
in sideAlias
allow options to appear before or after fieldList
Before only.
"Join type is ambiguous, please remove the join type before JOIN keyword or 'type='" | ||
+ " option."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in spl mode, ... | inner join xxx
with throw syntax exception, in which case user will see this error message,
Argument.ArgumentMap argumentMap = Argument.ArgumentMap.of(arguments); | ||
if (argumentMap.get("type") != null) { | ||
Join.JoinType joinTypeFromArgument = getJoinType(argumentMap.get("type").toString()); | ||
if (ctx.joinType() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JoinType is not a valid token in SPL mode. The AST builder should be aware of SPL mode and handle it separately.
@@ -315,7 +323,8 @@ tableSourceClause | |||
|
|||
// join | |||
joinCommand | |||
: (joinType) JOIN sideAlias joinHintList? joinCriteria? right = tableOrSubqueryClause | |||
: joinType JOIN sideAlias joinHintList? joinCriteria right = tableOrSubqueryClause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should expose only one syntax at a time? Concern of enable both PPL and SPL-compatible syntax is
- Hard to document and maintain, tooling (like syntax highlighting, query editors) must also be mode-aware.
- Increases complexity of semantic validation and testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current design is when the config is set to true, both ppl and spl syntax are supported. For your concerns, let me check the challenge in implementation level next week. I don’t think the document has any blocker. In future we can add a legacy config to disable the current ppl syntax as long as the implementation level has no blocker. For join, the current ppl join is a mutation of join of spl2. We may still need to support since splunk can both support spl1 and spl2. For your second concern, as long as implementation part has no blocker, the validation should work well since validation is done by antlr4 itself.
Description
Support join field list and join options:
plugins.ppl.spl_compatible.enabled
{SPL_compatible_grammar_enabled}? JOIN (joinOption)* (fieldList)? right = tableOrSubqueryClause
SPL Compatible Syntax
(Experimental)
(Since 3.2.0)
(prerequisite: plugins.ppl.spl_compatible.enabled=true)
join [type=<joinType>] [overwrite=<bool>] <join-field-list> <right-dataset>
type=<joinType>
: optional. The type of join to perform. The default isINNER
if not specified. Other option isLEFT
,RIGHT
,FULL
,CROSS
,SEMI
,ANTI
.overwrite=<bool>
: optional. Specifies whether duplicate-named fields from <right-dataset> (subsearch results) should replace corresponding fields in the main search results. The default value istrue
.join-field-list
: optional. The fields to use to build join criteria. Thejoin-field-list
must be present in both sides. If no <join-field-list> is present, all fields that are common to both sides are used.right-dataset
: mandatory. Right dataset could be either an index or a subsearch with/without alias.Related Issues
Resolves #3775
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.