Skip to content

[opt](nereids) support extract join multiple tables #51569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

yujun777
Copy link
Contributor

@yujun777 yujun777 commented Jun 8, 2025

What problem does this PR solve?

For rule ExtractSingleTableExpressionFromDisjunction, it will extract every single table's expression for LogicalFilter and LogicalJoin. But this is not enough, for join, it should support extract multiple tables: a expression for left tables, and a right expression for right tables.

For example:

(t1 join t2) join ( t3 and  t4) 
on  t1.a + t2.a = 1 and t3.x + t4.x = 1 or t1.a + t2.a = 2 and t3.x + t4.x = 2

it can't extract each single table expression for t1, t2, t3, t4.
but for the root join, its left plan is (t1 join t2), we can extract left plan's expression t1.a + t2.a = 1 or t1.a + t2.a = 2.
also for the right plan (t3 join t4), we can extract right plan's expression t3.x + t4.x = 1 or t3.x + t4.x = 2.

Then we can rewrite the root join's conditition as

origin_condition   AND (t1.a + t2.a = 1 or t1.a + t2.a = 2)  AND (t3.x + t4.x = 1 or t3.x + t4.x = 2)

latter, we can push down left plan expression to the left plan expression, and right plan expression to the right plan expression.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jun 8, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Contributor Author

yujun777 commented Jun 8, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33629 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b9bbe0bff19da8fda89f3cb93719f7fe8740e0ff, data reload: false

------ Round 1 ----------------------------------
q1	25907	5151	5052	5052
q2	1963	275	181	181
q3	10287	1243	690	690
q4	10227	1012	523	523
q5	7520	2381	2268	2268
q6	180	165	136	136
q7	922	727	603	603
q8	9298	1376	1046	1046
q9	6890	5196	5026	5026
q10	6870	2329	1916	1916
q11	494	283	268	268
q12	340	349	221	221
q13	17992	3834	3119	3119
q14	232	235	211	211
q15	574	493	489	489
q16	417	432	371	371
q17	580	843	356	356
q18	7583	7167	7088	7088
q19	1797	950	539	539
q20	327	330	219	219
q21	3677	3204	2342	2342
q22	1068	1009	965	965
Total cold run time: 115145 ms
Total hot run time: 33629 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5140	5046	5077	5046
q2	240	319	226	226
q3	2160	2651	2353	2353
q4	1413	1824	1398	1398
q5	4417	4346	4354	4346
q6	221	172	133	133
q7	2027	1922	1750	1750
q8	2573	2641	2535	2535
q9	7174	7214	7078	7078
q10	2976	3181	2748	2748
q11	578	515	500	500
q12	669	760	621	621
q13	3564	3883	3361	3361
q14	300	327	289	289
q15	521	496	458	458
q16	455	487	456	456
q17	1139	1529	1394	1394
q18	7752	7511	7274	7274
q19	796	813	911	813
q20	2009	1989	1835	1835
q21	4754	4530	4418	4418
q22	1065	1053	1007	1007
Total cold run time: 51943 ms
Total hot run time: 50039 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193441 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b9bbe0bff19da8fda89f3cb93719f7fe8740e0ff, data reload: false

query1	1453	1085	1072	1072
query2	6322	1844	1883	1844
query3	11034	4476	4243	4243
query4	52852	24364	23460	23460
query5	5218	521	469	469
query6	367	219	198	198
query7	5030	518	289	289
query8	307	226	228	226
query9	6128	2637	2640	2637
query10	464	333	291	291
query11	15122	15096	14888	14888
query12	162	109	106	106
query13	1142	530	426	426
query14	10191	6526	6560	6526
query15	206	200	190	190
query16	7004	663	496	496
query17	1092	736	603	603
query18	1539	423	318	318
query19	211	200	188	188
query20	142	133	121	121
query21	211	126	107	107
query22	4259	4353	4453	4353
query23	34527	33648	33471	33471
query24	6569	2389	2415	2389
query25	463	472	415	415
query26	726	275	156	156
query27	2431	505	342	342
query28	3096	2167	2144	2144
query29	594	571	442	442
query30	270	219	188	188
query31	843	833	784	784
query32	73	64	63	63
query33	461	378	313	313
query34	801	884	549	549
query35	800	863	753	753
query36	937	989	920	920
query37	112	100	78	78
query38	4231	4279	4214	4214
query39	1518	1478	1439	1439
query40	215	125	113	113
query41	90	58	61	58
query42	135	128	119	119
query43	522	536	486	486
query44	1401	889	884	884
query45	187	180	172	172
query46	898	1034	684	684
query47	1844	1882	1781	1781
query48	394	428	346	346
query49	709	510	408	408
query50	664	699	432	432
query51	4278	4272	4230	4230
query52	117	113	117	113
query53	237	270	185	185
query54	591	576	528	528
query55	86	85	81	81
query56	317	344	292	292
query57	1131	1199	1113	1113
query58	271	272	258	258
query59	2728	2781	2702	2702
query60	336	326	327	326
query61	127	126	126	126
query62	750	741	689	689
query63	236	204	201	201
query64	1981	1006	672	672
query65	4235	4199	4178	4178
query66	757	405	311	311
query67	15808	15640	15493	15493
query68	7208	892	527	527
query69	554	312	275	275
query70	1236	1161	1096	1096
query71	512	320	309	309
query72	6033	4872	5195	4872
query73	1341	708	359	359
query74	9236	9350	8986	8986
query75	3865	3189	2718	2718
query76	4354	1200	759	759
query77	618	381	339	339
query78	10037	10136	9366	9366
query79	4460	813	572	572
query80	619	519	427	427
query81	485	252	211	211
query82	514	128	101	101
query83	341	247	239	239
query84	292	108	97	97
query85	823	350	308	308
query86	373	291	268	268
query87	4462	4403	4358	4358
query88	3469	2269	2269	2269
query89	436	313	285	285
query90	1956	206	205	205
query91	138	137	114	114
query92	78	60	58	58
query93	2473	938	593	593
query94	671	399	304	304
query95	366	293	289	289
query96	489	568	282	282
query97	2731	2770	2667	2667
query98	250	214	213	213
query99	1429	1393	1283	1283
Total cold run time: 301802 ms
Total hot run time: 193441 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.55 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b9bbe0bff19da8fda89f3cb93719f7fe8740e0ff, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.11	0.11
query3	0.25	0.19	0.19
query4	1.59	0.19	0.18
query5	0.50	0.44	0.44
query6	1.18	0.67	0.64
query7	0.03	0.01	0.02
query8	0.05	0.03	0.03
query9	0.62	0.51	0.52
query10	0.58	0.59	0.57
query11	0.15	0.11	0.11
query12	0.15	0.12	0.11
query13	0.61	0.60	0.60
query14	0.80	0.81	0.82
query15	0.88	0.85	0.86
query16	0.37	0.38	0.38
query17	1.06	1.05	1.06
query18	0.22	0.20	0.21
query19	1.88	1.83	1.82
query20	0.01	0.01	0.01
query21	15.40	0.89	0.55
query22	0.73	1.15	0.62
query23	14.99	1.34	0.61
query24	6.83	1.53	0.31
query25	0.29	0.18	0.05
query26	0.62	0.16	0.13
query27	0.06	0.05	0.04
query28	9.11	0.96	0.45
query29	12.55	4.15	3.41
query30	0.25	0.09	0.07
query31	2.83	0.60	0.39
query32	3.23	0.56	0.47
query33	3.03	3.15	3.06
query34	15.81	5.10	4.50
query35	4.55	4.53	4.47
query36	0.66	0.49	0.47
query37	0.08	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.16	0.14	0.13
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.02
Total cold run time: 102.5 s
Total hot run time: 28.55 s

@xiedeyantu
Copy link
Member

I'd like to ask a question. For a single table performing an expression extraction, it's possible to push down the operations layer by layer through joins. However, in the case of multiple tables performing expression extraction, it targets the left and right children of the join. If these left and right children also qualify for expression extraction (whether single-table or multi-table), is this currently supported?

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 10, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@englefly englefly merged commit 6202001 into apache:master Jun 11, 2025
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants