Migrate core test to insta, part1 #16324

Chen-Yuan-Lai · 2025-06-07T17:03:58Z

Which issue does this PR close?

Part of Migrate core tests to insta #15791 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes, I manually tested the before/after changes.

Are there any user-facing changes?

No

alamb · 2025-06-09T13:52:05Z

Thank you @Chen-Yuan-Lai

blaginin · 2025-06-09T20:11:36Z

Hey @Chen-Yuan-Lai!

In the PR title you have

part1

but in the PR body you write

Closes #15791

Just FYI, if you expect more work on this issue, you may want to change "closes" to "part of" - because otherwise github will actually close the issue once this PR is merged

blaginin

Thank you! I wrote some comments, please let me know what you think :)

blaginin · 2025-06-09T20:24:43Z

datafusion/core/tests/optimizer/mod.rs

    let plan = test_sql(sql).unwrap();
-    assert_eq!(expected_plan, format!("{plan}"));
+    assert_snapshot!(
+        format!("{plan}"),


assert_snapshot triggers format internally so you can just do

Suggested change

format!("{plan}"),

plan,

Thanks for the comment, the format! is exactly unnecessary in these cases

blaginin · 2025-06-09T20:27:12Z

datafusion/core/tests/sql/explain_analyze.rs

+        .collect::<Vec<_>>()
+        .join("\n");
+    insta::with_settings!({filters => vec![
+    (r"\d+\.?\d*[µmn]?s", "[TIME]"),


[µmn] - I think this can be somewhat flaky it the test takes more time - maybe match on []?

Agree, current pattern can't match the longer time (ex. min, h, d). However, a broader pattern might match irrelevant strings. For example, match on [ ]: metrics=[output_rows=1, elapsed_compute=110.947µs] -> metrics=[TIME]

I'm currently considering to add more time units in the regex pattern:

r"\d+\.?\d*(?:µs|ms|ns|s|min|h)\b"

Or, do you have any suggestions for a more precise way to match time-related strings? many thanks.

Maybe just for this teset case we could avoid using insta and use the previous approach of substring matches

While that approach is also non ideal, it has seemed to work this far

Happy with keeping as is, but on this

might match irrelevant strings

you can probably use lookahead https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion

you can probably use lookahead https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion

Unfortunately, Rust's regex crate does not support lookahead or lookbehind assertions 😢 ( ref )

blaginin · 2025-06-09T20:35:45Z

datafusion/core/tests/sql/explain_analyze.rs

+        .map(|line| re.replace_all(line, "$1").trim_end().to_string())
+        .filter(|line| !line.is_empty() && !line.starts_with('+'))
+        .collect::<Vec<_>>()
+        .join("\n");


what do you think about just asserting the table itself without modifications? I fear that this code will:
a - make test a bit more complicated
b - needs to be manually maintained when we change the formatting

I have tried to assert the raw table, but I found that it caused inconsistent trailing whitespace in each snapshot testing. That is why I trimmed the table to only keep the plan context.
Do you have any idea for keeping the consistent table format in each execution? thx

maybe we can just keep using assert_metrics for this style test (or perhaps revert the changes from this PR and work on migrating the metrics tests to a nicer style in a separate PR so we don't delay merging the other parts of this PR)

Sure, I have revert the change in this PR

blaginin · 2025-06-09T20:39:24Z

datafusion/core/tests/sql/explain_analyze.rs

@@ -797,14 +796,16 @@ async fn explain_physical_plan_only() {
    let sql = "EXPLAIN select count(*) from (values ('a', 1, 100), ('a', 2, 150)) as t (c1,c2,c3)";
    let actual = execute(&ctx, sql).await;
    let actual = normalize_vec_for_explain(actual);


feels like these can also be insta redactions?

After searching ExplainNormalizer in the codebase, I found that only four test cases used it:

test_physical_plan_display_indent (initialize ExplainNormalizer)

test_physical_plan_display_indent_multi_children (initialize ExplainNormalizer)

explain_logical_plan_only (call normalize_vec_for_explain)

explain_physical_plan_only (call normalize_vec_for_explain)

1.2. created the physical plan with fixed core numbers (90000), and 3. 4. doesn't have partitioning=RoundRobinBatch() string in the snapshot, so I think this change may not be necessary?

blaginin · 2025-06-09T20:42:10Z

Also could you please resolve the conflicts

Chen-Yuan-Lai · 2025-06-10T02:09:08Z

Just FYI, if you expect more work on this issue, you may want to change "closes" to "part of" - because otherwise github will actually close the issue once this PR is merged

Oh! Sorry, I have corrected the PR body

…baseline_metrics

…timizer tests

…alyze_baseline_metrics

github-actions bot added the core Core DataFusion crate label Jun 7, 2025

alamb requested a review from blaginin June 9, 2025 13:51

blaginin reviewed Jun 9, 2025

View reviewed changes

Chen-Yuan-Lai force-pushed the migrate_core_test_to_insta branch from c3864e3 to 9944c9f Compare June 10, 2025 02:05

Chen-Yuan-Lai force-pushed the migrate_core_test_to_insta branch from f787d3a to a43a6a3 Compare June 14, 2025 09:33

Cheng-Yuan-Lai and others added 6 commits June 14, 2025 23:52

feat: migrate core test into insta

a88b86d

feat: enhance tests with snapshot assertions and remove unused macros

0f719c2

feat: rewrite snapshot test in explain_analyze_baseline_metrics

c8dd007

fix: update snapshot test to normalize file paths in explain_analyze_…

4a6ac4a

…baseline_metrics

refactor: simplify snapshot assertions by removing format calls in op…

e95ed80

…timizer tests

feat: revert the original usage of assert_metrics macro in explain_an…

c981e70

…alyze_baseline_metrics

Chen-Yuan-Lai force-pushed the migrate_core_test_to_insta branch from a43a6a3 to c981e70 Compare June 14, 2025 15:53

Migrate core test to insta, part1 #16324

Are you sure you want to change the base?

Migrate core test to insta, part1 #16324

Uh oh!

Conversation

Chen-Yuan-Lai commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Jun 9, 2025

Uh oh!

blaginin commented Jun 9, 2025

Uh oh!

blaginin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Chen-Yuan-Lai Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Chen-Yuan-Lai Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blaginin commented Jun 9, 2025

Uh oh!

Chen-Yuan-Lai commented Jun 10, 2025

Uh oh!

Uh oh!

Chen-Yuan-Lai commented Jun 7, 2025 •

edited

Loading

Chen-Yuan-Lai Jun 10, 2025 •

edited

Loading

Chen-Yuan-Lai Jun 12, 2025 •

edited

Loading