Skip to content

Migrate core test to insta, part1 #16324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Chen-Yuan-Lai
Copy link
Contributor

@Chen-Yuan-Lai Chen-Yuan-Lai commented Jun 7, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes, I manually tested the before/after changes.

Are there any user-facing changes?

No

@github-actions github-actions bot added the core Core DataFusion crate label Jun 7, 2025
@alamb alamb requested a review from blaginin June 9, 2025 13:51
@alamb
Copy link
Contributor

alamb commented Jun 9, 2025

Thank you @Chen-Yuan-Lai

@blaginin
Copy link
Contributor

blaginin commented Jun 9, 2025

Hey @Chen-Yuan-Lai!

In the PR title you have

part1

but in the PR body you write

Closes #15791

Just FYI, if you expect more work on this issue, you may want to change "closes" to "part of" - because otherwise github will actually close the issue once this PR is merged

Copy link
Contributor

@blaginin blaginin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I wrote some comments, please let me know what you think :)

let plan = test_sql(sql).unwrap();
assert_eq!(expected_plan, format!("{plan}"));
assert_snapshot!(
format!("{plan}"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_snapshot triggers format internally so you can just do

Suggested change
format!("{plan}"),
plan,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same above

Copy link
Contributor Author

@Chen-Yuan-Lai Chen-Yuan-Lai Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, the format! is exactly unnecessary in these cases

.collect::<Vec<_>>()
.join("\n");
insta::with_settings!({filters => vec![
(r"\d+\.?\d*[µmn]?s", "[TIME]"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[µmn] - I think this can be somewhat flaky it the test takes more time - maybe match on []?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, current pattern can't match the longer time (ex. min, h, d). However, a broader pattern might match irrelevant strings. For example, match on [ ]: metrics=[output_rows=1, elapsed_compute=110.947µs] -> metrics=[TIME]

I'm currently considering to add more time units in the regex pattern:

r"\d+\.?\d*(?:µs|ms|ns|s|min|h)\b"

Or, do you have any suggestions for a more precise way to match time-related strings? many thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just for this teset case we could avoid using insta and use the previous approach of substring matches

While that approach is also non ideal, it has seemed to work this far

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with keeping as is, but on this

might match irrelevant strings

you can probably use lookahead https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion

Copy link
Contributor Author

@Chen-Yuan-Lai Chen-Yuan-Lai Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably use lookahead https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion

Unfortunately, Rust's regex crate does not support lookahead or lookbehind assertions 😢 ( ref )

Comment on lines 63 to 66
.map(|line| re.replace_all(line, "$1").trim_end().to_string())
.filter(|line| !line.is_empty() && !line.starts_with('+'))
.collect::<Vec<_>>()
.join("\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about just asserting the table itself without modifications? I fear that this code will:
a - make test a bit more complicated
b - needs to be manually maintained when we change the formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to assert the raw table, but I found that it caused inconsistent trailing whitespace in each snapshot testing. That is why I trimmed the table to only keep the plan context.
Do you have any idea for keeping the consistent table format in each execution? thx

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can just keep using assert_metrics for this style test (or perhaps revert the changes from this PR and work on migrating the metrics tests to a nicer style in a separate PR so we don't delay merging the other parts of this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have revert the change in this PR

@@ -797,14 +796,16 @@ async fn explain_physical_plan_only() {
let sql = "EXPLAIN select count(*) from (values ('a', 1, 100), ('a', 2, 150)) as t (c1,c2,c3)";
let actual = execute(&ctx, sql).await;
let actual = normalize_vec_for_explain(actual);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

feels like these can also be insta redactions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After searching ExplainNormalizer in the codebase, I found that only four test cases used it:

  1. test_physical_plan_display_indent (initialize ExplainNormalizer)
  2. test_physical_plan_display_indent_multi_children (initialize ExplainNormalizer)
  3. explain_logical_plan_only (call normalize_vec_for_explain)
  4. explain_physical_plan_only (call normalize_vec_for_explain)

1.2. created the physical plan with fixed core numbers (90000), and 3. 4. doesn't have partitioning=RoundRobinBatch() string in the snapshot, so I think this change may not be necessary?

@blaginin
Copy link
Contributor

blaginin commented Jun 9, 2025

Also could you please resolve the conflicts

@Chen-Yuan-Lai Chen-Yuan-Lai force-pushed the migrate_core_test_to_insta branch from c3864e3 to 9944c9f Compare June 10, 2025 02:05
@Chen-Yuan-Lai
Copy link
Contributor Author

Just FYI, if you expect more work on this issue, you may want to change "closes" to "part of" - because otherwise github will actually close the issue once this PR is merged

Oh! Sorry, I have corrected the PR body

@Chen-Yuan-Lai Chen-Yuan-Lai force-pushed the migrate_core_test_to_insta branch from f787d3a to a43a6a3 Compare June 14, 2025 09:33
@Chen-Yuan-Lai Chen-Yuan-Lai force-pushed the migrate_core_test_to_insta branch from a43a6a3 to c981e70 Compare June 14, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants