Initial experiments with serialized testing #593

ratmice · 2025-06-13T20:05:10Z

This is mostly just for collecting feedback, as this patch is fairly incomplete,
It implements the serialization stuff, and it adds a -s switch to nimbleparse to output the serialized format,

It also adds some design docs which is a basic overview of what I intend to do, but this is necessarily incomplete,
and at times speculative based upon how I hope or imagine it might be made to work.

I don't think it is too far from actually deserializing tests and doing the pass/fail checking, but before going through the trouble of implementing that, it seemed worth doing a fairly complete proof of concept of the serialization format stuff.

The basic goals here have been to keep the syntax fairly noise free while being well typed,
I.e. instead of dynamically doing tests and vectors of tests, using a separate extension for the vector of tests.
and work well with both raw strings/and escaped strings, for grammars containing quotation marks, newlines, raw strings, etc.

While doing this, I noted that test_files currently only allows a single glob, but when we start dealing with multiple file extensions, we probably are going to need a way to specify multiple globs.

ltratt · 2025-06-13T22:20:54Z

Let me start with a dumb question: I think what this PR is aiming to do is a way of saying "for a grammar G and an input I, let me write a test for it". And, I think, furthermore it's trying to allow users to say "I'm only going to write a test for a subset of I w.r.t. G"?

ratmice · 2025-06-14T04:05:10Z

I'm not certain I understand the second part specifically w.r.t subset.
But I believe the answer is yes, the overall testing process is this:
Edit: (unless you mean subset with regards to the recent clone_and_change_start_rule function in which case the answer is no this doesn't attempt to do anything with passing input to subsets of grammars regarding to start rules,
I don't think that is what you meant because you say subset of I rather than subset of G.)

Build an RTParserBuilder from the parser
Read the .grmtest, or .grmtests file,
For each GrammarTest pass the input field to the RTParserBuilder in step 1.
Compare the output of parse_map building an ASTRepr to the fields of the
GrammarTest deserialized in step 2 using the following:
- If GrammarTest.pass is true and GrammarTest.ast.is_none(), check that the return value errors.is_empty()
- If GrammarTest.pass is false and GrammarTest.errors.is_none() check that the returned errors match the errors field.
- If GrammarTest.pass is true and GrammarTest.ast.is_some() check that the ASTRepr built by parse_map matches the Some(GrammarTest.ast).
- If GrammarTest.pass is false and GrammarTest.errors.is_some() check that the errors returned by parse_map matches GrammarTest.errors.

So essentially there are 4 kinds of tests supported from it parses successfully, it should fail to parse, and more specifially it should parse successfully and produce this exact AST, and it should fail to parse and produce this exact error. One of these tests gets run for each input.

With a default value, of pass: true when (input: "text") is specified.

I guess I would say this more formally as:

"for a grammar G, an input I, and an expected result E, the test is the comparison (G(I) == E) "

ratmice · 2025-06-14T06:11:43Z

grammar_testing/src/lib.rs

+    #[serde(default)]
+    pass: Option<bool>,
+    #[serde(default)]
+    errors: Option<Vec<String>>,


FWIW, I'm not really a big fan of this structure layout,
The fact that you can specify both ast, errors, and pass leads to non-sensical values like:

{ pass: true, errors: ["some error"], }

The layout was largely driven by the serialized input, I think ideally this would be some kind of enum

struct GrammarTest { input: String, expected_output: Option<GrammarTestOutput> } enum GrammarTestOutput { PassFail{pass: bool}, AST{ast: ASTRepr}, Errors{errors: Vec<String>}, }

I will see if I cannot get the same or effectively similar serialized input to map to that changed structure,
perhaps using the unwrap_variant_newtypes, it doesn't sound like it but it seems worth playing around with.

I messed around an alternative structure, using the unwrap_variant_newtypes extension,

#[derive(Deserialize, Serialize, PartialEq, Debug, Clone)] #[serde(deny_unknown_fields)] pub struct GrammarTest { input: String, /// A `None` value indicates expecting input to successfully parse. expect: Option<TestOutput>, } #[derive(Deserialize, Serialize, PartialEq, Debug, Clone)] pub enum TestOutput { Fail, AST(ASTRepr), Errors(Vec<String>), }

/// Not real parsing output, just testing round tripping of the struct serialization. [ (input: "b"), (input: "a", expect: AST("start", [("character", "a")])), (input: "abc", expect: Fail), (input: "abc", expect: Errors(["some error"])), ]

Even though it is a bit wordier, it feels like it might be an improvement, and it isn't that much wordier?

Edit: One other thing worth considering is only enabling the extensions by default for serialization,
which then requires the input to start with attributes enabling them for them to be enabled via deserialization, I don't really have a strong preference about that.

#![enable(implicit_some)] #![enable(unwrap_variant_newtypes)]

AFAICS you only need an enum here? SO I think one would have:

enum Test { TestError { input: String, errors: Vec<String> }, TestFail {input: String}, TestSuccess { input: String, ast: Option<...> }, }

One could split TestSuccess into two. I also wonder if Fail and Error even need splitting. They could probably be TestError { input: String, errors: Option<Vec<String>> }?

Perhaps, I'll give that a try.

9e2f213 is based off that idea, it does combine TestError and TestFail into just one, though now that I think about it, I kind of wonder if TestFail{input, errors}, isn't preferrable if errors is now optional.

I guess the thing to note is this always requires the outer enum variant where previously we could just say:
(input: "...") now we must say either TestSuccess(input: "...") or TestError(input: "...")

Right, so matching against serde probably isn't a good idea. Can we define our own, stable, output perhaps? Could it be as simple as the indented "sideways tree!" output that nimbleparse gives?

Yeah I agree for for neither string matching, nor tree matching serde really works here :(

My fear with string matching is (it seems to me like) we are likely to end up with multiple levels of escaping required, for instance encoding into a string we'll have nested quotation marks (avoidable with raw strings), but the wildcards for the fuzzy matching will also need escaping when they can be present in the output.

I was really trying to/hoping to avoid that kind of escaping, so that the stuff embedded within the tests was representative of the stuff being matched.

We have managed to get rid of escaping quotes with raw strings, I'm not aware of anything analogous for wildcard matching though. So one of the benefits I perceive of tree matching is that the wildcards are embedded as e.g. _ nodes within the tree, rather than embedded as characters within the string, and that kind of matching can be done without escaping data within actual tree nodes.

Because I guess thats where most of my hesitance towards string matching lies, because we can't really just pick non-conflicting characters to use for wildcards since the grammars are arbitrary. So I would like to play with tree matching.

Another minor interesting thought is diffing it had always been eventually instead of saying AST(x) != AST(y) for random potentially long AST, we could try and produce nice diff of the AST's, and while we can diff trees and text without wildcards/fuzzy matching. I've never really thought about diffing fuzzy matches.

Maybe that is just a case where you only produce diffs in the case where the "expected output" has no wildcards.
I guess I don't have any good intuition regarding diff production for the fuzzy matching case, other than it definitely feels like a special case.

String matching definitely has challenges, but in terms of quoting, doesn't embedding tests in another language (in this case, Rust) always involve this in some way?

That said, maybe we can break this into two. One is the general "testing grammar inputs" problem. Second is what we're serialising / testing against. Is that split correct?

I guess my only hesitation to splitting them up is that "testing grammar inputs", does lock us into an input format, let me use a slightly extended ron format which allows _ to be used as an example
The extension somewhat inspired by rust/ML _ wildcard but also the syntax used e.g. here
which is highly relevant https://compiler.club/pattern-matching-in-trees/

[TestSuccess(input: "a", ast: (_, [("character", "a")]))]

In theory though maybe we could add a Wildcard or Wild variant to ASTRepr,
or whatever match/pattern structure we end up using for ast matching,

I believe this or something similar might be compatible with Ron but in some ways using _ feels natural
So perhaps I was wrong that serde won't work for tree matching and it may, but it lacks syntactic sugar.

[TestSuccess(input: "a", ast: (Wildcard, [("character", "a")]))]

Hopefully this kind of also explains why I was hoping we could perhaps avoid escaping wildcards, because we're still encoding a tree-like data structure, but wildcards are embedded within the structure as unitary values, rather than within any character representation.

Hope I'm not being difficult here, it just feels like it might be worth attempting, (or maybe dejagnu just spoiled string based testing for me)

I guess some things worth pointing out, is the paper I had linked to pattern matching in trees, asks a slightly different problem than what we want, it's algorithm produces all the subtrees that match a pattern tree.

While what I believe we're after is more akin to the tree inclusion problem, which asks, given a pattern tree P and a tree T, whether we can produce P from T by deleting nodes in T.

That is to say that the tree inclusion problem omits wildcards, and holes. While the pattern matching in trees admits them. If we slightly modify the rules of the tree inclusion problem, to restrict the number of deletions allowed... e.g. a hole might allow only one deletion, while a wildcard allows any number of deletions.

I suppose it's interesting for two reasons, because of the lack of wildcards in the original formulation, but also just the slightly different formulation of the problem.

Initial experiments with serialized testing

705d1e4

ratmice assigned ltratt Jun 13, 2025

ratmice commented Jun 14, 2025

View reviewed changes

use enum based test variants

9e2f213

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial experiments with serialized testing #593

Initial experiments with serialized testing #593

Uh oh!

ratmice commented Jun 13, 2025

Uh oh!

ltratt commented Jun 13, 2025

Uh oh!

ratmice commented Jun 14, 2025 •

edited

Loading

Uh oh!

ratmice Jun 14, 2025

Uh oh!

ratmice Jun 15, 2025 •

edited

Loading

Uh oh!

ltratt Jun 16, 2025

Uh oh!

ratmice Jun 16, 2025

Uh oh!

ratmice Jun 16, 2025 •

edited

Loading

Uh oh!

ltratt Jun 17, 2025

Uh oh!

ratmice Jun 17, 2025

Uh oh!

ltratt Jun 17, 2025

Uh oh!

ratmice Jun 17, 2025 •

edited

Loading

Uh oh!

ratmice Jun 19, 2025

Uh oh!

Uh oh!

Initial experiments with serialized testing #593

Are you sure you want to change the base?

Initial experiments with serialized testing #593

Uh oh!

Conversation

ratmice commented Jun 13, 2025

Uh oh!

ltratt commented Jun 13, 2025

Uh oh!

ratmice commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ratmice Jun 14, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ltratt Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ltratt Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

ltratt Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

ratmice Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ratmice Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ratmice commented Jun 14, 2025 •

edited

Loading

ratmice Jun 15, 2025 •

edited

Loading

ratmice Jun 16, 2025 •

edited

Loading

ratmice Jun 17, 2025 •

edited

Loading