Skip to content

Fix #269 and rewrite DFAMin #270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 5, 2025
Merged

Fix #269 and rewrite DFAMin #270

merged 20 commits into from
Apr 5, 2025

Conversation

nicuveo
Copy link
Contributor

@nicuveo nicuveo commented Apr 3, 2025

Overview

This PR fixes a long-standing bug in the minimizer. As not all states are in the queue Q (or W as it is known in the reference implementation of Hopcroft's algorithm), it is possible in some rare cases for some sets to not be properly subdivided. This can result in the wrong rule being selected at runtime.

This PR adds two new tests: one that demonstrates the issue with older versions of the minimizer (pre 4f0b51b) and one that demonstrates the issue with current versions of the minimizer.

Implementation details

The gist of this change is simple: we now initialize R to the empty set, and W (renamed from its previous nonstandard name of Q) to a set of subsets of stages. I've additionally used this PR as an opportunity to do several small cleanups.

First, this PR rewrites the Note that describes the algorithm. It properly introduces some notation that was left unexplained, fixes the error regarding the initial sets, and fixes some typos.

As far as groupEquivalentStates is concerned, beyond the aforementioned core change, this PR extracts several helper functions to the top level, renames them, and adds some basic documentation. For instance, what was bigmap is now created by a top-level function called generateReverseTransitionCache.

Beyond this, it also makes use of list functions to reduce the amount of manual list construction / deconstruction, and makes some minor changes to minimizeDFA for readability, and to avoid needing to silence shadowing warnings.

Open questions

  • Does this PR need a Changelog entry, or are those only for releases?
  • Why were some boot files listed in the Cabal file? (I had to remove them to compile and run tests.)

@Bodigrim
Copy link

Bodigrim commented Apr 3, 2025

Why were some boot files listed in the Cabal file? (I had to remove them to compile and run tests.)

They are populated by

alex/Makefile

Lines 23 to 24 in eb6c78d

mv src/Parser.y src/Parser.y.boot
mv src/Scan.x src/Scan.x.boot

@andreasabel andreasabel added the pr: squash PR should be squashed upon merge label Apr 4, 2025
@andreasabel
Copy link
Member

andreasabel commented Apr 4, 2025

Excellent work, @nicuveo !
So you fixed a bug that existed likely in all versions of alex 3 so far! (We'll have to deprecate them all.)

I took some hours working through the new code for DFAMin, and I added some commits that resulted from my review:

  1. lots of comments
  2. use aliases OldSNum and NewSNum for state numbers rather than Int where possible
  3. optimize the numbers computation by observing that a start state can be only in one equivalence class
  4. optimize the computation of the transition tables of the new states by observing that a single old state out of an equivalence class already contains all the information we need for the new state (I mean, after all, all old states in a class are equivalent)
  5. replace restrictKeys by a plain filter

The last point is something we should/could discuss. My intuition is that restrictKeys has some overhead since it needs to return a balanced IntMap---only to have it crushed by the fold. So aren't we better off by turning it into a list, filtering it, and then fold together the remaining elements?

If you have any comments on my other changes, let me know as well.

My plan is to squash all commits into one.

P.S.: In general, I wonder whether the testsuite of alex is not too skimpy. After all, it did not discover the bug you found.

@andreasabel andreasabel self-assigned this Apr 4, 2025
@andreasabel andreasabel added this to the 3.5.3.0 milestone Apr 4, 2025
@Bodigrim
Copy link

Bodigrim commented Apr 4, 2025

To fix CI failures try bumping v2 to v3 in

- uses: uraimo/run-on-arch-action@v2

@nicuveo
Copy link
Contributor Author

nicuveo commented Apr 4, 2025

Thank you @andreasabel!

Regarding your changes to minimizeDFA: looks good to me! I had purposefully not made any deep change to it to avoid scope creep (beyond fixing obvious surface-level stuff), with the idea of doing further cleanups in a subsequent PR: separating the actual bug fix from the more cosmetic changes. I don't know what the policy / habit is for this project, though? But beyond that, yeah, again, your changes look sensible to me!

Regarding restrictKeys: yeah, i think your version is better! And it allows us to get rid of CPP, which is great. I'd prefer a do block however, like so; what do you think?

let x = IntSet.unions $ do
      (target, sources) <- IntMap.toList allPreviousStates
      guard $ target `InsSet.member` a
      pure sources

(An extremely pedantic nitpick, please feel free to ignore: 4522b8d introduces a nested where, and IME those are better avoided, do you think we could redo error handling there to avoid that?)

Regarding the test suite: i agree. A big challenge i faced with this work was managing to find a broken example expressed by rules, rather than as a quickcheck property / unit test that tests the invariants of groupEquivalentStates. Having a haskell test suite to validate some internals might help.

In general, i think this project could use a bit of TLC: it really lacks standardization, the naming conventions and comment styles are a bit all over the place... it looks, i'm sorry to say, like the project has been a bit neglected. I'd be happy to spend some time making cleanup / harmonization / modernization PRs; is that something that'd be welcome?

Thanks!

@nicuveo
Copy link
Contributor Author

nicuveo commented Apr 4, 2025

To fix CI failures try bumping v2 to v3 [...]

Ah, thanks! I've noticed the failure seems random, the emulated check has passed at least once... I'll open a separate PR to avoid crowding this one with an unrelated CI change.

@nicuveo
Copy link
Contributor Author

nicuveo commented Apr 4, 2025

I've pushed the changes to restrictKeys, and a small tentative change: i rewrote the processing of one X set to use foldl' instead of concat, concatMap, unzip, and ++. The hope is that, by only constructing the resulting list with :, we avoid paying a performance cost. I haven't benchmarked this however, and i am not sure how much list fusion would impact the original code.

@nicuveo nicuveo mentioned this pull request Apr 4, 2025
@andreasabel
Copy link
Member

Regarding your changes to minimizeDFA: looks good to me! I had purposefully not made any deep change to it to avoid scope creep (beyond fixing obvious surface-level stuff), with the idea of doing further cleanups in a subsequent PR: separating the actual bug fix from the more cosmetic changes. I don't know what the policy / habit is for this project, though? But beyond that, yeah, again, your changes look sensible to me!

I follow in general to separate bug fixes from refactorings, but in this case most of the lines of DFAMin had already been modified so I allowed myself to go all-in and freely rewrite some code that asked for improvement.

Regarding restrictKeys: yeah, i think your version is better! And it allows us to get rid of CPP, which is great. I'd prefer a do block however, like so; what do you think?

let x = IntSet.unions $ do
      (target, sources) <- IntMap.toList allPreviousStates
      guard $ target `InsSet.member` a
      pure sources

Yes, sure!

(An extremely pedantic nitpick, please feel free to ignore: 4522b8d introduces a nested where, and IME those are better avoided, do you think we could redo error handling there to avoid that?)

I agree with the general style to not deeply nest where. Rather, have top-level functions that can independently be given a clear semantics!
In case of a panic string however I would rather see this as distraction. Unless the code is incorrect (fingers cross), these panics are anyway dead code, so hiding them in some local where seems appropriate.

Regarding the test suite: i agree. A big challenge i faced with this work was managing to find a broken example expressed by rules, rather than as a quickcheck property / unit test that tests the invariants of groupEquivalentStates. Having a haskell test suite to validate some internals might help.

In general, i think this project could use a bit of TLC: it really lacks standardization, the naming conventions and comment styles are a bit all over the place... it looks, i'm sorry to say, like the project has been a bit neglected. I'd be happy to spend some time making cleanup / harmonization / modernization PRs; is that something that'd be welcome?

Refactorings and clean-ups are a bit of a two-edged sword. One the one hand, they improve the quality of the code, on the other hand, there is the risk of introducing regressions. After all, the current code, has passed the "test of time" (which is some correctness guarantee, yet not enough as the present bug shows.).
Therisk of introducing regressions can however be substantially reduced by a large test suite. So, I think work should be done in this order:

  1. Enhance the testsuite to a level that gives us confidence to risk refactorings/clean ups.
  2. Do refactorings/clean ups.

(The risk of regressions is real; e.g. see the "Hall of fame" for the make-over of the sister project happy: https://github.com/haskell/happy/issues?q=is%3Aissue%20state%3Aclosed%20label%3A%22regression%20in%20happy-2.*%22)

@andreasabel
Copy link
Member

I'd be happy to merge the PR in the current state, if you agree @nicuveo .

andreasabel and others added 6 commits April 5, 2025 11:31
A start state cannot belong to several equivalence classes (they are
disjoint), so we can remove already assigned start states from the
list of start states we want to assign.
Since all old states in an equivalence class are equivalent, we only
need the data from one of them to construct the new state.
@nicuveo
Copy link
Contributor Author

nicuveo commented Apr 5, 2025

Refactorings and clean-ups are a bit of a two-edged sword. One the one hand, they improve the quality of the code, on the other hand, there is the risk of introducing regressions. After all, the current code, has passed the "test of time" (which is some correctness guarantee, yet not enough as the present bug shows.). Therisk of introducing regressions can however be substantially reduced by a large test suite. So, I think work should be done in this order:

  1. Enhance the testsuite to a level that gives us confidence to risk refactorings/clean ups.
  2. Do refactorings/clean ups.

Sounds good! What's the proper way to get involved in such future plans? I have a bit of bandwidth and would be happy to help.

@nicuveo
Copy link
Contributor Author

nicuveo commented Apr 5, 2025

I'd be happy to merge the PR in the current state, if you agree @nicuveo .

I rebased this PR to include #271, but sadly it didn't seem to reliably the transient broken test. But if that's not a blocker, yep, i'm okay with merging this!

@nicuveo
Copy link
Contributor Author

nicuveo commented Apr 5, 2025

Oh wait no the new test failure is something we introduced. I'll fix that ASAP.

@andreasabel andreasabel merged commit aa6c57b into haskell:master Apr 5, 2025
19 checks passed
@nicuveo nicuveo deleted the issue269 branch April 5, 2025 11:52
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Apr 30, 2025
## Changes in 3.5.3.0

* Fix critical bug in automaton minimizer
  ([PR #270](haskell/alex#270)),
  thanks Antoine Leblanc!
* Tested with GHC 8.0 - 9.12.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr: squash PR should be squashed upon merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants