Skip to content

Use Simdjson ondemand parser instead of DOM parser #3878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 18, 2025

Conversation

Klaim
Copy link
Member

@Klaim Klaim commented Apr 4, 2025

Running main's test_libmamba executable currently allocates (but dont use) more than 5Gb when parsing 320Mb sized JSON files. This PR reduces this to around 2.5Gb. (observed using Visual Studio, in both Release and RelWithDebugInfo, also got some confirmation that this is observable on Linux)

  • use the "on-demand" parser from simdjson instead of the "DOM" parser
  • reorganize the parsing code to avoid reading json fields values more than once as this is required for proper behavior of the "on-demand" parser;
  • improved flat_set<T> to allow .contains(value) where value has a different type than T but is still comparable.

@Klaim Klaim added the release::enhancements For enhancements PRs or implementing features label Apr 4, 2025
@Klaim Klaim force-pushed the simdjson-ondemand branch from 064451d to ca7d188 Compare April 8, 2025 16:21
@Klaim Klaim marked this pull request as ready for review April 9, 2025 14:35
@Klaim Klaim requested a review from SandrineP April 9, 2025 14:36
@codecov-commenter
Copy link

codecov-commenter commented Apr 9, 2025

Codecov Report

Attention: Patch coverage is 93.79845% with 8 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@51890d3). Learn more about missing BASE report.

Files with missing lines Patch % Lines
libmamba/src/solver/libsolv/helpers.cpp 92.30% 8 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3878   +/-   ##
=======================================
  Coverage        ?   62.72%           
=======================================
  Files           ?      299           
  Lines           ?    36721           
  Branches        ?     2756           
=======================================
  Hits            ?    23033           
  Misses          ?    13636           
  Partials        ?       52           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Klaim Klaim force-pushed the simdjson-ondemand branch from 8d32c21 to 3479ad2 Compare April 17, 2025 13:53
Comment on lines +551 to +560
// BEWARE:
// We use below `simdjson`'s "on-demand" parser, which does not tolerate reading the same
// value more than once. This means we need to make sure that the objects and their fields
// are read and/or concretized only once and if we need to use them more than once we need
// to persist them in local memory. This is why the code below tries hard to pre-read the
// data needed in several parts of the computing in a way that prevents jumping up and down
// the hierarchy of json objects. When this rule is not followed, the parsing might end
// earlier than expected or might skip data that are read when they shouldn't be, leading to
// *runtime issues* that might not be visible at first. Because of these reasons, be careful
// when modifying the following parsing code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for mentioning this.

@jjerphan jjerphan merged commit 4c50d7b into mamba-org:main Apr 18, 2025
37 checks passed
@jjerphan jjerphan deleted the simdjson-ondemand branch April 18, 2025 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release::enhancements For enhancements PRs or implementing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants