Skip to content

Add option to pickler dumps() for best-effort determinism #34698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AdrS
Copy link
Contributor

@AdrS AdrS commented Apr 21, 2025

The motivation for this change is Flume caches pickled code and non-determinism breaks the caching. While making pickling fully-determinism is infeasible, increasing the determinism is still useful due to the increase in cache hits. Sets are a common source of non-determinism. This change sorts set elements to provide deterministic serialization. Because not all types provide a comparison operator, the sorting routine implements generic element comparison logic.

See: #34410

The motivation for this change is Flume caches pickled code and
non-determinism breaks the caching. While making pickling fully-determinism
is infeasible, increasing the determinism is still useful due to the
increase in cache hits. Sets are a common source of non-determinism.
This change sorts set elements to provide deterministic serialization.
Because not all types provide a comparison operator, the sorting routine
implements generic element comparison logic.

See: apache#34410
"""For internal use only; no backwards-compatibility guarantees."""
with _pickle_lock:
if enable_best_effort_determinism:
dill.dill.pickle(set, save_set)
Copy link
Contributor

@tvalentyn tvalentyn Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this create a side-effect (for future invocations where we have enable_best_effort_determinism=False again) ?

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @tvalentyn for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants