Skip to content

Improve seekable_format/examples/parallel_compression #4382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 8, 2025

Conversation

vasi
Copy link
Contributor

@vasi vasi commented May 7, 2025

Change the design of parallel_compression, to allow it to actually work in parallel without blowing up memory usage. Fixes #3980.

In the new approach, we don't wait for all jobs to be accepted by the queue before writing any results. Instead, as each job finishes, it adds its result to an ordered-linked-list, and then writes (and frees) as many results as are available. We use a mutex for synchronization, so this should no longer be racy. This approach also conveniently allows streaming input, since we no longer need to know the number of frames ahead of time.

An added test enforces that parallel_compression doesn't use excessive memory.

This doesn't change anything about the main zstd program, so we should leave #3980 open, or open a new issue about zstd CLI seekability.

A variety of other fixes were necessary to get test passing, or were obvious cleanups once this was complete. Each is a separate commit.

vasi added 8 commits May 6, 2025 21:55
This passes make flags, such as `-jN` for building in parallel, to
the underlying make.
Some of these examples are intended to be parallel, and don't make
sense to link against single-threaded libzstd.

The filename of mt and nomt libzstd are identical, so it's still
possible to link against the single-threaded one, just harder.
Previously, parallel_compression would only handle each job's results
after ALL jobs were successfully queued. This caused all src/dst
buffers to remain in memory until then!

It also polled to check whether a job completed, which is racy without
any memory barrier.

Now, we flush results as a side effect of completing a job. Completed
frames are placed in an ordered linked-list, and any eligible frames
are flushed. This may be zero or multiple frames, depending on the
order in which jobs finish.

This design also makes it simple to support streaming input, so that
is now available. Just pass `-` as the filename, and stdin/stdout will
be used for I/O.
There was no memory barrier between writing and reading `done`, which
would allow reordering to cause races. With so little data to handle
after each job completes, we might as well just join.
Use ulimit to fail the test if we use O(filesize) memory, rather than
O(threads).
Building lz4 as root was causing `make clean` to fail with permission
errors.

We used to have to install lz4 from source back in Ubuntu 14.04, but
nowadays the installed lz4 is fine. Get rid of ancient helpers and
cruft!
Copy link
Contributor

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vasi !
Looks good to me !
It's also pretty clean, very readable, style is consistent,
and the modification comes with a test run in CI.
I like it.

@Cyan4973 Cyan4973 self-assigned this May 8, 2025
@Cyan4973 Cyan4973 merged commit f9938c2 into facebook:dev May 8, 2025
101 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

seekable_format/examples/parallel_compression.c is not parallel
3 participants