Skip to content

fix(compress): allow passing in compressor options #509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

rchincha
Copy link

Goals of this PR:

  1. Allow passing in options to individual compressor
  2. Do not change default behavior

@rchincha
Copy link
Author

An alternative approach to: #508

@rchincha
Copy link
Author

https://github.com/project-stacker/stacker/blob/main/pkg/overlay/pack.go#L411C4-L411C4
becomes:

GzipCompressor.WithOpt(GzipBlockLen(256<<12))

Goals of this PR:
1. Allow passing in options to individual compressor
2. Do not change default behavior

Signed-off-by: Ramkumar Chinchani <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Oct 13, 2023

Codecov Report

Merging #509 (029f15e) into main (312b2db) will decrease coverage by 0.04%.
The diff coverage is 50.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #509      +/-   ##
==========================================
- Coverage   73.48%   73.45%   -0.04%     
==========================================
  Files          60       60              
  Lines        4884     4893       +9     
==========================================
+ Hits         3589     3594       +5     
- Misses        935      939       +4     
  Partials      360      360              
Files Coverage Δ
mutate/compress.go 44.92% <50.00%> (+1.59%) ⬆️

@hallyn
Copy link
Contributor

hallyn commented Oct 13, 2023

Looks nice

@rchincha
Copy link
Author

@rchincha
Copy link
Author

@cyphar thoughts on this PR? Hoping to avoid a umoci fork.

@cyphar
Copy link
Member

cyphar commented Oct 21, 2023

A fork definitely won't be necessary, I'll review this on Monday. Sorry, I was a bit busy last week.

@@ -76,6 +91,15 @@ func (gz gzipCompressor) MediaTypeSuffix() string {
return "gzip"
}

func (gz gzipCompressor) WithOpt(opt CompressorOpt) Compressor {
switch val := opt.(type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like that we ignore invalid arguments here. If we make this act on *gzipCompressor we could make it return a bool (or error) and then the caller can decide if they care about whether WithOpt actually did anything.

Alternatively, it feels like it should be possible to reimplement all of this compressor stuff with a closure that is passed by the caller (that creates the writer and configures it). But that would require more work...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could add a CustomCompressor just for cases where you need to configure the compression? Idk...

@rchincha
Copy link
Author

Another place of relevance in umoci code wrt compressor options :(

https://github.com/opencontainers/umoci/blob/main/pkg/system/copy.go#L32

@rchincha
Copy link
Author

fyi, filed this against image-spec opencontainers/image-spec#1145

@cyphar
Copy link
Member

cyphar commented Apr 22, 2025

Given #562 this would need to be reworked, but I don't know if you still have interest in this PR @rchincha? I can carry it if you like. I suspect we would want to make it more like the (... , opts ..func(...))-style configuration pattern now that algorithms are properly supported.

@hallyn
Copy link
Contributor

hallyn commented May 5, 2025

@cyphar stacker is using this, yes, and AFAIK it is because a user specifically need it.

@mikemccracken can you confirm whether the ability to specify the compression options are still needed?

@mikemccracken
Copy link
Contributor

@cyphar stacker is using this, yes, and AFAIK it is because a user specifically need it.

stacker has this commit in its stale fork of umoci, yes.

@mikemccracken can you confirm whether the ability to specify the compression options are still needed?

For background, here is my summary of how this became necessary, lightly edited from an internal note in 2023:

•	docker and umoci  have different gzip buffer sizes, and **docker build re-compresses base image layers** (this part is weird, but maybe it's because of supporting docker commit).
•	we are seeing this (layers that should be identical not matching) because some app teams are doing docker builds on top of stacker-created base image layers so the base layers don't match after docker recompresses them
•	the Right Fix is to have everyone use stacker, but that won't happen this cycle and we need a fix for this cycle. (note, this is from many cycles ago and nothing has changed)
•	changing stacker to use the same buffer size as docker should be fine and won't cause too much coordination between users of stacker because stacker does not recompress base layers, so old stacker versions can still build with new-stacker created base layers and it won't cause problems

other tools like the containers/image library also match the docker compression option, so they don't have this problem.


I believe that the people who were using docker build on top of stacker-built images are still doing so, but I don't know for sure. I do think that it'd come up again if we stopped this but I'm also a little surprised it hasn't come up in other uses of umoci.

Q: Would it make sense to just hardcode in umoci the same compression block size as docker and containers/image use?
It's a workaround for a weird docker behavior but it also seems basically harmless, IIRC the difference was not big vs. the default.

@cyphar
Copy link
Member

cyphar commented May 6, 2025

So, ideally we wouldn't change the current default because doing so would cause the same unfortunate cache busting. However, if Docker really does break our layers then maybe we should just hard-code it...

(It is possible to make the new algorithm setup configurable but it's probably not worth the effort unless there's a strong need for it.)

@mikemccracken
Copy link
Contributor

So, ideally we wouldn't change the current default because doing so would cause the same unfortunate cache busting. However, if Docker really does break our layers then maybe we should just hard-code it...

I don't follow what you mean by cache busting here - AFAIK the only reason this parameter ever comes up is that docker recompresses base tarballs using its own hardcoded blocksize and expects to get the same bits. Are there other tools also doing this?

(It is possible to make the new algorithm setup configurable but it's probably not worth the effort unless there's a strong need for it.)

As for configurability, stacker only needs to match docker, if umoci's default is the same as stacker then we don't need configurability here.

@cyphar
Copy link
Member

cyphar commented May 6, 2025

I meant that if we change the umoci settings then umoci will generate different blobs (with different hashes) compared to previous versions for the same inputs. In practice, OCI images are not built in a fully reproducible way already, but intentionally breaking this is something I try to avoid as much as possible to avoid breaking the caches of layers built with old umoci versions (we have tests to verify we don't do it by accident).

But yeah, if Docker is already forcefully doing this to umoci blobs you import today, then changing it to match is not as big of a deal.

@cyphar
Copy link
Member

cyphar commented May 7, 2025

#581 changes our default to the one used by stacker. PTAL @rchincha.

@cyphar cyphar closed this in #581 May 7, 2025
cyphar added a commit to cyphar/stacker that referenced this pull request May 31, 2025
This allows us to switch away from our umoci fork now that upstream
supports OverlayfsRootfs and the various features we need. The key
changes that allow us to switch away from our fork are:

 * opencontainers/umoci#572 which implemented a large number of fixes
   to overlayfs handling, such as opaque whiteouts and several features
   not implemented in our fork (xattr escaping, handling of missing
   parent directories, improved rootless support, handling of nested
   whiteouts inside an opaque whiteout).

 * opencontainers/umoci#581 which switched to a Docker-friendly gzip
   block size by default, removing the need to configure it (as
   suggested in opencontainers/umoci#509).

 * opencontainers/umoci#587 which implemented full configurable
   userxattr (user.overlay.*) support.

Signed-off-by: Aleksa Sarai <[email protected]>
rchincha pushed a commit to project-stacker/stacker that referenced this pull request May 31, 2025
* feat: update to skopeo v1.13.0

We need to update skopeo to match the pgzip version between skopeo and
umoci.

Signed-off-by: Aleksa Sarai <[email protected]>

* feat: update to github.com/opencontainers/[email protected]

This allows us to switch away from our umoci fork now that upstream
supports OverlayfsRootfs and the various features we need. The key
changes that allow us to switch away from our fork are:

 * opencontainers/umoci#572 which implemented a large number of fixes
   to overlayfs handling, such as opaque whiteouts and several features
   not implemented in our fork (xattr escaping, handling of missing
   parent directories, improved rootless support, handling of nested
   whiteouts inside an opaque whiteout).

 * opencontainers/umoci#581 which switched to a Docker-friendly gzip
   block size by default, removing the need to configure it (as
   suggested in opencontainers/umoci#509).

 * opencontainers/umoci#587 which implemented full configurable
   userxattr (user.overlay.*) support.

Signed-off-by: Aleksa Sarai <[email protected]>

---------

Signed-off-by: Aleksa Sarai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants