-
Notifications
You must be signed in to change notification settings - Fork 1.3k
git url: support CSV form (also support verifying a tag with a commit hash) #5903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
if isHTTPSourceV { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Use git show --ignore-all-space
to review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part was moved to:
🤔 Wondering why we use the fragment here actually. Wouldn't query parameters make more sense, since there's actually a way of having multiple params there? So instead:
It just feels a bit odd to overload the |
Query params doesn't seem correct here, as it would be sent to the server as an HTTP GET request |
@@ -6,6 +6,7 @@ const AttrAuthHeaderSecret = "git.authheadersecret" | |||
const AttrAuthTokenSecret = "git.authtokensecret" | |||
const AttrKnownSSHHosts = "git.knownsshhosts" | |||
const AttrMountSSHSock = "git.mountsshsock" | |||
const AttrCommitHash = "git.commithash" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Bikeshedding: this could be also named AttrGitChecksum
to follow AttrHTTPChecksum
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonistiigi Which one do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttrGitChecksum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include existing features like keep-git-dir, #4905 (comment) . Also #4974 seems like a simple addition.
What if a branch/tag starts with #
?
util/gitutil/git_url.go
Outdated
return nil, errors.New("tag, branch, and commit are exclusive") | ||
} | ||
for _, v := range refs { | ||
res.Ref = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the benefit of different fields if they still get all squashed into same ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use a single ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we have it so that ref
is used as is, branch
is converted to ref=refs/heads/xxx
and tag
is converted to ref=refs/tags/xxx
, or does that break something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be another PR?
Can be another PR? Maybe not as easy as
|
CSV parameters can be now specified after "##" in a git URL. e.g., https://github.com/user/repo.git##ref=v1.0.0,subdir=/dir Fix issue 4905, but the syntax differs from the original proposal. The documents will be added in https://github.com/docker/docs/blob/main/content/manuals/build/concepts/context.md#url-fragments Signed-off-by: Akihiro Suda <[email protected]>
e.g., https://github.com/user/repo.git##ref=mytag,commit=cafebab Fix issue 5871 Signed-off-by: Akihiro Suda <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Wondering why we use the fragment here actually. Wouldn't query parameters make more sense, since there's actually a way of having multiple params there?
Query params doesn't seem correct here, as it would be sent to the server as an HTTP GET request
In LLB we have different sources llb.Git and llb.HTTP and in Dockerfile we already allow this as https:// is a valid git protocol and we use .git
and github.com
to make a difference. This could be problematic if query options are sent to Git server on clone but that is already not supported. It does look cleaner to me than ##
, and we can still support #
by combining them, and we wouldn't break the one person using #abc
as a branch name atm.
I guess for SSH style URLs it would look weird and nonstandard?
@@ -151,6 +151,73 @@ func TestParseURL(t *testing.T) { | |||
Path: "/moby/buildkit", | |||
}, | |||
}, | |||
{ | |||
url: "https://github.com/moby/buildkit##ref=v1.0.0,subdir=/subdir", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add some SCP/SSH-style URL testcases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -635,6 +652,16 @@ func (gs *gitSourceHandler) Snapshot(ctx context.Context, g session.Group) (out | |||
} | |||
|
|||
git = git.New(gitutil.WithWorkTree(checkoutDir), gitutil.WithGitDir(gitDir)) | |||
if gs.src.CommitHash != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this happening after we have already completed the checkout instead of before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for verification, as we may potentially have an unexpected code path (in future)
refCommitFullHash = gs.src.Ref | ||
} | ||
if refCommitFullHash != "" { | ||
cacheKey := gs.shaToCacheKey(refCommitFullHash, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ref should be set here if one is present even if we know the full hash.
If first build is from ref=master,commithash=
and second build(replay pinned from provenance) is from ref=master,commithash=abcdef
then they should generate same cache keys. Ideally we would cover this with a testcase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -207,6 +209,9 @@ func (gs *gitSourceHandler) shaToCacheKey(sha, ref string) string { | |||
if gs.src.Subdir != "" { | |||
key += ":" + gs.src.Subdir | |||
} | |||
if gs.src.CommitHash != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Isn't sha
already same as CommitHash
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, CommitHash
can be a partial hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does that matter, if we already have a complete one.
Same as https://github.com/moby/buildkit/pull/5903/files/076850aeb7720b56683710d8e30b05fbb9352b97#diff-bf1789b0b19c8d6533a10da5e27f0b31da2cdcd37510a787498c3d19a985d470 . Eg. if you build from alpine:latest
or alpine:latest@sha256
then if the digest matches then they will generate the same cache key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new cache key seems safer to enforce hash verification, as an existing cache might have been populated without verifying the sha
with the expected value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can continue the discussion in
@@ -6,6 +6,7 @@ const AttrAuthHeaderSecret = "git.authheadersecret" | |||
const AttrAuthTokenSecret = "git.authtokensecret" | |||
const AttrKnownSSHHosts = "git.knownsshhosts" | |||
const AttrMountSSHSock = "git.mountsshsock" | |||
const AttrCommitHash = "git.commithash" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttrGitChecksum
If we are going to make our URL spec further diverged from RFC XXXX, the |
In Dockerfile, where this is only ambiguous, we already have https://domain/smth.git#ref meaning something completely different than doing a HTTP GET of that URL. From a purely URL parsing standpoint, query parameters are a standard part of URLs, while this ## as an alternative fragment separator really isn't. |
Maybe we should follow "URL Fragment Text Directives" https://wicg.github.io/scroll-to-text-fragment/#example-9acce0ff ? For BK, it would be
|
Curious; is passing these options in the URL itself a hard requirement? Looking at the related ticket, the motivation behind that request was for the current micro-format to be either ambiguous, or hard to grasp. I think the main reason we used a micro-format in the first place was because Dockerfile instructions (like With this PR, we're inventing yet-another micro format, but now that the Dockerfile format supports
For backward-compatibility, we can probably allow passing |
This syntax probably doesn't allow having a single |
Yeah, that's true; do we (can we?) expand fields in the flag? Because I guess that would still allow passing these references, which could be by combining (if you want the hash to be "optional"), e.g.; ARG GIT_TAG=v0.1.0
ARG GIT_HASH=@sha256:deadbeef
ARG GIT_REF=${GIT_TAG}${GIT_HASH}
FROM scratch
ADD --git-opts=ref=${GIT_REF},path=/dir,keep-git=true,depth=1 https://github.com/user/repo.git /src The only risk there would be if git would accept a change where |
Looks really ugly to me, and not really similar with the intent of "text directives" (not that our current anchor usage is).
This isn't just |
I guess you mean |
https://wicg.github.io/scroll-to-text-fragment/#example-9acce0ff says "The only directive introduced in this spec is the text directive but others could be added in the future"
|
Hm, right, but for the positional arg ( |
One of the patches I maintain downstream in my builds of BuildKit is just to remove the recursive submodule clone that's assumed by passing a URL, which I imagine would be a use case that could be unlocked by a more complex format here. In other words, I finally have an answer to moby/moby#3463 (comment) and ... "the call's coming from inside the house" -- to be fair though, I only use build-via-URL because it embeds slightly better provenance, and I'm not convinced I want to continue doing so in the future because it has a lot of rough edges. This PR frankly adds more, not less, because now we've got an even more complex micro-format for scripting |
It's not like there is any other alternative for these cases and because of the defaults in actions Git context is probably 100x more used than |
Right, but I think with both the It's a bit hard to describe my thinking; mostly we're trying to overload an already overloaded format. Let me drill down "how did we get here?"
We got on a sliding slope when we added support for branches, tags, directories; because now we started to conflate "URL" with our own features. The format was also complicated, so not as intuitive, because you had to make yourself familiar with the micro-format we defined for this, but the set of features was limited so we could cope (on our side), and it was power-user feature, so we expected investment on the user side to learn how to do this. But effectively, we painted ourselves in a corner because we wanted to continue to have both "it's a git(hub) URL", AND "we want to add our options on top, but still make it an URL". I just think that we are stretching the limits (too far) now, because we tied ourselves to that; the microformat is out-growing its original intent. It's no longer intuitive, and bolting more options on top only makes this more so; effectively throwing away the reason we added this in the first place (just copy/paste your repository URL!). Which is why I wondered if we could not use other (command-line) options to specify these options, instead of trying to bolt-on even more. If that's not an option, perhaps we should make this explicit; it's not a (git) URL anymore, it's a docker / buildkit argument with options. And if the argument must be an URL (to disambiguate from it being a local path), perhaps we should define a pseudo-scheme for this, e.g.;
Or
With our own scheme, we wouldn't have to be concerned that any (query)parameter included would be considered a parameter for the remote host. It would be ~backward compatible (older buildx versions should tell "unsupported scheme"), and we "own" the URL structure, so could add whatever we want.
☝️ unfortunately we weren't that bright though, so it looks like we don't invalidate schemes we don't support; docker build -t foo whatever://github.com/foo/bar
[+] Building 0.0s (0/0) docker:desktop-linux
ERROR: unable to prepare context: path "whatever://github.com/foo/bar" not found |
I don't want to overload. The current PR is introducing new concepts like
I don't think this history is quite correct. Building from remote URLs was supported in 2013, I believe even before the name Dockerfile was settled.
I do not understand why we would need a new schema. We already have logic for interpreting certain HTTP URLs as Git URLs. This would not be a new concept. Building from URLs or using URL dependencies is the only way BuildKit can track dependencies securely and validate them. The more builds we can make directly from URLs rather than local source or |
We are likely cutting v0.22 this week. If we can't come to an agreement about the URL syntax, we should at least try to get the LLB bits in. If people are unsure about the querystring then maybe we can leave add it to labs first. |
Rather than inventing a new microformat, can we just use a JSON string?
If we don't like this either, a new microformat can be added later without breaking the compatibility. |
How is adding a querystring to a URL adding a new microformat? |
Query strings are expected to be set as a part of HTTP GET request. |
This has never worked anyway. If it is HTTP URL then it is not handled by Git (and already works fine). If it is a Git URL then
same as
What happens internally is I believe
and obviously
would always be invalid. |
👍 On using query string to enhance Git context handling. This approach offers a more flexible, self-describing and extensible mechanism while keeping backward compatibility with fragment that can be merged but query param takes precedence.
|
I guess you meant myrepo.git?tag=mytag there |
Split to:
Also updated to use the query form |
Commit 1: git url: support CSV form
CSV parameters can be now specified after "##" in a git URL. e.g.,
Fix #4905, but the syntax differs from the original proposal.
The documents will be added in
https://github.com/docker/docs/blob/main/content/manuals/build/concepts/context.md#url-fragments
Commit 2: git url: support specifying ref and commit together
Fix #5871