Skip to content

Cosi v1alpha2 changes #4599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BlaineEXE
Copy link

@BlaineEXE BlaineEXE commented Apr 25, 2024

  • One-line PR description: begin v1alpha2 doc - initial work is fixing old errors without adding large features
  • Other comments:

Update the COSI KEP's base design to v1alpha2. The primary purpose of
this work is to reshape the doc and spec to fix inconsistencies and
vestigial items, and fix the most glaring issues in the v1alpha1 spec.

COSI maintainers are planning follow-up items to add larger individual
features after this.

Notably, bucketInfo.json has been changed to individual secret fields
with COSI_<KEY>: <VALUE> format, as the JSON blob was flagged as a
problem by several v1alpha1 users.

Additionally, rework the existing APIs/spec to give driver sidecars fewer responsibilities and take on more coordination responsibility in the main COSI controller. This mirrors the implementation of volume snapshotter uses and should help keep version mismatch issues between sidecar/controller less frequent. It also means the sidecar -- and thus vendor drivers -- require fewer RBAC permissions.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 25, 2024
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 25, 2024
@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch 3 times, most recently from d22d1bd to 511d6cc Compare April 25, 2024 22:01
@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from 511d6cc to 5aa494b Compare April 25, 2024 23:26
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 25, 2024
@BlaineEXE BlaineEXE marked this pull request as draft April 25, 2024 23:27
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 25, 2024
@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from 5aa494b to 52275b7 Compare April 25, 2024 23:40
@pacoxu pacoxu mentioned this pull request Jul 25, 2024
8 tasks
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 23, 2024
@xing-yang
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 23, 2024
Comment on lines 274 to 277
The mechanism by which the Controller waits for the intermediate Bucket to be provisioned in the
middle of BucketClaim reconciliation is not specified here. The important behavior (waiting) is
defined, and the logic coordinating the init-wait-finish reconcile process is left as an
implementation detail.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to define high-level behavior expectations here, rather than pigeonhole ourselves into a specific implementation solution.

@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from f55b3d3 to 11d78f4 Compare February 24, 2025 23:10
@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from 9098c29 to 14b42fa Compare March 11, 2025 22:11

COSI is out-of-tree, so version skew strategy is N/A

## Alternatives Considered
Copy link
Author

@BlaineEXE BlaineEXE Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section contains sections that describe design details that we have discarded, including many changes between v1alpha1 and v1alpha2. I have taken care to try to make this a good effective overview and discussion of any complex points.

@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from 14b42fa to 8c2312f Compare March 21, 2025 22:20
@BlaineEXE BlaineEXE marked this pull request as ready for review March 21, 2025 22:20
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 21, 2025
Comment on lines +507 to 508
// Protocols is a list of protocols that the provisioned Bucket must support. COSI will verify
// that each item in this list is advertised as supported by the OSP driver.
Protocols []Protocol
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I listened to @saad-ali 's response in the last sig-storage meeting to whether this is needed. One of the important things I heard was that the intent is that drivers can "advertise" what they support.

Reading more into Saad's reply, I can see how this field might be important for portability in some ways. Importantly, if I were to copy a BucketClaim spec to a new/different K8s cluster with different COSI driver, the presence of Protocols here should help me feel certain that COSI will return an error if the ported copy is incompatible with the new k8s cluster.

Something that wasn't present in the v1alpha1 spec (or implementation) was a way for drivers to "advertise" anything. Drivers were expected to error out if they couldn't support all protocols, but I don't consider this "advertising" things. And certainly, any driver could decide to ignore the list when provisioning if they wanted to.

Thus, for the v1alpha2 spec, I added a gRPC return value for DriverGetInfo for drivers to advertise their supported protocols. I also added the note here that we expect COSI itself (I think the Sidecar) to validate that what the user is requesting in Parameters is advertised as supported by the driver.

If my analysis and understanding of Saad's feedback is correct, I think this is good to keep (along with the ability for to drivers advertise support).


As an alternative to the input check, we originally planned v1alpha2 to only have the BucketClaim.status.supportedProtocols list, which would inform users what protocols the Bucket supports, but only after the BucketClaim/Bucket are done being provisioned. It wouldn't guarantee an error in ported BucketClaim provisioning if the 'new' cluster didn't support the same protocols.

This alternative would easier to implement because it's less complex. However, it wouldn't allow them to state their intentions before provisioning happens.

CC: @xing-yang @shanduur

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the comment and the doc, I think that spec field is the correct approach. It allows better validation and failing early, if desired protocols does not match protocols advertised by driver.

@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from 8c2312f to e4f1a94 Compare March 21, 2025 22:38

COSI Sidecar should not have Bucket delete permissions.

#### Generating Bucket Access Credentials
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are trying to keep the Sidecar responsibilities (and RBAC) as small as possible (following the design of Volume Snapshotter), we needed significant changes to how BucketAccesses are provisioned.

TL;DR: the controller gets info from BucketAccessClass and BucketClaim and copies it to the BucketAccess.status, after which the sidecar can do its thing. Sidecar no longer does Class/Claim inspection.

// Protocol is the name of the Protocol that this access credential is expected to support.
// +required
Protocol Protocol
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Protocols is important for BucketClaims/Buckets it stands to reason (to me) that Protocol is important for BucketAccess also.

@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from e4f1a94 to da319fe Compare March 21, 2025 22:46
Comment on lines 147 to 149
### Important changes between versions

NOTE:
- The secret will not be created until the credentials are generated/service account mappings are complete.
- Within a namespace, one BucketAccess and secret pair is generally sufficient, but cases which may want to control this more granularly can use multiple.
- The secret will be created with a finalizer that prevents it from being deleted until the associated bucketAccess is deleted.
v1alpha1 to v1alpha2:
Copy link
Author

@BlaineEXE BlaineEXE Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section has a quick overview of important changes from v1alpha1 to v1alpha2.

@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from e17381a to 586add2 Compare March 27, 2025 17:38
Copy link
Member

@shanduur shanduur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nitpicks, but honestly this is great! Good job.

1. Admin allows User to use BucketClass
2. User creates BucketClaim that uses BucketClass
3. COSI controller observes BucketClaim
1. Controller applies `objectstorage.k8s.io/bucketclaim-protection` finalizer to the BucketClaim
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify what actually is being protected? Why not just use objectstorage.k8s.io/protection?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested reusing the same finalizer name a couple weeks ago. It seemed like Xing was against it, or perhaps confused about what I was trying to express. I would prefer the approach you are suggesting, so I will make the change. This detail isn't the most important, so it can be reverted if there is still opposition

@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch 3 times, most recently from 91e708c to d0f96f0 Compare March 31, 2025 17:48

Each BucketAccess is meant to map to a unique service account in the OSP. Once the requested privileges have been granted, a secret by the name specified in `credentialsSecretName` in the BucketClaim is created. The secret will reside in the namespace of the BucketClaim. The secret will contain either access keys or service account tokens based on the chosen authentication type. The format of this secret can be found [here](#bucketinfo)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this previous version, I liked the idea of COSI driver being able to grant or deny access requests. This concept seemed to have disappeared in v1alpha2. If some rules were defined in the vendor-specific section of BucketAccessClass, it could allow COSI drivers to make access decisions—granting or denying requests based on that policy. This would give vendors flexibility to enforce rules like user-based permissions, time-based access, or namespace-level constraints. Or to allow users to make specific requests in the BucketAccess resource, such as requesting read-only access or specifying expiration times—inputs that could be evaluated by the COSI driver when deciding whether to grant or deny access.

Copy link
Author

@BlaineEXE BlaineEXE Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The details here haven't been removed from the v1alpha2 design. They have been clarified and expounded upon a bit, and the organization of the doc has changed.

We did remove a vague set of details around COSI supporting IAM, which wasn't fully baked, but I don't think that's what you're talking about.

To add clarity, both DriverGrantBucketAccess() and DriverRevokeBucketAccess() are the gRPC calls used by COSI to request the driver grant/revoke access. Those were present in v1alhpa1 and still present here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So your saying COSI drivers will still have the ability to grant or deny access requests? I didn't really see that language anywhere like it was in v1alpha1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grant and "Revoke" are core operations the drivers must support in order to allow BucketAccesses to be provisioned and deleted.

"Deny" implies access control to me. Access control wasn't part of the v1alpha1 spec, but we do have access control plans for the v1alpha2 spec in the near future: kubernetes-retired/container-object-storage-interface-api#107

Update the COSI KEP's base design to v1alpha2. The primary purpose of
this work is to reshape the doc and spec to fix inconsistencies and
vestigial items, and fix the most glaring issues in the v1alpha1 spec.

COSI maintainers are planning follow-up items to add larger individual
features after this.

Notably, bucketInfo.json has been changed to individual secret fields
with COSI_<KEY>: <VALUE> format, as the JSON blob was flagged as a
problem by several v1alpha1 users.

Signed-off-by: Blaine Gardner <[email protected]>
@BlaineEXE BlaineEXE force-pushed the cosi-v1alpha2-changes branch from d0f96f0 to 44d1f4b Compare March 31, 2025 21:00
Copy link

@moonlight16 moonlight16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm finished reviewing.

Copy link
Member

@shanduur shanduur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 1, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: BlaineEXE, moonlight16, shanduur
Once this PR has been reviewed and has the lgtm label, please assign saad-ali for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants