Skip to content

KEP: clusteradm operator #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TylerGillson
Copy link

Proposal for wrapping clusteradm with a Kubernetes controller that reconciles a MultiCluster custom resource, providing declarative lifecycle management for OCM multi-clusters.

Note: the controller already exists and I am proposing that Spectro Cloud donates it to the OCM ecosystem. I spoke to @mikeshng about this briefly in Slack. My team and I wrote and are currently maintaining this controller, as we're using it within a product that we're developing at Spectro Cloud. I would be happy to continue maintaining it if/when it is adopted by OCM.

@openshift-ci openshift-ci bot requested review from deads2k and qiujian16 June 2, 2025 20:21
Copy link

openshift-ci bot commented Jun 2, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: TylerGillson
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


## Motivation

OCM's current installation procedure is imperative-only, requiring multiple manual operations, either via `clusteradm` or direct application of OCM custom resources to various Kubernetes clusters. The lack of a declarative interface for managing the lifecycle of an OCM multi-cluster has several limitations:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clusteradm is the client tool so it meant to be imperative. But you can use operator and API ClusterManager/Klusterlet without clusteradm at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think the proposed API and implementation outlined in this enhancement PR can still land in the clusteradm repo.


### MultiCluster API

Current state of the v1alpha1 `MultiCluster` API, which is fully functional and meets all the capabilities outlined elsewhere in this enhancement proposal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// MultiClusterSpec defines the desired state of MultiCluster.
type MultiClusterSpec struct {
Hub Hub `json:"hub"`
Spokes []Spoke `json:"spokes"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if a user wants to add or delete a managedcluster? Can they directly create/delete a managedcluster resource?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed clusters can be added/removed by modifying the MultiCluster. The controller will execute the appropriate clusteradm commands to achieve the desired result, e.g. clusteradm join, clusteradm unjoin (against the spoke), clusteradm accept (against the hub), etc.


### Graduation Criteria

TBD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also iterate on some alternatives. For example:

User can set a global or per cluster configs, so when the user create a managedcluster, the agent will be deployed to the target cluster automatically.

Copy link
Author

@TylerGillson TylerGillson Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary objective of the "clusteradm operator" is to bootstrap all required OCM components on both the hub and spokes. What you've mentioned sounds to me like it relies on OCM operators already being installed and running.

Copy link
Member

@mikeshng mikeshng Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiujian16 is referring to the ACM pattern for importing an existing cluster. There's an import controller that watches for the creation of a ManagedCluster, along with its namespace and a way to connect to the spoke cluster (like a kubeconfig Secret). Once those are in place, it can connect to the managed cluster, deploy the klusterlet operator, and create the Klusterlet CR to initiate the registration process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you will still need to start "clusteradm operator" on the hub/spoke, and still need to handle the version and lifecycle of this operator. It is not clear to me why user cannot just install ocm operators directly. It would need some clarification.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything you said is true. The value add that the proposed operator provides is around consolidation and minimizing manual steps. Users would have a single cluster with network connectivity to all other clusters that acts similarly to a CAPI management cluster. Say you have 20 clusters that you want to link together. With this approach you would only need to install the multi-cluster operator on one cluster and apply a single CR.

Copy link
Member

@qiujian16 qiujian16 Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the goal is very similar as what we did on capi-importer work https://github.com/open-cluster-management-io/ocm/tree/main/pkg/registration/hub/importer. The difference is the join happens when a managedcluster is created on the hub. Would you also take a look at to see if the existing approach is good enough and can be extended to support your goal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing that @qiujian16. A generic import provider that enqueues import requests when kubeconfig secrets with a certain annotation are created could be a valid alternative.

The biggest concern I have with that is that the registration controller image will become very large once support for arbitrary managed cluster kubeconfigs is added (due to aws-iam-authenticator, gcloud, python, etc.). The cluster API kubeconfigs all use a token, but requiring a long-lived token for all spoke cluster kubeconfigs is a major limitation. Many orgs will be unwilling to take that approach for security reasons.

By splitting the proposed functionality into a separate, optional operator, we can avoid blowing up the size of the registration controller. Also, by using clusteradm within the proposed operator, we avoid code duplication. I can see that the importer is already doing some of the same things that occur during clusteradm join.


// MultiClusterSpec defines the desired state of MultiCluster.
type MultiClusterSpec struct {
Hub Hub `json:"hub"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? I think user can just apply ClusterManager resource.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That assumes that the OCM hub operators are already installed to reconcile the ClusterManager. How would the OCM hub components be initialized in the scenario you're describing?

By running clusteradm init, the OCM hub components and installed and the ClusterManager is created.

### Goals

- Declarative lifecycle management of an OCM multi-cluster
- Simplify and streamline the OCM onboarding and maintenance experience
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users will need to install this clusteradm operator and API as the first step. Would you mind adding a bit more detail about how that installation would work in the KEP? For example, will it be via a Helm chart, a new clusteradm subcommand, or something else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator we've developed is currently installed via Helm. We could certainly extend clusteradm to support installing that Helm chart as well. Maybe, clusteradm fleet-operator-init? Any command name proposals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helm chart sounds good. We probably don't need the clusteradm sub-command for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants