-
Notifications
You must be signed in to change notification settings - Fork 1.5k
KEP-5241: Beta Feature Gate Promotion Requirements #5242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,246 @@ | ||||||
<!-- | ||||||
**Note:** When your KEP is complete, all of these comment blocks should be removed. | ||||||
|
||||||
To get started with this template: | ||||||
|
||||||
- [ ] **Pick a hosting SIG.** | ||||||
Make sure that the problem space is something the SIG is interested in taking | ||||||
up. KEPs should not be checked in without a sponsoring SIG. | ||||||
- [ ] **Create an issue in kubernetes/enhancements** | ||||||
When filing an enhancement tracking issue, please make sure to complete all | ||||||
fields in that template. One of the fields asks for a link to the KEP. You | ||||||
can leave that blank until this KEP is filed, and then go back to the | ||||||
enhancement and add the link. | ||||||
- [ ] **Make a copy of this template directory.** | ||||||
Copy this template into the owning SIG's directory and name it | ||||||
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no | ||||||
leading-zero padding) assigned to your enhancement above. | ||||||
- [ ] **Fill out as much of the kep.yaml file as you can.** | ||||||
At minimum, you should fill in the "Title", "Authors", "Owning-sig", | ||||||
"Status", and date-related fields. | ||||||
- [ ] **Fill out this file as best you can.** | ||||||
At minimum, you should fill in the "Summary" and "Motivation" sections. | ||||||
These should be easy if you've preflighted the idea of the KEP with the | ||||||
appropriate SIG(s). | ||||||
- [ ] **Create a PR for this KEP.** | ||||||
Assign it to people in the SIG who are sponsoring this process. | ||||||
- [ ] **Merge early and iterate.** | ||||||
Avoid getting hung up on specific details and instead aim to get the goals of | ||||||
the KEP clarified and merged quickly. The best way to do this is to just | ||||||
start with the high-level sections and fill out details incrementally in | ||||||
subsequent PRs. | ||||||
|
||||||
Just because a KEP is merged does not mean it is complete or approved. Any KEP | ||||||
marked as `provisional` is a working document and subject to change. You can | ||||||
denote sections that are under active debate as follows: | ||||||
|
||||||
``` | ||||||
<<[UNRESOLVED optional short context or usernames ]>> | ||||||
Stuff that is being argued. | ||||||
<<[/UNRESOLVED]>> | ||||||
``` | ||||||
|
||||||
When editing KEPS, aim for tightly-scoped, single-topic PRs to keep discussions | ||||||
focused. If you disagree with what is already in a document, open a new PR | ||||||
with suggested changes. | ||||||
|
||||||
One KEP corresponds to one "feature" or "enhancement" for its whole lifecycle. | ||||||
You do not need a new KEP to move from beta to GA, for example. If | ||||||
new details emerge that belong in the KEP, edit the KEP. Once a feature has become | ||||||
"implemented", major changes should get new KEPs. | ||||||
|
||||||
The canonical place for the latest set of instructions (and the likely source | ||||||
of this file) is [here](/keps/NNNN-kep-template/README.md). | ||||||
|
||||||
**Note:** Any PRs to move a KEP to `implementable`, or significant changes once | ||||||
it is marked `implementable`, must be approved by each of the KEP approvers. | ||||||
If none of those approvers are still appropriate, then changes to that list | ||||||
should be approved by the remaining approvers and/or the owning SIG (or | ||||||
SIG Architecture for cross-cutting KEPs). | ||||||
--> | ||||||
# KEP-5241: Beta Feature Gate Promotion Requirements | ||||||
|
||||||
<!-- | ||||||
This is the title of your KEP. Keep it short, simple, and descriptive. A good | ||||||
title can help communicate what the KEP is and should be considered as part of | ||||||
any review. | ||||||
--> | ||||||
|
||||||
<!-- | ||||||
A table of contents is helpful for quickly jumping to sections of a KEP and for | ||||||
highlighting any additional information provided beyond the standard KEP | ||||||
template. | ||||||
|
||||||
Ensure the TOC is wrapped with | ||||||
<code><!-- toc --&rt;<!-- /toc --&rt;</code> | ||||||
tags, and then generate with `hack/update-toc.sh`. | ||||||
--> | ||||||
|
||||||
<!-- toc --> | ||||||
- [Release Signoff Checklist](#release-signoff-checklist) | ||||||
- [Summary](#summary) | ||||||
- [Motivation](#motivation) | ||||||
- [Goals](#goals) | ||||||
- [Non-Goals](#non-goals) | ||||||
- [Proposal](#proposal) | ||||||
- [User Stories (Optional)](#user-stories-optional) | ||||||
- [Story 1](#story-1) | ||||||
- [Story 2](#story-2) | ||||||
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) | ||||||
- [Risks and Mitigations](#risks-and-mitigations) | ||||||
- [Design Details](#design-details) | ||||||
- [Test Plan](#test-plan) | ||||||
- [Prerequisite testing updates](#prerequisite-testing-updates) | ||||||
- [Unit tests](#unit-tests) | ||||||
- [Integration tests](#integration-tests) | ||||||
- [e2e tests](#e2e-tests) | ||||||
- [Graduation Criteria](#graduation-criteria) | ||||||
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||||||
- [Version Skew Strategy](#version-skew-strategy) | ||||||
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) | ||||||
- [Feature Enablement and Rollback](#feature-enablement-and-rollback) | ||||||
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) | ||||||
- [Monitoring Requirements](#monitoring-requirements) | ||||||
- [Dependencies](#dependencies) | ||||||
- [Scalability](#scalability) | ||||||
- [Troubleshooting](#troubleshooting) | ||||||
- [Implementation History](#implementation-history) | ||||||
- [Drawbacks](#drawbacks) | ||||||
- [Alternatives](#alternatives) | ||||||
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) | ||||||
<!-- /toc --> | ||||||
|
||||||
## Release Signoff Checklist | ||||||
|
||||||
<!-- | ||||||
**ACTION REQUIRED:** In order to merge code into a release, there must be an | ||||||
issue in [kubernetes/enhancements] referencing this KEP and targeting a release | ||||||
milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases) | ||||||
of the targeted release**. | ||||||
|
||||||
For enhancements that make changes to code or processes/procedures in core | ||||||
Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release | ||||||
Signoff checklist to be completed. | ||||||
|
||||||
Check these off as they are completed for the Release Team to track. These | ||||||
checklist items _must_ be updated for the enhancement to be released. | ||||||
--> | ||||||
|
||||||
Items marked with (R) are required *prior to targeting to a milestone / release*. | ||||||
|
||||||
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) | ||||||
- [ ] (R) KEP approvers have approved the KEP status as `implementable` | ||||||
- [ ] (R) Design details are appropriately documented | ||||||
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) | ||||||
- [ ] e2e Tests for all Beta API Operations (endpoints) | ||||||
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||||||
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free | ||||||
- [ ] (R) Graduation criteria is in place | ||||||
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||||||
- [ ] (R) Production readiness review completed | ||||||
- [ ] (R) Production readiness review approved | ||||||
- [ ] "Implementation History" section is up-to-date for milestone | ||||||
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] | ||||||
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes | ||||||
|
||||||
<!-- | ||||||
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone. | ||||||
--> | ||||||
|
||||||
[kubernetes.io]: https://kubernetes.io/ | ||||||
[kubernetes/enhancements]: https://git.k8s.io/enhancements | ||||||
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes | ||||||
[kubernetes/website]: https://git.k8s.io/website | ||||||
|
||||||
## Summary | ||||||
|
||||||
Features gates must include all functional, security, and testing requirements along with | ||||||
resolving all issues and gaps identified prior to being enabled by default. | ||||||
The only valid GA criteria are “all issues and gaps identified as feedback during beta are resolved”. | ||||||
|
||||||
## Motivation | ||||||
|
||||||
Features gates that are enabled by default are enabled in every production Kubernetes cluster in the world. | ||||||
We must avoid making every production cluster into unstable or incomplete feature testing clusters. | ||||||
Even feature gates that make flags accessible, but require a secondary configuration to use must be | ||||||
stable, because it is unrealistic to expect everyone to understand the graduation stages of various flags | ||||||
for each release: the only stages that really matter are "takes enabling an explicit alpha feature gate" | ||||||
and "my production cluster accepts this as valid by default". | ||||||
|
||||||
### Goals | ||||||
|
||||||
* Features gates must include all functional, security, and testing requirements along with | ||||||
resolving all issues and gaps identified prior to being enabled by default. | ||||||
* The only valid GA criteria are “all issues and gaps identified as feedback during beta are resolved”. | ||||||
|
||||||
### Non-Goals | ||||||
|
||||||
* Changing beta APIs off by default rules. | ||||||
* Change the imperfect mechanisms we have for API evolution. | ||||||
|
||||||
## Proposal | ||||||
|
||||||
Kubernetes feature gates have three levels: GA, Beta, and Alpha. | ||||||
1. GA means that a feature gate is unconditionally enabled in all production kubernetes clusters and | ||||||
that feature cannot be disabled. | ||||||
2. Beta means that a feature gate is usually enabled in all production Kubernetes clusters by default | ||||||
and that feature can be disabled. | ||||||
Exceptions exist for entirely new APIs and some node features, but this broadly the case. | ||||||
Comment on lines
+186
to
+188
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this tends to confuse readers and new developers not familiar with the the duality of API version v1alpha1, v1beta1, v1 / feature gate alpha,beta, GA and the coupling of both lifecycles. Since the restriction about beta apis disabled by default, that makes feature gates that are beta and depende on api beta versions disabled by default. |
||||||
3. Alpha means that a feature gate is disabled in all production Kubernetes clusters by default and | ||||||
can be optionally enabled by setting a `--feature-gate` command line argument. | ||||||
|
||||||
Making the jump to GA (cannot be disabled), without actual field experience is irresponsible. | ||||||
The first time we take a feature gate enabled by default in production Kubernetes clusters, we must | ||||||
have a way to disable the feature in case of unexpected stability, performance, or security issues. | ||||||
|
||||||
Enabling incomplete features in production Kubernetes clusters by default is irresponsible. | ||||||
Features that are known to be incomplete naturally bring with them additional stability, performance, and security issues. | ||||||
Once a feature has been enabled in a production Kubernetes cluster by default, adding to it carries | ||||||
greater risk to upgrading clusters and the ecosystem. | ||||||
The feature can easily have become relied upon by workloads and other platform extensions. | ||||||
If an accident happens in adding those capabilities with stability, performance, and security the | ||||||
cost to disable those features in a cluster becomes significantly greater and breaks existing | ||||||
clusters, workloads and use-cases. | ||||||
This posture makes upgrades higher risk than necessary. | ||||||
|
||||||
To balance these concerns, we are changing how we evaluate Beta and GA stability criteria. | ||||||
The only valid GA criteria are “all issues and gaps identified as feedback during beta are resolved”. | ||||||
Promotion from Beta to GA must be zero-diff for the release. | ||||||
This means that Beta criteria must include all functional, security, and testing requirements along | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should also add monitoring and I know that @lmktfy has mentioned satisfactory documentation in beta before we promote to GA. |
||||||
with resolving all issues and gaps identified prior to beta. | ||||||
|
||||||
Phasing in larger features over time can be done by bringing separate feature gates through alpha, beta, and GA. | ||||||
Each feature gate needs to meet the beta and GA criteria for completeness, functional, security, and testing. | ||||||
After meeting hte criteria for enabled by default, and at the sig's discretion, the new feature gate could be | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
set to enabled by default in the release it is introduced. | ||||||
Importantly, the features need to behave in a way that allows old and new clients to interoperate and new additions | ||||||
to larger features able to be independently disablable with their own path for GA. | ||||||
|
||||||
### Risks and Mitigations | ||||||
|
||||||
#### What if I need to add capability to my feature? | ||||||
To handle this situation, we described above how to add second feature gate for the new behavior. | ||||||
This provides a mechanism for adding needed capability, but ensures that | ||||||
cluster-admins never end up stuck after upgrade because they rely on v1.Y-1 behavior that new capability | ||||||
in v1.Y broke under the same feature gate. | ||||||
|
||||||
#### Who will make sure that new KEPs follow the promotion rules? | ||||||
We'll adjust the KEP template to indicate the allowed criteria, so authors should notice. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only for new features. Authors of some existing KEP only notice if they actively sync with the KEP template (rarely done) or we announce this KEP here widely. |
||||||
SIG approvers should enforce those standards. | ||||||
PRR approvers can be a final backstop. | ||||||
|
||||||
### Graduation Criteria | ||||||
|
||||||
This document is our new position once merged until it is superceded by another position statement. | ||||||
|
||||||
## Drawbacks | ||||||
|
||||||
### This may slow the rate that new features are promoted. | ||||||
For this to be true, that would mean that we previously enabled feature gates in production that were knowingly | ||||||
incomplete for functional, security, testing, or known bugs. | ||||||
We hope this was not the common case, but if it was the common enough to have an impact, we're pleased that | ||||||
the result is preventing incomplete feature gates from being enabled in production clusters. | ||||||
|
||||||
## Alternatives | ||||||
|
||||||
None proposed so far. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
title: Beta Feature Gate Promotion Requirements | ||
kep-number: 5241 | ||
authors: | ||
- "@deads2k" | ||
owning-sig: sig-architecture | ||
participating-sigs: | ||
status: implemented | ||
creation-date: 2025-04-02 | ||
reviewers: | ||
- "@liggitt" | ||
- "@thockin" | ||
approvers: | ||
- "@johnbelamaric" | ||
- "@dims" | ||
- "@derekwaynecarr" | ||
|
||
see-also: | ||
- "/keps/sig-architecture/3136-beta-apis-off-by-default" | ||
replaces: | ||
|
||
stage: stable | ||
|
||
latest-milestone: "v1.34" | ||
|
||
milestone: | ||
stable: "v1.34" | ||
|
||
feature-gates: | ||
disable-supported: false | ||
|
||
metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have two levels here GA, and GA locked by default, without locked by default it can be disabled, right?