Description
Running the following benchmark on my laptop:
const openapipath = "api/openapi-spec/swagger.json" // https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/swagger.json
func BenchmarkJsonUnmarshalSwagger(b *testing.B) {
content, err := os.ReadFile(openapipath)
if err != nil {
b.Fatalf("Failed to open file: %v", err)
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
t := spec.Swagger{}
err := json.Unmarshal(content, &t)
if err != nil {
b.Fatalf("Failed to unmarshal: %v", err)
}
}
}
func BenchmarkJsonUnmarshalInterface(b *testing.B) {
content, err := os.ReadFile(openapipath)
if err != nil {
b.Fatalf("Failed to open file: %v", err)
}
b.ResetTimer()
for i := 0; i < b.N; i++ {
t := map[string]interface{}{}
err := json.Unmarshal(content, &t)
if err != nil {
b.Fatalf("Failed to unmarshal: %v", err)
}
}
}
Yields the following results:
apelisse ~/code/kubernetes/bench_api_parsing> go test -bench=. .
goos: darwin
goarch: amd64
pkg: k8s.io/kubernetes/bench_api_parsing
cpu: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
BenchmarkJsonUnmarshalSwagger-8 3 529833040 ns/op
BenchmarkJsonUnmarshalInterface-8 37 29475772 ns/op
PASS
ok k8s.io/kubernetes/bench_api_parsing 5.013s
For readability purposes, that's 529ms and 29ms respectively.
For context, this is about spec.Swagger
, the OpenAPI v2 definition which is mostly a clone of go-openapi
After a short investigation, the problem seems fairly obvious: the arbitrary vendor extensions (as defined by OpenAPI) forces the json to be deserialized multiple times, at many different levels within the object, causing the deserialization into spec.Swagger
to reach O(n²)
complexity (my maths is probably dubious).
Vendor extensions can appear at many different layers in the OpenAPI object, e.g. in:
spec.Swagger
spec.Header
spec.Paths
spec.Operations
- And many others ...
The problem, or lack of good solutions, comes from the rigid API (UnmarshalJSON(data []byte) error
) that forces the custom unmarshaler to receive a byte slice rather than an already decoded, or temporary format. Deserializing methods that do use more flexible APIs, like them YAML v3 parser (UnmarshalYAML(value *yaml.Node) error
), do not suffer of the same problem, as highlighted through #279 from @alexzielenski.
This bug, which was improperly understood until now, has had various consequences on the entire Kubernetes ecosystem for the last 5 years:
- Because deserializing into
spec.Swagger
was unacceptably slow for frequently invoked command-line tools, kubectl decided to usegnostic
/protobuf
even though the gnostic type is grossly unusable. - Add direct conversion from Gnostic v2 types to spec.Swagger #283 was written to transform
gnostic
intospec.Swagger
efficiently, but theSwagger
tognostic
would also be needed, as well as a OpenAPI v3 version. - So many issues have been written about poor Kubernetes apiserver performance related to parsing/serializing OpenAPI into/from json within the server, with various work-arounds like lazy-marshaling. e.g. Lazy marshaling for OpenAPI v2 spec #251
Many of this was noticed by customers, users and Kubernetes providers, as the evidence can show:
- https://jonnylangefeld.com/blog/the-kubernetes-discovery-cache-blessing-and-curse
- https://blog.upbound.io/scaling-kubernetes-to-thousands-of-crds/
- https://www.youtube.com/watch?v=jYiLN0vmncw
- Found by @alexzielenski in the CRD GA Kep:
Note: The Custom Resource Definition suggested maximum limit was selected not due to the above SLI/SLOs, but instead due to the latency OpenAPI publishing, which is a background process that occurs asychroniously each time a Custom Resource Definition schema is updated. For 500 Custom Resource Definitions it takes slightly over 35 seconds for a definition change to be visible via the OpenAPI spec endpoint.
For now, the solution discussed with @liggitt is to create a new UnmarshalUnstructured(interface{}) error
interface that could replace the slow UnmarshalJSON
interface, maybe like the following:
// UnmarshalJSON unmarshals a swagger spec from json
func (s *Swagger) UnmarshalJSON(data []byte) error {
var sw Swagger
var i interface{}{}
if err := json.Unmarshal(data, &i); err != nil {
return err
}
if err := FromUnstructured(i, &sw.SwaggerProps); err != nil {
return err
}
if err := FromUnstructured(i, &sw.VendorExtensible); err != nil {
return err
}
*s = sw
return nil
}
And FromUnstructured
would automatically call the UnmarshalUnstructured
methods when available. One drawback is that it forces it to deserialize into a map[string]interface{}
first and then copy, which is possibly slower than deserializing into the object directly.
A remark for the end, the exact same problem also applies to serialization/marshaling, though it is less critical.