Skip to content

Deserializing into spec.Swagger is almost 20x slower than deserializing into map[string]interface{} #315

Closed
kubernetes/kubernetes
#112988
@apelisse

Description

@apelisse

Running the following benchmark on my laptop:

const openapipath = "api/openapi-spec/swagger.json" // https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/swagger.json

func BenchmarkJsonUnmarshalSwagger(b *testing.B) {
	content, err := os.ReadFile(openapipath)
	if err != nil {
		b.Fatalf("Failed to open file: %v", err)
	}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		t := spec.Swagger{}
		err := json.Unmarshal(content, &t)
		if err != nil {
			b.Fatalf("Failed to unmarshal: %v", err)
		}
	}
}

func BenchmarkJsonUnmarshalInterface(b *testing.B) {
	content, err := os.ReadFile(openapipath)
	if err != nil {
		b.Fatalf("Failed to open file: %v", err)
	}

	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		t := map[string]interface{}{}
		err := json.Unmarshal(content, &t)
		if err != nil {
			b.Fatalf("Failed to unmarshal: %v", err)
		}
	}
}

Yields the following results:

apelisse ~/code/kubernetes/bench_api_parsing> go test -bench=. .
goos: darwin
goarch: amd64
pkg: k8s.io/kubernetes/bench_api_parsing
cpu: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
BenchmarkJsonUnmarshalSwagger-8     	       3	 529833040 ns/op
BenchmarkJsonUnmarshalInterface-8   	      37	  29475772 ns/op
PASS
ok  	k8s.io/kubernetes/bench_api_parsing	5.013s

For readability purposes, that's 529ms and 29ms respectively.

For context, this is about spec.Swagger, the OpenAPI v2 definition which is mostly a clone of go-openapi

After a short investigation, the problem seems fairly obvious: the arbitrary vendor extensions (as defined by OpenAPI) forces the json to be deserialized multiple times, at many different levels within the object, causing the deserialization into spec.Swagger to reach O(n²) complexity (my maths is probably dubious).

Vendor extensions can appear at many different layers in the OpenAPI object, e.g. in:

The problem, or lack of good solutions, comes from the rigid API (UnmarshalJSON(data []byte) error) that forces the custom unmarshaler to receive a byte slice rather than an already decoded, or temporary format. Deserializing methods that do use more flexible APIs, like them YAML v3 parser (UnmarshalYAML(value *yaml.Node) error), do not suffer of the same problem, as highlighted through #279 from @alexzielenski.

This bug, which was improperly understood until now, has had various consequences on the entire Kubernetes ecosystem for the last 5 years:

  1. Because deserializing into spec.Swagger was unacceptably slow for frequently invoked command-line tools, kubectl decided to use gnostic/protobuf even though the gnostic type is grossly unusable.
  2. Add direct conversion from Gnostic v2 types to spec.Swagger #283 was written to transform gnostic into spec.Swagger efficiently, but the Swagger to gnostic would also be needed, as well as a OpenAPI v3 version.
  3. So many issues have been written about poor Kubernetes apiserver performance related to parsing/serializing OpenAPI into/from json within the server, with various work-arounds like lazy-marshaling. e.g. Lazy marshaling for OpenAPI v2 spec #251

Many of this was noticed by customers, users and Kubernetes providers, as the evidence can show:

Note: The Custom Resource Definition suggested maximum limit was selected not due to the above SLI/SLOs, but instead due to the latency OpenAPI publishing, which is a background process that occurs asychroniously each time a Custom Resource Definition schema is updated. For 500 Custom Resource Definitions it takes slightly over 35 seconds for a definition change to be visible via the OpenAPI spec endpoint.

For now, the solution discussed with @liggitt is to create a new UnmarshalUnstructured(interface{}) error interface that could replace the slow UnmarshalJSON interface, maybe like the following:

// UnmarshalJSON unmarshals a swagger spec from json
func (s *Swagger) UnmarshalJSON(data []byte) error {
	var sw Swagger
	var i interface{}{}
	if err := json.Unmarshal(data, &i); err != nil {
		return err
	}
	if err := FromUnstructured(i, &sw.SwaggerProps); err != nil {
		return err
	}
	if err := FromUnstructured(i, &sw.VendorExtensible); err != nil {
		return err
	}
	*s = sw
	return nil
}

And FromUnstructured would automatically call the UnmarshalUnstructured methods when available. One drawback is that it forces it to deserialize into a map[string]interface{} first and then copy, which is possibly slower than deserializing into the object directly.

A remark for the end, the exact same problem also applies to serialization/marshaling, though it is less critical.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions