Sporadic Keycloak issues

For a while, we have been receiving sporadic reports about Keycloak not working properly, both via Alertmanager and various other communication channels.

Investigation today revealed that this is likely related to the `vault-agent` sidecar container that runs in every Keycloak pod. This container regularly crashes with the following error:

```sh
2025-03-24T19:23:58.026Z [ERROR] agent: runtime error encountered:
  error=
  | template server: vault.write(internal-tls/issue/internal-tls -> fb6ab102): vault.write(internal-tls/issue/internal-tls -> fb6ab102): Error making API request.
  |
  | URL: PUT http://vault.vault.svc:8200/v1/internal-tls/issue/internal-tls
  | Code: 400. Errors:
  |
  | * cannot satisfy request, as TTL would result in notAfter of 2025-07-22T19:23:58.023842036Z that is beyond the expiration of the CA certificate at 2025-06-26T23:39:49Z
   exitCode=1
Error encountered during run, refer to logs for more details.
```

Presumably, the Vault CA certificate is the problem here, which might have been configured with an expiration of 1 year when Vault was installed.

Since the Keycloak pod was created 43 days ago, the pod has been restarted 3892 times.

Keycloak itself has no logs indicating big problems during the same timeframe.


## Action items

- [ ] Fix the current issue
- [ ] Document how to fix this issue in the future in a runbook
- [ ] Expand the documentation in `kubernetes/namespaces/vault/README.md` as applicable


## Out of scope for now

- Configure metrics endpoint for Vault to monitor for CA certificate lifetime (will open separate issue)
- Police DevOps members to take alerts seriously (electric shock therapy conflicts with Chris' pacemaker)
- Remove Keycloak
- Remove Vault


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Sporadic Keycloak issues #573

Action items

Out of scope for now

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Sporadic Keycloak issues #573

Description

Action items

Out of scope for now

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions