Skip to content

Sporadic Keycloak issues #573

Open
Open
@jchristgit

Description

@jchristgit

For a while, we have been receiving sporadic reports about Keycloak not working properly, both via Alertmanager and various other communication channels.

Investigation today revealed that this is likely related to the vault-agent sidecar container that runs in every Keycloak pod. This container regularly crashes with the following error:

2025-03-24T19:23:58.026Z [ERROR] agent: runtime error encountered:
  error=
  | template server: vault.write(internal-tls/issue/internal-tls -> fb6ab102): vault.write(internal-tls/issue/internal-tls -> fb6ab102): Error making API request.
  |
  | URL: PUT http://vault.vault.svc:8200/v1/internal-tls/issue/internal-tls
  | Code: 400. Errors:
  |
  | * cannot satisfy request, as TTL would result in notAfter of 2025-07-22T19:23:58.023842036Z that is beyond the expiration of the CA certificate at 2025-06-26T23:39:49Z
   exitCode=1
Error encountered during run, refer to logs for more details.

Presumably, the Vault CA certificate is the problem here, which might have been configured with an expiration of 1 year when Vault was installed.

Since the Keycloak pod was created 43 days ago, the pod has been restarted 3892 times.

Keycloak itself has no logs indicating big problems during the same timeframe.

Action items

  • Fix the current issue
  • Document how to fix this issue in the future in a runbook
  • Expand the documentation in kubernetes/namespaces/vault/README.md as applicable

Out of scope for now

  • Configure metrics endpoint for Vault to monitor for CA certificate lifetime (will open separate issue)
  • Police DevOps members to take alerts seriously (electric shock therapy conflicts with Chris' pacemaker)
  • Remove Keycloak
  • Remove Vault

Metadata

Metadata

Assignees

Labels

component: networkingAn issue relating to a host networking (e.g. DNS, WireGuard, SSH)component: servicesAn issue relating to a Python Discord service (e.g. Bot, Site, Lancebot)group: docsIssues and pull requests related to our documentationgroup: kubernetesIssues and pull requests related to the Kubernetes setup

Type

No type

Projects

Status

Up next

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions