Skip to content

Secrets Management Strategy for Hybrid Kubernetes & Platform Workloads

Status

Superseded by ADR-0020. This ADR remains as historical context for the original governance pattern and the move to external secret stores plus pull-based synchronisation.


1. Context

HybridOps.Studio spans:

  • On-prem RKE2 clusters running on enterprise hypervisors (ADR-0202, ADR-0204).
  • Supporting platform services (for example Jenkins, NetBox, PostgreSQL) and future cloud clusters.

Early iterations used:

  • Raw Kubernetes Secret manifests with inline values in non-production.
  • Ad-hoc .env files during bootstrap phases.

This does not scale or audit well:

  • Secrets can leak into Git history or local artefacts.
  • Rotation requires manual edits across multiple places.
  • It is hard to prove where secret values live and how they are controlled.

The platform requires a governed approach that:

  • Keeps secret values out of Git.
  • Uses external, audit-capable secret stores as the steady-state source of truth.
  • Works across on-prem and cloud deployments.
  • Integrates with GitOps and CI/CD workflows.

2. Decision

HybridOps.Studio adopts the following secrets management strategy:

  • External secret stores are authoritative for steady-state secret values
  • Azure Key Vault is the primary store for application and platform secret values (ADR-0020, ADR-0502).
  • Kubernetes Secret objects are runtime projections and are not primary storage.

  • Kubernetes receives secrets via pull-based synchronisation

  • External Secrets Operator (ESO) syncs secret values from Azure Key Vault into Kubernetes Secret objects (ADR-0502).
  • Git contains ExternalSecret resources and wiring, not secret values.

  • Git holds references, not values

  • Plaintext .env files are restricted to local bootstrap connectivity and are not committed.
  • An encrypted vault bundle (control/secrets.vault.env) is permitted to enable non-interactive bootstrap, CI, and DR automation. It must not be used as a steady-state application secret store.

  • Technology-specific implementation lives in category ADRs

  • ADR-0502 defines the primary RKE2 implementation (AKV + ESO).
  • ADR-0020 defines the overall hierarchy (AKV steady state, encrypted vault bundle for bootstrap/CI/DR automation, SOPS optional policy artefact).

This ADR defines the governance pattern. Implementation specifics are handled in ADR-0502 and ADR-0020.


3. Rationale

  • Security and auditability
  • External secret stores provide RBAC and audit logs.
  • Rotation is performed at the source of truth rather than across multiple manifests.

  • Separation of concerns

  • Git holds desired wiring; secret stores hold values.
  • Clusters and pipelines consume secrets but do not own the source of truth.

  • Hybrid readiness

  • The approach applies consistently across on-prem and cloud clusters.

  • Evidence and traceability

  • Clear artefact trail for secret creation, sync, and consumption without exposing secret values.

4. Consequences

4.1 Positive consequences

  • Reduced risk of secret leakage into Git repositories.
  • Consistent pattern across environments: external store + operator.
  • Clear ADR layering:
  • Governance pattern (this ADR, historical).
  • Concrete implementation (ADR-0502).
  • End-to-end secrets hierarchy and runner-driven automation model (ADR-0020).

4.2 Negative consequences / risks

  • Operational dependency on external secret stores and the operator.
  • Bootstrap requires an initial trust anchor (service principal, token, or runner identity).
  • Additional documentation required for onboarding, rotation, and validation flows.

Mitigations:

  • Runbooks for operator bootstrap, identity setup, and end-to-end validation.
  • CI workflows that validate secret sync health without printing secret values.

5. Alternatives considered

  • Plain Kubernetes Secrets with Git-stored values: rejected due to leakage risk and poor audit trail.
  • Per-cluster bespoke approaches: rejected due to operational complexity and inconsistent teaching/evidence story.
  • Single universal vault product as baseline: deferred; AKV-based patterns are lower operational weight and aligned with target deployments.

6. Implementation notes

  • Platform ADRs
  • ADR-0502 defines the primary implementation: Azure Key Vault + External Secrets Operator for RKE2 workloads.
  • ADR-0020 defines bootstrap/CI/DR automation via encrypted vault bundle and optional policy-driven SOPS artefacts.

  • Code and configuration

  • deploy/*/secrets/ holds ExternalSecret resources (or equivalent) and secret wiring.
  • ESO is installed as part of the RKE2 platform bootstrap.

  • Bootstrap and automation inputs

  • control/secrets.vault.env may be used as a short-scope encrypted bundle for non-interactive automation.
  • Decryption occurs at runtime only and is scoped to single command execution.

  • Validation and evidence

  • Evidence 4 captures KMS entries, ESO sync behaviour, and workloads consuming secrets without exposing values.

7. Operational impact and validation

Operational impact:

  • Secrets are managed primarily in the external secret store.
  • .env files and ad-hoc secrets remain temporary bootstrap artefacts only.
  • Operator health is included in monitoring and validation workflows.

Validation:

  • Runbooks under ../ops/runbooks/security/ describe onboarding and rotation flows.
  • Proof artefacts referenced here and in ADR-0502 capture end-to-end secret sync and consumption.

8. References


Maintainer: HybridOps.Studio
License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.