Secrets Strategy — Azure Key Vault primary; encrypted vault bundle for bootstrap/CI/DR; Vault optional later¶
Status: Accepted — Centralises steady-state secret values in Azure Key Vault and uses a tightly scoped encrypted vault bundle for bootstrap, CI, and DR automation without introducing a second live secrets system.
Related guide: Secrets lifecycle and responsibilities
Related implementation ADRs:
- ADR-0502 – Use External Secrets Operator with Azure Key Vault for Application Secrets
- ADR-0501 – PostgreSQL on Dedicated VM with DR Replication
1. Context¶
HybridOps.Studio needs a consistent approach to managing secrets across:
- Ctrl-01 (the on-premises control node).
- Proxmox-based infrastructure (Packer / Terraform / Ansible).
- Cloud providers (Azure, GCP).
- The RKE2-based Kubernetes platform and workloads running on it.
The portfolio must:
- Demonstrate zero-touch automation (no manual UI configuration).
- Avoid hardcoded credentials in playbooks, pipelines, or templates.
- Support constrained single-operator environments while staying aligned with enterprise patterns.
- Provide a credible story for disaster recovery without introducing unnecessary complexity.
Earlier sketches considered multiple overlapping secret mechanisms:
- Per-provider bootstrap env files on Ctrl-01.
- Azure Key Vault (AKV).
- Potential future adoption of HashiCorp Vault.
This ADR narrows that down to a clear hierarchy and single source of truth for steady-state application and platform secret values. Day-to-day operational detail is covered in Secrets lifecycle and responsibilities.
2. Decision¶
- Azure Key Vault (AKV) is the single source of truth for steady-state secret values for:
- Jenkins, NetBox, and application secrets.
-
RKE2 and in-cluster workload secrets (via operators), as implemented in ADR-0502.
-
Bootstrap connectivity artefacts (per-provider credentials) are used only for infrastructure access, not as an application secret store.
In HyOps.Core these are emitted under the runtime root:
<root>/credentials/
azure.credentials.tfvars
gcp.credentials.tfvars
proxmox.credentials.tfvars
- RKE2 uses AKV via External Secrets Operator as the runtime secrets layer (per ADR-0502):
- External Secrets Operator syncs from AKV into Kubernetes Secrets.
- Reloader (or equivalent) restarts pods when secrets/configs change.
-
Kubernetes RBAC, Pod Security and NetworkPolicies govern access.
-
Bootstrap, CI, and DR automation may use an encrypted vault bundle for non-interactive secret injection, with the vault password stored out-of-band (for example in GitHub Actions secrets or Jenkins credentials).
-
HyOps uses
<root>/vault/bootstrap.vault.env(Ansible Vault encrypted env-format file). - Stored outside Git by default (local workstation or automation runner storage); decrypted only at runtime for a single command execution.
hyops secrets ensurecan generate missing values for bootstrap/labs/CI/DR, andhyops secrets akv-synccan sync selected AKV secrets into the vault bundle when needed.- DR runs may provision or recover foundation infrastructure, promote database replicas, and perform DNS cutover using runner-provided credentials decrypted at runtime.
-
This does not change steady state: application and platform secret values remain in AKV when available.
-
Optional break-glass artefact (SOPS) is permitted for policy-driven DR scenarios:
docs/secrets/secrets.dr.enc.yaml
- Used only when required by DR policy or when the vault bundle scope is intentionally insufficient for a recovery scenario.
-
Not used in normal operations.
-
HashiCorp Vault remains an optional, future enhancement:
- May be introduced later for dynamic database credentials or advanced cloud auth.
- Must not change the principle that there is a single active source of truth for steady-state secret values at any point in time.
Instead of mirroring AKV into a second live platform, platform and application secrets remain in AKV in steady state. The encrypted vault bundle is constrained to bootstrap, CI, and DR automation and must not be used as a steady-state application secret store.
3. Rationale¶
- Clarity and simplicity: AKV as the steady-state store avoids drift, confusion, and duplicated rotation logic.
- Alignment with target audience: Azure AD + AKV is a realistic enterprise path.
- Bootstrap vs. runtime separation: Per-provider bootstrap env files support early provisioning without becoming an application secret store.
- Evidence-focused: Clear artefact trail across
infra/env/*, Terraform state, AKV policy, ESO resources, and workload consumption. - DR without over-engineering: DR automation can run from an external runner using the encrypted vault bundle; SOPS remains optional and policy-driven.
- Non-interactive automation without click-ops: The vault bundle enables repeatable bootstrap/CI/DR runs without embedding plaintext values in pipelines or templates.
4. Consequences¶
4.1 Positive consequences¶
- Clear hierarchy:
- Bootstrap connectivity:
<root>/credentials/*. - Steady-state secret values: AKV (with RKE2 consumption via ADR-0502).
- Bootstrap/CI/DR injection:
<root>/vault/bootstrap.vault.env(encrypted; decrypted only at runtime). - Optional break-glass: SOPS artefact (policy-driven).
- Reduced risk of secrets drift or multiple conflicting sources of truth.
- Compatible with future Vault adoption without parallel live secret stores.
4.2 Negative consequences / risks¶
- Strong dependency on AKV for steady-state operations:
- If AKV or Azure are unavailable, platform operations may be degraded.
- Additional work required to implement ESO and access controls correctly (RBAC, Pod Security, NetworkPolicies).
- DR execution depends on an external runner and out-of-band vault password storage (for example GitHub Actions secrets).
- Optional SOPS artefact requires secure key handling and documented procedure.
- The vault bundle introduces a second protected secret container, but it is scoped to bootstrap/CI/DR automation and does not change the steady-state source-of-truth hierarchy.
5. Alternatives considered¶
- Single giant
.envfile for all providers: -
Rejected due to mixed concerns and increased risk of application secrets leaking into bootstrap artefacts.
-
HashiCorp Vault as the initial primary store:
- Deferred due to operational weight in a constrained environment; remains optional once AKV-based patterns are stable and evidence-backed.
6. Implementation notes¶
6.1 Bootstrap connectivity artefacts¶
- Created and maintained by:
hyops init <target>(writes bootstrap config and runtime credentials under<root>/config/and<root>/credentials/)- Legacy workspaces may still use per-provider env/tfvars files (for example under
infra/env/). - Consumed by:
- Packer (Proxmox templates).
- Terraform (Proxmox SDN/VMs, Azure/GCP infra, AKV).
- Ansible (Ctrl-01, Jenkins, NetBox, RKE2 nodes).
6.2 AKV integration¶
- Terraform modules create AKV, secrets, and access policies.
- RKE2 consumes secrets via External Secrets Operator (ADR-0502).
6.3 Encrypted vault bundle (bootstrap/CI/DR)¶
- HyOps uses
<root>/vault/bootstrap.vault.env(Ansible Vault encrypted env-format file) to avoid interactive prompts. - Workstations bootstrap a local password provider via
hyops vault bootstrap(stores password inpassunderhybridops/ansible-vaultby default). - Secrets can be managed with:
- Generate missing values:
hyops secrets ensure --env <env> ... - Explicit overrides:
hyops secrets set --env <env> ... - Enterprise sync (AKV -> vault bundle):
hyops secrets akv-sync --env <env> --vault-name <name> ... - Automation runners store/provide the vault password out-of-band (for example Jenkins credentials or GitHub Actions secrets).
6.4 Optional SOPS artefact¶
docs/secrets/secrets.dr.enc.yamlmay be created with SOPS when required.- DR runbook defines how and when to decrypt and use it.
7. Operational impact and validation¶
- Runbooks cover ctrl-01 bootstrap, AKV provisioning/rotation checks, ESO sync validation, and DR execution via external runner.
- Evidence includes Terraform plans/state, ESO manifests/logs, workload logs confirming injection, and
output/logs/artefacts from bootstrap and DR runs.
8. References¶
- Guide: Secrets lifecycle and responsibilities
- ADR: ADR-0502 – Use External Secrets Operator with Azure Key Vault for Application Secrets
- ADR: ADR-0501 – PostgreSQL on Dedicated VM with DR Replication
- How-to: Provision ctrl-01
- Related cost & telemetry considerations: Cost & Telemetry
Maintainer: HybridOps.Studio
License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.