Deploy PostgreSQL HA (HyOps Blueprint)¶
Purpose: Deploy a complete on-prem PostgreSQL HA cluster through a single governed blueprint run.
Owner: Platform engineering
Trigger: New environment bring-up, baseline foundation buildout, or controlled environment reset
Impact: Consumes shared SDN/NetBox authority and converges image, VM infrastructure, and Patroni + etcd cluster software
Severity: P2
Pre-reqs: Proxmox init complete for target env, vault decrypt works, NetBox authority is ready (authoritative IPAM), SSH key access exists, and required DB secrets are available in vault.
Rollback strategy: Destroy modules in reverse order or run controlled rebuild from the same blueprint inputs.
Context¶
Blueprint ref: onprem/postgresql-ha@v1
Location: hybridops-core/blueprints/onprem/postgresql-ha@v1/blueprint.yml
Default step flow:
core/onprem/template-image(Rocky 9)platform/onprem/platform-vm(IPAM from NetBox + shared SDN state)platform/onprem/postgresql-ha
Preconditions and safety checks¶
Path behavior:
- Installed
hyops(viainstall.sh) can be run from any working directory. -
Source-checkout usage should export
HYOPS_CORE_ROOT=/path/to/hybridops-core. -
Ensure NetBox authority is ready (required for IPAM)
This blueprint allocates Postgres VM IPs from NetBox (no hardcoded per-VM IPs) and consumes the shared SDN authority state.
The VM step uses the vnetenvdata bridge alias, which HyOps resolves from --env:
- dev -> vnetddev (VLAN 21)
- staging -> vnetdstg (VLAN 31)
- prod -> vnetdprd (VLAN 41)
- shared -> vnetdata (shared platform data VLAN)
For a shared platform PostgreSQL HA (for example a NetBox authority DB), run this
blueprint in shared; vnetenvdata resolves to vnetdata automatically.
Default shared-client allowlist:
- The blueprint includes an explicit
allowed_clientsentry for the shared NetBox authority VM data IP (10.12.0.11/32, user/databasenetbox) so thenetbox-ha-cutoverflow works out of the box against shared PostgreSQL HA. - For stricter or non-NetBox use cases, override
allowed_clientsin your blueprint inputs/overlay to match the exact client CIDRs that should be permitted inpg_hba.conf.
By default, HyOps expects NetBox authority in --env shared. If it is not ready yet, run:
hyops blueprint deploy --env shared --ref onprem/bootstrap-netbox@v1 --execute
This also seeds the shared SDN foundation and NetBox IPAM/inventory datasets consumed by this blueprint.
If this authority is missing, blueprint preflight blocks platform/onprem/platform-vm with:
- contract failed: netbox authority not ready (platform/onprem/netbox, status=missing) (authority_root=.../envs/shared)
- Ensure required secrets exist in runtime vault:
hyops secrets ensure --env dev \
PATRONI_SUPERUSER_PASSWORD \
PATRONI_REPLICATION_PASSWORD \
NETBOX_DB_PASSWORD
- Validate and plan blueprint:
hyops blueprint validate --ref onprem/postgresql-ha@v1
hyops blueprint plan --ref onprem/postgresql-ha@v1
- Run preflight:
hyops blueprint preflight --env dev --ref onprem/postgresql-ha@v1
Steps¶
- Execute full blueprint
hyops blueprint deploy --env dev \
--ref onprem/postgresql-ha@v1 \
--execute
If HyOps detects existing step state (rerun/replacement risk), it may prompt
for confirmation before executing the blueprint. Use --yes for
non-interactive runs.
-
Observe progress during long phases
-
Each module step prints
progress: logs=...with the active log file path. - Long-running phases also print a one-time log-watch hint and heartbeat status lines.
- Evidence and streamed logs are written under:
$HOME/.hybridops/envs/<env>/logs/module/...
Optional live terminal streaming:
hyops --verbose blueprint deploy --env dev \
--ref onprem/postgresql-ha@v1 \
--execute
- Cut over NetBox to consume PostgreSQL HA contract (optional, recommended)
When foundation NetBox currently points at platform/onprem/postgresql-core, run:
hyops blueprint deploy --env shared \
--ref onprem/netbox-ha-cutover@v1 \
--execute
Verification¶
- Verify module state
cat "$HOME/.hybridops/envs/dev/state/modules/platform__onprem__postgresql-ha/latest.json"
Success indicators:
- Blueprint summary ends with
status=ok. platform/onprem/postgresql-hastate showsstatus: ok.outputs.cap.db.postgresql_haisready.outputs.cluster_vipmatches the reserved VIP (when configured).
Note on cluster_vip and allowed_clients:
- The blueprint no longer hardcodes VIP/client CIDRs.
- Set them via blueprint file overlay (or module-level env overrides) per environment.
-
Recommended: reserve VIP in NetBox before deployment.
-
Functional smoke check (example)
From a host that has management network reachability:
nc -vz 10.12.0.55 5432
Troubleshooting¶
- If deploy stops before execution with
preflight_status=failed, fix required failures and rerun. - For SSH timeout/connectivity errors, ensure management network routing or configure bastion/proxy jump.
- For missing secret errors, seed required keys via
hyops secrets ensure/hyops secrets set. - Recent driver failures include an
open: <evidence>/<driver>.loghint in the error summary. Start with that file first. - For NetBox API reachability errors in IPAM mode:
- Ensure NetBox is
status=okin the NetBox authority env (default:shared). - If your workstation is not routed to the management subnet, HyOps may auto-tunnel NetBox API via the Proxmox host (requires
hyops init proxmox --bootstrapand working SSH access).
Fallback validation path (when shared authority is intentionally not bootstrapped yet):
- Use module-level lifecycle tests against existing PG VM inventory:
hyops preflight/apply/destroy/apply --module platform/onprem/postgresql-ha- Reuse generated inputs from:
~/.hybridops/envs/dev/work/blueprint-inputs/onprem_postgresql-ha_v1/postgresql_ha.inputs.yml