Skip to content

Deploy PostgreSQL HA (HyOps Blueprint)

Purpose: Deploy a complete on-prem PostgreSQL HA cluster through a single governed blueprint run.
Owner: Platform engineering
Trigger: New environment bring-up, baseline foundation buildout, or controlled environment reset
Impact: Consumes shared SDN/NetBox authority and converges image, VM infrastructure, and Patroni + etcd cluster software
Severity: P2
Pre-reqs: Proxmox init complete for target env, vault decrypt works, NetBox authority is ready (authoritative IPAM), SSH key access exists, and required DB secrets are available in vault.
Rollback strategy: Destroy modules in reverse order or run controlled rebuild from the same blueprint inputs.

Context

Blueprint ref: onprem/postgresql-ha@v1
Location: hybridops-core/blueprints/onprem/postgresql-ha@v1/blueprint.yml

Default step flow:

  1. core/onprem/template-image (Rocky 9)
  2. platform/onprem/platform-vm (IPAM from NetBox + shared SDN state)
  3. platform/onprem/postgresql-ha

Preconditions and safety checks

Path behavior:

  • Installed hyops (via install.sh) can be run from any working directory.
  • Source-checkout usage should export HYOPS_CORE_ROOT=/path/to/hybridops-core.

  • Ensure NetBox authority is ready (required for IPAM)

This blueprint allocates Postgres VM IPs from NetBox (no hardcoded per-VM IPs) and consumes the shared SDN authority state. The VM step uses the vnetenvdata bridge alias, which HyOps resolves from --env: - dev -> vnetddev (VLAN 21) - staging -> vnetdstg (VLAN 31) - prod -> vnetdprd (VLAN 41) - shared -> vnetdata (shared platform data VLAN)

For a shared platform PostgreSQL HA (for example a NetBox authority DB), run this blueprint in shared; vnetenvdata resolves to vnetdata automatically.

Default shared-client allowlist:

  • The blueprint includes an explicit allowed_clients entry for the shared NetBox authority VM data IP (10.12.0.11/32, user/database netbox) so the netbox-ha-cutover flow works out of the box against shared PostgreSQL HA.
  • For stricter or non-NetBox use cases, override allowed_clients in your blueprint inputs/overlay to match the exact client CIDRs that should be permitted in pg_hba.conf.

By default, HyOps expects NetBox authority in --env shared. If it is not ready yet, run:

hyops blueprint deploy --env shared --ref onprem/bootstrap-netbox@v1 --execute

This also seeds the shared SDN foundation and NetBox IPAM/inventory datasets consumed by this blueprint.

If this authority is missing, blueprint preflight blocks platform/onprem/platform-vm with: - contract failed: netbox authority not ready (platform/onprem/netbox, status=missing) (authority_root=.../envs/shared)

  1. Ensure required secrets exist in runtime vault:
hyops secrets ensure --env dev \
  PATRONI_SUPERUSER_PASSWORD \
  PATRONI_REPLICATION_PASSWORD \
  NETBOX_DB_PASSWORD
  1. Validate and plan blueprint:
hyops blueprint validate --ref onprem/postgresql-ha@v1
hyops blueprint plan --ref onprem/postgresql-ha@v1
  1. Run preflight:
hyops blueprint preflight --env dev --ref onprem/postgresql-ha@v1

Steps

  1. Execute full blueprint
hyops blueprint deploy --env dev \
  --ref onprem/postgresql-ha@v1 \
  --execute

If HyOps detects existing step state (rerun/replacement risk), it may prompt for confirmation before executing the blueprint. Use --yes for non-interactive runs.

  1. Observe progress during long phases

  2. Each module step prints progress: logs=... with the active log file path.

  3. Long-running phases also print a one-time log-watch hint and heartbeat status lines.
  4. Evidence and streamed logs are written under:
  5. $HOME/.hybridops/envs/<env>/logs/module/...

Optional live terminal streaming:

hyops --verbose blueprint deploy --env dev \
  --ref onprem/postgresql-ha@v1 \
  --execute
  1. Cut over NetBox to consume PostgreSQL HA contract (optional, recommended)

When foundation NetBox currently points at platform/onprem/postgresql-core, run:

hyops blueprint deploy --env shared \
  --ref onprem/netbox-ha-cutover@v1 \
  --execute

Verification

  1. Verify module state
cat "$HOME/.hybridops/envs/dev/state/modules/platform__onprem__postgresql-ha/latest.json"

Success indicators:

  • Blueprint summary ends with status=ok.
  • platform/onprem/postgresql-ha state shows status: ok.
  • outputs.cap.db.postgresql_ha is ready.
  • outputs.cluster_vip matches the reserved VIP (when configured).

Note on cluster_vip and allowed_clients:

  • The blueprint no longer hardcodes VIP/client CIDRs.
  • Set them via blueprint file overlay (or module-level env overrides) per environment.
  • Recommended: reserve VIP in NetBox before deployment.

  • Functional smoke check (example)

From a host that has management network reachability:

nc -vz 10.12.0.55 5432

Troubleshooting

  • If deploy stops before execution with preflight_status=failed, fix required failures and rerun.
  • For SSH timeout/connectivity errors, ensure management network routing or configure bastion/proxy jump.
  • For missing secret errors, seed required keys via hyops secrets ensure / hyops secrets set.
  • Recent driver failures include an open: <evidence>/<driver>.log hint in the error summary. Start with that file first.
  • For NetBox API reachability errors in IPAM mode:
  • Ensure NetBox is status=ok in the NetBox authority env (default: shared).
  • If your workstation is not routed to the management subnet, HyOps may auto-tunnel NetBox API via the Proxmox host (requires hyops init proxmox --bootstrap and working SSH access).

Fallback validation path (when shared authority is intentionally not bootstrapped yet):

  • Use module-level lifecycle tests against existing PG VM inventory:
  • hyops preflight/apply/destroy/apply --module platform/onprem/postgresql-ha
  • Reuse generated inputs from:
    • ~/.hybridops/envs/dev/work/blueprint-inputs/onprem_postgresql-ha_v1/postgresql_ha.inputs.yml

References