Skip to content

Operate PostgreSQL HA Backup (pgBackRest) (HyOps)

Purpose: Configure pgBackRest backups + WAL archiving for an existing Patroni PostgreSQL HA cluster.
Owner: Platform engineering
Trigger: After PostgreSQL HA bootstrap, before onboarding stateful services, and during DR drills
Impact: Enables backups, WAL archive shipping, and creates scheduled backup jobs (optionally to both repo1 + repo2)
Severity: P2
Pre-reqs: platform/onprem/postgresql-ha is healthy, backup repo reachable from DB nodes, vault decrypt works, Ansible deps installed.
Rollback strategy: Destroy this module to disable scheduled backups (best-effort).

Context

This runbook covers module-level operations for:

  • Module: platform/onprem/postgresql-ha-backup
  • Driver: config/ansible
  • Upstream automation: vitabaks.autobase (Ansible Galaxy)
  • Scope: configure backups on existing PostgreSQL HA nodes (no VM provisioning)

This module is intentionally "bring-your-own object store":

  • It does not create repository storage.
  • It does not provision cloud resources.
  • Optional: HyOps can provision repository infra:
  • GCS: org/gcp/object-repo (bucket + service account, no SA keys)
  • AWS: org/aws/object-repo (bucket + IAM user, no access keys)
  • Azure Blob: org/azure/object-repo (storage account + container, no account keys)
  • Compatibility wrappers remain available:
  • org/gcp/pgbackrest-repo
  • org/aws/pgbackrest-repo
  • org/azure/pgbackrest-repo
  • Recommended wiring: set repo_state_ref in backup inputs so backend/repo fields are derived from module state instead of duplicated manually.
  • Preferred GCP orchestration path: use blueprint dr/postgresql-ha-backup-gcp@v1 so HyOps provisions the repository first and then configures backup against repo_state_ref.
  • Optional secondary copy wiring:
  • secondary_enabled: true
  • secondary_repo_state_ref (recommended), or explicit secondary_backend + secondary_* fields.
  • Preferred repo modules are the generic object-repo modules:
  • org/gcp/object-repo
  • org/aws/object-repo
  • org/azure/object-repo
  • Compatibility wrappers remain supported:
  • org/gcp/pgbackrest-repo
  • org/aws/pgbackrest-repo
  • org/azure/pgbackrest-repo

Preconditions and safety checks

  • Installed hyops (via install.sh) can be run from any working directory.
  • If you want to use shipped example overlays, set:
export HYOPS_CORE_ROOT="${HYOPS_CORE_ROOT:-$HOME/.hybridops/core/app}"

For source checkout usage, set HYOPS_CORE_ROOT to your hybridops-core checkout root instead. - Correct environment selected (--env shared|dev|staging|prod). - PostgreSQL HA cluster is ready: - $HOME/.hybridops/envs/<env>/state/modules/platform__onprem__postgresql-ha/latest.json has status=ok - Backup repository is reachable from DB nodes (firewall/routing).

Select a backend in your overlay:

  • backend: s3 (AWS S3 or S3-compatible)
  • backend: gcs (Google Cloud Storage)
  • backend: azure (Azure Blob Storage)

Or set:

  • repo_state_ref: org/gcp/object-repo (or aws/azure equivalent)
  • Backend and repository location fields are then resolved from state.
  • If the repository was created in a dedicated state slot, use the instance-qualified form:
  • repo_state_ref: org/gcp/object-repo#postgresql_ha_backup_repo

For optional secondary copy (repo2), also set:

  • secondary_enabled: true
  • secondary_repo_state_ref: org/azure/object-repo (or gcp/aws equivalent)
  • Keep repo_mismatch_action: fail as default safety.
  • Use repo_mismatch_action: reset only when intentionally re-initializing stale pgBackRest stanza metadata after a cluster rebuild.

Secrets (must be explicitly provided, not randomly generated):

  • Backend s3:
  • PG_BACKUP_S3_ACCESS_KEY_ID
  • PG_BACKUP_S3_SECRET_ACCESS_KEY
  • Backend gcs:
  • PG_BACKUP_GCS_SA_JSON (service account JSON content)
  • Backend azure:
  • PG_BACKUP_AZURE_ACCOUNT_KEY (storage account key)

If secondary copy is enabled, also provide secondary backend credentials (defaults):

  • Secondary s3:
  • PG_BACKUP_SECONDARY_S3_ACCESS_KEY_ID
  • PG_BACKUP_SECONDARY_S3_SECRET_ACCESS_KEY
  • Secondary gcs:
  • PG_BACKUP_SECONDARY_GCS_SA_JSON
  • Secondary azure:
  • PG_BACKUP_SECONDARY_AZURE_ACCOUNT_KEY

Seed secrets into runtime vault (recommended).

Optional (GCS): provision repo bucket + service account

HYOPS_INPUT_project_state_ref=org/gcp/project-factory \
HYOPS_INPUT_bucket_name=hyops-dev-pgbackrest-a1 \
hyops apply --env <env> \
  --module org/gcp/object-repo \
  --inputs "$HYOPS_CORE_ROOT/modules/org/gcp/object-repo/examples/inputs.min.yml"

If the project is external to HyOps, clear project_state_ref in the input file and set project_id explicitly instead.

Recommended GCS bucket pattern:

  • hyops-<env>-pgbackrest-<suffix>
  • Example: hyops-dev-pgbackrest-a1
  • Keep bucket names lowercase and globally unique within GCS.

Optional (AWS): provision S3 repo bucket + IAM user

hyops apply --env <env> \
  --module org/aws/object-repo \
  --inputs "$HYOPS_CORE_ROOT/modules/org/aws/object-repo/examples/inputs.min.yml"

Optional (Azure): provision Blob repo storage

hyops apply --env <env> \
  --module org/azure/object-repo \
  --inputs "$HYOPS_CORE_ROOT/modules/org/azure/object-repo/examples/inputs.min.yml"

Secrets examples:

# S3
hyops secrets set --env <env> \
  PG_BACKUP_S3_ACCESS_KEY_ID='...' \
  PG_BACKUP_S3_SECRET_ACCESS_KEY='...'

# GCS
export PG_BACKUP_GCS_SA_JSON="$(cat sa.json)"
hyops secrets set --env <env> --from-env PG_BACKUP_GCS_SA_JSON

# Azure Blob
hyops secrets set --env <env> PG_BACKUP_AZURE_ACCOUNT_KEY='...'

Steps

  1. Select an overlay

  2. S3-compatible: $HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.minio.yml

  3. GCS: $HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.gcs.yml
  4. GCS (consume current platform/onprem/postgresql-ha state): $HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.gcs.ha-state.yml
  5. GCS (explicit inventory_groups, no inventory_state_ref): $HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.gcs.explicit-inventory.yml
  6. Azure Blob: $HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.azure.yml
  7. GCS primary + Azure secondary: $HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.gcs.azure-secondary.yml

Optional (recommended): bind repository module state without duplicating bucket/container values.

export HYOPS_INPUT_repo_state_ref=org/gcp/object-repo
# alternatives:
# export HYOPS_INPUT_repo_state_ref=org/aws/object-repo
# export HYOPS_INPUT_repo_state_ref=org/azure/object-repo

# if the repo was provisioned with a dedicated state instance:
# export HYOPS_INPUT_repo_state_ref=org/gcp/object-repo#postgresql_ha_backup_repo

# optional repo2 state-driven copy target:
# export HYOPS_INPUT_secondary_enabled=true
# export HYOPS_INPUT_secondary_repo_state_ref=org/azure/object-repo
  1. Preflight
INPUTS="$HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.minio.yml"

hyops preflight --env <env> --strict \
  --module platform/onprem/postgresql-ha-backup \
  --inputs "$INPUTS"
  1. Configure backups (install + reconcile)
hyops apply --env <env> \
  --module platform/onprem/postgresql-ha-backup \
  --inputs "$INPUTS"
  1. Trigger on-demand full backup (optional)
HYOPS_INPUT_repo_state_ref=org/gcp/object-repo \
HYOPS_INPUT_apply_mode=backup \
hyops apply --env <env> \
  --module platform/onprem/postgresql-ha-backup \
  --inputs "$INPUTS"

If the repository path was reused from an older PostgreSQL cluster and backup mode fails with a pgBackRest system-id mismatch, run one controlled recovery pass:

HYOPS_INPUT_repo_state_ref=org/gcp/object-repo \
HYOPS_INPUT_apply_mode=backup \
HYOPS_INPUT_repo_mismatch_action=reset \
hyops apply --env <env> \
  --module platform/onprem/postgresql-ha-backup \
  --inputs "$INPUTS"

Then return to the normal backup command without repo_mismatch_action=reset.

  1. Verify state and evidence
cat "$HOME/.hybridops/envs/<env>/state/modules/platform__onprem__postgresql-ha-backup/latest.json"

Check:

  • status is ok
  • outputs.cap.db.postgresql_ha_backup is ready
  • outputs.pgbackrest_repo matches backend settings

Common issues

dependency platform/onprem/postgresql-ha is not ready

Cause: PostgreSQL HA is missing or not healthy.

Fix:

  • Deploy platform/onprem/postgresql-ha first and ensure status=ok.

S3 permission errors during stanza-create / archive / backup

Cause: wrong credentials or bucket policy.

Fix:

  • Confirm access key has list/read/write permissions to the bucket.
  • Confirm endpoint/region/uri_style match your S3 implementation.

backup and archive info files exist but do not match the database

Cause: a rebuilt PostgreSQL cluster is reusing repository path metadata from a different system-id.

Fix:

  • Preferred: keep repo_mismatch_action: fail, rotate to a clean repo path.
  • Controlled recovery: set repo_mismatch_action: reset for one run to re-initialize stanza metadata.

archive.info cannot be opened during first backup after reset

Cause: pgBackRest archive metadata was removed during controlled reset and needs stanza initialization before backup.

Fix:

  • Re-run the same backup command with repo_mismatch_action=reset.
  • HyOps backup mode now performs stanza initialization before pgbackrest check.
  • After the first successful run, remove repo_mismatch_action=reset.

GCS authentication errors during stanza-create / archive / backup

Cause: missing/invalid service account JSON, or bucket IAM policy.

Fix:

  • Confirm PG_BACKUP_GCS_SA_JSON is present in vault for selected env.
  • Confirm service account has read/write bucket permissions.

Azure authentication errors during stanza-create / archive / backup

Cause: missing/invalid storage account key, wrong account/container, or network ACL restrictions.

Fix:

  • Confirm PG_BACKUP_AZURE_ACCOUNT_KEY is present in vault for selected env.
  • Confirm azure_storage_account and azure_container are correct.
  • Confirm storage firewall/network policy allows DB nodes.

References