Operate PostgreSQL HA Backup (pgBackRest) (HyOps)¶
Purpose: Configure pgBackRest backups + WAL archiving for an existing Patroni PostgreSQL HA cluster.
Owner: Platform engineering
Trigger: After PostgreSQL HA bootstrap, before onboarding stateful services, and during DR drills
Impact: Enables backups, WAL archive shipping, and creates scheduled backup jobs (optionally to both repo1 + repo2)
Severity: P2
Pre-reqs: platform/onprem/postgresql-ha is healthy, backup repo reachable from DB nodes, vault decrypt works, Ansible deps installed.
Rollback strategy: Destroy this module to disable scheduled backups (best-effort).
Context¶
This runbook covers module-level operations for:
- Module:
platform/onprem/postgresql-ha-backup - Driver:
config/ansible - Upstream automation:
vitabaks.autobase(Ansible Galaxy) - Scope: configure backups on existing PostgreSQL HA nodes (no VM provisioning)
This module is intentionally "bring-your-own object store":
- It does not create repository storage.
- It does not provision cloud resources.
- Optional: HyOps can provision repository infra:
- GCS:
org/gcp/object-repo(bucket + service account, no SA keys) - AWS:
org/aws/object-repo(bucket + IAM user, no access keys) - Azure Blob:
org/azure/object-repo(storage account + container, no account keys) - Compatibility wrappers remain available:
org/gcp/pgbackrest-repoorg/aws/pgbackrest-repoorg/azure/pgbackrest-repo- Recommended wiring: set
repo_state_refin backup inputs so backend/repo fields are derived from module state instead of duplicated manually. - Preferred GCP orchestration path: use blueprint
dr/postgresql-ha-backup-gcp@v1so HyOps provisions the repository first and then configures backup againstrepo_state_ref. - Optional secondary copy wiring:
secondary_enabled: truesecondary_repo_state_ref(recommended), or explicitsecondary_backend+secondary_*fields.- Preferred repo modules are the generic object-repo modules:
org/gcp/object-repoorg/aws/object-repoorg/azure/object-repo- Compatibility wrappers remain supported:
org/gcp/pgbackrest-repoorg/aws/pgbackrest-repoorg/azure/pgbackrest-repo
Preconditions and safety checks¶
- Installed
hyops(viainstall.sh) can be run from any working directory. - If you want to use shipped example overlays, set:
export HYOPS_CORE_ROOT="${HYOPS_CORE_ROOT:-$HOME/.hybridops/core/app}"
For source checkout usage, set HYOPS_CORE_ROOT to your hybridops-core checkout root instead.
- Correct environment selected (--env shared|dev|staging|prod).
- PostgreSQL HA cluster is ready:
- $HOME/.hybridops/envs/<env>/state/modules/platform__onprem__postgresql-ha/latest.json has status=ok
- Backup repository is reachable from DB nodes (firewall/routing).
Select a backend in your overlay:
backend: s3(AWS S3 or S3-compatible)backend: gcs(Google Cloud Storage)backend: azure(Azure Blob Storage)
Or set:
repo_state_ref: org/gcp/object-repo(or aws/azure equivalent)- Backend and repository location fields are then resolved from state.
- If the repository was created in a dedicated state slot, use the instance-qualified form:
repo_state_ref: org/gcp/object-repo#postgresql_ha_backup_repo
For optional secondary copy (repo2), also set:
secondary_enabled: truesecondary_repo_state_ref: org/azure/object-repo(or gcp/aws equivalent)- Keep
repo_mismatch_action: failas default safety. - Use
repo_mismatch_action: resetonly when intentionally re-initializing stale pgBackRest stanza metadata after a cluster rebuild.
Secrets (must be explicitly provided, not randomly generated):
- Backend
s3: PG_BACKUP_S3_ACCESS_KEY_IDPG_BACKUP_S3_SECRET_ACCESS_KEY- Backend
gcs: PG_BACKUP_GCS_SA_JSON(service account JSON content)- Backend
azure: PG_BACKUP_AZURE_ACCOUNT_KEY(storage account key)
If secondary copy is enabled, also provide secondary backend credentials (defaults):
- Secondary
s3: PG_BACKUP_SECONDARY_S3_ACCESS_KEY_IDPG_BACKUP_SECONDARY_S3_SECRET_ACCESS_KEY- Secondary
gcs: PG_BACKUP_SECONDARY_GCS_SA_JSON- Secondary
azure: PG_BACKUP_SECONDARY_AZURE_ACCOUNT_KEY
Seed secrets into runtime vault (recommended).
Optional (GCS): provision repo bucket + service account
HYOPS_INPUT_project_state_ref=org/gcp/project-factory \
HYOPS_INPUT_bucket_name=hyops-dev-pgbackrest-a1 \
hyops apply --env <env> \
--module org/gcp/object-repo \
--inputs "$HYOPS_CORE_ROOT/modules/org/gcp/object-repo/examples/inputs.min.yml"
If the project is external to HyOps, clear project_state_ref in the input file and set project_id explicitly instead.
Recommended GCS bucket pattern:
hyops-<env>-pgbackrest-<suffix>- Example:
hyops-dev-pgbackrest-a1 - Keep bucket names lowercase and globally unique within GCS.
Optional (AWS): provision S3 repo bucket + IAM user
hyops apply --env <env> \
--module org/aws/object-repo \
--inputs "$HYOPS_CORE_ROOT/modules/org/aws/object-repo/examples/inputs.min.yml"
Optional (Azure): provision Blob repo storage
hyops apply --env <env> \
--module org/azure/object-repo \
--inputs "$HYOPS_CORE_ROOT/modules/org/azure/object-repo/examples/inputs.min.yml"
Secrets examples:
# S3
hyops secrets set --env <env> \
PG_BACKUP_S3_ACCESS_KEY_ID='...' \
PG_BACKUP_S3_SECRET_ACCESS_KEY='...'
# GCS
export PG_BACKUP_GCS_SA_JSON="$(cat sa.json)"
hyops secrets set --env <env> --from-env PG_BACKUP_GCS_SA_JSON
# Azure Blob
hyops secrets set --env <env> PG_BACKUP_AZURE_ACCOUNT_KEY='...'
Steps¶
-
Select an overlay
-
S3-compatible:
$HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.minio.yml - GCS:
$HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.gcs.yml - GCS (consume current
platform/onprem/postgresql-hastate):$HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.gcs.ha-state.yml - GCS (explicit inventory_groups, no inventory_state_ref):
$HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.gcs.explicit-inventory.yml - Azure Blob:
$HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.azure.yml - GCS primary + Azure secondary:
$HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.gcs.azure-secondary.yml
Optional (recommended): bind repository module state without duplicating bucket/container values.
export HYOPS_INPUT_repo_state_ref=org/gcp/object-repo
# alternatives:
# export HYOPS_INPUT_repo_state_ref=org/aws/object-repo
# export HYOPS_INPUT_repo_state_ref=org/azure/object-repo
# if the repo was provisioned with a dedicated state instance:
# export HYOPS_INPUT_repo_state_ref=org/gcp/object-repo#postgresql_ha_backup_repo
# optional repo2 state-driven copy target:
# export HYOPS_INPUT_secondary_enabled=true
# export HYOPS_INPUT_secondary_repo_state_ref=org/azure/object-repo
- Preflight
INPUTS="$HYOPS_CORE_ROOT/modules/platform/onprem/postgresql-ha-backup/examples/inputs.minio.yml"
hyops preflight --env <env> --strict \
--module platform/onprem/postgresql-ha-backup \
--inputs "$INPUTS"
- Configure backups (install + reconcile)
hyops apply --env <env> \
--module platform/onprem/postgresql-ha-backup \
--inputs "$INPUTS"
- Trigger on-demand full backup (optional)
HYOPS_INPUT_repo_state_ref=org/gcp/object-repo \
HYOPS_INPUT_apply_mode=backup \
hyops apply --env <env> \
--module platform/onprem/postgresql-ha-backup \
--inputs "$INPUTS"
If the repository path was reused from an older PostgreSQL cluster and backup mode fails with a pgBackRest system-id mismatch, run one controlled recovery pass:
HYOPS_INPUT_repo_state_ref=org/gcp/object-repo \
HYOPS_INPUT_apply_mode=backup \
HYOPS_INPUT_repo_mismatch_action=reset \
hyops apply --env <env> \
--module platform/onprem/postgresql-ha-backup \
--inputs "$INPUTS"
Then return to the normal backup command without repo_mismatch_action=reset.
- Verify state and evidence
cat "$HOME/.hybridops/envs/<env>/state/modules/platform__onprem__postgresql-ha-backup/latest.json"
Check:
statusisokoutputs.cap.db.postgresql_ha_backupisreadyoutputs.pgbackrest_repomatches backend settings
Common issues¶
dependency platform/onprem/postgresql-ha is not ready¶
Cause: PostgreSQL HA is missing or not healthy.
Fix:
- Deploy
platform/onprem/postgresql-hafirst and ensurestatus=ok.
S3 permission errors during stanza-create / archive / backup¶
Cause: wrong credentials or bucket policy.
Fix:
- Confirm access key has list/read/write permissions to the bucket.
- Confirm endpoint/region/uri_style match your S3 implementation.
backup and archive info files exist but do not match the database¶
Cause: a rebuilt PostgreSQL cluster is reusing repository path metadata from a different system-id.
Fix:
- Preferred: keep
repo_mismatch_action: fail, rotate to a clean repo path. - Controlled recovery: set
repo_mismatch_action: resetfor one run to re-initialize stanza metadata.
archive.info cannot be opened during first backup after reset¶
Cause: pgBackRest archive metadata was removed during controlled reset and needs stanza initialization before backup.
Fix:
- Re-run the same backup command with
repo_mismatch_action=reset. - HyOps backup mode now performs stanza initialization before
pgbackrest check. - After the first successful run, remove
repo_mismatch_action=reset.
GCS authentication errors during stanza-create / archive / backup¶
Cause: missing/invalid service account JSON, or bucket IAM policy.
Fix:
- Confirm
PG_BACKUP_GCS_SA_JSONis present in vault for selected env. - Confirm service account has read/write bucket permissions.
Azure authentication errors during stanza-create / archive / backup¶
Cause: missing/invalid storage account key, wrong account/container, or network ACL restrictions.
Fix:
- Confirm
PG_BACKUP_AZURE_ACCOUNT_KEYis present in vault for selected env. - Confirm
azure_storage_accountandazure_containerare correct. - Confirm storage firewall/network policy allows DB nodes.