PostgreSQL DR Operating Model (Restore vs Warm Standby vs Multi-Cloud)¶

Purpose¶

Define a professional, deterministic DR model for PostgreSQL that:

avoids split-brain risk,
keeps operations predictable for packaged/tarball users,
supports decision-service-driven failover target selection,
scales to multi-cloud without forcing dual-primary or dual-warm complexity.

Recommended default (current standard)¶

Single active primary (on-prem during normal operations).
Continuous pgBackRest backup + WAL archive to one primary object repository backend (gcs or azure or s3).
One warm-standby failover target (GCP or Azure), selected by policy/decision service.
Optional secondary backup copy (cross-cloud object replication/copy), for repository survivability.

This gives strong DR posture without introducing unnecessary operational risk.

DR modes¶

Mode A: Backup-restore DR (baseline, always required)¶

On DR event, provision/reuse target cluster and restore from pgBackRest repository.
Best for lower cost and simpler operations.
Higher RTO than warm-standby promotion.

Mode B: Warm-standby promotion DR (enterprise fast path)¶

Keep one target cloud cluster in read-only recovery posture.
On DR event, promote the standby and cut traffic.
Lower RTO, higher steady-state cost.

Mode C: Dual warm-standby (advanced, not default)¶

Two warm standbys in different clouds.
Not recommended by default due to cost, orchestration complexity, and increased operational blast radius.

Multi-cloud configuration¶

If decision service may choose between GCP and Azure, use:

One active warm standby target at a time (selected by policy),
plus secondary backup copy to the other cloud.

Do not run dual read-only PostgreSQL clusters by default unless you have a strict, tested requirement and operational capacity for it.

Tiered DR configurations (SME, schools, enterprise)¶

PostgreSQL DR is available in three tiered configurations to match cost, recovery objectives, and operational capacity.

Default lane: backup-restore DR¶

Recommended for:

SMEs
schools
cost-sensitive environments
teams without 24x7 database specialists

Shape:

On-prem Patroni HA primary
Continuous pgBackRest backup + WAL archive to one object repository
Restore into self-managed cloud VMs during DR
Controlled failback back on-prem

Why this is the default:

lowest steady-state cost
easiest to explain and support
cleanest tarball-safe story
strongest fit for packaged deployments and reviewable drills

Premium lane: managed warm-standby DR¶

Recommended for:

enterprises with stricter RTO/RPO needs
customers willing to pay for lower recovery time and higher steady-state cost

Shape:

On-prem Patroni HA primary
One managed cloud PostgreSQL standby target
Controlled promotion during DR
Explicit failback by reseed or reverse replication

This is an upgrade path, not the default posture for all customers.

Advanced lane: multi-cloud resilience¶

Recommended only when justified by policy, budget, and operational maturity.

Shape:

one active warm standby target at a time
optional secondary backup copy to another cloud
decision-service-driven target selection later

Do not make dual warm-standby or dual read-only the default product shape.

Decision service contract (target state)¶

Decision service should output at least:

dr_mode: restore | warm_promote | deny
dr_target_cloud: gcp | azure
repo_backend: gcs | azure | s3
enable_secondary_backup_copy: true|false
rationale: string

HybridOps DR workflows then consume these outputs to select:

failover blueprint/inputs,
repository state reference (repo_state_ref),
whether to run secondary copy automation.

Implementation guidance¶

Treat PostgreSQL DR as two product lanes:
baseline self-managed restore lane
premium managed warm-standby lane
Keep existing failover/failback blueprints as deterministic restore paths.
Keep platform/postgresql-ha-backup as the standard backup configuration layer.
Keep the client-facing endpoint contract the same across both lanes:
endpoint_dns_name
endpoint_target
endpoint_target_type
endpoint_host
endpoint_port
endpoint_cutover_required
Keep the current GCP/Azure restore blueprints as the first shipped DR path to prove end-to-end recovery before introducing managed-DB replication.
Use optional pgBackRest repo2 in platform/postgresql-ha-backup for secondary backup copy:
secondary_enabled: true
secondary_repo_state_ref (recommended) or secondary_backend + secondary_* inputs
Keep secondary copy explicit and policy-driven, not coupled to primary DB write path.
Object repositories are provisioned with Terraform provider-native resources (AWS/GCP/Azure official providers), then consumed via repo_state_ref/secondary_repo_state_ref.
For rebuilt clusters reusing the same backup path, keep repo_mismatch_action=fail by default and use repo_mismatch_action=reset only for controlled stanza re-initialization.
Treat warm-standby as an upgrade path, not mandatory for all users.
Treat failback from managed PostgreSQL as a controlled reseed/new-lineage operation unless and until reverse replication is explicitly implemented and tested.

PostgreSQL DR Operating Model (Restore vs Warm Standby vs Multi-Cloud)¶

Purpose¶

Recommended default (current standard)¶

DR modes¶

Mode A: Backup-restore DR (baseline, always required)¶

Mode B: Warm-standby promotion DR (enterprise fast path)¶

Mode C: Dual warm-standby (advanced, not default)¶

Multi-cloud configuration¶

Tiered DR configurations (SME, schools, enterprise)¶

Default lane: backup-restore DR¶

Premium lane: managed warm-standby DR¶

Advanced lane: multi-cloud resilience¶

Decision service contract (target state)¶

Implementation guidance¶

References¶