Skip to content

PostgreSQL DR Product Lanes

Purpose: Define the clean product and architecture split for PostgreSQL disaster recovery in HybridOps so that the platform remains supportable, tarball-safe, and commercially clear for SMEs, schools, and enterprise customers.

This standard complements:

It is not an ADR. It describes the current product and architecture boundary that shipped modules and blueprints should follow.

1. Product lanes

HybridOps SHOULD present PostgreSQL DR in three lanes.

1.1 Baseline lane: self-managed restore DR

Target fit:

  • SMEs
  • schools
  • cost-sensitive customers

Architecture:

  • On-prem PostgreSQL HA primary (Patroni + etcd)
  • pgBackRest backup and WAL archive to one object repository
  • DR restore to self-managed cloud VMs
  • Controlled failback to on-prem

Commercial posture:

  • default offer
  • lowest steady-state cost
  • strongest packaged/tarball fit

1.2 Premium lane: managed warm-standby DR

Target fit:

  • customers with tighter RTO/RPO requirements
  • customers willing to pay for lower recovery time and higher operational sophistication

Architecture:

  • On-prem PostgreSQL HA primary
  • one managed cloud PostgreSQL standby target
  • controlled promotion during DR
  • controlled failback by reseed or reverse replication

Commercial posture:

  • premium or enterprise add-on
  • not the default packaged story

1.3 Advanced lane: multi-cloud resilience

Target fit:

  • customers with explicit policy or regulatory requirements
  • customers with operational capacity for multi-cloud DR

Architecture:

  • one active warm standby target at a time
  • optional secondary backup copy in a second cloud
  • later decision-service-driven target selection

Commercial posture:

  • advanced offering
  • not required for baseline product success

2. Default product rule

HybridOps MUST ship and document the baseline self-managed restore lane as the default PostgreSQL DR path.

Rationale:

  • simplest to explain
  • easiest to support
  • lowest cost for the target market
  • best fit for tarball-first product packaging
  • already aligned with current module and blueprint contracts

Managed PostgreSQL DR MUST be implemented as a separate lane, not as a hidden variation of the self-managed restore path.

3. Module and blueprint boundaries

3.1 Baseline lane

Baseline DR SHOULD use these contracts:

  • backup configuration:
  • platform/onprem/postgresql-ha-backup
  • object repository:
  • org/gcp/object-repo
  • org/aws/object-repo
  • org/azure/object-repo
  • failover restore blueprint:
  • dr/postgresql-ha-failover-gcp@v1
  • future equivalent restore blueprints for other clouds
  • failback blueprint:
  • dr/postgresql-ha-failback-onprem@v1

This lane SHOULD consume repo_state_ref and inventory_state_ref rather than duplicating bucket names or VM IPs.

3.2 Premium managed lane

Managed DR SHOULD be modeled as separate modules and blueprints, for example:

  • managed database provisioning:
  • org/gcp/cloudsql-postgresql or equivalent
  • source preparation:
  • platform/onprem/postgresql-dr-source
  • managed standby/replication setup:
  • org/gcp/cloudsql-external-replica or equivalent
  • DR promotion blueprint:
  • dr/postgresql-cloudsql-promote-gcp@v1
  • failback blueprint:
  • dr/postgresql-cloudsql-failback-onprem@v1

These names are target-state examples, not currently shipped contracts.

Managed DR MUST NOT overload:

  • platform/onprem/postgresql-ha
  • dr/postgresql-ha-failover-gcp@v1

with provider-managed database semantics.

4. Failback rules

4.1 Baseline self-managed lane

Failback SHOULD remain:

  • maintenance-window based
  • explicit
  • evidence-driven

The on-prem target SHOULD be treated as a rebuilt or re-seeded cluster, not as an attempt to resume an old ambiguous timeline.

4.2 Managed lane

Failback from managed PostgreSQL SHOULD default to one of:

  • controlled export / reseed into a fresh on-prem leader, then rebuild replicas
  • reverse replication into on-prem when explicitly implemented and tested

Managed failback MUST be treated as a new lineage / reseed operation unless reverse replication is an explicitly shipped and validated feature.

5. Destructive operations

Backup repository purge MUST NOT be bundled into normal rerun or routine destroy flows.

Destructive backup cleanup SHOULD be:

  • separate
  • explicit
  • opt-in
  • clearly confirmed

Routine destroy for backup configuration may disable schedules or cluster-side configuration, but SHOULD NOT imply repository data deletion.

6. RTO/RPO positioning

Baseline self-managed restore DR:

  • higher RTO than warm standby
  • RPO bounded by backup/WAL position
  • best price-to-operability balance

Premium managed warm standby DR:

  • lower RTO
  • better RPO, typically bounded by replication lag
  • higher steady-state cost
  • more complex failback

HybridOps marketing and documentation SHOULD describe these tradeoffs explicitly and avoid presenting managed DR as “better in every way.”

7. Decision service boundary

Decision service SHOULD be introduced only after the baseline and premium DR lanes are independently proven.

Decision service SHOULD select among already-tested options, for example:

  • target cloud
  • DR mode
  • whether to enable secondary backup copy

Decision service MUST NOT be used to compensate for unproven or underspecified DR paths.