PostgreSQL DR Product Lanes¶
Purpose: Define the clean product and architecture split for PostgreSQL disaster recovery in HybridOps so that the platform remains supportable, tarball-safe, and commercially clear for SMEs, schools, and enterprise customers.
This standard complements:
- PostgreSQL DR Operating Model (Restore vs Warm Standby vs Multi-Cloud)
- Object Repository Contract
- Runtime root and packaging
It is not an ADR. It describes the current product and architecture boundary that shipped modules and blueprints should follow.
1. Product lanes¶
HybridOps SHOULD present PostgreSQL DR in three lanes.
1.1 Baseline lane: self-managed restore DR¶
Target fit:
- SMEs
- schools
- cost-sensitive customers
Architecture:
- On-prem PostgreSQL HA primary (Patroni + etcd)
- pgBackRest backup and WAL archive to one object repository
- DR restore to self-managed cloud VMs
- Controlled failback to on-prem
Commercial posture:
- default offer
- lowest steady-state cost
- strongest packaged/tarball fit
1.2 Premium lane: managed warm-standby DR¶
Target fit:
- customers with tighter RTO/RPO requirements
- customers willing to pay for lower recovery time and higher operational sophistication
Architecture:
- On-prem PostgreSQL HA primary
- one managed cloud PostgreSQL standby target
- controlled promotion during DR
- controlled failback by reseed or reverse replication
Commercial posture:
- premium or enterprise add-on
- not the default packaged story
1.3 Advanced lane: multi-cloud resilience¶
Target fit:
- customers with explicit policy or regulatory requirements
- customers with operational capacity for multi-cloud DR
Architecture:
- one active warm standby target at a time
- optional secondary backup copy in a second cloud
- later decision-service-driven target selection
Commercial posture:
- advanced offering
- not required for baseline product success
2. Default product rule¶
HybridOps MUST ship and document the baseline self-managed restore lane as the default PostgreSQL DR path.
Rationale:
- simplest to explain
- easiest to support
- lowest cost for the target market
- best fit for tarball-first product packaging
- already aligned with current module and blueprint contracts
Managed PostgreSQL DR MUST be implemented as a separate lane, not as a hidden variation of the self-managed restore path.
3. Module and blueprint boundaries¶
3.1 Baseline lane¶
Baseline DR SHOULD use these contracts:
- backup configuration:
platform/onprem/postgresql-ha-backup- object repository:
org/gcp/object-repoorg/aws/object-repoorg/azure/object-repo- failover restore blueprint:
dr/postgresql-ha-failover-gcp@v1- future equivalent restore blueprints for other clouds
- failback blueprint:
dr/postgresql-ha-failback-onprem@v1
This lane SHOULD consume repo_state_ref and inventory_state_ref rather than duplicating bucket names or VM IPs.
3.2 Premium managed lane¶
Managed DR SHOULD be modeled as separate modules and blueprints, for example:
- managed database provisioning:
org/gcp/cloudsql-postgresqlor equivalent- source preparation:
platform/onprem/postgresql-dr-source- managed standby/replication setup:
org/gcp/cloudsql-external-replicaor equivalent- DR promotion blueprint:
dr/postgresql-cloudsql-promote-gcp@v1- failback blueprint:
dr/postgresql-cloudsql-failback-onprem@v1
These names are target-state examples, not currently shipped contracts.
Managed DR MUST NOT overload:
platform/onprem/postgresql-hadr/postgresql-ha-failover-gcp@v1
with provider-managed database semantics.
4. Failback rules¶
4.1 Baseline self-managed lane¶
Failback SHOULD remain:
- maintenance-window based
- explicit
- evidence-driven
The on-prem target SHOULD be treated as a rebuilt or re-seeded cluster, not as an attempt to resume an old ambiguous timeline.
4.2 Managed lane¶
Failback from managed PostgreSQL SHOULD default to one of:
- controlled export / reseed into a fresh on-prem leader, then rebuild replicas
- reverse replication into on-prem when explicitly implemented and tested
Managed failback MUST be treated as a new lineage / reseed operation unless reverse replication is an explicitly shipped and validated feature.
5. Destructive operations¶
Backup repository purge MUST NOT be bundled into normal rerun or routine destroy flows.
Destructive backup cleanup SHOULD be:
- separate
- explicit
- opt-in
- clearly confirmed
Routine destroy for backup configuration may disable schedules or cluster-side configuration, but SHOULD NOT imply repository data deletion.
6. RTO/RPO positioning¶
Baseline self-managed restore DR:
- higher RTO than warm standby
- RPO bounded by backup/WAL position
- best price-to-operability balance
Premium managed warm standby DR:
- lower RTO
- better RPO, typically bounded by replication lag
- higher steady-state cost
- more complex failback
HybridOps marketing and documentation SHOULD describe these tradeoffs explicitly and avoid presenting managed DR as “better in every way.”
7. Decision service boundary¶
Decision service SHOULD be introduced only after the baseline and premium DR lanes are independently proven.
Decision service SHOULD select among already-tested options, for example:
- target cloud
- DR mode
- whether to enable secondary backup copy
Decision service MUST NOT be used to compensate for unproven or underspecified DR paths.