Failback PostgreSQL Cloud SQL DR to On-Prem (HyOps Blueprint)¶
-
Purpose: Gate the failback decision and then repoint the stable PostgreSQL service endpoint back to the on-prem PostgreSQL HA lane. Owner: Platform engineering / SRE
-
Trigger: End of a managed-cloud DR event or failback drill
- Impact: Applications are redirected back to the on-prem PostgreSQL HA endpoint.
-
Severity: P1 Pre-reqs: The managed cloud primary has been fenced, the on-prem PostgreSQL HA lane has already been rebuilt or reseeded, and DNS authority credentials are available.
-
Rollback strategy: If the manual gate is not confirmed, do nothing. If cutback is unsafe, keep service on the managed DR primary until the on-prem target is re-verified.
Context¶
Blueprint ref: dr/postgresql-cloudsql-failback-onprem@v1
Location: hybridops-core/blueprints/dr/postgresql-cloudsql-failback-onprem@v1/blueprint.yml
Default step flow:
core/shared/manual-gateplatform/network/dns-routingplatform/network/dns-routing(apply_mode=status, live PowerDNS verification)
Important:
- this blueprint does not rebuild the on-prem cluster for you
- rebuild, reseed, or reverse-sync work must already be complete before the manual gate is confirmed
- this keeps the product honest until reverse replication automation is explicitly shipped and tested
- DNS cutback consumes
endpoint_hostfromplatform/postgresql-ha#postgresql_restore_onprem_failbackbecause the route uses anArecord - if the rebuilt failback lane reused the original on-prem source addresses and
those hosts were later fenced during the managed-cloud promote drill, restart
Patroni and republish
platform/postgresql-ha#postgresql_restore_onprem_failbackbefore trusting itsendpoint_hostfor DNS cutback
Manual gate expectations¶
Set the manual gate only after all of these are already true:
managed_primary_fenced=trueonprem_target_rebuilt=trueonprem_primary_writable=truefailback_approved=true
Validate and execute¶
hyops blueprint validate --ref dr/postgresql-cloudsql-failback-onprem@v1
hyops blueprint preflight --env dev --ref dr/postgresql-cloudsql-failback-onprem@v1
hyops blueprint deploy --env dev --ref dr/postgresql-cloudsql-failback-onprem@v1 --execute
Verify¶
Confirm:
- manual gate state is
cap.control.manual_gate = confirmed - DNS cutback state is
cap.network.dns_routing = ready - DNS status step succeeded with
dns.status = live-ok - the published record now targets the on-prem PostgreSQL HA endpoint contract
- application writes land only on the restored on-prem primary