Promote PostgreSQL Cloud SQL DR in GCP (HyOps Blueprint)¶
-
Purpose: Gate the cutover decision and then repoint the stable PostgreSQL service endpoint to the managed GCP DR lane. Owner: Platform engineering / SRE
-
Trigger: Controlled DR event or promotion drill after the managed standby has already been promoted with provider-native controls
- Impact: Applications are redirected to the managed GCP PostgreSQL endpoint.
-
Severity: P1 Pre-reqs: The managed standby lane already exists, provider-native promotion has been completed, the old primary is fenced, and DNS authority credentials are available.
-
Rollback strategy: If the manual gate has not been confirmed, do nothing. If DNS cutover completes incorrectly, restore the DNS target only after re-verifying write authority and split-brain safety.
Context¶
Blueprint ref: dr/postgresql-cloudsql-promote-gcp@v1
Location: hybridops-core/blueprints/dr/postgresql-cloudsql-promote-gcp@v1/blueprint.yml
Default step flow:
org/gcp/cloudsql-external-replica(apply_mode=status, required stateRUNNING)core/shared/manual-gateplatform/network/dns-routingplatform/network/dns-routing(apply_mode=status, live PowerDNS verification)
Important:
- this blueprint does not perform the provider-native promotion action for you
- it exists to make the fencing and approval decision explicit and auditable
- it now fails before the manual gate if the managed standby DMS job is not
RUNNING - the
managed_standby_statusstep is state-backed but it no longer skips a live re-check during deploy; status steps in DR blueprints are expected to rerun so historical success state cannot substitute for a live status check - a hard source fence before DMS promotion can leave replication-slot cleanup to be done manually later on the old source; record that explicitly in the manual gate evidence when it occurs
- verify that the hosts you are fencing still belong to the original source lane.
If
platform/onprem/platform-vm#postgres_ha_vmshas already been destroyed and its old addresses have been reused for a rebuilt failback lane, do not record those rebuilt hosts as proof that the old source primary was fenced - DNS cutover consumes
endpoint_hostfromorg/gcp/cloudsql-external-replica#managed_standbybecause the route uses anArecord
Manual gate expectations¶
Set the manual gate only after all of these are already true:
source_primary_fenced=truemanaged_target_promoted=trueapplication_cutover_approved=true
If any of those statements are still uncertain, do not execute the blueprint. In particular, confirm the fenced hosts still map to the source state you intend to retire, not to a separately rebuilt failback target that reused the same IPs.
Validate and execute¶
hyops blueprint validate --ref dr/postgresql-cloudsql-promote-gcp@v1
hyops blueprint preflight --env dev --file /home/user/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-promote-gcp.yml
hyops blueprint deploy --env dev --file /home/user/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-promote-gcp.yml --execute
Preflight should succeed without operator approval. The deploy path still stops unless the manual gate inputs are changed to confirm: true and all gate assertions are set to true.
Verify¶
Confirm:
- standby status step succeeded and reported the DMS lane
RUNNING - manual gate state is
cap.control.manual_gate = confirmed - if you rerun
managed_standby_statusafter provider-native promotion, that status instance should now flip tostatus=errorinstead of preserving the earlier greenRUNNINGsnapshot - DNS cutover state is
cap.network.dns_routing = ready - DNS status step succeeded with
dns.status = live-ok - the published record now targets the managed Cloud SQL endpoint
- application writes land only on the promoted GCP primary