Promote PostgreSQL Cloud SQL DR in GCP (HyOps Blueprint)¶

Purpose: Gate the cutover decision and then repoint the stable PostgreSQL service endpoint to the managed GCP DR lane.
Owner: Platform engineering / SRE
Trigger: Controlled DR event or promotion drill after the managed standby has already been promoted with provider-native controls
Impact: Applications are redirected to the managed GCP PostgreSQL endpoint.
Severity: P1
Pre-reqs: The managed standby lane already exists, provider-native promotion has been completed, the old primary is fenced, and DNS authority credentials are available.
Rollback strategy: If the manual gate has not been confirmed, do nothing. If DNS cutover completes incorrectly, restore the DNS target only after re-verifying write authority and split-brain safety.

Context¶

Blueprint ref: dr/postgresql-cloudsql-promote-gcp@v1 Location: hybridops-core/blueprints/dr/postgresql-cloudsql-promote-gcp@v1/blueprint.yml

Default step flow:

org/gcp/cloudsql-external-replica (apply_mode=status, required state RUNNING)
core/shared/manual-gate
platform/network/dns-routing
platform/network/dns-routing (apply_mode=status, live PowerDNS verification)

Important:

this blueprint does not perform the provider-native promotion action for you
it exists to make the fencing and approval decision explicit and auditable
it now fails before the manual gate if the managed standby DMS job is not RUNNING
the managed_standby_status step is state-backed but it no longer skips a live re-check during deploy; status steps in DR blueprints are expected to rerun so historical success state cannot substitute for a live status check
a hard source fence before DMS promotion can leave replication-slot cleanup to be done manually later on the old source; record that explicitly in the manual gate evidence when it occurs
verify that the hosts you are fencing still belong to the original source lane. If platform/onprem/platform-vm#postgres_ha_vms has already been destroyed and its old addresses have been reused for a rebuilt failback lane, do not record those rebuilt hosts as proof that the old source primary was fenced
DNS cutover consumes endpoint_host from org/gcp/cloudsql-external-replica#managed_standby because the route uses an A record

Manual gate expectations¶

Set the manual gate only after all of these are already true:

source_primary_fenced=true
managed_target_promoted=true
application_cutover_approved=true

If any of those statements are still uncertain, do not execute the blueprint. In particular, confirm the fenced hosts still map to the source state you intend to retire, not to a separately rebuilt failback target that reused the same IPs.

Validate and execute¶

hyops blueprint validate --ref dr/postgresql-cloudsql-promote-gcp@v1
hyops blueprint preflight --env dev --file /home/user/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-promote-gcp.yml
hyops blueprint deploy --env dev --file /home/user/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-promote-gcp.yml --execute

Preflight should succeed without operator approval. The deploy path still stops unless the manual gate inputs are changed to confirm: true and all gate assertions are set to true.

Verify¶

Confirm:

standby status step succeeded and reported the DMS lane RUNNING
manual gate state is cap.control.manual_gate = confirmed
if you rerun managed_standby_status after provider-native promotion, that status instance should now flip to status=error instead of preserving the earlier green RUNNING snapshot
DNS cutover state is cap.network.dns_routing = ready
DNS status step succeeded with dns.status = live-ok
the published record now targets the managed Cloud SQL endpoint
application writes land only on the promoted GCP primary

Promote PostgreSQL Cloud SQL DR in GCP (HyOps Blueprint)¶

Context¶

Manual gate expectations¶

Validate and execute¶

Verify¶

Related¶