Establish PostgreSQL Cloud SQL Standby in GCP (HyOps Blueprint)¶
-
Purpose: Stand up the managed GCP DR lane without cutting production traffic. Owner: Platform engineering / SRE
-
Trigger: Planned DR readiness work or recurring standby validation drill
- Impact: A managed GCP standby exists and publishes a normalized endpoint contract, but applications continue to use the current primary.
-
Severity: P2 Pre-reqs:
hyops init gcpis complete, the on-prem PostgreSQL HA lane is healthy, the Patroni replication credential exists in runtime vault/env, the selected GCP project/network path is already reachable, and the source-contract step points at the current authoritative on-prem PostgreSQL state instance. -
Rollback strategy: Destroy the managed standby lane only; no application cutover reversal is needed because traffic is not changed by this blueprint.
Context¶
Blueprint ref: dr/postgresql-cloudsql-standby-gcp@v1
Location: hybridops-core/blueprints/dr/postgresql-cloudsql-standby-gcp@v1/blueprint.yml
Default step flow:
platform/onprem/postgresql-dr-sourceorg/gcp/cloudsql-external-replicaorg/gcp/cloudsql-external-replica(apply_mode=status, required stateRUNNING)
Important:
- this blueprint does not promote or cut over traffic
- the Cloud SQL target is created through the DMS destination connection profile path used by
org/gcp/cloudsql-external-replica - the resulting state publishes the same endpoint fields used by
platform/postgresql-ha - the blueprint now finishes with a live DMS status verification step, so a stale earlier green status cannot stand in for a currently broken standby lane
org/gcp/cloudsql-external-replica#managed_standbyis historical establish evidence, not the live readiness signal by itself; use#managed_standby_statusfor current lane truth
Required operator inputs¶
At minimum, provide:
project_state_reforproject_idnetwork_state_reforprivate_networkregionsource_connection_profile_namedestination_connection_profile_namemigration_job_namesource_replication_usersource_replication_password_env
Recommended default:
source_replication_password_env: PATRONI_REPLICATION_PASSWORDsource_ssl_type: NONEfor the current on-prem PostgreSQL HA lane unless you have explicitly provisioned DMS TLS materialdatamigration.googleapis.comenabled in the destination GCP project beforeapply_mode=establish
Optional but recommended:
endpoint_dns_namewhen you want the managed lane to publish a stable cutover record targetgcloud_active_accountwhen the runner must assert the exact operator account- if
connectivity_mode: reverse-sshis used across the site-extension path, ensure the runner can already reach the on-prem source host and port - when the source subnet returns through a non-HyOps upstream gateway, enable consumer SNAT on
platform/network/vyos-site-extension-onprem - for Cloud SQL private-IP lanes, include the destination private service access range in that SNAT set when the on-prem side can observe it as the effective source
- ensure the authoritative on-prem
platform/postgresql-halane grants the replication user access from the effective translated source address that PostgreSQL will actually see
Authoritative source rule:
- if the active on-prem lane is published as an explicit state instance, set the source-contract step
inventory_state_refanddb_state_refto that exactplatform/postgresql-ha#<instance>value - do not rely on a stale bare
platform/onprem/postgresql-haorplatform/postgresql-halatest slot when the active lane has already moved - when that authoritative instance was created by restore or failback, reconcile it once with
platform/postgresql-hausingapply_mode=maintenance,pglogical_enable=true, andpending_restart=truebefore building the managed standby lane
Prepare the env-scoped blueprint file¶
hyops blueprint init --env dev \
--ref dr/postgresql-cloudsql-standby-gcp@v1 \
--dest-name dr-postgresql-cloudsql-standby-gcp.yml
The shipped blueprint is a scaffold. Edit:
~/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-standby-gcp.yml- replace every
CHANGE_ME_*value before preflight or deploy
Set at minimum:
regionproject_state_reforproject_idnetwork_state_reforprivate_networksource_connection_profile_namedestination_connection_profile_namemigration_job_namesource_replication_usersource_replication_password_env
Validate and execute¶
hyops blueprint validate --file "$HOME/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-standby-gcp.yml"
hyops blueprint preflight --env dev \
--file "$HOME/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-standby-gcp.yml"
hyops blueprint deploy --env dev \
--file "$HOME/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-standby-gcp.yml" \
--execute
Verify¶
Confirm the replica state publishes:
cap.db.managed_external_replica = establishedmanaged_replication_established = truemanaged_replication_ready_for_cutover = trueonce the DMS job reachesRUNNINGendpoint_targetendpoint_portendpoint_cutover_required
Also confirm the status state instance:
org/gcp/cloudsql-external-replica#managed_standby_statusisstatus: ok- published
migration_job_stateisRUNNING
If endpoint_dns_name is blank, endpoint_target will be the Cloud SQL private host and endpoint_cutover_required=true.
Current managed-standby verification¶
Use these checks when you need the current managed DR lane truth:
hyops show module org/gcp/cloudsql-external-replica#managed_standby --env dev
hyops show module org/gcp/cloudsql-external-replica#managed_standby_status --env dev
hyops show module platform/network/dns-routing#postgresql_dns_status_cloudsql_failback_onprem --env dev
Expected:
managed_replication_established: true- the status instance reports the migration job as
RUNNING - the DNS status output still shows the currently active service location separately from standby readiness