Skip to content

Establish PostgreSQL Cloud SQL Standby in GCP (HyOps Blueprint)

  • Purpose: Stand up the managed GCP DR lane without cutting production traffic. Owner: Platform engineering / SRE

  • Trigger: Planned DR readiness work or recurring standby validation drill

  • Impact: A managed GCP standby exists and publishes a normalized endpoint contract, but applications continue to use the current primary.
  • Severity: P2 Pre-reqs: hyops init gcp is complete, the on-prem PostgreSQL HA lane is healthy, the Patroni replication credential exists in runtime vault/env, the selected GCP project/network path is already reachable, and the source-contract step points at the current authoritative on-prem PostgreSQL state instance.

  • Rollback strategy: Destroy the managed standby lane only; no application cutover reversal is needed because traffic is not changed by this blueprint.

Context

Blueprint ref: dr/postgresql-cloudsql-standby-gcp@v1 Location: hybridops-core/blueprints/dr/postgresql-cloudsql-standby-gcp@v1/blueprint.yml

Default step flow:

  1. platform/onprem/postgresql-dr-source
  2. org/gcp/cloudsql-external-replica
  3. org/gcp/cloudsql-external-replica (apply_mode=status, required state RUNNING)

Important:

  • this blueprint does not promote or cut over traffic
  • the Cloud SQL target is created through the DMS destination connection profile path used by org/gcp/cloudsql-external-replica
  • the resulting state publishes the same endpoint fields used by platform/postgresql-ha
  • the blueprint now finishes with a live DMS status verification step, so a stale earlier green status cannot stand in for a currently broken standby lane
  • org/gcp/cloudsql-external-replica#managed_standby is historical establish evidence, not the live readiness signal by itself; use #managed_standby_status for current lane truth

Required operator inputs

At minimum, provide:

  • project_state_ref or project_id
  • network_state_ref or private_network
  • region
  • source_connection_profile_name
  • destination_connection_profile_name
  • migration_job_name
  • source_replication_user
  • source_replication_password_env

Recommended default:

  • source_replication_password_env: PATRONI_REPLICATION_PASSWORD
  • source_ssl_type: NONE for the current on-prem PostgreSQL HA lane unless you have explicitly provisioned DMS TLS material
  • datamigration.googleapis.com enabled in the destination GCP project before apply_mode=establish

Optional but recommended:

  • endpoint_dns_name when you want the managed lane to publish a stable cutover record target
  • gcloud_active_account when the runner must assert the exact operator account
  • if connectivity_mode: reverse-ssh is used across the site-extension path, ensure the runner can already reach the on-prem source host and port
  • when the source subnet returns through a non-HyOps upstream gateway, enable consumer SNAT on platform/network/vyos-site-extension-onprem
  • for Cloud SQL private-IP lanes, include the destination private service access range in that SNAT set when the on-prem side can observe it as the effective source
  • ensure the authoritative on-prem platform/postgresql-ha lane grants the replication user access from the effective translated source address that PostgreSQL will actually see

Authoritative source rule:

  • if the active on-prem lane is published as an explicit state instance, set the source-contract step inventory_state_ref and db_state_ref to that exact platform/postgresql-ha#<instance> value
  • do not rely on a stale bare platform/onprem/postgresql-ha or platform/postgresql-ha latest slot when the active lane has already moved
  • when that authoritative instance was created by restore or failback, reconcile it once with platform/postgresql-ha using apply_mode=maintenance, pglogical_enable=true, and pending_restart=true before building the managed standby lane

Prepare the env-scoped blueprint file

hyops blueprint init --env dev \
  --ref dr/postgresql-cloudsql-standby-gcp@v1 \
  --dest-name dr-postgresql-cloudsql-standby-gcp.yml

The shipped blueprint is a scaffold. Edit:

  • ~/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-standby-gcp.yml
  • replace every CHANGE_ME_* value before preflight or deploy

Set at minimum:

  • region
  • project_state_ref or project_id
  • network_state_ref or private_network
  • source_connection_profile_name
  • destination_connection_profile_name
  • migration_job_name
  • source_replication_user
  • source_replication_password_env

Validate and execute

hyops blueprint validate --file "$HOME/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-standby-gcp.yml"
hyops blueprint preflight --env dev \
  --file "$HOME/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-standby-gcp.yml"
hyops blueprint deploy --env dev \
  --file "$HOME/.hybridops/envs/dev/config/blueprints/dr-postgresql-cloudsql-standby-gcp.yml" \
  --execute

Verify

Confirm the replica state publishes:

  • cap.db.managed_external_replica = established
  • managed_replication_established = true
  • managed_replication_ready_for_cutover = true once the DMS job reaches RUNNING
  • endpoint_target
  • endpoint_port
  • endpoint_cutover_required

Also confirm the status state instance:

  • org/gcp/cloudsql-external-replica#managed_standby_status is status: ok
  • published migration_job_state is RUNNING

If endpoint_dns_name is blank, endpoint_target will be the Cloud SQL private host and endpoint_cutover_required=true.

Current managed-standby verification

Use these checks when you need the current managed DR lane truth:

hyops show module org/gcp/cloudsql-external-replica#managed_standby --env dev
hyops show module org/gcp/cloudsql-external-replica#managed_standby_status --env dev
hyops show module platform/network/dns-routing#postgresql_dns_status_cloudsql_failback_onprem --env dev

Expected:

  • managed_replication_established: true
  • the status instance reports the migration job as RUNNING
  • the DNS status output still shows the currently active service location separately from standby readiness