Skip to content

Managed PostgreSQL DR with Cloud SQL

Overview

Managed PostgreSQL DR with Cloud SQL is the lower-overhead recovery path for PostgreSQL. The primary database stays on-prem until promotion is approved, while Cloud SQL provides a managed recovery lane.

It suits teams that want a credible cloud recovery option without operating a full second PostgreSQL HA estate throughout the year.

Case study

  • Context: the on-prem PostgreSQL HA estate needed a tested cloud recovery option. Running a second self-managed HA estate in GCP year-round was considered too costly for the current operational stage.
  • Challenge: cloud standby capacity had to be replication-ready without requiring a permanently staffed second database estate, and promotion and failback both had to remain under explicit operator control.
  • Approach: Cloud SQL as managed standby via org/gcp/cloudsql-external-replica. Logical replication keeps the standby current. Promotion is a discrete operator action. Return on-prem follows dr/postgresql-cloudsql-failback-onprem@v1 as a separate controlled step.
  • Outcome: the recorded drill established the managed standby, completed failback, returned DNS to 10.12.0.32, and demonstrated a lower-overhead recovery lane with explicit control at each step.

Covers Cloud SQL external replica setup, controlled promotion, failback on-prem, and final DNS confirmation after the recovery drill.

Outcome

This recovery lane reduces day-to-day operating effort while keeping promotion and return under explicit control.

  • The authoritative source remains on-prem until promotion is approved.
  • Cloud SQL provides the managed recovery lane instead of a permanently active second primary estate.
  • Final DNS and service position remain visible after failback.

Operating model

  • On-prem PostgreSQL HA stays the live write source in normal operation.
  • Cloud SQL remains prepared as the managed recovery lane.
  • Promotion is explicit and controlled.
  • Return on-prem is a separate controlled step.

Compared with the self-managed HA recovery cycle, this lane trades architectural symmetry for lower day-to-day operating effort.

Architecture

Managed PostgreSQL DR with Cloud SQL architecture showing the on-prem write source, managed Cloud SQL standby, explicit promotion step, and the separate controlled return to on-prem.

The managed standby removes the need to operate a second PostgreSQL HA estate year-round. Promotion and failback remain discrete, reviewable operator actions.

Recovery sequence

  1. The on-prem source is assessed and the Cloud SQL standby is brought to a replication-ready state.
  2. Promotion remains an explicit control action.
  3. Service returns on-prem through an isolated failback path.
  4. DNS and final service position are checked after the return.

Platform state

Cloud SQL instances list from the recorded drill, showing the source instance, external primary contract, and managed standby together Cloud SQL standby overview for hyops-dev-netbox-standby1, showing the managed standby posture and data-transfer view from the drill window Recorded gcloud Cloud SQL standby describe output from the drill window, showing the standby instance name, region, state, activation policy, and private address

IP addresses, hostnames, and instance identifiers visible in screenshots and recordings reflect the ephemeral infrastructure provisioned during the recorded exercise.

Where it fits

  • when a credible cloud recovery option matters more than full infrastructure symmetry
  • when a managed standby is preferable to operating a second Patroni estate
  • when lower ongoing database operations overhead is worth an intentionally asymmetric recovery design

Implementation

  • Source assessment: platform/onprem/postgresql-dr-source checks the on-prem estate before any standby operation begins.
  • Managed standby: Cloud SQL external replica keeps a replication-ready copy without a second self-managed cluster.
  • Promotion path: promotion remains a discrete step requiring explicit operator approval.
  • Failback path: dr/postgresql-cloudsql-failback-onprem@v1 returns the service on-prem as a separate controlled operation.
  • DNS layer: platform/network/dns-routing manages the cutover record at each stage.

Key components

  • Source posture and assessment: platform/onprem/postgresql-dr-source
  • Managed standby path: org/gcp/cloudsql-external-replica
  • Promote workflow: dr/postgresql-cloudsql-promote-gcp@v1
  • Controlled return workflow: dr/postgresql-cloudsql-failback-onprem@v1
  • DNS cutover layer: platform/network/dns-routing

References

Further reading
Implementation references
  • platform/onprem/postgresql-dr-source
  • org/gcp/cloudsql-external-replica#managed_standby
  • platform/network/dns-routing#postgresql_dns_status_cloudsql_failback_onprem

What was verified

Verified during the recorded HybridOps v1.0.1 managed standby drill with the standby established, the controlled promotion path tested, and the isolated failback drill completed.