DR Failover to Cloud¶
Executive summary¶
This showcase demonstrates a practical disaster recovery (DR) pattern where:
- Primary workloads run on-prem in a Proxmox / Kubernetes environment.
- A warm standby environment exists in the cloud.
- Controlled DNS cutover is used to switch traffic during an incident.
- Evidence is captured for both failover and failback.
The goal is to show a realistic, cost-aware DR approach suitable for SMEs and larger environments.
Case study – how this was used in practice¶
- Context: Hybrid lab environment used to rehearse DR scenarios and support portfolio evidence.
- Challenge: Show a DR strategy that is realistic, testable and not purely theoretical.
- Approach: Built a minimal viable DR pattern: warm standby in cloud, DNS-based cutover, and explicit runbooks for failover/failback.
- Outcome: Clear, repeatable DR steps with artefacts suitable for auditors, assessors and engineering leaders.
Related decisions (for example):
Demo¶
Video walkthrough¶
- Video: https://www.youtube.com/watch?v=DR_FAILOVER_DEMO
The demo walks through:
- Normal operation with on-prem environment serving traffic.
- Simulated incident impacting on-prem services.
- DNS cutover to the cloud environment.
- Validation of application health and data paths in the cloud.
Screenshots¶


Architecture¶
- High-level diagram:

- Key components:
- On-prem: Proxmox / Kubernetes cluster hosting primary workloads.
- Cloud: Minimal standby environment sized for DR, not steady-state peak.
- Control plane: DNS provider (and optionally load balancer) used to switch traffic.
- State: Either replicated or re-hydrated from backups, depending on scenario.
Optional detailed diagrams:
Implementation highlights¶
- Warm standby rather than always-on multi-region active/active to control cost.
- DNS-driven cutover with clear TTL and rollback considerations.
- Runbooks for both failover and failback, with evidence captured for each exercise.
- Designed to be demonstrable in a home lab while mapping to real-world DR expectations.
Assets and source¶
-
GitHub folder for this showcase:
https://github.com/hybridops-studio/hybridops-studio/tree/main/showcases/dr-failover-to-cloud -
Infra and scripts:
infra/terraform/– DR infrastructure definitions.core/ansible/– configuration for DR roles.-
./scripts/– helper scripts used during DR drills. -
Evidence:
./evidence/– screenshots, logs and exported dashboards from DR test runs.
Academy track (if applicable)¶
In the Academy, this scenario can be turned into a guided DR exercise where learners:
- Execute the failover and validate application health.
- Perform a controlled failback and verify state consistency.
- Capture and review evidence to support an internal DR review.
Role-based lens (optional)¶
- Platform Engineer / SRE: focuses on reliability, observability and safe procedures.
- Infrastructure Engineer: sees practical replication and failover mechanics.
- Engineering Manager / Hiring Manager: sees a realistic DR approach rather than a purely theoretical slide.