Runner-Local DR Execution Model¶
Purpose¶
Describe the intended operating path for HybridOps DR and burst workflows when execution is performed by a shared runner in or near the target environment.
Why this exists¶
Workstation-driven DR is not an acceptable default because:
- the operator laptop may not have private cloud reachability
- the operator laptop may be unavailable during an incident
- public IP on every target VM is not the right steady-state answer
- DR should remain executable even when on-prem is partially or fully unavailable
See also:
Recommended execution chain¶
- Observability signals a condition.
- Decision service evaluates policy and emits a decision artifact.
- Workflow runner validates the decision, approvals, and prerequisites.
- Runner executes the selected HyOps blueprint/module path.
- Data services are restored or promoted.
- Required secrets are synchronized.
- GitOps reconciles stateless workloads.
- Traffic cutover occurs.
- Backup and observability are re-established.
- Evidence is published.
Traffic cutover SHOULD prefer internal DNS endpoint updates against the shared internal authority rather than application-by-application raw IP rewrites.
Control-plane pattern¶
The runner SHOULD be outside the primary on-prem failure domain.
Examples:
- shared cloud runner
- on-prem runner on the restored management network
- edge execution node
- dedicated automation VM in the cloud hub
Current HybridOps reference implementation:
- bootstrap a private GCP runner in the hub core subnet via
networking/gcp-ops-runner@v1 - bootstrap an on-prem runner on the management network via
networking/onprem-ops-runner@v1 - bootstrap the runner host software via
platform/linux/ops-runner - dispatch env-scoped blueprint overlays to that runner via:
hyops runner blueprint preflighthyops runner blueprint deploy- use that runner for blueprints that declare
execution_plane: runner-local
The runner SHOULD have private reachability to the chosen target subnet/VPC.
GCP project-role model¶
For GCP, the runner-local model is clearer when project roles are explicit:
host/network project- Shared VPC
- Cloud Router / NAT
- runner VM placement
- private DR VM placement
control project- env-scoped secret authority such as GCP Secret Manager
- env-scoped object repositories such as backup buckets
workload project- optional later role for env-specific service projects
This means the runner can execute in the host/network project while still reading secrets or backup state from the env control project.
Access patterns¶
Preferred¶
- runner-local private reachability
Acceptable fallback¶
- explicit bastion
- provider-specific access such as GCP IAP
Last-resort drill mode¶
- temporary public IPs with restricted firewall rules
Normal hybrid vs DR¶
Normal workload burst may rely on:
- BGP/IPsec
- routed overlays
- private site-to-cloud connectivity
DR must still work if the source on-prem site is down.
Therefore:
- normal hybrid connectivity is valuable
- but DR execution cannot depend on that source site remaining alive
Practical HyOps implications¶
HybridOps blueprints and modules SHOULD be designed so that:
- cloud DR inventories do not assume workstation reachability
- access mode is explicit
- repository, network, and project data are consumed from state
- step preflight fails before execution when access prerequisites are missing
HybridOps runner dispatch SHOULD:
- treat the local runtime as the authority
- stage a disposable runtime bundle onto the runner
- execute
hyopson the runner against that staged runtime - sync resulting state, logs, meta, and artifacts back to the local runtime
- write local runner evidence for every dispatch
- remove the local staged runner bundle and dispatch env after successful runs by default, so one-shot synced secrets do not linger in the runtime work directory
- allow explicit one-shot env forwarding for secrets, resolving requested keys from the selected runtime vault first and allowing operator-shell overrides during the transition to fully managed secret synchronization
- allow a pre-dispatch secret-source refresh so the runtime vault cache can be repopulated from an external authority such as HashiCorp Vault or GCP Secret Manager before the runner job is staged
- project Terraform Cloud authentication from the selected runtime when local
hyops init terraform-cloudhas already established a usable token, so TFC-backed remote jobs do not require a second interactive login on the target runner - when Terraform Cloud is used as the backend for runner-executed workflows,
treat it as the state backend and ensure HyOps-managed workspaces execute in
localmode on the runner rather than on Terraform Cloud infrastructure
Future integration points¶
Expected future integrations:
- decision service emits
dr_mode,target_cloud,execution_plane, and related policy fields - runner consumes those fields and selects the right blueprint
- secret sync from managed secret stores is performed as a distinct execution phase
Guardrails¶
Do:
- keep target VMs private by default
- use a shared runner or explicit bastion
- keep access mode visible in workflow inputs and evidence
Do not:
- rely on the operator workstation as the product default
- assume convenience bastion inference is valid for cloud DR
- make public IP per target node the standard operating posture
Outcome¶
If this model is followed, HybridOps can offer:
- a clean DR story for SMEs and schools
- a supportable default execution posture
- a path to more advanced decision-driven recovery later