Runner-Local DR Execution Model¶

Purpose¶

Describe the intended operating path for HybridOps DR and burst workflows when execution is performed by a shared runner in or near the target environment.

Why this exists¶

Workstation-driven DR is not an acceptable default because:

the operator laptop may not have private cloud reachability
the operator laptop may be unavailable during an incident
public IP on every target VM is not the right steady-state answer
DR should remain executable even when on-prem is partially or fully unavailable

Recommended execution chain¶

Observability signals a condition.
Decision service evaluates policy and emits a decision artifact.
Workflow runner validates the decision, approvals, and prerequisites.
Runner executes the selected HyOps blueprint/module path.
Data services are restored or promoted.
Required secrets are synchronized.
GitOps reconciles stateless workloads.
Traffic cutover occurs.
Backup and observability are re-established.
Evidence is published.

Traffic cutover SHOULD prefer internal DNS endpoint updates against the shared internal authority rather than application-by-application raw IP rewrites.

Control-plane pattern¶

The runner SHOULD be outside the primary on-prem failure domain.

Examples:

shared cloud runner
on-prem runner on the restored management network
edge execution node
dedicated automation VM in the cloud hub

Current HybridOps reference implementation:

bootstrap a private GCP runner in the hub core subnet via networking/gcp-ops-runner@v1
bootstrap an on-prem runner on the management network via networking/onprem-ops-runner@v1
bootstrap the runner host software via platform/linux/ops-runner
dispatch env-scoped blueprint overlays to that runner via:
hyops runner blueprint preflight
hyops runner blueprint deploy
use that runner for blueprints that declare execution_plane: runner-local

The runner SHOULD have private reachability to the chosen target subnet/VPC.

GCP project-role model¶

For GCP, the runner-local model is clearer when project roles are explicit:

host/network project
Shared VPC
Cloud Router / NAT
runner VM placement
private DR VM placement
control project
env-scoped secret authority such as GCP Secret Manager
env-scoped object repositories such as backup buckets
workload project
optional later role for env-specific service projects

This means the runner can execute in the host/network project while still reading secrets or backup state from the env control project.

Access patterns¶

Preferred¶

runner-local private reachability

Acceptable fallback¶

explicit bastion
provider-specific access such as GCP IAP

Last-resort drill mode¶

temporary public IPs with restricted firewall rules

Normal hybrid vs DR¶

Normal workload burst may rely on:

BGP/IPsec
routed overlays
private site-to-cloud connectivity

DR must still work if the source on-prem site is down.

Therefore:

normal hybrid connectivity is valuable
but DR execution cannot depend on that source site remaining alive

Practical HyOps implications¶

HybridOps blueprints and modules SHOULD be designed so that:

cloud DR inventories do not assume workstation reachability
access mode is explicit
repository, network, and project data are consumed from state
step preflight fails before execution when access prerequisites are missing

HybridOps runner dispatch SHOULD:

treat the local runtime as the authority
stage a disposable runtime bundle onto the runner
execute hyops on the runner against that staged runtime
sync resulting state, logs, meta, and artifacts back to the local runtime
write local runner evidence for every dispatch
remove the local staged runner bundle and dispatch env after successful runs by default, so one-shot synced secrets do not linger in the runtime work directory
allow explicit one-shot env forwarding for secrets, resolving requested keys from the selected runtime vault first and allowing operator-shell overrides during the transition to fully managed secret synchronization
allow a pre-dispatch secret-source refresh so the runtime vault cache can be repopulated from an external authority such as HashiCorp Vault or GCP Secret Manager before the runner job is staged
project Terraform Cloud authentication from the selected runtime when local hyops init terraform-cloud has already established a usable token, so TFC-backed remote jobs do not require a second interactive login on the target runner
when Terraform Cloud is used as the backend for runner-executed workflows, treat it as the state backend and ensure HyOps-managed workspaces execute in local mode on the runner rather than on Terraform Cloud infrastructure

Future integration points¶

Expected future integrations:

decision service emits dr_mode, target_cloud, execution_plane, and related policy fields
runner consumes those fields and selects the right blueprint
secret sync from managed secret stores is performed as a distinct execution phase

Guardrails¶

Do:

keep target VMs private by default
use a shared runner or explicit bastion
keep access mode visible in workflow inputs and evidence

Do not:

rely on the operator workstation as the product default
assume convenience bastion inference is valid for cloud DR
make public IP per target node the standard operating posture

Outcome¶

If this model is followed, HybridOps can offer:

a clean DR story for SMEs and schools
a supportable default execution posture
a path to more advanced decision-driven recovery later