Skip to content

DR Execution and Access Model

Purpose: Define the clean execution and access model for HybridOps disaster recovery and workload burst operations so recovery does not depend on an operator workstation.

This standard applies to:

  • DR blueprints
  • burst-to-cloud blueprints
  • decision service integrations
  • automation runners that invoke hyops

It complements:

0. Provider composition rule

HybridOps SHOULD model runner enablement as four separate concerns:

  • provider-specific egress adapter
  • provider-specific compute placement
  • provider-specific access adapter
  • generic runner bootstrap

That means:

  • platform/linux/ops-runner remains generic
  • GCP/Azure/AWS/Proxmox handle VM placement in their own modules
  • provider-specific egress is handled separately from runner bootstrap
  • runner blueprints compose these layers instead of hiding them inside one module

0.1 GCP project-role rule

When HybridOps uses GCP, documentation and inputs SHOULD distinguish project roles instead of implying that one GCP project owns every DR asset.

Preferred roles:

  • host/network project
  • Shared VPC
  • Cloud Router / NAT
  • private runner placement
  • control project
  • env-scoped secret authority adapters such as GCP Secret Manager
  • env-scoped object repositories such as backup buckets
  • other env-scoped control artifacts
  • workload project
  • optional future location for env-scoped compute or service projects when Shared VPC attachment is intentionally in use

Norms:

  • runner placement MAY live in the host/network project
  • secrets and object repositories MAY live in the control project
  • documents and blueprints MUST NOT collapse these roles into a generic project_id story when different projects are intentionally used

1. Default execution rule

HybridOps DR and burst workflows MUST default to runner-local execution, not workstation-direct execution.

Meaning:

  • the workflow is triggered by policy/decision
  • execution happens from a controlled runner in or near the chosen target environment
  • the operator workstation is optional and must not be the assumed control plane
  • runner provisioning and runner bootstrap are distinct concerns

This is the default product posture for SMEs, schools, and enterprise customers.

2. Planes

HybridOps SHOULD model DR and burst around four planes.

2.1 Decision plane

  • evaluates health, policy, and target selection
  • emits a decision artifact
  • does not perform infrastructure changes directly

2.2 Execution plane

  • runs HyOps blueprints/modules
  • captures evidence
  • applies approvals and safety gates

2.3 Connectivity plane

  • provides private reachability between sites and clouds during normal operation
  • may use BGP/IPsec, WireGuard, or equivalent overlay/underlay patterns
  • must not be assumed to survive an on-prem outage during DR

2.4 Workload plane

  • reconciles stateless workloads from GitOps sources
  • consumes restored/promoted data services
  • performs cutover after data and platform readiness checks pass

3. Access modes

HybridOps SHOULD support explicit access modes for cloud and DR targets.

3.1 runner-local (preferred default)

The automation runner has private L3 reachability to the target subnet or VPC.

Use when:

  • running DR in GCP, Azure, or AWS
  • bursting workloads into cloud
  • executing from a shared control plane

Advantages:

  • no per-VM public IP requirement
  • cleanest enterprise posture
  • strongest fit for pipeline-driven operations

Norm:

  • a runner-local target still needs explicit outbound egress for bootstrap and tool delivery

3.2 private-overlay

Reachability is provided through a private inter-site path such as:

  • BGP over IPsec
  • Cloud VPN
  • WireGuard overlay
  • equivalent routed hybrid connectivity

Use when:

  • on-prem and cloud need continuous hybrid connectivity
  • workload burst depends on private east-west paths

Norm:

  • useful for normal hybrid operation
  • not sufficient alone as the DR execution assumption, because on-prem may be unavailable

3.3 bastion-explicit

An explicit bastion is provided by contract and used intentionally.

Use when:

  • runner-local private reachability is not yet available
  • a controlled hop host is acceptable

Norms:

  • bastion usage MUST be explicit
  • auto-inferred bastions MUST NOT be assumed for cloud DR targets

3.4 gcp-iap

Google Cloud IAP TCP forwarding may be used as a provider-specific access mode.

Use when:

  • targets are private-only GCE VMs
  • the product is operating in GCP

Norm:

  • this is valid as a provider-specific enhancement
  • it MUST remain optional, not the cross-cloud default

3.5 public-ephemeral

Temporary public access to target VMs for a drill or constrained fallback.

Norms:

  • drill-only or break-glass
  • MUST be time-bounded
  • MUST use tightly scoped firewall rules
  • MUST NOT be the shipped default for DR database nodes

4. Product rules

HybridOps MUST follow these rules.

  1. Cloud DR blueprints SHOULD default to private-only compute nodes.
  2. Cloud DR workflows MUST NOT assume the operator laptop can reach private cloud IPs.
  3. ssh_proxy_jump_auto or equivalent convenience logic MUST be limited to on-prem or local-lab scenarios.
  4. Cloud DR and burst workflows SHOULD prefer runner-local execution.
  5. Public IP on every DR VM MUST NOT be the default product posture.

5. Control-plane posture

HybridOps SHOULD use a shared control plane outside the primary on-prem failure domain.

The shared control plane may host:

  • workflow runner
  • evidence collection
  • decision service
  • policy/approval engine
  • future secret sync orchestration

This control plane is distinct from:

  • the source on-prem site
  • the DR target site

Reference shape:

  • create the runner host with a platform VM blueprint or module
  • bootstrap the runner host with the shipped runner bootstrap module
  • use a cloud-side runner for failover and an on-prem runner for failback
  • then execute DR or burst workflows from that runner

6. Decision-service boundary

Decision service MUST:

  • observe
  • evaluate
  • select
  • emit a decision

Decision service MUST NOT:

  • directly create infrastructure
  • directly mutate cloud or on-prem resources
  • bypass workflow approvals and evidence collection

Execution runners consume decision outputs and invoke HyOps.

7. Data and secret handling

During DR:

  • data recovery/promotion must happen before workload cutover
  • secrets sync must be a separate, explicit phase
  • secret authority transitions must be documented and evidence-backed

External secret sync SHOULD plug into the execution plane after the DR target has been selected and before application cutover. HashiCorp Vault is the preferred neutral authority; cloud-native secret stores remain valid adapters.

For the target market:

  • default posture: private-only DR targets, runner-local execution
  • normal hybrid posture: BGP/IPsec or equivalent private interconnect
  • drill fallback: explicit bastion or temporary public access only when justified

This gives the cleanest story for:

  • SMEs
  • schools
  • enterprise customers who want a credible upgrade path later