Skip to content

Provision On-Prem Ops Runner (HyOps Blueprint)

  • Purpose: Provision and bootstrap a shared on-prem runner VM on the management network so failback and local platform workflows can run from inside the restored on-prem site. Owner: Platform engineering / SRE

  • Trigger: On-prem control-plane bootstrap, failback pipeline preparation, or runner rebuild

  • Impact: Creates one shared on-prem runner VM and bootstraps the HybridOps runner toolchain on it for failback or local execution
  • Severity: P2 Pre-reqs: Proxmox and the on-prem management network are reachable, the current on-prem Linux runner template contract is available or buildable (currently ubuntu-22.04), and operators have a real management IP/gateway for the runner VM.

  • Rollback strategy: Run platform/linux/ops-runner#onprem_ops_runner_bootstrap with runner_state: absent if needed, then destroy platform/onprem/platform-vm#onprem_ops_runner.

Context

Blueprint ref: networking/onprem-ops-runner@v1 Location (example file): ~/.hybridops/envs/<env>/config/blueprints/onprem-ops-runner.yml

Step flow:

  1. core/onprem/template-image#template_image_ubuntu_22_04 (template_key: ubuntu-22.04)
  2. platform/onprem/platform-vm#onprem_ops_runner
  3. platform/linux/ops-runner#onprem_ops_runner_bootstrap

This blueprint is the preferred on-prem execution bootstrap for:

  • runner-local failback workflows
  • steady-state on-prem platform maintenance
  • local control-plane work that should not run from an operator workstation

The runner bootstrap step is intentionally rerun on deploy. A green historical bootstrap state is not enough evidence after a VM rebuild or an IP/gateway correction.

The runner path is intentionally pinned to Ubuntu LTS. That keeps the on-prem execution host aligned with the validated vyos-vm-images builder posture used by core/shared/vyos-image-build, instead of depending on a Rocky-specific Packer fallback.

The blueprint uses Proxmox-init bastion auto-detection for first-hop SSH readiness and bootstrap, with the Proxmox jump user set explicitly to root, so operators do not need direct workstation reachability to the on-prem management subnet when Proxmox remains reachable.

Preconditions and safety checks

  1. Validate the shipped blueprint and create an env-scoped overlay:

    hyops blueprint init --env dev \
      --ref networking/onprem-ops-runner@v1 \
      --dest-name onprem-ops-runner.yml
    
  2. Edit the overlay and replace the placeholders:

  3. CHANGE_ME_RUNNER_IP_CIDR

  4. CHANGE_ME_RUNNER_GATEWAY

  5. Validate and preflight:

    hyops blueprint validate --ref networking/onprem-ops-runner@v1
    
    hyops blueprint preflight --env dev \
      --file "$HOME/.hybridops/envs/dev/config/blueprints/onprem-ops-runner.yml"
    

Execute

From a source checkout during development:

HYOPS_CORE_ROOT=/path/to/hybridops-core \
hyops blueprint deploy --env dev \
  --file "$HOME/.hybridops/envs/dev/config/blueprints/onprem-ops-runner.yml" \
  --execute

From a packaged install:

hyops blueprint deploy --env dev \
  --file "$HOME/.hybridops/envs/dev/config/blueprints/onprem-ops-runner.yml" \
  --execute

Runner dispatch preserves HYOPS_ENV for remote jobs, so env-scoped naming stays intact on shared metal. That means physical VM names on Proxmox continue to come out as environment-prefixed names such as dev-pgha-01, rather than collapsing to bare logical keys like pgha-01.

Verify

hyops state show --env dev --module platform/onprem/platform-vm#onprem_ops_runner
hyops state show --env dev --module platform/linux/ops-runner#onprem_ops_runner_bootstrap

Expected:

  • platform/onprem/platform-vm#onprem_ops_runner is status: ok
  • platform/linux/ops-runner#onprem_ops_runner_bootstrap is status: ok
  • one VM named like platform-shared-runner-01
  • HybridOps installed under /opt/hybridops/core

How this fits failback

Use this runner as the execution host for:

That keeps the failback posture consistent with cloud failover:

  • operator workstation is only the dispatcher
  • real execution happens on the target-side runner
  • failback does not depend on ad hoc SSH from a laptop

References