Skip to content

Operate RKE2 Cluster Module (HyOps)

Purpose: Install/validate/destroy RKE2 on provisioned VMs through a single module lifecycle.
Owner: Platform engineering
Trigger: Cluster build, rebuild, or cleanup for an environment
Impact: Changes Kubernetes control-plane/worker runtime on target VMs
Severity: P2
Pre-reqs: platform/onprem/platform-vm state is ok; vault decrypt works; SSH reachability exists (direct or bastion).
Rollback strategy: Run module destroy, then re-apply using known-good inputs.

Context

Module ref: platform/onprem/rke2-cluster

This module consumes VM inventory from module state and uses the Ansible driver to converge RKE2 roles.

Preconditions and safety checks

Path behavior:

  • Installed hyops (via install.sh) can be run from any working directory.
  • If you want to use the shipped example overlays, set:
export HYOPS_CORE_ROOT="${HYOPS_CORE_ROOT:-$HOME/.hybridops/core/app}"

For source checkout usage, set HYOPS_CORE_ROOT to your hybridops-core checkout root instead.

  1. Ensure required secret key exists:
hyops secrets ensure --env dev RKE2_TOKEN
  1. Ensure VM inventory module is ready:
cat "$HOME/.hybridops/envs/dev/state/modules/platform__onprem__platform-vm/latest.json"

Expect "status": "ok".

  1. Validate module inputs before apply:
hyops preflight --env dev --strict \
  --module platform/onprem/rke2-cluster \
  --inputs "$HYOPS_CORE_ROOT/modules/platform/onprem/rke2-cluster/examples/inputs.typical.yml"

Steps

  1. Apply (typical)
hyops apply --env dev \
  --module platform/onprem/rke2-cluster \
  --inputs "$HYOPS_CORE_ROOT/modules/platform/onprem/rke2-cluster/examples/inputs.typical.yml"
  1. Verify module state and kubeconfig path
cat "$HOME/.hybridops/envs/dev/state/modules/platform__onprem__rke2-cluster/latest.json"

Expect:

  • "status": "ok"
  • outputs.cap.k8s.rke2 = "ready"
  • outputs.kubeconfig_path exists

  • Verify cluster nodes

KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" kubectl get nodes -o wide
  1. Destroy (cleanup)
hyops destroy --env dev \
  --module platform/onprem/rke2-cluster \
  --inputs "$HYOPS_CORE_ROOT/modules/platform/onprem/rke2-cluster/examples/inputs.typical.yml"

Verification

Success indicators:

  • Apply exits 0 and writes module state to:
  • $HOME/.hybridops/envs/<env>/state/modules/platform__onprem__rke2-cluster/latest.json
  • Evidence path is printed during run under:
  • $HOME/.hybridops/envs/<env>/logs/module/platform__onprem__rke2-cluster/<run_id>/
  • Driver log file exists:
  • ansible.log

Troubleshooting

  • Connectivity failures (cannot reach ...:22):
  • Ensure workstation has L3 reachability or configure bastion settings (ssh_proxy_jump_*).
  • Inventory state not ready (status=destroyed or missing):
  • Re-apply platform/onprem/platform-vm first.
  • Long-running phases:
  • Follow printed progress: logs=... path.
  • Optional: export HYOPS_PROGRESS_INTERVAL_S=30
  • First converge is usually slower after destroy/rebuild because each node pulls RKE2 runtime/control-plane images and waits for CNI readiness.
  • This is normal when nodes have cold image caches.
  • If convergence is consistently too slow, increase VM sizing (recommended baseline: control-plane 4 vCPU / 8 GiB, worker 2 vCPU / 4 GiB) and verify registry egress/DNS latency.

References