Skip to content

Deploy RKE2 + Workloads (HyOps Blueprint)

  • Purpose: Deploy a complete on-prem RKE2 cluster with GitOps-ready Argo CD and External Secrets Operator bootstrap through a single governed blueprint run. Owner: Platform engineering

  • Trigger: New environment bring-up, full rebuild, or controlled E2E validation

  • Impact: Provisions VM infrastructure, installs RKE2, bootstraps Argo CD with the workloads root Application, and provisions the on-prem GCP Secret Manager service account secret required for ESO to begin reconciling platform secrets
  • Severity: P1 Pre-reqs: Proxmox init complete for target env, vault decrypt works, NetBox authority is ready (authoritative IPAM), SSH key access exists, RKE2_TOKEN in vault, HYOPS_GSM_SA_KEY_JSON in vault.

  • Rollback strategy: Destroy modules in reverse order or run a controlled rebuild from the same blueprint inputs.

Context

Blueprint ref: onprem/rke2-workloads@v1 Location: hybridops-core/blueprints/onprem/rke2-workloads@v1/blueprint.yml

Public workload baseline consumed after bootstrap:

  • clusters/onprem
  • clusters/burst
  • platform/external-secrets
  • platform/secret-stores

The public workload repo now treats the ESO and cluster secret-store path as part of the baseline platform surface, not as an internal-only extension. On-prem and burst do not use the same secret bootstrap method:

  • onprem/rke2-workloads@v1 uses platform/k8s/gsm-bootstrap
  • gcp/gke-burst@v1 uses platform/k8s/gcp-secret-store

Step flow:

# Step ID Module Phase
1 template_image_rocky9 core/onprem/template-image bootstrap
2 rke2_vms platform/onprem/platform-vm bootstrap
3 rke2_cluster platform/onprem/rke2-cluster operations
4 gitops_workloads platform/k8s/argocd-bootstrap operations
5 gsm_bootstrap platform/k8s/gsm-bootstrap operations

Steps 4 and 5 run on the controller (localhost) against the cluster kubeconfig: no remote SSH required from that point forward. After step 5 completes, ArgoCD owns the cluster state and GitOps takeover is complete.

Preconditions and safety checks

Path behavior

  • Installed hyops (via install.sh) can be run from any working directory.
  • Source-checkout usage should export HYOPS_CORE_ROOT=/path/to/hybridops-core.

1. Ensure NetBox authority is ready

This blueprint allocates VM IPs from NetBox (no hardcoded per-VM IPs) and consumes the shared SDN authority state.

Network placement defaults:

  • RKE2 control-plane VMs: vnetmgmt (operator/API reachability)
  • RKE2 worker VMs: vnetmgmt

If NetBox authority is not ready yet, run:

hyops blueprint deploy --env shared --ref onprem/bootstrap-netbox@v1 --execute

2. Ensure required secrets exist in the runtime vault

# RKE2 cluster join token (auto-generated if missing)
hyops secrets ensure --env dev RKE2_TOKEN

# GCP Service Account key JSON for ESO → GCP Secret Manager
# This must be set manually — it cannot be generated automatically.
hyops secrets set --env dev HYOPS_GSM_SA_KEY_JSON "$(cat /path/to/gcp-sa-key.json)"

HYOPS_GSM_SA_KEY_JSON must be the JSON key for a GCP Service Account with roles/secretmanager.secretAccessor on the target GCP project. This is the trust anchor for External Secrets Operator and is the only secret that cannot be sourced from GCP Secret Manager itself.

Confirm both keys are present in the vault:

hyops secrets list --env dev | grep -E 'RKE2_TOKEN|HYOPS_GSM_SA_KEY_JSON'

3. Set the GCP project ID in the on-prem cluster overlay

The on-prem ClusterSecretStore manifest uses a project ID placeholder that must be patched for each on-prem cluster target. Edit:

hybridops-workloads-src/hybridops-workloads/apps/platform/secret-stores/manifests/overlays/onprem/cluster-secret-store-patch.yaml

Replace REPLACE_GCP_PROJECT_ID with the actual GCP project ID and commit before running the blueprint.

4. Validate and plan

hyops blueprint validate --ref onprem/rke2-workloads@v1
hyops blueprint plan --ref onprem/rke2-workloads@v1

5. Run preflight

hyops blueprint preflight --env dev --ref onprem/rke2-workloads@v1

Steps

1. Execute full blueprint

hyops blueprint deploy --env dev \
  --ref onprem/rke2-workloads@v1 \
  --execute

If HyOps detects existing step state (rerun/replacement risk), it may prompt for confirmation. Use --yes for non-interactive runs.

2. Observe progress

  • HyOps prints a preflight summary before step execution begins.
  • Each module step prints progress: logs=... with the active log file path.
  • Long-running phases print a one-time log-watch hint and heartbeat status lines.

Optional live terminal streaming:

hyops --verbose blueprint deploy --env dev \
  --ref onprem/rke2-workloads@v1 \
  --execute

3. Verify cluster state after RKE2 (step 3)

KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" \
  kubectl get nodes -o wide

4. Verify ArgoCD after gitops_workloads (step 4)

KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" \
  kubectl -n argocd get applications

The root application hyops-workloads-root should be present and syncing.

For the current on-prem public baseline, confirm the secret-management applications are present as well:

KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" \
  kubectl -n argocd get applications | grep -E 'external-secrets|secret-stores'

5. Verify GSM secret after gsm_bootstrap (step 5)

KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" \
  kubectl -n external-secrets get secret gsm-sa-credentials

Expected output: the secret exists with a credentials.json key.

Verification

Success indicators:

  • Blueprint summary ends with status=ok.
  • All five step modules report status: ok.
  • platform/k8s/argocd-bootstrap output: cap.gitops.argocd = ready.
  • platform/k8s/gsm-bootstrap output: cap.k8s.gsm-bootstrap = ready.
  • ArgoCD root application hyops-workloads-root is present in the argocd namespace and reconciling.
  • platform-external-secrets and platform-secret-stores applications are present in ArgoCD for the on-prem target.
  • gsm-sa-credentials secret is present in the external-secrets namespace.
  • After ESO sync wave completes, ExternalSecret resources (for example platform-keycloak-secrets in the keycloak namespace) report Ready = True.

Run record paths:

  • Blueprint step records and module logs: $HOME/.hybridops/envs/<env>/logs/module/...
  • Cluster kubeconfig: $HOME/.hybridops/envs/<env>/state/kubeconfigs/rke2.yaml
  • Module state files: $HOME/.hybridops/envs/<env>/state/modules/platform__k8s__*/latest.json

Troubleshooting

Preflight fails before execution: Fix reported failures and rerun. Use --skip-preflight only for deliberate break-glass runs.

Step 3 (rke2_cluster) SSH timeout: Ensure management network routing is in place or configure bastion/proxy jump.

Step 4 (gitops_workloads) ArgoCD install hangs: Check the argocd-wait-timeout-s input (default 300 s). On first install over a slow registry connection, increase to 420 s via blueprint inputs.

Step 5 (gsm_bootstrap) assertion fails: HYOPS_GSM_SA_KEY_JSON is empty: The key is not present in the bootstrap vault. Run hyops secrets set --env dev HYOPS_GSM_SA_KEY_JSON "$(cat ...)" and re-run the step.

ESO ClusterSecretStore not becoming Ready after GitOps takeover: - Confirm gsm-sa-credentials secret exists in the external-secrets namespace. - Confirm the GCP project ID patch is applied correctly in the onprem cluster overlay and has been committed and pushed. - Check ArgoCD sync status for the platform-secret-stores application. - Confirm the GCP SA has roles/secretmanager.secretAccessor on the target project.

If you are working on the GKE burst target instead, follow GKE burst baseline. The burst path uses Workload Identity and does not rely on gsm-sa-credentials.

ExternalSecret remains NotReady: - Confirm the secret exists in GCP Secret Manager with the exact name expected by the allowlist (hyops-{env}-platform-{KEY}). - Run hyops secrets gsm-persist --env dev --scope shared to seed any missing secrets. - Check ESO operator logs: kubectl -n external-secrets logs -l app.kubernetes.io/name=external-secrets.

NetBox API reachability errors in IPAM mode: - Ensure NetBox is status=ok in the NetBox authority env (default: shared). - If your workstation is not routed to the management subnet, HyOps may auto-tunnel NetBox API via the Proxmox host (requires hyops init proxmox --bootstrap and working SSH access).

Re-running individual steps

To re-run a single step after a failure (for example after fixing vault secrets):

hyops blueprint deploy --env dev \
  --ref onprem/rke2-workloads@v1 \
  --step gsm_bootstrap \
  --execute

References