Deploy RKE2 + Workloads (HyOps Blueprint)¶
-
Purpose: Deploy a complete on-prem RKE2 cluster with GitOps-ready Argo CD and External Secrets Operator bootstrap through a single governed blueprint run. Owner: Platform engineering
-
Trigger: New environment bring-up, full rebuild, or controlled E2E validation
- Impact: Provisions VM infrastructure, installs RKE2, bootstraps Argo CD with the workloads root Application, and provisions the on-prem GCP Secret Manager service account secret required for ESO to begin reconciling platform secrets
-
Severity: P1 Pre-reqs: Proxmox init complete for target env, vault decrypt works, NetBox authority is ready (authoritative IPAM), SSH key access exists,
RKE2_TOKENin vault,HYOPS_GSM_SA_KEY_JSONin vault. -
Rollback strategy: Destroy modules in reverse order or run a controlled rebuild from the same blueprint inputs.
Context¶
Blueprint ref: onprem/rke2-workloads@v1
Location: hybridops-core/blueprints/onprem/rke2-workloads@v1/blueprint.yml
Public workload baseline consumed after bootstrap:
clusters/onpremclusters/burstplatform/external-secretsplatform/secret-stores
The public workload repo now treats the ESO and cluster secret-store path as part of the baseline platform surface, not as an internal-only extension. On-prem and burst do not use the same secret bootstrap method:
onprem/rke2-workloads@v1usesplatform/k8s/gsm-bootstrapgcp/gke-burst@v1usesplatform/k8s/gcp-secret-store
Step flow:
| # | Step ID | Module | Phase |
|---|---|---|---|
| 1 | template_image_rocky9 |
core/onprem/template-image |
bootstrap |
| 2 | rke2_vms |
platform/onprem/platform-vm |
bootstrap |
| 3 | rke2_cluster |
platform/onprem/rke2-cluster |
operations |
| 4 | gitops_workloads |
platform/k8s/argocd-bootstrap |
operations |
| 5 | gsm_bootstrap |
platform/k8s/gsm-bootstrap |
operations |
Steps 4 and 5 run on the controller (localhost) against the cluster kubeconfig: no remote SSH required from that point forward. After step 5 completes, ArgoCD owns the cluster state and GitOps takeover is complete.
Preconditions and safety checks¶
Path behavior¶
- Installed
hyops(viainstall.sh) can be run from any working directory. - Source-checkout usage should export
HYOPS_CORE_ROOT=/path/to/hybridops-core.
1. Ensure NetBox authority is ready¶
This blueprint allocates VM IPs from NetBox (no hardcoded per-VM IPs) and consumes the shared SDN authority state.
Network placement defaults:
- RKE2 control-plane VMs:
vnetmgmt(operator/API reachability) - RKE2 worker VMs:
vnetmgmt
If NetBox authority is not ready yet, run:
hyops blueprint deploy --env shared --ref onprem/bootstrap-netbox@v1 --execute
2. Ensure required secrets exist in the runtime vault¶
# RKE2 cluster join token (auto-generated if missing)
hyops secrets ensure --env dev RKE2_TOKEN
# GCP Service Account key JSON for ESO → GCP Secret Manager
# This must be set manually — it cannot be generated automatically.
hyops secrets set --env dev HYOPS_GSM_SA_KEY_JSON "$(cat /path/to/gcp-sa-key.json)"
HYOPS_GSM_SA_KEY_JSON must be the JSON key for a GCP Service Account with roles/secretmanager.secretAccessor on the target GCP project. This is the trust anchor for External Secrets Operator and is the only secret that cannot be sourced from GCP Secret Manager itself.
Confirm both keys are present in the vault:
hyops secrets list --env dev | grep -E 'RKE2_TOKEN|HYOPS_GSM_SA_KEY_JSON'
3. Set the GCP project ID in the on-prem cluster overlay¶
The on-prem ClusterSecretStore manifest uses a project ID placeholder that must be patched for each on-prem cluster target. Edit:
hybridops-workloads-src/hybridops-workloads/apps/platform/secret-stores/manifests/overlays/onprem/cluster-secret-store-patch.yaml
Replace REPLACE_GCP_PROJECT_ID with the actual GCP project ID and commit before running the blueprint.
4. Validate and plan¶
hyops blueprint validate --ref onprem/rke2-workloads@v1
hyops blueprint plan --ref onprem/rke2-workloads@v1
5. Run preflight¶
hyops blueprint preflight --env dev --ref onprem/rke2-workloads@v1
Steps¶
1. Execute full blueprint¶
hyops blueprint deploy --env dev \
--ref onprem/rke2-workloads@v1 \
--execute
If HyOps detects existing step state (rerun/replacement risk), it may prompt for confirmation. Use --yes for non-interactive runs.
2. Observe progress¶
- HyOps prints a preflight summary before step execution begins.
- Each module step prints
progress: logs=...with the active log file path. - Long-running phases print a one-time log-watch hint and heartbeat status lines.
Optional live terminal streaming:
hyops --verbose blueprint deploy --env dev \
--ref onprem/rke2-workloads@v1 \
--execute
3. Verify cluster state after RKE2 (step 3)¶
KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" \
kubectl get nodes -o wide
4. Verify ArgoCD after gitops_workloads (step 4)¶
KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" \
kubectl -n argocd get applications
The root application hyops-workloads-root should be present and syncing.
For the current on-prem public baseline, confirm the secret-management applications are present as well:
KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" \
kubectl -n argocd get applications | grep -E 'external-secrets|secret-stores'
5. Verify GSM secret after gsm_bootstrap (step 5)¶
KUBECONFIG="$HOME/.hybridops/envs/dev/state/kubeconfigs/rke2.yaml" \
kubectl -n external-secrets get secret gsm-sa-credentials
Expected output: the secret exists with a credentials.json key.
Verification¶
Success indicators:
- Blueprint summary ends with
status=ok. - All five step modules report
status: ok. platform/k8s/argocd-bootstrapoutput:cap.gitops.argocd = ready.platform/k8s/gsm-bootstrapoutput:cap.k8s.gsm-bootstrap = ready.- ArgoCD root application
hyops-workloads-rootis present in theargocdnamespace and reconciling. platform-external-secretsandplatform-secret-storesapplications are present in ArgoCD for the on-prem target.gsm-sa-credentialssecret is present in theexternal-secretsnamespace.- After ESO sync wave completes,
ExternalSecretresources (for exampleplatform-keycloak-secretsin thekeycloaknamespace) reportReady = True.
Run record paths:
- Blueprint step records and module logs:
$HOME/.hybridops/envs/<env>/logs/module/... - Cluster kubeconfig:
$HOME/.hybridops/envs/<env>/state/kubeconfigs/rke2.yaml - Module state files:
$HOME/.hybridops/envs/<env>/state/modules/platform__k8s__*/latest.json
Troubleshooting¶
Preflight fails before execution:
Fix reported failures and rerun. Use --skip-preflight only for deliberate break-glass runs.
Step 3 (rke2_cluster) SSH timeout: Ensure management network routing is in place or configure bastion/proxy jump.
Step 4 (gitops_workloads) ArgoCD install hangs:
Check the argocd-wait-timeout-s input (default 300 s). On first install over a slow registry connection, increase to 420 s via blueprint inputs.
Step 5 (gsm_bootstrap) assertion fails: HYOPS_GSM_SA_KEY_JSON is empty:
The key is not present in the bootstrap vault. Run hyops secrets set --env dev HYOPS_GSM_SA_KEY_JSON "$(cat ...)" and re-run the step.
ESO ClusterSecretStore not becoming Ready after GitOps takeover:
- Confirm gsm-sa-credentials secret exists in the external-secrets namespace.
- Confirm the GCP project ID patch is applied correctly in the onprem cluster overlay and has been committed and pushed.
- Check ArgoCD sync status for the platform-secret-stores application.
- Confirm the GCP SA has roles/secretmanager.secretAccessor on the target project.
If you are working on the GKE burst target instead, follow GKE burst baseline. The burst path uses Workload Identity and does not rely on gsm-sa-credentials.
ExternalSecret remains NotReady:
- Confirm the secret exists in GCP Secret Manager with the exact name expected by the allowlist (hyops-{env}-platform-{KEY}).
- Run hyops secrets gsm-persist --env dev --scope shared to seed any missing secrets.
- Check ESO operator logs: kubectl -n external-secrets logs -l app.kubernetes.io/name=external-secrets.
NetBox API reachability errors in IPAM mode:
- Ensure NetBox is status=ok in the NetBox authority env (default: shared).
- If your workstation is not routed to the management subnet, HyOps may auto-tunnel NetBox API via the Proxmox host (requires hyops init proxmox --bootstrap and working SSH access).
Re-running individual steps¶
To re-run a single step after a failure (for example after fixing vault secrets):
hyops blueprint deploy --env dev \
--ref onprem/rke2-workloads@v1 \
--step gsm_bootstrap \
--execute