Skip to content

Bootstrap an RKE2 Cluster from Proxmox Templates

This HOWTO shows how to use Packer-built Proxmox templates, Terraform, and Ansible to bootstrap the main RKE2 cluster for HybridOps.Studio, and where to capture proof artefacts for Evidence 4.

It assumes that:


1. Objectives

By the end of this HOWTO you will be able to:

  • Use Terraform to create RKE2 control-plane and worker VMs from Proxmox templates.
  • Use Ansible to install and configure RKE2 on those nodes.
  • Verify that the cluster is healthy with kubectl.
  • Store logs and artefacts under output/artifacts/infra/rke2/ so another engineer or assessor can verify the process.

2. Prerequisites

2.1 Infrastructure and access

You should have:

  • A Proxmox cluster reachable from your control node and/or Jenkins agent.
  • Packer-built templates available in Proxmox, for example:
  • tpl-ubuntu-22.04 for RKE2 nodes.
  • Network configuration aligned with ADR-0015 (for example, management and workload networks).
  • SSH access to the nodes via cloud-init or injected keys.

2.2 Code layout

The exact paths may vary, but this HOWTO assumes:

Adjust paths to match your actual repository layout.

2.3 Credentials and environment

You will need:

  • Proxmox API credentials (token ID and secret) available as environment variables or Terraform variables.
  • SSH keys or credentials for Ansible to connect to the new nodes.
  • kubectl available on the control node, configured to talk to the RKE2 cluster once bootstrapped.

3. Prepare Terraform variables

  1. Navigate to the RKE2 Terraform directory, for example:
cd infra/terraform/live-v1/rke2-cluster/
  1. Copy the example variables file if necessary:
cp terraform.tfvars.example terraform.tfvars
  1. Edit terraform.tfvars to reference the correct Proxmox templates and node sizes, for example:
proxmox_endpoint   = "https://<PROXMOX_IP>:8006/api2/json"
proxmox_token_id   = "automation@pam!infra-token"
proxmox_token_secret = "<SECRET>"

rke2_template_name = "tpl-ubuntu-22.04"

rke2_controlplane_count = 3
rke2_worker_count       = 2

rke2_network_name = "vmbr0"
  1. Save the file and keep it out of version control if it contains secrets.

4. Create RKE2 VMs with Terraform

  1. Initialise the Terraform working directory:
terraform init
  1. Review the plan to see which VMs will be created:
terraform plan -out=tfplan-rke2-bootstrap
  1. Apply the plan:
terraform apply tfplan-rke2-bootstrap
  1. Wait for Terraform to complete. When finished, it should output:

  2. IP addresses or hostnames of the control-plane nodes.

  3. IP addresses or hostnames of the worker nodes.

  4. Capture the plan and apply output into the proof folder, for example:

mkdir -p output/artifacts/infra/rke2/$(date -Iseconds)
terraform show tfplan-rke2-bootstrap > output/artifacts/infra/rke2/$(date -Iseconds)/terraform-plan.txt

Adjust the folder name pattern if you already have a standard.


5. Install and configure RKE2 with Ansible

  1. Navigate to the Ansible RKE2 playbook directory, for example:
cd core/ansible/rke2/
  1. Ensure your inventory file lists the control-plane and worker nodes created by Terraform, for example:
[rke2_controlplane]
cp-01 ansible_host=10.0.0.11
cp-02 ansible_host=10.0.0.12
cp-03 ansible_host=10.0.0.13

[rke2_workers]
wk-01 ansible_host=10.0.1.21
wk-02 ansible_host=10.0.1.22
  1. Run the Ansible playbook to install and configure RKE2:
ansible-playbook -i inventory.ini site-rke2.yml
  1. Monitor the output for:

  2. Installation of RKE2 server and agent components.

  3. Configuration of systemd services.
  4. Retrieval of the kubeconfig file to a known location (for example, /etc/rancher/rke2/rke2.yaml on the first control-plane node or copied back to the control node).

  5. Save the Ansible run log into the proof folder:

ansible-playbook -i inventory.ini site-rke2.yml | tee output/artifacts/infra/rke2/$(date -Iseconds)/ansible-rke2-bootstrap.log

6. Verify the RKE2 cluster

  1. On the control node (or your workstation with access), configure kubectl to use the RKE2 kubeconfig, for example:
export KUBECONFIG=~/.kube/rke2-hybridops.yaml
  1. Check node status:
kubectl get nodes -o wide

You should see all control-plane and worker nodes in Ready state.

  1. Verify core components (namespaces and pods), for example:
kubectl get ns
kubectl get pods -A
  1. Capture selected kubectl outputs into the proof folder:
kubectl get nodes -o wide > output/artifacts/infra/rke2/$(date -Iseconds)/kubectl-get-nodes.txt
kubectl get pods -A > output/artifacts/infra/rke2/$(date -Iseconds)/kubectl-get-pods-all.txt

These commands provide a minimum evidence set that the cluster is up and functioning.


7. Optional: Tag the cluster for DR and cost modelling

If your Terraform or Ansible roles support tagging or labelling for DR and cost modelling:

  1. Ensure nodes or namespaces are labelled with environment and role, for example:
kubectl label node cp-01 env=onprem role=controlplane
kubectl label node wk-01 env=onprem role=worker
  1. Confirm that these labels appear in Prometheus metrics and cost artefacts, where applicable.

This helps link the cluster to DR and cost decisions described in ADR-0701 and ADR-0801.


8. Validation checklist

Use this checklist to confirm bootstrap is complete:

  • [ ] Terraform created the expected RKE2 control-plane and worker VMs from the correct Proxmox template.
  • [ ] Ansible successfully installed and started RKE2 on all nodes.
  • [ ] kubectl get nodes shows all nodes in Ready state.
  • [ ] Core namespaces and system pods are running without crash loops.
  • [ ] Evidence artefacts (Terraform plan/apply output, Ansible logs, kubectl snapshots) exist under output/artifacts/infra/rke2/.
  • [ ] Any DR or cost labels/metadata are applied if you are using them.

References


Maintainer: HybridOps.Studio
License: MIT-0 for code, CC-BY-4.0 for documentation