Skip to content

RKE2 Runs on Full VMs (Rocky Linux 9 Base) with Simple LB and Storage

Note: This ADR is superseded by ADR-0204 – RKE2 Runs on Rocky VMs on Enterprise Hypervisors. It is retained for historical context only.

Status

Superseded — earlier RKE2-on-VM baseline retained for history; replaced by ADR-0204 which generalises the pattern across enterprise hypervisors (for example Proxmox, VMware).

1. Context

Earlier prototypes deployed RKE2 on:

  • LXC containers.
  • Ubuntu-based VMs.

While lightweight, these approaches introduced issues:

  • Kernel and cgroup limits inside LXCs impacted Kubernetes components.
  • Inconsistent SELinux/AppArmor behaviour on Ubuntu affected Longhorn and Cilium.
  • Snapshot and export incompatibilities across hypervisors during DR tests.

To achieve a realistic enterprise baseline without RHEL subscription costs, Rocky Linux 9.x was selected as:

  • Binary compatible with RHEL.
  • Long-term supported.
  • Straightforward to automate with Packer, Terraform and Ansible.

This ADR originally captured the first stable decision to standardise RKE2 nodes on Rocky Linux 9 VMs before the broader, hypervisor-agnostic view in ADR-0204.

2. Decision (historical)

  • RKE2 control-plane and worker nodes run as full VMs based on Rocky Linux 9.x.
  • VMs are built via Packer and provisioned via Terraform with cloud-init.
  • A simple on-prem load balancer and storage stack (MetalLB + Longhorn) underpin the cluster.

This decision remains technically accurate, but the canonical reference for RKE2 node placement is now ADR-0204.

3. Rationale (historical)

Rocky Linux 9 was chosen because it offers:

  • Enterprise familiarity — RHEL-compatible behaviour for assessors and trainees.
  • Portability — VMs export cleanly across hypervisors and clouds.
  • Predictability — Stable SELinux, systemd and kernel interfaces.
  • Governance alignment — Matches common security baselines in ITIL/ISO-aligned organisations.

The move away from LXC/Ubuntu for RKE2 was driven by:

  • Kernel and cgroup restrictions in LXC that complicated Kubernetes operations.
  • AppArmor/SELinux differences that made storage and CNI behaviour less predictable.
  • Desire for a single, consistent base OS for observability, GitOps and DR drills.

4. Consequences

4.1 Positive

  • Predictable, enterprise-grade behaviour for RKE2 nodes.
  • Improved snapshot and clone compatibility across hypervisors.
  • Lower licensing cost compared to commercial RHEL while keeping close semantics.

4.2 Negative / risks

  • Larger base image and slightly longer build times compared to ultra-minimal distros.
  • Need to manage Rocky Linux 9 lifecycle (kernel pinning, mirror configuration) explicitly.

5. Alternatives considered

  • Ubuntu-based RKE2 nodes — rejected due to observed SELinux/AppArmor and storage quirks for Longhorn and Cilium.
  • LXC-based RKE2 nodes — rejected for long-term use due to kernel and cgroup limitations and more complex support story.
  • RHEL proper — suitable for some enterprises but unnecessary licensing complexity for HybridOps.Studio’s blueprint.

These alternatives are still valid experimentation targets but are not part of the standard blueprint.

6. Implementation notes (historical)

Originally, this decision appeared in:

  • Packer templates for rocky-9-rke2-base images.
  • Terraform modules that allocate RKE2 VMs with Rocky 9 as the base OS.
  • Ansible roles used to install and configure RKE2 on those VMs.

After ADR-0204, those same artefacts are referenced from the newer ADR as part of a broader, hypervisor-neutral design.

7. Operational impact and validation

Operationally, this ADR supported:

  • Consistent behaviour for RKE2 clusters on Proxmox.
  • Easier DR experiments involving snapshot/clone/restore and export to other hypervisors.
  • Teaching material that assumed a RHEL-like OS without licensing friction.

Validation was performed via:

  • RKE2 cluster bring-up and upgrade tests.
  • Longhorn storage and CNI tests on Rocky 9.
  • DR drills using VM-level snapshots and exports.

These validation stories are now primarily referenced from ADR-0204 and associated runbooks.

8. References


Maintainer: HybridOps.Studio
License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.