Skip to content

LXC Containers for Lightweight Workloads on Proxmox

Status

Accepted — LXC containers are used for lightweight, non-critical helper workloads on Proxmox.
Core control-plane components (ctrl-01), shared databases (PostgreSQL) and Kubernetes nodes run on full VMs, per ADR-0012, ADR-0202 and ADR-0204.


1. Context

HybridOps.Studio runs on an on-premises hypervisor (initially Proxmox, but patterns are portable to other enterprise hypervisors) and needs to balance:

  • Realistic enterprise patterns (full VMs for critical components, clear OS baselines).
  • Homelab constraints (single node or small clusters, limited CPU/RAM).
  • A credible DR story where:
  • Control plane and stateful tiers are VM-based and portable.
  • Lightweight helpers do not consume disproportionate resources.

Earlier experiments used LXC too aggressively, including:

  • Running PostgreSQL in LXC (ADR-0013, now superseded by ADR-0501).
  • Considering LXC for broader control-plane workloads.

Subsequent ADRs clarified:

  • ADR-0012ctrl-01 runs as a VM with cloud-init.
  • ADR-0017 — OS baseline (Rocky/Ubuntu/Windows) for infra and control layers.
  • ADR-0202 / ADR-0204 — RKE2 runs on Rocky Linux VMs on enterprise hypervisors.
  • ADR-0501 — PostgreSQL runs on a dedicated VM with DR replication, not in LXC.

This ADR narrows the LXC story to something that is both realistic and safe:
helpers, tools and teaching workloads only, not shared infrastructure state.


2. Decision

HybridOps.Studio adopts the following pattern:

  • Full VMs (via Packer + cloud-init per ADR-0016) are used for:
  • Control plane (ctrl-01), PostgreSQL, RKE2 nodes, NetBox (initially) and other core services.
  • LXC containers on Proxmox are used only for:
  • Lightweight, non-critical helper services.
  • Academy / demo workloads that benefit from higher density.
  • Short-lived tools that do not hold authoritative state.

Typical in-scope LXC workloads:

  • Docs / site preview containers.
  • Log shippers, small exporters, or protocol test helpers.
  • Small demo apps that do not store critical data.
  • “Utility shells” for teaching Linux/networking concepts.

Explicitly out of scope for LXC:

  • ctrl-01 and other control nodes (see ADR-0012).
  • RKE2 control-plane and worker nodes (see ADR-0204).
  • Shared PostgreSQL instances or other primary databases (see ADR-0501).
  • Anything considered part of the authoritative state or DR tier.

LXC provisioning is:

  • Driven via Terraform (Proxmox provider) or Proxmox API directly.
  • Kept deliberately simpler than the VM pipeline (no Packer for LXC images).
  • Documented as a complementary option, not a primary building block.

3. Rationale

3.1 Why keep LXC at all?

  • Density for helpers and teaching
  • Homelab resources are finite; LXC allows more “small helpers” and demo nodes without ballooning VM count.
  • Realism with guard rails
  • Many enterprises still run light tooling (exporters, small services) on less isolated platforms, but:
    • Core control plane and stateful services remain on full VMs or managed services.
  • Operational clarity
  • By explicitly scoping LXC to helpers, the platform story stays clean:
    • VMs = control, state, cluster nodes.
    • LXC = helpers and demos.

3.2 Why not use LXC for PostgreSQL or control plane?

  • Isolation and DR expectations
  • Control-node and primary database patterns must be easily portable between Proxmox, VMware and cloud.
  • VM images and VM-level snapshots are more portable and better understood in enterprise DR.
  • Kernel and cgroup subtleties
  • LXC introduces host-kernel coupling that can surprise people during upgrades and DR tests.
  • Evidence clarity
  • For assessors, it is cleaner to say:
    • “Authoritative state lives on a VM with DR replication” (ADR-0501),
    • “Kubernetes is stateless compute atop that.”

4. Consequences

4.1 Positive consequences

  • Clear, opinionated boundary
  • VMs for control/state, LXC for helpers only.
  • Resource efficiency
  • You can run:
    • Docs preview,
    • Small exporters,
    • Academy demo nodes
      in LXCs without burning VM slots.
  • Better storytelling
  • Easier to tell a clean DR and portability story:
    • “If a hypervisor dies, we care about restoring VMs; LXCs are nice-to-have.”

4.2 Negative consequences / risks

  • Two provisioning models
  • Team must understand both VM and LXC provisioning flows.
  • Potential misuse
  • Without discipline, someone might again place stateful workloads into LXC “because it’s lighter”.
  • Kernel coupling
  • LXC shares the host kernel, so kernel regressions can affect many helpers at once (acceptable for non-critical roles, but still a consideration).

Mitigations:

  • Document the decision matrix (VM vs LXC) in OS / platform guides.
  • Require architectural review before putting any new workload into LXC.
  • Keep LXC inventory clearly marked in NetBox / docs as “helper / non-critical”.

5. Alternatives considered

  • No LXC at all (VMs only)
  • Simpler to reason about, but:

    • Reduces density on a small homelab node.
    • Makes some Academy examples more expensive to run concurrently.
  • “LXC-first” pattern (including databases and control plane)

  • Rejected:

    • Harder to tell a portable DR story.
    • Kernel/cgroup edge cases for tools like Kubernetes, Longhorn, etc.
    • Conflicts with ADR-0012 and ADR-0501.
  • Move all helper workloads into RKE2 pods instead of LXC

  • Conceptually attractive, but:
    • Some helpers are explicitly pre-cluster or used to debug the cluster itself.
    • You still want a place to run tooling when RKE2 is unhealthy.

6. Implementation notes

  • LXC containers are provisioned via:
  • Terraform proxmox_lxc resources, or
  • A small Proxmox API wrapper, where appropriate for demos.
  • Base images:
  • Prefer the same OS family as the VM baseline (Rocky / Ubuntu) per ADR-0017.
  • Use Proxmox standard templates rather than building LXC images with Packer.
  • Storage:
  • For helpers and demos, rootfs and any small data live on Proxmox storage.
  • Do not use LXC for authoritative data; shared databases and critical state live on VMs per ADR-0501.
  • Inventory:
  • NetBox and docs should mark LXC nodes clearly as:
    • role = helper / tier = non-critical.

7. Operational impact and validation

Operational impact:

  • Platform team must:
  • Monitor LXC node resource usage to avoid contention with core VMs.
  • Ensure that no critical services are silently moved into LXC.
  • Keep a simple runbook for creating, updating and retiring helper containers.

Validation:

  • Runbooks (to be created or updated):
  • runbook_lxc-container-provisioning.md — create/update/destroy helper containers.
  • Evidence:
  • Screenshots and logs in ../evidence/evidence-02-platform-lxc-lightweight-workloads.md showing:
    • Helper services running in LXC.
    • Core services (ctrl-01, db-01, RKE2 nodes) running as VMs.
  • Review trigger:
  • Revisit this ADR if:
    • The platform moves away from Proxmox to another hypervisor where LXC is not available, or
    • All helper workloads move into RKE2 permanently, making LXC redundant.

8. References


Maintainer: HybridOps.Studio
License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.