Status: Accepted (2025-12-02)

Use Longhorn as RKE2 Storage Layer for Stateful Kubernetes Workloads¶

1. Context¶

HybridOps.Studio separates compute and state:

Compute:
RKE2 clusters running on Proxmox.
State:
Critical relational data (for example, NetBox) on PostgreSQL in LXC (db-01) as per ADR-0013.

For Kubernetes-native workloads that require persistent volumes:

Local hostPath volumes are brittle and tied to single nodes.
NFS is simple but introduces a separate SPOF and operational overhead.
Ceph and similar systems are powerful but heavier than needed for a homelab-scale environment.

We need:

A simple, K8s-native, replicated block storage solution for RKE2.
Good observability and straightforward recovery procedures.
A pattern that can be explained in consulting and Academy material as a pragmatic choice for labs and small clusters.

2. Decision¶

HybridOps.Studio adopts Longhorn as the primary RKE2 storage layer for stateful Kubernetes workloads that do not require a dedicated external database.

RKE2 clusters are configured with Longhorn as the default StorageClass for PVCs where appropriate.
Critical system-of-record data (for example, NetBox DB) remains on PostgreSQL LXC (db-01).
Non-critical or self-contained workloads (for example, demo apps, ephemeral services) may use Longhorn-backed PVCs.

3. Rationale¶

3.1 Why Longhorn?¶

Purpose-built for Kubernetes as a distributed block storage system.
Easy to operate in small clusters:
UI and metrics built in.
Does not require a separate Ceph cluster.
Supports:
Volume replication across nodes.
Snapshots and backup to external endpoints (for example, object storage).

This makes it a good balance between:

Functional robustness, and
Operational simplicity in a homelab / small-cluster scenario.

3.2 Why not “everything in Longhorn”?¶

HybridOps.Studio keeps relational state (for example, NetBox) on PostgreSQL in LXC because:

It simplifies backup and DR procedures for system-of-record data (ADR-0013).
It allows RKE2 and Jenkins to remain largely stateless for DR and bursting stories.
It demonstrates a realistic split between:
Cluster-local storage for workloads, and
Externally managed databases for critical state.

4. Consequences¶

4.1 Positive¶

Better storage for stateful workloads on RKE2
Replicated volumes, simple snapshot/backup options.
Clear separation of storage strategies
PostgreSQL LXC for system-of-record data.
Longhorn for Kubernetes-native state that can be recreated or restored independently.
Teaching value
Shows how labs and small teams can adopt a more robust storage layer without implementing Ceph.

4.2 Negative / trade-offs¶

Additional component to operate
Longhorn must be upgraded and monitored.
Node disk usage and replication factors must be managed.
Not a substitute for full-scale enterprise storage
For very large clusters or mission-critical workloads, clients may still need more advanced or managed storage solutions.

5. Implementation¶

5.1 Cluster configuration¶

Longhorn is installed into the RKE2 cluster using the recommended method for the distribution.
A Longhorn-backed StorageClass (for example, longhorn) is created and may be set as default where appropriate.

5.2 Workload guidance¶

Sample/demo apps use Longhorn-backed PVCs for any persistent data they require.
Documentation clearly indicates:
Which workloads rely on Longhorn.
Which rely on external databases or other storage.

5.3 Backup and DR¶

Longhorn’s snapshot and backup features are configured for:
Regular backups of key workloads.
Optional backup to an external endpoint (for example, S3-compatible storage).
These backups are complementary to:
PostgreSQL backups (for db-01).
Infrastructure-as-code rebuilds.

6. Operational considerations¶

Longhorn metrics should be scraped by Prometheus and included in platform dashboards.
Alerts for:
Disk pressure,
Replica failures, and
Volume health should be defined.
Academy content should show:
Creating a PVC that uses Longhorn.
Inspecting Longhorn volumes.
Performing a basic restore from a snapshot.

7. References¶

Maintainer: HybridOps.Studio
License: MIT-0 for code, CC-BY-4.0 for documentation