Skip to content

Docker Engine baseline for control nodes and container hosts

Status

Accepted — Docker Engine and Docker Compose v2 are installed and managed via a standard baseline on supported Linux hosts (for example ctrl-01, build agents, and helper nodes). Kubernetes (RKE2) remains the primary runtime for platform and application workloads (see ADR-0202), while Docker provides a consistent host-level container capability.


1. Context

HybridOps.Studio needs a consistent way to run containers on hosts that are not themselves Kubernetes worker nodes, including:

  • The primary control node (ctrl-01), which runs:
  • Packer, Terraform, Ansible and related automation.
  • Jenkins controller (see ADR-0603) as a containerised service.
  • Build / utility hosts that:
  • Run short-lived helper containers.
  • Execute CI tasks that depend on Docker images.
  • Occasional demo or helper workloads where a full RKE2 cluster would be disproportionate.

Early experiments used a mix of:

  • Distro-provided docker.io packages.
  • Ad-hoc Docker installs without a clear standard.
  • Manual steps on individual hosts.

This led to:

  • Version drift between hosts.
  • Unclear expectations about the presence of Docker and the Compose plugin.
  • Fragile CI scripts that behaved differently across environments.

At the same time, ADR-0202 establishes RKE2 as the primary runtime for platform and application workloads. The Docker decision must:

  • Support RKE2 rather than compete with it.
  • Keep container usage outside Kubernetes small, focused and auditable.
  • Be easy to reproduce and reason about in evidence and runbooks.

2. Decision

HybridOps.Studio adopts a standard Docker baseline implemented by the hybridops.common.docker_engine Ansible role with the following rules:

  • Scope and purpose
  • Docker Engine is treated as host-level plumbing for:
    • Control nodes (for example ctrl-01).
    • Build agents and utility hosts.
    • Lightweight helpers that do not justify a dedicated Kubernetes workload.
  • RKE2 remains the primary runtime for platform and application workloads (see ADR-0202).

  • Supported platforms

  • Ubuntu 22.04 LTS (jammy) on the Debian family.
  • Rocky Linux 9 / RHEL 9 on the RedHat family.
  • Other distributions must provide their own baseline; this ADR does not cover them.

  • Installation source

  • Use the official Docker CE repositories, not arbitrary distro packages:
    • Configure Docker’s APT repository on Ubuntu 22.04.
    • Configure Docker’s YUM repository on Rocky/RHEL 9.
  • Install:

    • docker-ce
    • docker-ce-cli
    • containerd.io
    • docker-compose-plugin (Compose v2)
  • Service management

  • Ensure the docker service is:

    • Installed.
    • Started when docker_engine_state == "present".
    • Optionally enabled on boot via docker_engine_enable.
  • User access

  • Manage membership of the docker group via docker_engine_users so that nominated users can run Docker without full root access.

  • Unsupported platforms

  • On OS families outside Debian and RedHat, the role fails fast with a clear message, and callers must:

    • Provide a distro-specific Docker baseline, or
    • Avoid relying on host-level Docker on that platform.
  • Compose usage

  • The standard interface is docker compose (Compose v2 plugin), not the legacy standalone docker-compose binary.

This decision is implemented and enforced by:

  • The hybridops.common.docker_engine role.
  • Guard tasks in dependent roles (for example hybridops.app.jenkins_controller) that validate Docker availability before proceeding.

3. Rationale

3.1 Why standardise Docker at all?

Even with RKE2 as the primary runtime, Docker remains important for:

  • Control-plane tools:
  • Jenkins controller (see ADR-0603).
  • Small helper services that are easier to run as a single container on ctrl-01 than as full Kubernetes deployments.
  • Build and testing workflows:
  • Building images or running tools that assume a local Docker daemon.
  • CI pipelines that rely on docker-in-docker or local image operations.

Without a baseline, each host drifts in:

  • Docker version and configuration.
  • Availability of the Compose plugin.
  • Permissions and group membership.

A standard Docker baseline:

  • Simplifies runbooks and troubleshooting.
  • Enables reliable evidence capture (for example, docker version proofs).
  • Reduces “it works on this host but not that one” failure modes.

3.2 Why official Docker CE repositories?

Distro packages (docker.io, older docker-ce builds, or forked variants) can:

  • Lag behind upstream releases.
  • Be packaged with different defaults and support horizons.
  • Behave inconsistently across distros.

Using official Docker CE repositories:

  • Aligns behaviour across Ubuntu and Rocky.
  • Provides predictable upgrade paths and documentation.
  • Is more realistic for enterprise environments where Docker CE is standard.

3.3 Why limit supported platforms?

Supporting “every Linux under the sun” would:

  • Increase testing and maintenance overhead.
  • Make the role harder to reason about in the homelab + reference implementation.

By explicitly targeting:

  • Ubuntu 22.04 (jammy), and
  • Rocky Linux 9 / RHEL 9,

HybridOps.Studio:

  • Matches the OS choices already made for control nodes and RKE2 (see ADR-0012, ADR-0202, ADR-0204).
  • Keeps the role small, understandable, and testable.
  • Encourages contributors to add platform support only when they can also add tests and evidence.

4. Consequences

4.1 Positive consequences

  • Consistent container baseline
  • Control nodes and Docker-capable hosts share a predictable Docker setup.
  • Jenkins and other Docker-dependent tooling run against the same baseline.

  • Simplified operations and documentation

  • Runbooks and HOWTOs can assume:
    • docker daemon present.
    • docker compose available.
    • Known service and repo configuration.
  • Debug steps (for example, docker version) produce comparable output across environments.

  • Evidence-friendly behaviour

  • The docker_engine role can emit version and configuration details as part of platform evidence (for example, Evidence 4).
  • Easier to demonstrate “this is the Docker baseline in use” to reviewers.

  • Clear guardrails

  • Unsupported OS families fail early with a clear error instead of silently drifting or partially installing Docker.

4.2 Negative consequences and risks

  • Reduced flexibility on unsupported distros
  • Teams that want to run HybridOps.Studio roles on other Linux flavours must either:

    • Adopt Ubuntu/Rocky for Docker hosts, or
    • Build and maintain their own Docker baseline outside this ADR.
  • Coupling to Docker CE packaging

  • Changes in Docker’s official repositories (for example, package renames or EOL decisions) could break the baseline and require ADR updates and automation changes.

  • Surface area for security management

  • Docker daemon and group memberships are additional security surfaces that must be monitored and audited, especially on multi-tenant hosts.

Mitigations:

  • Keep the docker_engine role small and well-tested so changes are low-risk.
  • Treat Docker hosts that run control-plane workloads (for example ctrl-01) as part of the core platform with appropriate hardening and monitoring.
  • Encourage consumers to use RKE2 for ongoing application workloads and keep Docker usage tightly scoped.

5. Alternatives considered

5.1 No standard role (each playbook installs Docker ad hoc)

  • Pros:
  • Maximum flexibility per scenario.
  • Cons:
  • Version and configuration drift.
  • Repetition and copy-paste in multiple roles/playbooks.
  • Harder to prove and document the baseline for evidence and reviewers.

5.2 Use only distro-provided Docker packages

  • Pros:
  • Simpler story for some distributions.
  • No extra repositories to manage.
  • Cons:
  • Versions often lag upstream.
  • Behaviour differs across distros.
  • Less realistic for environments that already standardise on Docker CE.

5.3 “Kubernetes only” (no host-level Docker standard)

  • Pros:
  • Fewer components to maintain.
  • Cleaner story around RKE2 as the single runtime.
  • Cons:
  • Does not cover control nodes and small hosts that need container support.
  • Makes it harder to run tools like Jenkins controller in a familiar way on ctrl-01.

6. Implementation notes

  • The baseline is implemented by the hybridops.common.docker_engine Ansible role:
  • Used in bootstrap and CI pipelines to prepare hosts.
  • Exposes simple variables (docker_engine_state, docker_engine_enable, docker_engine_users) to stay close to Ansible conventions.
  • Dependent roles:
  • Should not attempt to install Docker themselves.
  • May include guard tasks that assert Docker and Compose are present (for example, the Jenkins controller role) and fail with clear guidance if not.
  • OS support:
  • Introducing support for a new distribution requires:
    • An explicit change to this ADR (or a follow-up ADR).
    • Tests, runbooks, and evidence updates.

7. Operational impact and validation

Operational impact:

  • Platform and SRE teams must:
  • Include Docker Engine baseline checks in health dashboards for Docker hosts.
  • Monitor docker service state and group membership on critical nodes.
  • Coordinate Docker upgrades as part of planned maintenance.

Validation:

  • Runbooks and HOWTOs should cover:
  • “Bootstrap Docker baseline on ctrl-01 and build agents”.
  • “Troubleshoot Docker Engine baseline issues” (for example, repo problems, version mismatches, missing Compose plugin).
  • Evidence folders (for example output/artifacts/infra/docker/) can capture:
  • docker version output from representative hosts.
  • Package lists and repository configuration.
  • Screenshots or logs showing dependent services (for example Jenkins controller) running on top of this baseline.

Validation is considered successful when:

  • Docker baseline can be applied consistently to supported hosts via a single role.
  • Jenkins controller and other Docker-dependent roles run without local Docker-specific bootstrapping.
  • Evidence for Docker version and configuration can be produced on demand for reviews and audits.

8. References