Docker Engine baseline for control nodes and container hosts¶
Status¶
Accepted — Docker Engine and Docker Compose v2 are installed and managed via a
standard baseline on supported Linux hosts (for example ctrl-01, build
agents, and helper nodes). Kubernetes (RKE2) remains the primary runtime for
platform and application workloads (see ADR-0202), while Docker provides a
consistent host-level container capability.
1. Context¶
HybridOps.Studio needs a consistent way to run containers on hosts that are not themselves Kubernetes worker nodes, including:
- The primary control node (
ctrl-01), which runs: - Packer, Terraform, Ansible and related automation.
- Jenkins controller (see ADR-0603) as a containerised service.
- Build / utility hosts that:
- Run short-lived helper containers.
- Execute CI tasks that depend on Docker images.
- Occasional demo or helper workloads where a full RKE2 cluster would be disproportionate.
Early experiments used a mix of:
- Distro-provided
docker.iopackages. - Ad-hoc Docker installs without a clear standard.
- Manual steps on individual hosts.
This led to:
- Version drift between hosts.
- Unclear expectations about the presence of Docker and the Compose plugin.
- Fragile CI scripts that behaved differently across environments.
At the same time, ADR-0202 establishes RKE2 as the primary runtime for platform and application workloads. The Docker decision must:
- Support RKE2 rather than compete with it.
- Keep container usage outside Kubernetes small, focused and auditable.
- Be easy to reproduce and reason about in evidence and runbooks.
2. Decision¶
HybridOps.Studio adopts a standard Docker baseline implemented by the
hybridops.common.docker_engine Ansible role with the following rules:
- Scope and purpose
- Docker Engine is treated as host-level plumbing for:
- Control nodes (for example
ctrl-01). - Build agents and utility hosts.
- Lightweight helpers that do not justify a dedicated Kubernetes workload.
- Control nodes (for example
-
RKE2 remains the primary runtime for platform and application workloads (see ADR-0202).
-
Supported platforms
- Ubuntu 22.04 LTS (jammy) on the Debian family.
- Rocky Linux 9 / RHEL 9 on the RedHat family.
-
Other distributions must provide their own baseline; this ADR does not cover them.
-
Installation source
- Use the official Docker CE repositories, not arbitrary distro packages:
- Configure Docker’s APT repository on Ubuntu 22.04.
- Configure Docker’s YUM repository on Rocky/RHEL 9.
-
Install:
docker-cedocker-ce-clicontainerd.iodocker-compose-plugin(Compose v2)
-
Service management
-
Ensure the
dockerservice is:- Installed.
- Started when
docker_engine_state == "present". - Optionally enabled on boot via
docker_engine_enable.
-
User access
-
Manage membership of the
dockergroup viadocker_engine_usersso that nominated users can run Docker without fullrootaccess. -
Unsupported platforms
-
On OS families outside
DebianandRedHat, the role fails fast with a clear message, and callers must:- Provide a distro-specific Docker baseline, or
- Avoid relying on host-level Docker on that platform.
-
Compose usage
- The standard interface is
docker compose(Compose v2 plugin), not the legacy standalonedocker-composebinary.
This decision is implemented and enforced by:
- The
hybridops.common.docker_enginerole. - Guard tasks in dependent roles (for example
hybridops.app.jenkins_controller) that validate Docker availability before proceeding.
3. Rationale¶
3.1 Why standardise Docker at all?¶
Even with RKE2 as the primary runtime, Docker remains important for:
- Control-plane tools:
- Jenkins controller (see ADR-0603).
- Small helper services that are easier to run as a single container on
ctrl-01than as full Kubernetes deployments. - Build and testing workflows:
- Building images or running tools that assume a local Docker daemon.
- CI pipelines that rely on docker-in-docker or local image operations.
Without a baseline, each host drifts in:
- Docker version and configuration.
- Availability of the Compose plugin.
- Permissions and group membership.
A standard Docker baseline:
- Simplifies runbooks and troubleshooting.
- Enables reliable evidence capture (for example,
docker versionproofs). - Reduces “it works on this host but not that one” failure modes.
3.2 Why official Docker CE repositories?¶
Distro packages (docker.io, older docker-ce builds, or forked variants) can:
- Lag behind upstream releases.
- Be packaged with different defaults and support horizons.
- Behave inconsistently across distros.
Using official Docker CE repositories:
- Aligns behaviour across Ubuntu and Rocky.
- Provides predictable upgrade paths and documentation.
- Is more realistic for enterprise environments where Docker CE is standard.
3.3 Why limit supported platforms?¶
Supporting “every Linux under the sun” would:
- Increase testing and maintenance overhead.
- Make the role harder to reason about in the homelab + reference implementation.
By explicitly targeting:
- Ubuntu 22.04 (jammy), and
- Rocky Linux 9 / RHEL 9,
HybridOps.Studio:
- Matches the OS choices already made for control nodes and RKE2 (see ADR-0012, ADR-0202, ADR-0204).
- Keeps the role small, understandable, and testable.
- Encourages contributors to add platform support only when they can also add tests and evidence.
4. Consequences¶
4.1 Positive consequences¶
- Consistent container baseline
- Control nodes and Docker-capable hosts share a predictable Docker setup.
-
Jenkins and other Docker-dependent tooling run against the same baseline.
-
Simplified operations and documentation
- Runbooks and HOWTOs can assume:
dockerdaemon present.docker composeavailable.- Known service and repo configuration.
-
Debug steps (for example,
docker version) produce comparable output across environments. -
Evidence-friendly behaviour
- The
docker_enginerole can emit version and configuration details as part of platform evidence (for example, Evidence 4). -
Easier to demonstrate “this is the Docker baseline in use” to reviewers.
-
Clear guardrails
- Unsupported OS families fail early with a clear error instead of silently drifting or partially installing Docker.
4.2 Negative consequences and risks¶
- Reduced flexibility on unsupported distros
-
Teams that want to run HybridOps.Studio roles on other Linux flavours must either:
- Adopt Ubuntu/Rocky for Docker hosts, or
- Build and maintain their own Docker baseline outside this ADR.
-
Coupling to Docker CE packaging
-
Changes in Docker’s official repositories (for example, package renames or EOL decisions) could break the baseline and require ADR updates and automation changes.
-
Surface area for security management
- Docker daemon and group memberships are additional security surfaces that must be monitored and audited, especially on multi-tenant hosts.
Mitigations:
- Keep the
docker_enginerole small and well-tested so changes are low-risk. - Treat Docker hosts that run control-plane workloads (for example
ctrl-01) as part of the core platform with appropriate hardening and monitoring. - Encourage consumers to use RKE2 for ongoing application workloads and keep Docker usage tightly scoped.
5. Alternatives considered¶
5.1 No standard role (each playbook installs Docker ad hoc)¶
- Pros:
- Maximum flexibility per scenario.
- Cons:
- Version and configuration drift.
- Repetition and copy-paste in multiple roles/playbooks.
- Harder to prove and document the baseline for evidence and reviewers.
5.2 Use only distro-provided Docker packages¶
- Pros:
- Simpler story for some distributions.
- No extra repositories to manage.
- Cons:
- Versions often lag upstream.
- Behaviour differs across distros.
- Less realistic for environments that already standardise on Docker CE.
5.3 “Kubernetes only” (no host-level Docker standard)¶
- Pros:
- Fewer components to maintain.
- Cleaner story around RKE2 as the single runtime.
- Cons:
- Does not cover control nodes and small hosts that need container support.
- Makes it harder to run tools like Jenkins controller in a familiar way on
ctrl-01.
6. Implementation notes¶
- The baseline is implemented by the
hybridops.common.docker_engineAnsible role: - Used in bootstrap and CI pipelines to prepare hosts.
- Exposes simple variables (
docker_engine_state,docker_engine_enable,docker_engine_users) to stay close to Ansible conventions. - Dependent roles:
- Should not attempt to install Docker themselves.
- May include guard tasks that assert Docker and Compose are present (for example, the Jenkins controller role) and fail with clear guidance if not.
- OS support:
- Introducing support for a new distribution requires:
- An explicit change to this ADR (or a follow-up ADR).
- Tests, runbooks, and evidence updates.
7. Operational impact and validation¶
Operational impact:
- Platform and SRE teams must:
- Include Docker Engine baseline checks in health dashboards for Docker hosts.
- Monitor
dockerservice state and group membership on critical nodes. - Coordinate Docker upgrades as part of planned maintenance.
Validation:
- Runbooks and HOWTOs should cover:
- “Bootstrap Docker baseline on ctrl-01 and build agents”.
- “Troubleshoot Docker Engine baseline issues” (for example, repo problems, version mismatches, missing Compose plugin).
- Evidence folders (for example
output/artifacts/infra/docker/) can capture: docker versionoutput from representative hosts.- Package lists and repository configuration.
- Screenshots or logs showing dependent services (for example Jenkins controller) running on top of this baseline.
Validation is considered successful when:
- Docker baseline can be applied consistently to supported hosts via a single role.
- Jenkins controller and other Docker-dependent roles run without local Docker-specific bootstrapping.
- Evidence for Docker version and configuration can be produced on demand for reviews and audits.
8. References¶
- Docker Hub:
hybridops/ci-agent-tools - ADR-0001 – ADR Process and Conventions
- ADR-0012 – Control node runs as a VM (Cloud-Init); LXC reserved for light helpers
- ADR-0202 – Adopt RKE2 as primary runtime for platform and applications
- ADR-0204 – RKE2 runs on Rocky VMs on enterprise hypervisors
- ADR-0603 – Run Jenkins controller on control node, agents on RKE2
- ADR-0607: Standardize CI agent tools image for Docker and RKE2 agents
- Docker Engine documentation
- Docker Compose v2 documentation