Environments & Guardrails¶
HybridOps.Studio runs across a focused set of three clearly defined environments
(dev, staging, prod) with explicit guardrails around who can change what, where, and when.
This page is aimed at engineering leaders, hiring managers, and other decision-makers who need to see how risk is controlled without slowing teams down. It also serves as background for any formal reviewers who want to understand how the Environment Guard Framework (EGF) is applied in practice.
This page sits alongside:
- ADR-0600 – Environment Guard Framework
- ADR-0002 – NetBox-Driven Inventory
- Secrets Lifecycle
- Ansible Collections and Roles Index
Together, they describe how environments are defined, how automation is governed, and how inventory and secrets behave across tiers.
1. Why environments and guardrails exist¶
The platform is designed to be:
- Experiment-friendly in dev.
- Realistic but reversible in staging.
- Tightly controlled in prod.
To get there, we combine:
- Environment tiers – dev / staging / prod with different risk expectations.
- Guardrails – policy plus automation that enforce those expectations.
- Evidence – logs and reports that prove the guardrails actually fired.
The goal is simple:
Engineers can move quickly, but nobody can “accidentally prod” without hitting visible checks, approvals, and audit trails.
2. Environment tiers at a glance¶
| Environment | Purpose | Risk profile | Typical users |
|---|---|---|---|
| dev | Everyday changes, experiments | Low – intentionally changeable | Individual engineers, CI |
| staging | Pre-prod, UAT, integration testing | Medium – controlled | Teams, release managers |
| prod | Live platform, external users | High – tightly gated | SREs, senior engineers, on-call |
Key differences:
- Approvals – prod changes can require explicit approvals and justifications.
- Windows – prod deploys are restricted to maintenance windows by default.
- Observability – higher logging and alerting expectations in staging/prod.
- Rollbacks – DR and rollback drills are always staged before prod adoption.
These expectations are encoded in automation, not just written on a wiki.
For a deeper look at how platform-level policies are defined and enforced, see:
3. The Environment Guard Framework (EGF)¶
The Environment Guard Framework is the set of roles and patterns that enforce environment rules at run time. It lives primarily in the hybridops.common Ansible collection and is defined in detail in:
At a high level, the EGF pipeline is:
flowchart LR
A[env_guard<br/>Governance / risk] --> B[gen_inventory<br/>Placeholder inventory]
B --> C[host_selector<br/>Host targeting]
C --> D[ip_mapper<br/>Env-aware IP mapping]
D --> E[connectivity_test<br/>Connectivity + artefacts]
E --> F[deploy<br/>Application / infra change]
In CI and other non-interactive runs, env_guard reads the target environment from variables (for example env) or environment variables such as HOS_ENV, and runs in a fully non-interactive mode.
env_guard– validates the target environment, performs risk scoring, enforces maintenance windows, and generates a Correlation ID (CID) for the run.gen_inventory– (legacy / bridge) generates environment-specific inventories from structured data where NetBox is not yet the source of truth.host_selector– chooses which hosts are in scope, with multiple selection methods (manual, group-based, hierarchical, bulk).ip_mapper– resolves placeholder addresses into environment-specific IPs when needed. In NetBox-first setups this becomes a bridge/fallback role.connectivity_test– runs multi-protocol reachability checks (ICMP, SSH, HTTP/HTTPS) and writes JSON/JSONL artefacts before any risky changes.deploy– application or infrastructure deployment roles that apply the actual change under the same CID.
All of these steps are CID-aware: logs and artefacts can be tied back to a single execution across roles and pipelines.
For role-level details and where they are used, see:
4. NetBox as source of truth¶
Inventory and addressing decisions are not hard-coded into playbooks.
Instead, NetBox is used as the source of truth for:
- Sites, VLANs, prefixes, and addresses.
- Devices, VMs, and their primary IPs.
- Tags, roles, and other selectors used by automation.
Ansible then consumes NetBox-backed dynamic inventories such as:
deployment/inventories/core/netbox.ymldeployment/inventories/env/dev/netbox.yml(and equivalents for staging/prod)
That means:
- Environment guardrails are enforced on top of a single, consistent inventory model.
- DR rebuilds can reconstruct inventory deterministically from NetBox.
- Evidence (CSV exports, seed logs, Ansible runs) all point back to the same authority.
NetBox’s role as the source of truth is captured in:
For how NetBox data is exported/imported and surfaced as evidence, see also:
5. How guardrails show up in day-to-day work¶
5.1 Engineering work in dev¶
Scenario: an engineer wants to change RKE2 configuration or try a new addon.
- They target dev explicitly (
-e env=dev) or via CI defaults. env_guardconfirms that dev is low-risk, always-open, and does not require manual approval.host_selectorscopes the change to dev control-plane nodes.connectivity_testruns and records a CID-tagged snapshot of reachability.- The change is applied, and follow-up checks see the impact.
If something fails, it fails early in dev with clear logs; prod is untouched.
5.2 Rehearsal and sign-off in staging¶
Scenario: a release candidate or infrastructure change is deployed to staging.
- CI or a release playbook sets
env=staging. env_guard:- Computes a risk score based on environment, number of hosts, and timing.
- Validates that the run is within staging business hours.
- Issues a CID that appears in logs and Markdown audit reports.
host_selectoruses group-based or hierarchical selection to target only the relevant staging clusters.connectivity_testand the application playbooks run against staging only.
This creates an artefact trail that shows exactly what went to staging and when, including pre- and post-checks tied to a single CID.
5.3 Controlled change in prod¶
Scenario: a production change (security patch, config adjustment, rollout) needs to be executed on live systems.
- The engineer must explicitly target
env=prodor use a CI job wired to prod. There is no “silent” prod path. env_guard:- Detects prod and assigns a high risk score.
- Enforces the maintenance window (for example, weekend daytime or an agreed change slot).
- Requires justification text and, depending on configuration, can require a manual confirmation step (for example, typing a confirmation phrase).
- Emits a CID that will be attached to all subsequent logs and reports.
host_selectorlimits scope (for example, prod control plane only, or a specific service slice).connectivity_testruns; if the environment already looks unhealthy, the run can be blocked before making it worse.- Only then do deployment roles run, under the same CID.
From an engineering-lead / hiring-manager perspective this means:
- Every production change is intentional, scoped, and timestamped.
- You can answer “who changed what, where, and when?” using concrete artefacts, not guesswork.
6. What this gives engineering leaders and hiring managers¶
From a leadership or hiring-manager point of view, the environment model and guardrails translate to:
-
Operational discipline
Changes to prod are gated, logged, and explainable without adding friction to dev and staging work. -
Clear separation of concerns
Developers can experiment in dev, teams can validate in staging, and only proven changes move into prod with appropriate controls. -
Audit-ready artefacts
CIDs, logs, JSON/JSONL outputs, and Markdown reports give you hard evidence for incident reviews, due diligence, and compliance conversations. -
Reduced onboarding risk
New engineers inherit guardrails by default; you are not relying on everyone remembering “the rules” before they run their first playbook.
For concrete artefact locations and how they are packaged into evidence packs, see:
7. Where ADR-0600 fits¶
The detailed design of the Environment Guard Framework is captured in:
- ADR-0600 – Environment Guard Framework
Describes theenv_guardrole, risk model, maintenance windows, correlation IDs, and howgen_inventory,host_selector,ip_mapper, andconnectivity_testare chained together. - ADR-0002 – NetBox-Driven Inventory
Explains how NetBox is used as the source of truth for IPAM and inventory. - Ansible Collections and Roles Index
Shows where the EGF-related roles live, how they are tested, and how they are composed into end-to-end pipelines.
These sit under Architecture & Decisions and Reference in the docs and are cross-linked from HOWTOs and runbooks wherever the framework is used.
8. Summary¶
- The platform runs across dev, staging, and prod, each with a clearly defined risk profile.
- The Environment Guard Framework makes those profiles real by enforcing environment, scope, timing, and approvals in automation.
- NetBox acts as the source of truth for inventory and addressing so guardrails always operate on a consistent view of the estate.
- Every meaningful change, especially in prod, is intentional, governed, and traceable through CID-tagged artefacts.
This is the same discipline you would expect from a production platform in a commercial setting; the difference here is that it is also fully documented.