Skip to content

Environments & Guardrails

HybridOps.Studio runs across a focused set of three clearly defined environments
(dev, staging, prod) with explicit guardrails around who can change what, where, and when.

This page is aimed at engineering leaders, hiring managers, and other decision-makers who need to see how risk is controlled without slowing teams down. It also serves as background for any formal reviewers who want to understand how the Environment Guard Framework (EGF) is applied in practice.

This page sits alongside:

Together, they describe how environments are defined, how automation is governed, and how inventory and secrets behave across tiers.


1. Why environments and guardrails exist

The platform is designed to be:

  • Experiment-friendly in dev.
  • Realistic but reversible in staging.
  • Tightly controlled in prod.

To get there, we combine:

  • Environment tiers – dev / staging / prod with different risk expectations.
  • Guardrails – policy plus automation that enforce those expectations.
  • Evidence – logs and reports that prove the guardrails actually fired.

The goal is simple:

Engineers can move quickly, but nobody can “accidentally prod” without hitting visible checks, approvals, and audit trails.


2. Environment tiers at a glance

Environment Purpose Risk profile Typical users
dev Everyday changes, experiments Low – intentionally changeable Individual engineers, CI
staging Pre-prod, UAT, integration testing Medium – controlled Teams, release managers
prod Live platform, external users High – tightly gated SREs, senior engineers, on-call

Key differences:

  • Approvals – prod changes can require explicit approvals and justifications.
  • Windows – prod deploys are restricted to maintenance windows by default.
  • Observability – higher logging and alerting expectations in staging/prod.
  • Rollbacks – DR and rollback drills are always staged before prod adoption.

These expectations are encoded in automation, not just written on a wiki.

For a deeper look at how platform-level policies are defined and enforced, see:


3. The Environment Guard Framework (EGF)

The Environment Guard Framework is the set of roles and patterns that enforce environment rules at run time. It lives primarily in the hybridops.common Ansible collection and is defined in detail in:

At a high level, the EGF pipeline is:

flowchart LR
    A[env_guard<br/>Governance / risk] --> B[gen_inventory<br/>Placeholder inventory]
    B --> C[host_selector<br/>Host targeting]
    C --> D[ip_mapper<br/>Env-aware IP mapping]
    D --> E[connectivity_test<br/>Connectivity + artefacts]
    E --> F[deploy<br/>Application / infra change]

In CI and other non-interactive runs, env_guard reads the target environment from variables (for example env) or environment variables such as HOS_ENV, and runs in a fully non-interactive mode.

  • env_guard – validates the target environment, performs risk scoring, enforces maintenance windows, and generates a Correlation ID (CID) for the run.
  • gen_inventory – (legacy / bridge) generates environment-specific inventories from structured data where NetBox is not yet the source of truth.
  • host_selector – chooses which hosts are in scope, with multiple selection methods (manual, group-based, hierarchical, bulk).
  • ip_mapper – resolves placeholder addresses into environment-specific IPs when needed. In NetBox-first setups this becomes a bridge/fallback role.
  • connectivity_test – runs multi-protocol reachability checks (ICMP, SSH, HTTP/HTTPS) and writes JSON/JSONL artefacts before any risky changes.
  • deploy – application or infrastructure deployment roles that apply the actual change under the same CID.

All of these steps are CID-aware: logs and artefacts can be tied back to a single execution across roles and pipelines.

For role-level details and where they are used, see:


4. NetBox as source of truth

Inventory and addressing decisions are not hard-coded into playbooks.

Instead, NetBox is used as the source of truth for:

  • Sites, VLANs, prefixes, and addresses.
  • Devices, VMs, and their primary IPs.
  • Tags, roles, and other selectors used by automation.

Ansible then consumes NetBox-backed dynamic inventories such as:

  • deployment/inventories/core/netbox.yml
  • deployment/inventories/env/dev/netbox.yml (and equivalents for staging/prod)

That means:

  • Environment guardrails are enforced on top of a single, consistent inventory model.
  • DR rebuilds can reconstruct inventory deterministically from NetBox.
  • Evidence (CSV exports, seed logs, Ansible runs) all point back to the same authority.

NetBox’s role as the source of truth is captured in:

For how NetBox data is exported/imported and surfaced as evidence, see also:


5. How guardrails show up in day-to-day work

5.1 Engineering work in dev

Scenario: an engineer wants to change RKE2 configuration or try a new addon.

  • They target dev explicitly (-e env=dev) or via CI defaults.
  • env_guard confirms that dev is low-risk, always-open, and does not require manual approval.
  • host_selector scopes the change to dev control-plane nodes.
  • connectivity_test runs and records a CID-tagged snapshot of reachability.
  • The change is applied, and follow-up checks see the impact.

If something fails, it fails early in dev with clear logs; prod is untouched.


5.2 Rehearsal and sign-off in staging

Scenario: a release candidate or infrastructure change is deployed to staging.

  • CI or a release playbook sets env=staging.
  • env_guard:
  • Computes a risk score based on environment, number of hosts, and timing.
  • Validates that the run is within staging business hours.
  • Issues a CID that appears in logs and Markdown audit reports.
  • host_selector uses group-based or hierarchical selection to target only the relevant staging clusters.
  • connectivity_test and the application playbooks run against staging only.

This creates an artefact trail that shows exactly what went to staging and when, including pre- and post-checks tied to a single CID.


5.3 Controlled change in prod

Scenario: a production change (security patch, config adjustment, rollout) needs to be executed on live systems.

  • The engineer must explicitly target env=prod or use a CI job wired to prod. There is no “silent” prod path.
  • env_guard:
  • Detects prod and assigns a high risk score.
  • Enforces the maintenance window (for example, weekend daytime or an agreed change slot).
  • Requires justification text and, depending on configuration, can require a manual confirmation step (for example, typing a confirmation phrase).
  • Emits a CID that will be attached to all subsequent logs and reports.
  • host_selector limits scope (for example, prod control plane only, or a specific service slice).
  • connectivity_test runs; if the environment already looks unhealthy, the run can be blocked before making it worse.
  • Only then do deployment roles run, under the same CID.

From an engineering-lead / hiring-manager perspective this means:

  • Every production change is intentional, scoped, and timestamped.
  • You can answer “who changed what, where, and when?” using concrete artefacts, not guesswork.

6. What this gives engineering leaders and hiring managers

From a leadership or hiring-manager point of view, the environment model and guardrails translate to:

  • Operational discipline
    Changes to prod are gated, logged, and explainable without adding friction to dev and staging work.

  • Clear separation of concerns
    Developers can experiment in dev, teams can validate in staging, and only proven changes move into prod with appropriate controls.

  • Audit-ready artefacts
    CIDs, logs, JSON/JSONL outputs, and Markdown reports give you hard evidence for incident reviews, due diligence, and compliance conversations.

  • Reduced onboarding risk
    New engineers inherit guardrails by default; you are not relying on everyone remembering “the rules” before they run their first playbook.

For concrete artefact locations and how they are packaged into evidence packs, see:


7. Where ADR-0600 fits

The detailed design of the Environment Guard Framework is captured in:

These sit under Architecture & Decisions and Reference in the docs and are cross-linked from HOWTOs and runbooks wherever the framework is used.


8. Summary

  • The platform runs across dev, staging, and prod, each with a clearly defined risk profile.
  • The Environment Guard Framework makes those profiles real by enforcing environment, scope, timing, and approvals in automation.
  • NetBox acts as the source of truth for inventory and addressing so guardrails always operate on a consistent view of the estate.
  • Every meaningful change, especially in prod, is intentional, governed, and traceable through CID-tagged artefacts.

This is the same discipline you would expect from a production platform in a commercial setting; the difference here is that it is also fully documented.