Environments & Guardrails¶
HybridOps treats an environment as a logical lane (an isolated execution environment), not as a provider-specific slice.
That means one env can legitimately contain the on-prem, cloud, edge, and control-plane state for the same lane, while shared holds only the assets intentionally reused across lanes.
This page sits alongside:
- ADR-0600 – Environment Guard Framework
- ADR-0002 – NetBox-Driven Inventory
- Secrets Lifecycle
- Ansible collections
Those documents cover how environments are defined, how automation is governed, and how inventory and secrets behave across lanes.
Environment binding rule¶
HybridOps treats an env as a logical lane.
That means:
devis expected to hold thedevon-prem, cloud, and edge state for the same lanesharedis the normal cross-env authority for assets intentionally shared between lanes- other cross-env state references are exceptional and should be used only for controlled drills or migrations
Examples:
- Good steady-state pattern:
devon-prem platform +devGCP burst +devDR resources all in--env dev- Good shared-authority pattern:
devDNS or runner flows consumingsharedcontrol-plane state- Not the normal operating model:
- long-lived
dev-onpremanddev-gcpenvs wired together through cross-env refs
Non-shared cross-env state resolution fails fast unless the run sets allow_cross_env_state=true explicitly.
For routine inspection of the resulting env posture, prefer:
hyops show init --env <env>hyops show module --env <env> <module-ref[#instance]>hyops show env --env <env>
1. Why environments and guardrails exist¶
The platform is designed so that:
- routine experimentation can happen without touching live service lanes
- controlled drills and migrations can happen explicitly
- shared authorities remain stable across those lanes
To achieve this, HybridOps combines:
- logical lanes such as
shared,dev,drill,staging, andprod - guardrails that enforce scope, approvals, timing, and cross-env boundaries
- run records that make each execution reviewable afterward
2. Common lane patterns¶
HybridOps does not force a fixed three-env model. The common lane patterns are:
| Lane | Purpose | Notes |
|---|---|---|
shared |
Intentional shared authorities | DNS, runners, control-plane services, or other assets meant to be reused |
dev |
Integrated working lane | Can include on-prem, cloud, and edge state for the same delivery lane |
drill |
Controlled rehearsal lane | Used for DR proofs, recovery drills, and destructive validation |
staging |
Pre-production lane | Optional, when a team needs a formal pre-prod promotion path |
prod |
Live service lane | Highest guardrails and approval expectations |
The important rule is not the lane name. It is whether the lane is:
- integrated and self-contained
- intentionally shared
- or an explicitly controlled exception such as a drill or migration
For a deeper look at how platform-level policies are defined and enforced, see:
3. The Environment Guard Framework (EGF)¶
The Environment Guard Framework is the set of roles and patterns that enforce environment rules at run time. It lives primarily in the hybridops.common Ansible collection and is defined in detail in:
At a high level, the EGF pipeline is:
flowchart LR
A[env_guard<br/>Governance / risk] --> B[gen_inventory<br/>Placeholder inventory]
B --> C[host_selector<br/>Host targeting]
C --> D[ip_mapper<br/>Env-aware IP mapping]
D --> E[connectivity_test<br/>Connectivity + records]
E --> F[deploy<br/>Application / infra change]
In CI and other non-interactive runs, env_guard reads the target environment from variables (for example env) or environment variables such as HOS_ENV, and runs non-interactively.
env_guard– validates the target environment, performs risk scoring, enforces maintenance windows, and generates a Correlation ID (CID) for the run.gen_inventory– (legacy / bridge) generates environment-specific inventories from structured data where NetBox is not yet the source of truth.host_selector– chooses which hosts are in scope, with multiple selection methods (manual, group-based, hierarchical, bulk).ip_mapper– resolves placeholder addresses into environment-specific IPs when needed. In NetBox-first setups this becomes a bridge/fallback role.connectivity_test– runs multi-protocol reachability checks (ICMP, SSH, HTTP/HTTPS) and writes JSON/JSONL run records before any risky changes.deploy– application or infrastructure deployment roles that apply the actual change under the same CID.
All of these steps are CID-aware: logs and run records can be tied back to a single execution across roles and pipelines.
For role-level details and where they are used, see:
4. NetBox as source of truth¶
Inventory and addressing decisions are not hard-coded into playbooks.
Instead, NetBox is used as the source of truth for:
- Sites, VLANs, prefixes, and addresses.
- Devices, VMs, and their primary IPs.
- Tags, roles, and other selectors used by automation.
Ansible then consumes NetBox-backed dynamic inventories such as:
deployment/inventories/core/netbox.ymldeployment/inventories/env/dev/netbox.yml(and equivalents for staging/prod)
That means:
- Environment guardrails are enforced on top of a single, consistent inventory model.
- DR rebuilds can reconstruct inventory deterministically from NetBox.
- Run records (CSV exports, seed logs, Ansible runs) all point back to the same authority.
NetBox's role as the source of truth is captured in:
For how NetBox data is exported/imported and surfaced in the reference map, see also:
4. Lane and guardrail behavior in practice¶
4.1 Engineering work in dev¶
Scenario: an engineer wants to change RKE2 configuration or try a new addon.
- They target dev explicitly (
-e env=dev) or via CI defaults. env_guardconfirms that dev is low-risk, always-open, and does not require manual approval.host_selectorscopes the change to dev control-plane nodes.connectivity_testruns and records a CID-tagged snapshot of reachability.- The change is applied, and follow-up checks see the impact.
If something fails, it fails early in dev; prod is untouched.
4.2 Rehearsal in drill or sign-off in staging¶
Scenario: a release candidate or infrastructure change is deployed to staging.
- CI or a release playbook sets
env=drillorenv=staging. env_guard:- Computes a risk score based on environment, number of hosts, and timing.
- Validates that the run is within staging business hours.
- Issues a CID that appears in logs and Markdown audit reports.
host_selectoruses group-based or hierarchical selection to target only the relevant staging clusters.connectivity_testand the application playbooks run against staging only.
This creates a run trail that shows exactly what was exercised and when, including pre- and post-checks tied to a single CID.
4.3 Controlled change in prod¶
Scenario: a production change (security patch, config adjustment, rollout) needs to be executed on live systems.
- The engineer must explicitly target
env=prodor use a CI job wired to prod. There is no "silent" prod path. env_guard:- Detects prod and assigns a high risk score.
- Enforces the maintenance window (for example, weekend daytime or an agreed change slot).
- Requires justification text and, depending on configuration, can require a manual confirmation step (for example, typing a confirmation phrase).
- Emits a CID that will be attached to all subsequent logs and reports.
host_selectorlimits scope (for example, prod control plane only, or a specific service slice).connectivity_testruns; if the environment already looks unhealthy, the run can be blocked before making it worse.- Only then do deployment roles run, under the same CID.
From an operating perspective this means:
- Every production change is intentional, scoped, and timestamped.
- You can answer "who changed what, where, and when?" using concrete run records, not guesswork.
5. Where ADR-0600 fits¶
The detailed design of the Environment Guard Framework is captured in:
- ADR-0600 – Environment Guard Framework
Describes the
env_guardrole, risk model, maintenance windows, correlation IDs, and howgen_inventory,host_selector,ip_mapper, andconnectivity_testare chained together. - ADR-0002 – NetBox-Driven Inventory Explains how NetBox is used as the source of truth for IPAM and inventory.
- Ansible collections Shows where the EGF-related roles live, how they are tested, and how they are composed into end-to-end pipelines.
These sit under Architecture & Decisions and Reference in the docs and are cross-linked from HOWTOs and runbooks wherever the framework is used.
6. Summary¶
- Environments are logical lanes, not provider slices.
sharedis the normal shared authority; other cross-env state is exceptional.- The Environment Guard Framework makes lane boundaries, approvals, scope, and timing enforceable in automation.
- NetBox remains the source of truth for inventory and addressing where platform automation depends on it.
- Each meaningful run is intentional, governed, and traceable through CID-tagged run records.