Environments & Guardrails¶

HybridOps treats an environment as a logical lane (an isolated execution environment), not as a provider-specific slice.

That means one env can legitimately contain the on-prem, cloud, edge, and control-plane state for the same lane, while shared holds only the assets intentionally reused across lanes.

This page sits alongside:

Those documents cover how environments are defined, how automation is governed, and how inventory and secrets behave across lanes.

Environment binding rule¶

HybridOps treats an env as a logical lane.

That means:

dev is expected to hold the dev on-prem, cloud, and edge state for the same lane
shared is the normal cross-env authority for assets intentionally shared between lanes
other cross-env state references are exceptional and should be used only for controlled drills or migrations

Examples:

Good steady-state pattern:
dev on-prem platform + dev GCP burst + dev DR resources all in --env dev
Good shared-authority pattern:
dev DNS or runner flows consuming shared control-plane state
Not the normal operating model:
long-lived dev-onprem and dev-gcp envs wired together through cross-env refs

Non-shared cross-env state resolution fails fast unless the run sets allow_cross_env_state=true explicitly.

For routine inspection of the resulting env posture, prefer:

hyops show init --env <env>
hyops show module --env <env> <module-ref[#instance]>
hyops show env --env <env>

1. Why environments and guardrails exist¶

The platform is designed so that:

routine experimentation can happen without touching live service lanes
controlled drills and migrations can happen explicitly
shared authorities remain stable across those lanes

To achieve this, HybridOps combines:

logical lanes such as shared, dev, drill, staging, and prod
guardrails that enforce scope, approvals, timing, and cross-env boundaries
run records that make each execution reviewable afterward

2. Common lane patterns¶

HybridOps does not force a fixed three-env model. The common lane patterns are:

Lane	Purpose	Notes
`shared`	Intentional shared authorities	DNS, runners, control-plane services, or other assets meant to be reused
`dev`	Integrated working lane	Can include on-prem, cloud, and edge state for the same delivery lane
`drill`	Controlled rehearsal lane	Used for DR proofs, recovery drills, and destructive validation
`staging`	Pre-production lane	Optional, when a team needs a formal pre-prod promotion path
`prod`	Live service lane	Highest guardrails and approval expectations

The important rule is not the lane name. It is whether the lane is:

integrated and self-contained
intentionally shared
or an explicitly controlled exception such as a drill or migration

For a deeper look at how platform-level policies are defined and enforced, see:

3. The Environment Guard Framework (EGF)¶

The Environment Guard Framework is the set of roles and patterns that enforce environment rules at run time. It lives primarily in the hybridops.common Ansible collection and is defined in detail in:

ADR-0600 – Environment Guard Framework

At a high level, the EGF pipeline is:

flowchart LR
    A[env_guard<br/>Governance / risk] --> B[gen_inventory<br/>Placeholder inventory]
    B --> C[host_selector<br/>Host targeting]
    C --> D[ip_mapper<br/>Env-aware IP mapping]
    D --> E[connectivity_test<br/>Connectivity + records]
    E --> F[deploy<br/>Application / infra change]

In CI and other non-interactive runs, env_guard reads the target environment from variables (for example env) or environment variables such as HOS_ENV, and runs non-interactively.

env_guard – validates the target environment, performs risk scoring, enforces maintenance windows, and generates a Correlation ID (CID) for the run.
gen_inventory – (legacy / bridge) generates environment-specific inventories from structured data where NetBox is not yet the source of truth.
host_selector – chooses which hosts are in scope, with multiple selection methods (manual, group-based, hierarchical, bulk).
ip_mapper – resolves placeholder addresses into environment-specific IPs when needed. In NetBox-first setups this becomes a bridge/fallback role.
connectivity_test – runs multi-protocol reachability checks (ICMP, SSH, HTTP/HTTPS) and writes JSON/JSONL run records before any risky changes.
deploy – application or infrastructure deployment roles that apply the actual change under the same CID.

All of these steps are CID-aware: logs and run records can be tied back to a single execution across roles and pipelines.

For role-level details and where they are used, see:

Ansible collections

4. NetBox as source of truth¶

Inventory and addressing decisions are not hard-coded into playbooks.

Instead, NetBox is used as the source of truth for:

Sites, VLANs, prefixes, and addresses.
Devices, VMs, and their primary IPs.
Tags, roles, and other selectors used by automation.

Ansible then consumes NetBox-backed dynamic inventories such as:

deployment/inventories/core/netbox.yml
deployment/inventories/env/dev/netbox.yml (and equivalents for staging/prod)

That means:

Environment guardrails are enforced on top of a single, consistent inventory model.
DR rebuilds can reconstruct inventory deterministically from NetBox.
Run records (CSV exports, seed logs, Ansible runs) all point back to the same authority.

NetBox's role as the source of truth is captured in:

ADR-0002 – NetBox-Driven Inventory

For how NetBox data is exported/imported and surfaced in the reference map, see also:

Reference map

4. Lane and guardrail behavior in practice¶

4.1 Engineering work in `dev`¶

Scenario: an engineer wants to change RKE2 configuration or try a new addon.

They target dev explicitly (-e env=dev) or via CI defaults.
env_guard confirms that dev is low-risk, always-open, and does not require manual approval.
host_selector scopes the change to dev control-plane nodes.
connectivity_test runs and records a CID-tagged snapshot of reachability.
The change is applied, and follow-up checks see the impact.

If something fails, it fails early in dev; prod is untouched.

4.2 Rehearsal in `drill` or sign-off in `staging`¶

Scenario: a release candidate or infrastructure change is deployed to staging.

CI or a release playbook sets env=drill or env=staging.
env_guard:
Computes a risk score based on environment, number of hosts, and timing.
Validates that the run is within staging business hours.
Issues a CID that appears in logs and Markdown audit reports.
host_selector uses group-based or hierarchical selection to target only the relevant staging clusters.
connectivity_test and the application playbooks run against staging only.

This creates a run trail that shows exactly what was exercised and when, including pre- and post-checks tied to a single CID.

4.3 Controlled change in `prod`¶

Scenario: a production change (security patch, config adjustment, rollout) needs to be executed on live systems.

The engineer must explicitly target env=prod or use a CI job wired to prod. There is no "silent" prod path.
env_guard:
Detects prod and assigns a high risk score.
Enforces the maintenance window (for example, weekend daytime or an agreed change slot).
Requires justification text and, depending on configuration, can require a manual confirmation step (for example, typing a confirmation phrase).
Emits a CID that will be attached to all subsequent logs and reports.
host_selector limits scope (for example, prod control plane only, or a specific service slice).
connectivity_test runs; if the environment already looks unhealthy, the run can be blocked before making it worse.
Only then do deployment roles run, under the same CID.

From an operating perspective this means:

Every production change is intentional, scoped, and timestamped.
You can answer "who changed what, where, and when?" using concrete run records, not guesswork.

5. Where ADR-0600 fits¶

The detailed design of the Environment Guard Framework is captured in:

ADR-0600 – Environment Guard Framework Describes the env_guard role, risk model, maintenance windows, correlation IDs, and how gen_inventory, host_selector, ip_mapper, and connectivity_test are chained together.
ADR-0002 – NetBox-Driven Inventory Explains how NetBox is used as the source of truth for IPAM and inventory.
Ansible collections Shows where the EGF-related roles live, how they are tested, and how they are composed into end-to-end pipelines.

These sit under Architecture & Decisions and Reference in the docs and are cross-linked from HOWTOs and runbooks wherever the framework is used.

6. Summary¶

Environments are logical lanes, not provider slices.
shared is the normal shared authority; other cross-env state is exceptional.
The Environment Guard Framework makes lane boundaries, approvals, scope, and timing enforceable in automation.
NetBox remains the source of truth for inventory and addressing where platform automation depends on it.
Each meaningful run is intentional, governed, and traceable through CID-tagged run records.