Skip to content

Environments & Guardrails

HybridOps treats an environment as a logical lane (an isolated execution environment), not as a provider-specific slice.

That means one env can legitimately contain the on-prem, cloud, edge, and control-plane state for the same lane, while shared holds only the assets intentionally reused across lanes.

This page sits alongside:

Those documents cover how environments are defined, how automation is governed, and how inventory and secrets behave across lanes.


Environment binding rule

HybridOps treats an env as a logical lane.

That means:

  • dev is expected to hold the dev on-prem, cloud, and edge state for the same lane
  • shared is the normal cross-env authority for assets intentionally shared between lanes
  • other cross-env state references are exceptional and should be used only for controlled drills or migrations

Examples:

  • Good steady-state pattern:
  • dev on-prem platform + dev GCP burst + dev DR resources all in --env dev
  • Good shared-authority pattern:
  • dev DNS or runner flows consuming shared control-plane state
  • Not the normal operating model:
  • long-lived dev-onprem and dev-gcp envs wired together through cross-env refs

Non-shared cross-env state resolution fails fast unless the run sets allow_cross_env_state=true explicitly.

For routine inspection of the resulting env posture, prefer:

  • hyops show init --env <env>
  • hyops show module --env <env> <module-ref[#instance]>
  • hyops show env --env <env>

1. Why environments and guardrails exist

The platform is designed so that:

  • routine experimentation can happen without touching live service lanes
  • controlled drills and migrations can happen explicitly
  • shared authorities remain stable across those lanes

To achieve this, HybridOps combines:

  • logical lanes such as shared, dev, drill, staging, and prod
  • guardrails that enforce scope, approvals, timing, and cross-env boundaries
  • run records that make each execution reviewable afterward

2. Common lane patterns

HybridOps does not force a fixed three-env model. The common lane patterns are:

Lane Purpose Notes
shared Intentional shared authorities DNS, runners, control-plane services, or other assets meant to be reused
dev Integrated working lane Can include on-prem, cloud, and edge state for the same delivery lane
drill Controlled rehearsal lane Used for DR proofs, recovery drills, and destructive validation
staging Pre-production lane Optional, when a team needs a formal pre-prod promotion path
prod Live service lane Highest guardrails and approval expectations

The important rule is not the lane name. It is whether the lane is:

  • integrated and self-contained
  • intentionally shared
  • or an explicitly controlled exception such as a drill or migration

For a deeper look at how platform-level policies are defined and enforced, see:


3. The Environment Guard Framework (EGF)

The Environment Guard Framework is the set of roles and patterns that enforce environment rules at run time. It lives primarily in the hybridops.common Ansible collection and is defined in detail in:

At a high level, the EGF pipeline is:

flowchart LR
    A[env_guard<br/>Governance / risk] --> B[gen_inventory<br/>Placeholder inventory]
    B --> C[host_selector<br/>Host targeting]
    C --> D[ip_mapper<br/>Env-aware IP mapping]
    D --> E[connectivity_test<br/>Connectivity + records]
    E --> F[deploy<br/>Application / infra change]

In CI and other non-interactive runs, env_guard reads the target environment from variables (for example env) or environment variables such as HOS_ENV, and runs non-interactively.

  • env_guard – validates the target environment, performs risk scoring, enforces maintenance windows, and generates a Correlation ID (CID) for the run.
  • gen_inventory – (legacy / bridge) generates environment-specific inventories from structured data where NetBox is not yet the source of truth.
  • host_selector – chooses which hosts are in scope, with multiple selection methods (manual, group-based, hierarchical, bulk).
  • ip_mapper – resolves placeholder addresses into environment-specific IPs when needed. In NetBox-first setups this becomes a bridge/fallback role.
  • connectivity_test – runs multi-protocol reachability checks (ICMP, SSH, HTTP/HTTPS) and writes JSON/JSONL run records before any risky changes.
  • deploy – application or infrastructure deployment roles that apply the actual change under the same CID.

All of these steps are CID-aware: logs and run records can be tied back to a single execution across roles and pipelines.

For role-level details and where they are used, see:


4. NetBox as source of truth

Inventory and addressing decisions are not hard-coded into playbooks.

Instead, NetBox is used as the source of truth for:

  • Sites, VLANs, prefixes, and addresses.
  • Devices, VMs, and their primary IPs.
  • Tags, roles, and other selectors used by automation.

Ansible then consumes NetBox-backed dynamic inventories such as:

  • deployment/inventories/core/netbox.yml
  • deployment/inventories/env/dev/netbox.yml (and equivalents for staging/prod)

That means:

  • Environment guardrails are enforced on top of a single, consistent inventory model.
  • DR rebuilds can reconstruct inventory deterministically from NetBox.
  • Run records (CSV exports, seed logs, Ansible runs) all point back to the same authority.

NetBox's role as the source of truth is captured in:

For how NetBox data is exported/imported and surfaced in the reference map, see also:


4. Lane and guardrail behavior in practice

4.1 Engineering work in dev

Scenario: an engineer wants to change RKE2 configuration or try a new addon.

  • They target dev explicitly (-e env=dev) or via CI defaults.
  • env_guard confirms that dev is low-risk, always-open, and does not require manual approval.
  • host_selector scopes the change to dev control-plane nodes.
  • connectivity_test runs and records a CID-tagged snapshot of reachability.
  • The change is applied, and follow-up checks see the impact.

If something fails, it fails early in dev; prod is untouched.


4.2 Rehearsal in drill or sign-off in staging

Scenario: a release candidate or infrastructure change is deployed to staging.

  • CI or a release playbook sets env=drill or env=staging.
  • env_guard:
  • Computes a risk score based on environment, number of hosts, and timing.
  • Validates that the run is within staging business hours.
  • Issues a CID that appears in logs and Markdown audit reports.
  • host_selector uses group-based or hierarchical selection to target only the relevant staging clusters.
  • connectivity_test and the application playbooks run against staging only.

This creates a run trail that shows exactly what was exercised and when, including pre- and post-checks tied to a single CID.


4.3 Controlled change in prod

Scenario: a production change (security patch, config adjustment, rollout) needs to be executed on live systems.

  • The engineer must explicitly target env=prod or use a CI job wired to prod. There is no "silent" prod path.
  • env_guard:
  • Detects prod and assigns a high risk score.
  • Enforces the maintenance window (for example, weekend daytime or an agreed change slot).
  • Requires justification text and, depending on configuration, can require a manual confirmation step (for example, typing a confirmation phrase).
  • Emits a CID that will be attached to all subsequent logs and reports.
  • host_selector limits scope (for example, prod control plane only, or a specific service slice).
  • connectivity_test runs; if the environment already looks unhealthy, the run can be blocked before making it worse.
  • Only then do deployment roles run, under the same CID.

From an operating perspective this means:

  • Every production change is intentional, scoped, and timestamped.
  • You can answer "who changed what, where, and when?" using concrete run records, not guesswork.

5. Where ADR-0600 fits

The detailed design of the Environment Guard Framework is captured in:

These sit under Architecture & Decisions and Reference in the docs and are cross-linked from HOWTOs and runbooks wherever the framework is used.


6. Summary

  • Environments are logical lanes, not provider slices.
  • shared is the normal shared authority; other cross-env state is exceptional.
  • The Environment Guard Framework makes lane boundaries, approvals, scope, and timing enforceable in automation.
  • NetBox remains the source of truth for inventory and addressing where platform automation depends on it.
  • Each meaningful run is intentional, governed, and traceable through CID-tagged run records.