Skip to content

Bootstrap NetBox Foundation (HyOps Blueprint)

  • Purpose: Produce a working on-prem NetBox foundation using a governed, repeatable blueprint run. Owner: Platform engineering

  • Trigger: New environment bootstrap, platform rebuild, or lab environment reset

  • Impact: Builds images, provisions VMs, configures PostgreSQL and NetBox, then seeds NetBox with foundation IPAM/VM inventory
  • Severity: P2 Pre-reqs: Proxmox init completed for the environment, vault decrypt working, Ansible deps installed, blueprint inputs reviewed.

  • Rollback strategy: Destroy affected modules with the same overlays, or run hyops rebuild per-module.

Context

This runbook executes the product-level blueprint:

  • Blueprint: onprem/bootstrap-netbox@v1
  • Location (source checkout): hybridops-core/blueprints/onprem/bootstrap-netbox@v1/blueprint.yml

It sequences modules so that:

  1. SDN is converged
  2. Template image is built (if needed)
  3. PostgreSQL + NetBox VMs are provisioned
  4. PostgreSQL is configured
  5. NetBox is configured and validated
  6. Foundation SDN prefixes and bootstrap VMs are synced into NetBox (strict sync-only seed pass after NetBox API is ready)

Bootstrap NetBox seed behavior:

  • The blueprint no longer re-applies SDN/VM infrastructure to seed NetBox.
  • The NetBox module performs a sync-only seed pass after NetBox is online and the API is reachable from the controller.
  • The seed pass is strict: bootstrap fails if foundation dataset sync cannot complete.
  • The final VM inventory seed pass auto-creates the NetBox virtualization cluster (onprem-core) if missing.
  • HyOps also auto-creates a default NetBox cluster type (hyops-managed) unless an explicit type is provided via environment (NETBOX_CLUSTER_TYPE_ID, or NETBOX_CLUSTER_TYPE_NAME / NETBOX_CLUSTER_TYPE_SLUG).
  • VM sync enriches imported records with provider metadata when available (VM sizing, interface MAC, external ID, and role).
  • For provider linkage, HyOps attempts to auto-create the NetBox VM custom field external_id on first sync (can be overridden/disabled via environment).

Preconditions and safety checks

  • Installed hyops (via install.sh) can be run from any working directory.
  • Source-checkout usage should export HYOPS_CORE_ROOT=/path/to/hybridops-core.
  • NetBox authority environment selected (default: shared).
  • Proxmox environment init completed for this env:
  • hyops init proxmox --env <env> --bootstrap ...
  • Ansible deps installed into the runtime root for the environment:
  • hybridops-core/tools/setup/setup-ansible.sh
  • NetBox bootstrap secrets available via shell env or runtime vault:
  • NETBOX_DB_PASSWORD
  • NETBOX_SECRET_KEY
  • NETBOX_SUPERUSER_PASSWORD
  • NETBOX_API_TOKEN

Bootstrap credential behavior (important):

  • NetBox login defaults to username admin.
  • HyOps applies NETBOX_SUPERUSER_PASSWORD as the authoritative password for the admin user on apply (existing user passwords are reconciled by default).
  • Use only the value itself when logging in (not NETBOX_SUPERUSER_PASSWORD=...).

NetBox authority model:

  • By default, HyOps treats NetBox as a shared authority hosted at $HOME/.hybridops/envs/shared.
  • To override the authority (advanced):
  • export HYOPS_NETBOX_AUTHORITY_ENV=<env>
  • export HYOPS_NETBOX_AUTHORITY_ROOT=/path/to/runtime/root

Recommended: seed required keys in the env vault before running the blueprint:

hyops secrets ensure --env shared \
  NETBOX_DB_PASSWORD \
  NETBOX_SECRET_KEY \
  NETBOX_SUPERUSER_PASSWORD \
  NETBOX_API_TOKEN

Note on “chicken-and-egg”:

  • This blueprint intentionally uses static foundation IPs (for the NetBox VM and its database VM). NetBox cannot allocate its own IPs before it exists.
  • After this blueprint finishes, NetBox is online and pre-seeded with the foundation SDN prefixes and bootstrap VMs.
  • Later blueprints can use NetBox authoritative IPAM to allocate VM IPs without hardcoding them.

Install/update Ansible runtime deps for the env:

# If you installed via install.sh (default runs setup-all), this is already done.
# To (re)install Ansible Galaxy deps for an env:
hyops setup ansible --env shared

Steps

  1. Review the blueprint defaults (recommended)

This blueprint is self-contained for the default lab network it defines:

  • vnetmgmt (management) on 10.10.0.0/24
  • vnetdata (shared services/data) on 10.12.0.0/24
  • reserved static IPs for foundation VMs:
  • pgcore-01 on vnetdata (10.12.0.10)
  • netbox-01 dual-homed:
    • vnetmgmt (10.10.0.11) for operator/API access
    • vnetdata (10.12.0.11) for shared-service/data traffic; PostgreSQL HA allowlists both 10.10.0.11/32 and 10.12.0.11/32 for the NetBox client

If you need to change the network plan or reserved IPs for your environment:

  • Prefer creating an env-scoped copy with hyops blueprint init --env <env> --ref onprem/bootstrap-netbox@v1 and then running that overlay with --file (do not edit the installed release payload in-place).

  • Validate and plan

    hyops blueprint validate --ref onprem/bootstrap-netbox@v1
    hyops blueprint plan --ref onprem/bootstrap-netbox@v1
    
  • Preflight (contracts + module resolvability)

    hyops blueprint preflight --env shared \
      --ref onprem/bootstrap-netbox@v1
    

Optional strict NetBox API gate:

  • By default, blueprint contracts enforce NetBox state readiness for authority/IPAM steps.
  • To also require live NetBox API reachability + token validity during blueprint preflight/deploy, set:
  • policy.netbox_live_api_check: true in the blueprint manifest.
  • Use strict mode when operator/control-node network path to NetBox is expected to be stable.

  • Execute

    hyops blueprint deploy --env shared \
      --ref onprem/bootstrap-netbox@v1 \
      --execute
    

hyops blueprint deploy --execute performs a blueprint-level preflight gate before execution. If required checks fail, no step is executed. Use --skip-preflight only for controlled break-glass scenarios.

When rerunning the same blueprint in the same environment, HyOps may prompt for confirmation if existing step states indicate a rerun/replacement path. Use --yes for non-interactive/CI runs.

During long phases, HyOps prints progress markers and the active log file path:

  • progress: logs=...
  • progress: phase=preflight|apply|destroy|...
  • one-time log watch hints (for example watch logs: tail -f ...)

Heartbeat interval can be adjusted:

export HYOPS_PROGRESS_INTERVAL_S=30

Optional live terminal streaming for troubleshooting:

hyops --verbose blueprint deploy --env shared \
  --ref onprem/bootstrap-netbox@v1 \
  --execute

Failure summaries now usually include an open: <run-record-path>/<driver>.log hint. Start there first (terragrunt.log, ansible.log, or packer.log depending on the failing step).

  1. Verify state and run records
    ls -la $HOME/.hybridops/envs/shared/state/modules
    

Check for:

  • core__onprem__network-sdn/latest.json
  • core__onprem__template-image/latest.json
  • platform__onprem__platform-vm/latest.json
  • platform__onprem__postgresql-core/latest.json
  • platform__onprem__netbox/latest.json

Bootstrap timing note:

  • Fresh template clones (pgcore-01, netbox-01) can take 1-3 minutes before SSH is reachable on first boot.
  • The blueprint now includes built-in connectivity wait for the pgcore and netbox steps, so operators should not need to rerun just because SSH is not ready in the first few seconds.
  • PostgreSQL bootstrap now refreshes Debian-family APT metadata before package installation, so a fresh Ubuntu clone does not fail simply because package indexes are older than the mirror contents.
  • First-run NetBox migrations on a clean database can take several minutes. The default readiness budget now allows for that longer bootstrap window, and background worker/housekeeping services wait for the web service to become healthy before starting.
  • The sync-only NetBox seed step now uses a controller-local SSH tunnel over the same target access path as the NetBox host configuration step, so bootstrap does not require direct HTTP routing from the controller to the NetBox management IP.
  • NetBox housekeeping now runs as a one-shot maintenance task with restart: on-failure, which prevents a false-green restart loop after successful completion.

Verification

Primary state:

  • $HOME/.hybridops/envs/<env>/state/modules/platform__onprem__netbox/latest.json

Success indicators:

  • status is ok
  • outputs.netbox_url is present
  • outputs.cap.ipam.netbox is ready
  • outputs.netbox_url should resolve to the management IP by default (for example http://10.10.0.11:8000/)
  • NetBox contains:
  • IPAM prefix/VLAN/range records for the bootstrap SDN foundation (vnetmgmt + vnetdata)
  • virtualization cluster onprem-core
  • VMs pgcore-01 and netbox-01

Post-actions and clean-up

  • Record the blueprint run in the change record with:
  • blueprint ref
  • run run-record paths ($HOME/.hybridops/envs/<env>/logs/...)
  • If this was a transient test, destroy the modules (or rebuild environment).

After successful bootstrap, NetBox can become authoritative for IP allocation:

  • Example blueprints that can consume NetBox IPAM:
  • onprem/rke2@v1
  • onprem/postgresql-ha@v1
  • onprem/netbox-ha-cutover@v1 (after PostgreSQL HA is ready, migrates NetBox DB contract from pgcore to HA)
  • onprem/eve-ng@v1

References