Skip to content

Operate Proxmox SDN (core/onprem/network-sdn)

Demonstration

Recorded walkthrough of a full SDN deploy and validation sequence using this runbook.


Reference Scenario

The Reusable Proxmox SDN Foundation reference scenario shows a live SDN deployment result with structured run records and validation output.


Operational procedures for managing the Proxmox SDN and DHCP configuration produced by:

  • module: core/onprem/network-sdn
  • driver: Terragrunt (internal), executed via hyops for structured run records
  • Terraform module: hybridops-tech/sdn/proxmox (pinned in packs)

Conventions:

  • <PROXMOX_HOST> is the Proxmox management endpoint (IP or DNS name).
  • Prefer hyops commands over raw terragrunt to keep state and run records consistent.
  • SDN zone_name is env-scoped to avoid collisions when using --env <env>.

Recommended operating modes:

  • host-routed mode
  • enable_host_l3 = true
  • enable_snat = true
  • enable_dhcp = true
  • Use for bootstrap foundations, academy, labs, and small-site deployments.
  • edge-routed mode
  • enable_host_l3 = false
  • enable_snat = false
  • enable_dhcp = false
  • Use when VyOS or another edge tier owns north-south routing and egress.

Brownfield caution:

  • Do not point the module at manually created SDN objects and assume it will safely “adopt” them.
  • Safe brownfield options are:
  • create a new module-owned zone/VNet set, or
  • import/cut over existing SDN objects into Terraform state deliberately.
  • destroy is safe for the zone managed in module state; it is not a safe takeover mechanism for hand-built SDN.

1. Deployment

1.1 Initial deployment

# Optional: validate readiness and inputs before mutating the target
hyops preflight --env <env> --target proxmox
hyops preflight --env <env> --strict --module core/onprem/network-sdn --inputs <inputs.yml>

# Review changes (recommended)
hyops plan --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>

# Apply configuration
hyops apply --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>

This provisions (defaults; adjust for your inputs):

  • SDN zone derived from inputs.zone_name (default base: hybzone, env-scoped in Proxmox when --env is set).
  • VNets and subnets (default: vnetmgmt, vnetobs, vnetdata, vnetlab).
  • Optional host-level L3 gateways, NAT, and dnsmasq-based DHCP, depending on enable_host_l3, enable_snat, and enable_dhcp in module inputs.

1.2 Update configuration

Typical updates:

  • Add or adjust VNets/subnets.
  • Change DHCP ranges or DNS domain.
  • Enable/disable host L3, NAT, or DHCP.

Edit your inputs overlay file (passed via --inputs), then:

hyops plan --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>
hyops apply --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>

HybridOps will converge SDN objects and host configuration in place.

1.2.1 Force host-side reconcile (same-input drift recovery)

If host-side SDN state drifts (for example a vnet* interface exists but the expected gateway IP is missing) and your topology inputs are unchanged, use the explicit reconcile token instead of changing unrelated inputs:

HYOPS_INPUT_host_reconcile_nonce="$(date -u +%Y%m%dT%H%M%SZ)" \
hyops apply --env shared --module core/onprem/network-sdn --inputs <inputs.yml>

This forces the host-side gateway/NAT/DHCP setup scripts to re-run while keeping the SDN topology inputs unchanged.

1.3 Destroy (lab only)

hyops destroy --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>

Warning: This removes SDN VNets, subnets, and host DHCP/NAT configuration. Only use for lab tear-down or controlled rebuilds.

Destroy scope is zone-scoped, not a blanket Proxmox network wipe. The module removes:

  • the SDN zone/VNet/subnet objects it manages
  • gateway addresses derived from that zone's gateway state
  • NAT rules tagged for that zone
  • DHCP units/config for that zone

It does not intentionally remove unrelated bridges, unrelated SDN zones, or NAT/DHCP state outside the managed zone.


2. Validation

2.1 Quick health check

# Inspect latest module outputs for this env
cat ~/.hybridops/envs/<env>/state/modules/core__onprem__network-sdn/latest.json

# Check SDN zones on Proxmox
ssh root@<PROXMOX_HOST> 'pvesh get /cluster/sdn/zones'

# Check VNet bridges (defaults; adjust for your vnet names)
ssh root@<PROXMOX_HOST> 'ip link show | grep -E "vnet(mgmt|obs|data|lab)"'

# Check gateways and routes
ssh root@<PROXMOX_HOST> 'ip -4 addr show | grep "inet 10\."'
ssh root@<PROXMOX_HOST> 'ip route | grep 10\.'

Expected:

  • Zone present and active (note: zone ID may be env-scoped, for example dhybzone).
  • Bridges for configured VNets (default: vnetmgmt, vnetobs, vnetdata, vnetlab).
  • Gateways for configured subnets (default: 10.10.0.1, 10.11.0.1, 10.12.0.1, 10.50.0.1) when host L3 is enabled.

2.2 DHCP validation (when enabled)

# List per-VNet DHCP units
ssh root@<PROXMOX_HOST> 'systemctl list-units "dnsmasq@hybridops-sdn-dhcp-*"'

# Check DHCP listeners on port 67
ssh root@<PROXMOX_HOST> 'ss -ulpn | grep ":67" || true'

# Inspect generated per-VNet config
ssh root@<PROXMOX_HOST> 'ls -1 /etc/dnsmasq.d/dhcp-hybridops-sdn-dhcp-*.conf || true'

For each DHCP-enabled subnet you should see:

  • A corresponding dnsmasq@hybridops-sdn-dhcp-*.service unit.
  • A matching dhcp-hybridops-sdn-dhcp-*.conf file.

2.3 Test from a VM

On a VM attached to vnetmgmt:

# 1. Gateway reachable
ping 10.10.0.1

# 2. Internet reachable (NAT working)
ping 8.8.8.8

# 3. DNS resolution
nslookup google.com || dig google.com

# 4. DHCP lease obtained
ip addr show | grep "inet 10.10."

If DHCP is disabled but L3+NAT are enabled, you should still see:

  • Default route via 10.10.0.1 (when configured statically).
  • Successful ping to 8.8.8.8.

2.4 Current shared-lane verification

Use these checks when you need current proof of the shared SDN baseline rather than just module deployment history:

hyops show module core/onprem/network-sdn --env shared
ssh root@10.10.0.1 'hostname && pvesh get /cluster/sdn/zones --output-format json'
ssh root@10.10.0.1 'ip -brief link | grep -E "vnet(data|dev|ddev|mgmt)"'

Expected:

  • the shared SDN module is status=ok
  • the Proxmox SDN zone shybzone is present
  • the expected vnet* interfaces are UP

3. DHCP and L3/NAT behaviour

The SDN module treats L3, NAT, and DHCP as separate host-side concerns:

  • enable_host_l3 controls whether gateways (.1 per subnet) are configured.
  • enable_snat controls NAT out via the uplink (typically vmbr0).
  • enable_dhcp controls creation of per-VNet dnsmasq units.

3.1 DHCP behaviour at a glance

Per-subnet behaviour is driven by the combination of:

  • Global flags: enable_host_l3, enable_dhcp.
  • Subnet fields: dhcp_range_start, dhcp_range_end, optional dhcp_enabled.
enable_host_l3 enable_dhcp Subnet flags / ranges Result
false false anything Pure L2 SDN only. No host gateways, no NAT, no DHCP.
true false ranges optional Host has .1 gateway per subnet, optional SNAT, no DHCP.
true true dhcp_range_start + dhcp_range_end, dhcp_enabled omitted DHCP enabled for that subnet (implicit, “ranges = on”).
true true ranges set, dhcp_enabled = true DHCP enabled for that subnet (explicit).
true true ranges set, dhcp_enabled = false DHCP disabled – ranges treated as documentation only.

Guardrails enforced by the module:

  • enable_dhcp = true requires enable_host_l3 = true so dnsmasq can bind to VNet interfaces.
  • If dhcp_enabled = true, both dhcp_range_start and dhcp_range_end must be set.

4. Routine DHCP management

4.1 Service-level view

# List all HybridOps SDN DHCP units
ssh root@<PROXMOX_HOST> 'systemctl list-units "dnsmasq@hybridops-sdn-dhcp-*"'

# Focus on a single VNet (replace <ZONE_NAME> with the effective zone ID, for example dhybzone)
ssh root@<PROXMOX_HOST> 'systemctl status dnsmasq@hybridops-sdn-dhcp-vnetmgmt-<ZONE_NAME>-10-10-0-0-24.service'

4.2 Restart DHCP for all SDN VNets

ssh root@<PROXMOX_HOST> '
  systemctl list-unit-files "dnsmasq@hybridops-sdn-dhcp-*" --no-legend     | awk "{print \$1}"     | while read -r unit; do
        [ -n "$unit" ] || continue
        systemctl restart "$unit" || true
      done
'

4.3 Inspect leases (pattern)

Lease files are per-instance. A typical location pattern:

ssh root@<PROXMOX_HOST> '
  ls -1 /var/lib/misc/dnsmasq.hybridops-sdn-dhcp-*leases 2>/dev/null || true
'

If needed, document exact lease file names once the first leases have been issued on your node.


5. Troubleshooting

5.1 DHCP unit will not start

Symptoms

  • systemctl status dnsmasq@hybridops-sdn-dhcp-… shows failed.
  • Logs report unknown interface vnet… or FAILED to start up.
  • GUI shows DHCP status as failed for the SDN zone.

Diagnosis

ssh root@<PROXMOX_HOST> '
  systemctl status "dnsmasq@hybridops-sdn-dhcp-*" --no-pager
  journalctl -u "dnsmasq@hybridops-sdn-dhcp-*" -n 50 --no-pager
  ip link show | grep -E "vnet(mgmt|obs|data|lab)" || true
'

5.2 Proxmox GUI shows SDN error even though traffic works

Current expected fix path:

  • use hybridops-tech/sdn/proxmox v0.1.5 or newer
  • rerun core/onprem/network-sdn

What changed in v0.1.5:

  • the host-side status helper now normalises generated vnet* stanzas in /etc/network/interfaces.d/sdn to inet static with the derived gateway address
  • Proxmox SDN status then agrees with the actual host gateway state instead of showing a false red error

This is non-destructive. It does not change the declared topology; it only keeps the generated host interface file and the live gateway state in agreement.

Common causes

  • pvesh set /cluster/sdn was run while VNets were still converging.
  • VNet interfaces were removed or renamed.
  • Another DHCP server is already bound to port 67.

Fix (pattern)

  1. Confirm VNets exist:

    ssh root@ ' pvesh get /cluster/sdn/zones pvesh get /cluster/sdn/vnets '

  2. Re-apply the stack so gateways and DHCP units are recreated:

    hyops apply --env --module core/onprem/network-sdn --inputs

  3. If a conflicting DHCP service is present, disable it:

    ssh root@ ' ss -ulpn | grep ":67" || true systemctl stop isc-dhcp-server 2>/dev/null || true systemctl disable isc-dhcp-server 2>/dev/null || true '


5.2 VNet bridges missing or stale

Symptoms

  • ip link show does not list expected vnet* interfaces.
  • Proxmox SDN UI shows VNets in error or deleted state.
  • After hyops destroy, kernel interfaces persist.

Diagnosis

ssh root@<PROXMOX_HOST> '
  pvesh get /cluster/sdn
  ip link show | grep vnet || true
'

Fix (lab-safe pattern)

ssh root@<PROXMOX_HOST> '
  for v in vnetmgmt vnetobs vnetdata vnetlab; do
    ip link set "$v" down 2>/dev/null || true
    ip link delete "$v" 2>/dev/null || true
  done

  ifreload -a || true
  pvesh set /cluster/sdn || true
'

Use with care if the node runs other SDN zones or VNets with different naming conventions.


5.3 GUI mismatch vs actual state

On some Proxmox VE builds, the SDN GUI can show stale information about gateways or DHCP status.

  • Prefer CLI checks (pvesh, ip addr, iptables -t nat -S, systemctl, ss).
  • Treat the GUI as advisory rather than authoritative.

6. Operational recommendations

  • Prefer hyops plan/apply–driven changes to SDN over manual edits.
  • Reserve hyops destroy for full lab tear-downs.
  • Keep DHCP optional in environments where another IPAM/DHCP solution is used.
  • Capture CLI output from validation and troubleshooting commands as part of the run record for observability and DR runbooks.

Use the SDN module in one of two intentional ways.

7.1 Host-routed mode

Best for: - bootstrap foundations - academy or lab environments - smaller sites where the Proxmox host can safely provide L3/NAT/DHCP

Typical input posture:

enable_host_l3: true
enable_snat: true
enable_dhcp: true

In this mode the Proxmox node owns: - gateway IPs on vnet* - optional SNAT via the uplink bridge - optional dnsmasq DHCP

7.2 Edge-routed mode

Best for: - production environments - HybridOps WAN/VyOS edge designs - sites where north-south routing should not live on the hypervisor

Typical input posture:

enable_host_l3: false
enable_snat: false
enable_dhcp: false

In this mode Proxmox SDN is kept to segmentation and bridge/VLAN orchestration, while routing and DHCP are delegated to the edge/network layer.

Preferred production guidance: - let the edge tier own egress and north-south routing - use Proxmox SDN mainly for segmentation unless you explicitly want host-routed subnets

8. Brownfield adoption

If the site already has manually created Proxmox SDN objects:

  • do not simply point the module at the same names and assume destroy will sort it out
  • either create a new module-owned zone, or import the existing SDN objects into state as part of a planned cutover

Destroy is scoped to the zone managed by the module state, but that still means it will remove the SDN objects and host-side services for that module-owned zone.

9. Destroy semantics

hyops destroy --module core/onprem/network-sdn removes only the SDN zone and host-side services owned by that module instance.

It does remove, for that zone: - module-managed SDN zone/VNet/subnet objects - gateway IPs for that zone - NAT rules tagged for that zone - dnsmasq DHCP units/configs for that zone

It does not intentionally remove: - unrelated Proxmox bridges - unrelated SDN zones - NAT rules without the module's zone tags - DHCP services/configs for other zones


Further reading