Skip to content

Operate Proxmox SDN (core/onprem/network-sdn)

Operational procedures for managing the Proxmox SDN and DHCP configuration produced by:

  • module: core/onprem/network-sdn
  • driver: Terragrunt (internal), executed via hyops for evidence-first runs
  • Terraform module: hybridops-studio/sdn/proxmox (pinned in packs)

Conventions:

  • <PROXMOX_HOST> is the Proxmox management endpoint (IP or DNS name).
  • Prefer hyops commands over raw terragrunt to keep state/evidence consistent.
  • SDN zone_name is env-scoped to avoid collisions when using --env <env>.

1. Deployment

1.1 Initial deployment

# Optional: validate readiness and inputs before mutating the target
hyops preflight --env <env> --target proxmox
hyops preflight --env <env> --strict --module core/onprem/network-sdn --inputs <inputs.yml>

# Review changes (recommended)
hyops plan --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>

# Apply configuration
hyops apply --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>

This provisions (defaults; adjust for your inputs):

  • SDN zone derived from inputs.zone_name (default base: hybzone, env-scoped in Proxmox when --env is set).
  • VNets and subnets (default: vnetmgmt, vnetobs, vnetdata, vnetlab).
  • Optional host-level L3 gateways, NAT, and dnsmasq-based DHCP, depending on enable_host_l3, enable_snat, and enable_dhcp in module inputs.

1.2 Update configuration

Typical updates:

  • Add or adjust VNets/subnets.
  • Change DHCP ranges or DNS domain.
  • Enable/disable host L3, NAT, or DHCP.

Edit your inputs overlay file (passed via --inputs), then:

hyops plan --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>
hyops apply --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>

HybridOps will converge SDN objects and host configuration in place.

1.2.1 Force host-side reconcile (same-input drift recovery)

If host-side SDN state drifts (for example a vnet* interface exists but the expected gateway IP is missing) and your topology inputs are unchanged, use the explicit reconcile token instead of changing unrelated inputs:

HYOPS_INPUT_host_reconcile_nonce="$(date -u +%Y%m%dT%H%M%SZ)" \
hyops apply --env shared --module core/onprem/network-sdn --inputs <inputs.yml>

This forces the host-side gateway/NAT/DHCP setup scripts to re-run while keeping the SDN topology inputs unchanged.

1.3 Destroy (lab only)

hyops destroy --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>

Warning: This removes SDN VNets, subnets, and host DHCP/NAT configuration. Only use for lab tear-down or controlled rebuilds.


2. Validation

2.1 Quick health check

# Inspect latest module outputs for this env
cat ~/.hybridops/envs/<env>/state/modules/core__onprem__network-sdn/latest.json

# Check SDN zones on Proxmox
ssh root@<PROXMOX_HOST> 'pvesh get /cluster/sdn/zones'

# Check VNet bridges (defaults; adjust for your vnet names)
ssh root@<PROXMOX_HOST> 'ip link show | grep -E "vnet(mgmt|obs|data|lab)"'

# Check gateways and routes
ssh root@<PROXMOX_HOST> 'ip -4 addr show | grep "inet 10\."'
ssh root@<PROXMOX_HOST> 'ip route | grep 10\.'

Expected:

  • Zone present and active (note: zone ID may be env-scoped, for example dhybzone).
  • Bridges for configured VNets (default: vnetmgmt, vnetobs, vnetdata, vnetlab).
  • Gateways for configured subnets (default: 10.10.0.1, 10.11.0.1, 10.12.0.1, 10.50.0.1) when host L3 is enabled.

2.2 DHCP validation (when enabled)

# List per-VNet DHCP units
ssh root@<PROXMOX_HOST> 'systemctl list-units "dnsmasq@hybridops-sdn-dhcp-*"' 

# Check DHCP listeners on port 67
ssh root@<PROXMOX_HOST> 'ss -ulpn | grep ":67" || true'

# Inspect generated per-VNet config
ssh root@<PROXMOX_HOST> 'ls -1 /etc/dnsmasq.d/dhcp-hybridops-sdn-dhcp-*.conf || true'

For each DHCP-enabled subnet you should see:

  • A corresponding dnsmasq@hybridops-sdn-dhcp-*.service unit.
  • A matching dhcp-hybridops-sdn-dhcp-*.conf file.

2.3 Test from a VM

On a VM attached to vnetmgmt:

# 1. Gateway reachable
ping 10.10.0.1

# 2. Internet reachable (NAT working)
ping 8.8.8.8

# 3. DNS resolution
nslookup google.com || dig google.com

# 4. DHCP lease obtained
ip addr show | grep "inet 10.10."

If DHCP is disabled but L3+NAT are enabled, you should still see:

  • Default route via 10.10.0.1 (when configured statically).
  • Successful ping to 8.8.8.8.

3. DHCP and L3/NAT behaviour

The SDN module treats L3, NAT, and DHCP as separate host-side concerns:

  • enable_host_l3 controls whether gateways (.1 per subnet) are configured.
  • enable_snat controls NAT out via the uplink (typically vmbr0).
  • enable_dhcp controls creation of per-VNet dnsmasq units.

3.1 DHCP behaviour at a glance

Per-subnet behaviour is driven by the combination of:

  • Global flags: enable_host_l3, enable_dhcp.
  • Subnet fields: dhcp_range_start, dhcp_range_end, optional dhcp_enabled.
enable_host_l3 enable_dhcp Subnet flags / ranges Result
false false anything Pure L2 SDN only. No host gateways, no NAT, no DHCP.
true false ranges optional Host has .1 gateway per subnet, optional SNAT, no DHCP.
true true dhcp_range_start + dhcp_range_end, dhcp_enabled omitted DHCP enabled for that subnet (implicit, “ranges = on”).
true true ranges set, dhcp_enabled = true DHCP enabled for that subnet (explicit).
true true ranges set, dhcp_enabled = false DHCP disabled – ranges treated as documentation only.

Guardrails enforced by the module:

  • enable_dhcp = true requires enable_host_l3 = true so dnsmasq can bind to VNet interfaces.
  • If dhcp_enabled = true, both dhcp_range_start and dhcp_range_end must be set.

4. Routine DHCP management

4.1 Service-level view

# List all HybridOps SDN DHCP units
ssh root@<PROXMOX_HOST> 'systemctl list-units "dnsmasq@hybridops-sdn-dhcp-*"' 

# Focus on a single VNet (replace <ZONE_NAME> with the effective zone ID, for example dhybzone)
ssh root@<PROXMOX_HOST> 'systemctl status dnsmasq@hybridops-sdn-dhcp-vnetmgmt-<ZONE_NAME>-10-10-0-0-24.service'

4.2 Restart DHCP for all SDN VNets

ssh root@<PROXMOX_HOST> '
  systemctl list-unit-files "dnsmasq@hybridops-sdn-dhcp-*" --no-legend     | awk "{print \$1}"     | while read -r unit; do
        [ -n "$unit" ] || continue
        systemctl restart "$unit" || true
      done
'

4.3 Inspect leases (pattern)

Lease files are per-instance. A typical location pattern:

ssh root@<PROXMOX_HOST> '
  ls -1 /var/lib/misc/dnsmasq.hybridops-sdn-dhcp-*leases 2>/dev/null || true
'

If needed, document exact lease file names once the first leases have been issued on your node.


5. Troubleshooting

5.1 DHCP unit will not start

Symptoms

  • systemctl status dnsmasq@hybridops-sdn-dhcp-… shows failed.
  • Logs report unknown interface vnet… or FAILED to start up.
  • GUI shows DHCP status as failed for the SDN zone.

Diagnosis

ssh root@<PROXMOX_HOST> '
  systemctl status "dnsmasq@hybridops-sdn-dhcp-*" --no-pager
  journalctl -u "dnsmasq@hybridops-sdn-dhcp-*" -n 50 --no-pager
  ip link show | grep -E "vnet(mgmt|obs|data|lab)" || true
'

Common causes

  • pvesh set /cluster/sdn was run while VNets were still converging.
  • VNet interfaces were removed or renamed.
  • Another DHCP server is already bound to port 67.

Fix (pattern)

  1. Confirm VNets exist:
ssh root@<PROXMOX_HOST> '
  pvesh get /cluster/sdn/zones
  pvesh get /cluster/sdn/vnets
'
  1. Re-apply the stack so gateways and DHCP units are recreated:
hyops apply --env <env> --module core/onprem/network-sdn --inputs <inputs.yml>
  1. If a conflicting DHCP service is present, disable it:
ssh root@<PROXMOX_HOST> '
  ss -ulpn | grep ":67" || true
  systemctl stop isc-dhcp-server 2>/dev/null || true
  systemctl disable isc-dhcp-server 2>/dev/null || true
'

5.2 VNet bridges missing or stale

Symptoms

  • ip link show does not list expected vnet* interfaces.
  • Proxmox SDN UI shows VNets in error or deleted state.
  • After hyops destroy, kernel interfaces persist.

Diagnosis

ssh root@<PROXMOX_HOST> '
  pvesh get /cluster/sdn
  ip link show | grep vnet || true
'

Fix (lab-safe pattern)

ssh root@<PROXMOX_HOST> '
  for v in vnetmgmt vnetobs vnetdata vnetlab; do
    ip link set "$v" down 2>/dev/null || true
    ip link delete "$v" 2>/dev/null || true
  done

  ifreload -a || true
  pvesh set /cluster/sdn || true
'

Use with care if the node runs other SDN zones or VNets with different naming conventions.


5.3 GUI mismatch vs actual state

On some Proxmox VE builds, the SDN GUI can show stale information about gateways or DHCP status.

  • Prefer CLI checks (pvesh, ip addr, iptables -t nat -S, systemctl, ss).
  • Treat the GUI as advisory rather than authoritative.

6. Operational recommendations

  • Prefer hyops plan/apply–driven changes to SDN over manual edits.
  • Reserve hyops destroy for full lab tear-downs.
  • Keep DHCP optional in environments where another IPAM/DHCP solution is used.
  • Capture CLI output from validation and troubleshooting commands as evidence for observability and DR runbooks.