Skip to content

Linux Edge WAN with strongSwan and FRR for Hybrid Cloud Connectivity

Status

Accepted — Linux-based WAN edge using strongSwan (swanctl) for route-based IPsec and FRR for BGP adopted as the standard pattern for site-to-cloud connectivity.


1. Context

HybridOps.Studio requires reliable, cost-effective connectivity between on-premises/colocation sites and cloud providers (GCP, Azure). Commercial SD-WAN appliances add licensing cost and vendor lock-in unsuitable for the platform scale.

Requirements:

  • Route-based IPsec compatible with cloud-native VPN gateways (GCP HA VPN, Azure VPN Gateway)
  • Dynamic routing via BGP for automatic failover and prefix exchange
  • Narrow traffic selectors to protect management traffic on shared hosts
  • Deterministic configuration via Ansible for repeatability and evidence

Constraints:

  • Must run on standard Linux (Debian/Ubuntu) without proprietary software
  • Must support dual-tunnel HA patterns matching cloud VPN gateway designs
  • Configuration must be auditable and version-controlled

2. Decision

Adopt strongSwan with swanctl configuration and FRR for BGP as the standard WAN edge stack:

  • IPsec: strongSwan swanctl with route-based VTI interfaces
  • Routing: FRR BGPd with strict prefix-list filtering
  • Tunnels: Dual VTI interfaces per site for HA (matches GCP HA VPN / Azure active-active)
  • Traffic selectors: Narrow selectors based on advertised/imported prefixes
  • Automation: Ansible roles wan_edge (configuration) and wan_validate (verification)

3. Rationale

strongSwan + swanctl over other IPsec implementations:

  • Native Linux, widely deployed, active maintenance
  • swanctl provides declarative configuration (vs legacy ipsec.conf)
  • VTI support for route-based tunnels required by cloud providers
  • Mark-based SA selection avoids policy conflicts with management traffic

FRR over other routing daemons:

  • Industry-standard BGP implementation
  • Integrated vtysh for operational familiarity
  • Prefix-list and route-map support for policy control
  • Active community, Debian/Ubuntu packages available

Dual-tunnel HA pattern:

  • Matches GCP HA VPN interface model (two tunnels, two inside /30s)
  • Provides redundancy without complex failover scripts
  • BGP handles path selection automatically

Narrow traffic selectors:

  • Prevents IPsec policies from capturing management (SSH) traffic
  • Allows shared-host deployments where tunnel and management coexist
  • Matches effective behavior in production (only routed prefixes traverse tunnel)

4. Consequences

4.1 Positive consequences

  • Zero licensing cost for WAN edge functionality
  • Consistent configuration across sites via Ansible
  • Cloud-provider agnostic (same pattern for GCP, Azure, AWS)
  • Full observability via standard Linux tools (ip xfrm, vtysh, journalctl)
  • Testable locally with WAN simulator before production deployment

4.2 Negative consequences / risks

  • Requires Linux networking expertise for troubleshooting
  • No vendor support; community and internal knowledge required
  • BGP misconfiguration can cause routing loops or blackholes
  • IPsec rekeying during high traffic may cause brief packet loss

Mitigations:

  • wan_validate role provides automated health checks
  • Strict prefix-lists prevent route leaks
  • Smoke tests validate configuration before production apply

5. Alternatives considered

Commercial SD-WAN (Cisco, Fortinet, Palo Alto)

  • Rejected: Licensing cost prohibitive for platform scale
  • Rejected: Vendor lock-in conflicts with multi-cloud strategy

WireGuard

  • Rejected: No native BGP integration
  • Rejected: Not supported by GCP/Azure VPN gateways for site-to-cloud

OpenVPN

  • Rejected: TLS-based, not compatible with cloud IPsec gateways
  • Rejected: Performance inferior to kernel IPsec

LibreSwan

  • Considered: Similar capability to strongSwan
  • Rejected: Less active development, smaller community

6. Implementation notes

Ansible roles:

  • hybridops.network.wan_edge — strongSwan, VTI, FRR configuration
  • hybridops.network.wan_validate — IPsec, BGP, route, reachability checks

Key files:

  • roles/wan_edge/templates/swanctl.conf.j2 — IPsec configuration
  • roles/wan_edge/templates/frr.conf.j2 — BGP configuration
  • roles/wan_edge/defaults/main.yml — tunable defaults

Configuration flow:

  1. Packages installed (strongswan-swanctl, frr)
  2. VTI interfaces created with marks matching IPsec SA
  3. swanctl.conf rendered with narrow traffic selectors
  4. frr.conf rendered with prefix-lists and peer-group
  5. Services enabled, handlers restart on config change
  6. CHILD_SAs verified installed before completion

7. Operational impact and validation

Validation role checks:

  • CHILD_SA count matches tunnel count
  • No SAs in transient state (REKEYING, DELETING)
  • BGP neighbors established (not Active/Idle)
  • Accepted prefix count >= 1 per neighbor
  • Expected routes present in BGP table
  • End-to-end ping to remote loopbacks

Smoke test:

  • Local WAN simulator with two VMs
  • Exercises full IPsec + BGP + routing chain
  • Run via make test.local ROLE=wan_edge

Production monitoring:

  • swanctl --list-sas for IPsec state
  • vtysh -c "show bgp summary" for BGP state
  • Prometheus exporters available for both (future enhancement)

8. References

Related ADRs:

External:


Maintainer: HybridOps.Studio
License: MIT-0 for code, CC-BY-4.0 for documentation