Skip to content

Internal DNS Authority and Service Endpoint Model

Purpose

Define the clean DNS model for HybridOps so workloads and operators do not depend on raw service IPs during normal operation, failover, or failback.

This standard complements:

1. Required split

HybridOps MUST separate:

  1. DNS authority
  2. the system that hosts authoritative zones and answers for service records
  3. DNS cutover automation
  4. the system that updates records during failover, failback, or traffic shift
  5. Source-of-truth metadata
  6. the system that records IPAM, ownership, and desired endpoint data

These are related but not the same component.

2. Default authority model

HybridOps SHOULD use:

  • Cloudflare DNS for public internet zones
  • examples:
    • hybridops.tech
    • docs.hybridops.tech
    • learn.hybridops.tech
  • shared internal authoritative DNS for private platform and workload zones
  • recommended first implementation: PowerDNS Authoritative

Reason:

  • Cloudflare is well-suited for public zones and public web properties
  • internal DR and service discovery should remain provider-neutral
  • internal DNS must work across on-prem, GCP, Azure, and AWS
  • the first internal authority should not force the product into one cloud's DNS service

The recommended first internal DNS authority for HybridOps is:

  • PowerDNS Authoritative
  • hosted in the shared control plane
  • managed through its API

Recommended first topology:

  • primary writable PowerDNS on a dedicated shared control-plane host in Hetzner
  • secondary read-only PowerDNS on-prem via AXFR/IXFR
  • optional additional secondary in a cloud control plane later

This keeps internal DNS:

  • outside the primary on-prem failure domain
  • close to the shared control plane and runner plane
  • available locally on-prem for steady-state reads even when the shared link is degraded

This is preferred over:

  • NetBox as the DNS engine
  • GCP Cloud DNS as the primary internal authority
  • Azure DNS as the primary internal authority
  • Route 53 as the primary internal authority

Those remain valid adapters or forwarders later, but not the first shared authority.

4. NetBox boundary

NetBox SHOULD remain:

  • IPAM and infrastructure source of truth
  • endpoint metadata source
  • ownership and inventory source

NetBox SHOULD NOT be treated as the first authoritative DNS engine for HybridOps DR.

If richer DNS modeling is desired later, it MAY be added as metadata support, but HybridOps should still keep authoritative internal DNS and DNS cutover as separate concerns.

5. Service endpoint rule

Applications and platform services MUST consume stable DNS names, not raw service IPs.

Examples:

  • postgres.dev.hyops.internal
  • netbox.shared.hyops.internal
  • argocd.dev.hyops.internal
  • thanos.shared.hyops.internal

That means:

  • normal operation points the name at the primary endpoint
  • failover points the same name at the DR endpoint
  • failback points the same name back to the restored primary site

6. Zone model

HybridOps SHOULD standardize on:

  • one shared internal zone for private service endpoints
  • one clear naming scheme for env-scoped records

Reference pattern:

  • zone: hyops.internal
  • env-scoped records:
  • postgres.dev.hyops.internal
  • postgres.staging.hyops.internal
  • postgres.prod.hyops.internal
  • shared service records:
  • netbox.shared.hyops.internal
  • vault.shared.hyops.internal

7. Automation rule

HybridOps DNS automation SHOULD operate in two layers:

  1. service modules and DR blueprints publish the desired endpoint contract
  2. DNS automation updates the authoritative DNS service accordingly

The existing dns-routing capability belongs to layer 2.

It is cutover/update automation, not the DNS authority itself.

8. Forwarding and resolution

Cloud and on-prem environments MAY use provider-native forwarders or conditional resolvers to reach the shared internal authority.

Examples:

  • on-prem recursive resolvers forwarding hyops.internal
  • GCP forwarding zones forwarding hyops.internal
  • Azure/AWS equivalents later

The authority remains shared; forwarding is per-environment resolution plumbing.

Recommended first resolution posture:

  • on-prem recursive resolvers forward hyops.internal to the shared PowerDNS primary/secondary set
  • GCP and later Azure/AWS use forwarding or conditional resolver rules for hyops.internal
  • applications never query the authoritative servers directly

9. Product rules

HybridOps MUST follow these rules:

  1. Public web DNS and internal platform DNS MUST remain separate.
  2. DR-critical service endpoints MUST use DNS names, not raw node IPs.
  3. Internal DNS authority MUST live outside the primary on-prem failure domain.
  4. DNS cutover automation MUST NOT be confused with DNS hosting.
  5. The tarball MUST ship contracts and automation hooks, but MUST NOT depend on a public website as the runtime DNS source of truth.

For the current HybridOps product stage:

  1. keep Cloudflare for public zones
  2. stand up shared PowerDNS primary on a dedicated Hetzner shared control-plane host
  3. add an on-prem PowerDNS secondary
  4. standardize hyops.internal naming
  5. let DR blueprints publish endpoint DNS names
  6. let DNS cutover automation update the shared internal authority

This is the cleanest path that remains tarball-safe, cross-cloud, and supportable.

The first executable control path for step 5 is now:

  • platform/network/dns-routing
  • with provider: powerdns-api
  • using the shared internal PowerDNS authority as the authoritative target
  • consuming the shared PowerDNS authority state by default, with explicit endpoint overrides reserved for break-glass cases