Skip to content

HybridOps.Studio treats Terraform and Terragrunt as first-class automation layers for a governed platform, not as ad-hoc scripts. Stack naming and operator ergonomics must reflect that standard.

Normalize Terragrunt live stacks via generated alias tree

Status

Accepted — infra/terraform/live-v1/stacks is the canonical alias tree for Terragrunt stacks, generated by a script that normalises live paths into stable stack names.


1. Context

HybridOps.Studio uses Terraform and Terragrunt to manage a growing set of infrastructure stacks across:

  • On-prem (Proxmox and related services).
  • Cloud platforms (Azure, GCP).
  • Multiple layers (foundation, platform, workloads).
  • Multiple environments (for example dev, staging, prod).

The infra/terraform/live-v1/ layout reflects this structure with nested directories such as:

  • cloud/azure/environments/dev/00-foundation/...
  • cloud/gcp/org/00-project-factory/...
  • onprem/proxmox/dev/10-platform/...

Terragrunt stacks are addressed by path. Without an alias layer, operators would have to call targets such as:

  • make plan STACK=cloud/azure/environments/dev/10-platform/core/ctrl-01
  • make apply STACK=onprem/proxmox/dev/20-workloads/k8s-nodes

This creates several problems:

  • Poor ergonomics — paths are long, error-prone to type, and easy to mis-copy.
  • Brittle contracts — any change to directory layout breaks CLI habits, runbooks, and CI pipelines.
  • Difficult documentation — examples must either embed long paths or explain mapping rules in prose.
  • Cross-platform inconsistency — on-prem and cloud stacks cannot easily share a common naming pattern.

The control/tools/provision/terragrunt/gen-stacks-symlinks.sh script (name may evolve) currently uses a heuristic path_to_name function to generate short names such as:

  • cloud/azure/environments/dev/10-platform/core/ctrl-01az-dev-platform-core-ctrl-01
  • onprem/proxmox/dev/20-workloads/k8s-nodespve-dev-workloads-k8s

These names are then exposed as symlinks under infra/terraform/live-v1/stacks/, and referenced by Makefile targets and runbooks (STACK=<name>).

This ADR records the decision to treat that alias tree as canonical for operator-facing stack selection, rather than as an incidental helper.


2. Decision

HybridOps.Studio standardises on a generated alias tree for Terragrunt stacks:

  1. Canonical alias directory

  2. infra/terraform/live-v1/stacks/ is the only operator-facing entrypoint for Terragrunt stacks.

  3. Each alias under stacks/ is a symlink pointing to a live stack directory that contains a terragrunt.hcl.

  4. Path normalisation

  5. Alias names are generated from live paths using a deterministic normalisation function that:

    • Strips high-level prefixes such as cloud/, onprem/, and environments/.
    • Collapses common provider and platform terms, for example:
    • proxmoxpve
    • vmwarevmw
    • azureaz
    • key-vaultkeyvault
    • resource-grouprg
    • k8s-nodesk8s
    • Converts / and _ to -, collapses duplicate dashes, and trims trailing separators.
  6. The resulting alias is unique within stacks/. If collisions appear in future, they must be resolved by adjusting the mapping or layout, not by bypassing the alias layer.

  7. Script ownership

  8. Alias generation is implemented by a single script under control/tools/provision/terragrunt/, currently named gen-stacks-symlinks.sh.

  9. The script:

    • Scans infra/terraform/live-v1/ for terragrunt.hcl files.
    • Computes alias names via path_to_name.
    • Re-creates the stacks/ directory on each run, ensuring it is derived state.
  10. Integration points

  11. The Terragrunt Makefile and wrapper scripts treat stacks/ as the default resolution mechanism for STACK:

    • make plan STACK=pve-core-vm-linux
    • make apply STACK=az-dev-platform-core-ctrl-01
  12. Runbooks, HOWTOs, and ADRs reference stacks using these aliases, not their raw paths.
  13. CI/CD jobs for Terraform/Terragrunt use aliases when selecting stacks.

  14. Contracts

  15. Any change to the alias mapping function is treated as a breaking change and must be:

    • Captured in this ADR (or a successor).
    • Reflected in runbooks and CI pipelines.
    • Validated against existing aliases to avoid accidental collisions.

3. Rationale

3.1 Operator ergonomics

Operator and learner workflows should prefer short, meaningful stack identifiers over long filesystem paths. Patterns such as:

  • STACK=pve-core-vm-linux
  • STACK=az-dev-foundation-hub

are easier to remember, type, and explain than the underlying Terragrunt paths.

This aligns with HybridOps.Studio’s goals of:

  • Clear, repeatable run commands in documentation.
  • Onboarding-friendly naming for Academy content.
  • Reduced cognitive load when switching between on-prem and cloud stacks.

3.2 Stability vs layout evolution

The live tree must remain free to evolve:

  • New layers (00-foundation, 10-platform, 20-workloads).
  • New environment naming or additional clouds.
  • Organisational changes (e.g. org/ vs environments/ nesting).

By treating stacks/ as a derived alias layer:

  • Internal layout can change, as long as aliases are maintained or migrated deliberately.
  • CI/CD configurations and runbooks can remain stable, referencing only aliases.
  • Future refactors (for example live-v2/) can co-exist while preserving stacks/ contracts.

3.3 Traceability and evidence

Having a single script generate aliases creates a clear trace:

  • The mapping from live paths to stack names is visible and testable.
  • Evidence runs can record both the alias and the underlying path.
  • ADRs and runbooks can show a concise STACK= value plus a pointer to the canonical directory.

This supports HybridOps.Studio’s evidence-first posture: every stack execution can be tied back to a specific alias and underlying path at a given commit.

3.4 Alternatives and future integration

The alias scheme also creates a straightforward bridge to:

  • HCP Terraform workspace naming.
  • Future GitOps controllers that need stable, human-readable identifiers.
  • Cross-referencing between observability dashboards and stack executions.

Rather than each tool inventing its own naming conventions, the alias tree becomes the single source for stack names.


4. Consequences

4.1 Positive consequences

  • Simplified commands
    Operators run make plan STACK=... with short, consistent identifiers.

  • Decoupled layout
    Internal directory changes do not immediately break CLI usage or runbooks, provided aliases remain stable.

  • Consistent cross-domain naming
    On-prem (pve-*) and cloud (az-*, gcp-*) stacks follow a common pattern, improving mental mapping and Academy teaching materials.

  • Better documentation
    Runbooks, HOWTOs, and ADRs can reference stack aliases in prose, keeping examples readable and copy-paste friendly.

  • Derivable state
    stacks/ is generated on demand and can be safely deleted and recreated, which simplifies troubleshooting and onboarding.

4.2 Negative consequences / risks

  • Alias drift
    If the alias script is not run after adding or moving stacks, stacks/ may become stale. This is mitigated by:
  • Wiring alias generation into Make targets (for example make terragrunt.stacks).
  • Running it in CI as part of validation.

  • Collision risk
    The normalisation scheme may generate the same alias for different paths (for example if two stacks differ only by segments that are stripped or compressed). This risk is addressed by:

  • Keeping path_to_name intentionally conservative.
  • Adding lint checks to detect collisions.
  • Adjusting layout or naming when collisions are detected.

  • Implied contract
    Once documented and referenced in ADRs, aliases become part of the platform contract. Renaming an alias requires a migration plan (for example deprecation window and updates in runbooks).

  • Tooling dependency
    Some minimal tooling (shell, find, sed) is now part of the Terragrunt workflow. This is acceptable for the target Linux-based operator environments.


5. Alternatives considered

  1. Use raw Terragrunt paths everywhere

  2. Pros: No additional script or symlink layer; Terragrunt works out of the box.

  3. Cons: CLI commands become long and fragile; documentation and runbooks are harder to read; changing directory layout breaks existing usage.
  4. Outcome: Rejected. The operator experience and documentation clarity are insufficient.

  5. Hard-code a mapping in Makefiles or wrapper scripts

  6. Pros: Central place to define short names.

  7. Cons: Difficult to keep in sync with new stacks; invites duplication across Makefiles and scripts; no obvious way to inspect the current mapping on disk.
  8. Outcome: Rejected. Too manual and error-prone for a growing platform.

  9. Use Terragrunt generate blocks or custom inputs

  10. Pros: Keeps logic closer to Terragrunt configuration.

  11. Cons: Still leaves operators with raw paths; does not address CLI ergonomics directly; more complex to inspect from the filesystem.
  12. Outcome: Rejected for this concern; can still be used alongside aliases for other purposes.

  13. Name HCP Terraform workspaces as the primary ID

  14. Pros: Aligns with remote state backends and policy runs.

  15. Cons: Tightly couples local workflows to HCP Terraform; does not help in offline or local-only runs; introduces a separate indirection layer for operators.
  16. Outcome: Deferred. Workspace naming may align with aliases, but aliases remain the primary operator-facing construct.

6. Implementation notes

  • Script location

  • The alias generation script lives under:

    • control/tools/provision/terragrunt/gen-stacks-symlinks.sh
  • It is invoked by:

    • A Makefile target such as terragrunt.stacks in the Terragrunt tooling Makefile.
    • CI jobs that validate the tree and detect drift.
  • Live tree

  • The script treats infra/terraform/live-v1/ as the source of truth for live stacks.

  • It ignores:

    • .terragrunt-cache/
    • .terraform/
    • The stacks/ directory itself.
  • Alias naming

  • The path_to_name function:

    • Removes known prefixes as described in the Decision section.
    • Applies provider-specific abbreviations.
    • Normalises separators and dashes.
  • Any change to this function must be treated as a deliberate evolution of the contract and validated against existing aliases.

  • Idempotency

  • On each run, the script:

    • Deletes the existing stacks/ directory.
    • Recreates aliases based on the current live tree.
  • This ensures stacks/ is always derived from the current repository state.

  • Testing

  • A simple CI step can:

    • Run the alias generation script.
    • Assert that stacks/ is non-empty.
    • Optionally compare the alias set against a known snapshot for regression detection.

7. Operational impact and validation

  • Operator workflows

  • Runbooks and how-to guides instruct operators to:

    • Use STACK=<alias> when working with Terragrunt via Make targets and wrapper scripts.
    • Avoid referencing long live paths directly in day-to-day commands.
  • CI/CD

  • Pipelines that plan, apply, and destroy stacks reference aliases in their configuration (for example STACK=pve-core-vm-linux), simplifying configuration across environments.

  • Evidence

  • Execution logs and artefacts for Terragrunt runs:

    • Record the alias used.
    • Optionally log the resolved path for traceability.
  • Evidence folders such as output/artifacts/platform/terraform/terragrunt-stacks/ can organise runs by alias.

  • Failure modes

  • If alias generation fails (for example due to permissions or missing tools), CI and local Make targets can:

    • Surface a clear error message.
    • Point operators to the runbook for regenerating aliases.
  • If alias collisions occur, the CI lints can block changes until naming is resolved.

8. References