HybridOps.Studio treats Terraform and Terragrunt as first-class automation layers for a governed platform, not as ad-hoc scripts. Stack naming and operator ergonomics must reflect that standard.
Normalize Terragrunt live stacks via generated alias tree¶
Status¶
Accepted — infra/terraform/live-v1/stacks is the canonical alias tree for Terragrunt stacks, generated by a script that normalises live paths into stable stack names.
1. Context¶
HybridOps.Studio uses Terraform and Terragrunt to manage a growing set of infrastructure stacks across:
- On-prem (Proxmox and related services).
- Cloud platforms (Azure, GCP).
- Multiple layers (foundation, platform, workloads).
- Multiple environments (for example
dev,staging,prod).
The infra/terraform/live-v1/ layout reflects this structure with nested directories such as:
cloud/azure/environments/dev/00-foundation/...cloud/gcp/org/00-project-factory/...onprem/proxmox/dev/10-platform/...
Terragrunt stacks are addressed by path. Without an alias layer, operators would have to call targets such as:
make plan STACK=cloud/azure/environments/dev/10-platform/core/ctrl-01make apply STACK=onprem/proxmox/dev/20-workloads/k8s-nodes
This creates several problems:
- Poor ergonomics — paths are long, error-prone to type, and easy to mis-copy.
- Brittle contracts — any change to directory layout breaks CLI habits, runbooks, and CI pipelines.
- Difficult documentation — examples must either embed long paths or explain mapping rules in prose.
- Cross-platform inconsistency — on-prem and cloud stacks cannot easily share a common naming pattern.
The control/tools/provision/terragrunt/gen-stacks-symlinks.sh script (name may evolve) currently uses a heuristic path_to_name function to generate short names such as:
cloud/azure/environments/dev/10-platform/core/ctrl-01→az-dev-platform-core-ctrl-01onprem/proxmox/dev/20-workloads/k8s-nodes→pve-dev-workloads-k8s
These names are then exposed as symlinks under infra/terraform/live-v1/stacks/, and referenced by Makefile targets and runbooks (STACK=<name>).
This ADR records the decision to treat that alias tree as canonical for operator-facing stack selection, rather than as an incidental helper.
2. Decision¶
HybridOps.Studio standardises on a generated alias tree for Terragrunt stacks:
-
Canonical alias directory
-
infra/terraform/live-v1/stacks/is the only operator-facing entrypoint for Terragrunt stacks. -
Each alias under
stacks/is a symlink pointing to a live stack directory that contains aterragrunt.hcl. -
Path normalisation
-
Alias names are generated from live paths using a deterministic normalisation function that:
- Strips high-level prefixes such as
cloud/,onprem/, andenvironments/. - Collapses common provider and platform terms, for example:
proxmox→pvevmware→vmwazure→azkey-vault→keyvaultresource-group→rgk8s-nodes→k8s- Converts
/and_to-, collapses duplicate dashes, and trims trailing separators.
- Strips high-level prefixes such as
-
The resulting alias is unique within
stacks/. If collisions appear in future, they must be resolved by adjusting the mapping or layout, not by bypassing the alias layer. -
Script ownership
-
Alias generation is implemented by a single script under
control/tools/provision/terragrunt/, currently namedgen-stacks-symlinks.sh. -
The script:
- Scans
infra/terraform/live-v1/forterragrunt.hclfiles. - Computes alias names via
path_to_name. - Re-creates the
stacks/directory on each run, ensuring it is derived state.
- Scans
-
Integration points
-
The Terragrunt Makefile and wrapper scripts treat
stacks/as the default resolution mechanism forSTACK:make plan STACK=pve-core-vm-linuxmake apply STACK=az-dev-platform-core-ctrl-01
- Runbooks, HOWTOs, and ADRs reference stacks using these aliases, not their raw paths.
-
CI/CD jobs for Terraform/Terragrunt use aliases when selecting stacks.
-
Contracts
-
Any change to the alias mapping function is treated as a breaking change and must be:
- Captured in this ADR (or a successor).
- Reflected in runbooks and CI pipelines.
- Validated against existing aliases to avoid accidental collisions.
3. Rationale¶
3.1 Operator ergonomics¶
Operator and learner workflows should prefer short, meaningful stack identifiers over long filesystem paths. Patterns such as:
STACK=pve-core-vm-linuxSTACK=az-dev-foundation-hub
are easier to remember, type, and explain than the underlying Terragrunt paths.
This aligns with HybridOps.Studio’s goals of:
- Clear, repeatable run commands in documentation.
- Onboarding-friendly naming for Academy content.
- Reduced cognitive load when switching between on-prem and cloud stacks.
3.2 Stability vs layout evolution¶
The live tree must remain free to evolve:
- New layers (
00-foundation,10-platform,20-workloads). - New environment naming or additional clouds.
- Organisational changes (e.g.
org/vsenvironments/nesting).
By treating stacks/ as a derived alias layer:
- Internal layout can change, as long as aliases are maintained or migrated deliberately.
- CI/CD configurations and runbooks can remain stable, referencing only aliases.
- Future refactors (for example
live-v2/) can co-exist while preservingstacks/contracts.
3.3 Traceability and evidence¶
Having a single script generate aliases creates a clear trace:
- The mapping from live paths to stack names is visible and testable.
- Evidence runs can record both the alias and the underlying path.
- ADRs and runbooks can show a concise
STACK=value plus a pointer to the canonical directory.
This supports HybridOps.Studio’s evidence-first posture: every stack execution can be tied back to a specific alias and underlying path at a given commit.
3.4 Alternatives and future integration¶
The alias scheme also creates a straightforward bridge to:
- HCP Terraform workspace naming.
- Future GitOps controllers that need stable, human-readable identifiers.
- Cross-referencing between observability dashboards and stack executions.
Rather than each tool inventing its own naming conventions, the alias tree becomes the single source for stack names.
4. Consequences¶
4.1 Positive consequences¶
-
Simplified commands
Operators runmake plan STACK=...with short, consistent identifiers. -
Decoupled layout
Internal directory changes do not immediately break CLI usage or runbooks, provided aliases remain stable. -
Consistent cross-domain naming
On-prem (pve-*) and cloud (az-*,gcp-*) stacks follow a common pattern, improving mental mapping and Academy teaching materials. -
Better documentation
Runbooks, HOWTOs, and ADRs can reference stack aliases in prose, keeping examples readable and copy-paste friendly. -
Derivable state
stacks/is generated on demand and can be safely deleted and recreated, which simplifies troubleshooting and onboarding.
4.2 Negative consequences / risks¶
- Alias drift
If the alias script is not run after adding or moving stacks,stacks/may become stale. This is mitigated by: - Wiring alias generation into Make targets (for example
make terragrunt.stacks). -
Running it in CI as part of validation.
-
Collision risk
The normalisation scheme may generate the same alias for different paths (for example if two stacks differ only by segments that are stripped or compressed). This risk is addressed by: - Keeping
path_to_nameintentionally conservative. - Adding lint checks to detect collisions.
-
Adjusting layout or naming when collisions are detected.
-
Implied contract
Once documented and referenced in ADRs, aliases become part of the platform contract. Renaming an alias requires a migration plan (for example deprecation window and updates in runbooks). -
Tooling dependency
Some minimal tooling (shell,find,sed) is now part of the Terragrunt workflow. This is acceptable for the target Linux-based operator environments.
5. Alternatives considered¶
-
Use raw Terragrunt paths everywhere
-
Pros: No additional script or symlink layer; Terragrunt works out of the box.
- Cons: CLI commands become long and fragile; documentation and runbooks are harder to read; changing directory layout breaks existing usage.
-
Outcome: Rejected. The operator experience and documentation clarity are insufficient.
-
Hard-code a mapping in Makefiles or wrapper scripts
-
Pros: Central place to define short names.
- Cons: Difficult to keep in sync with new stacks; invites duplication across Makefiles and scripts; no obvious way to inspect the current mapping on disk.
-
Outcome: Rejected. Too manual and error-prone for a growing platform.
-
Use Terragrunt
generateblocks or custom inputs -
Pros: Keeps logic closer to Terragrunt configuration.
- Cons: Still leaves operators with raw paths; does not address CLI ergonomics directly; more complex to inspect from the filesystem.
-
Outcome: Rejected for this concern; can still be used alongside aliases for other purposes.
-
Name HCP Terraform workspaces as the primary ID
-
Pros: Aligns with remote state backends and policy runs.
- Cons: Tightly couples local workflows to HCP Terraform; does not help in offline or local-only runs; introduces a separate indirection layer for operators.
- Outcome: Deferred. Workspace naming may align with aliases, but aliases remain the primary operator-facing construct.
6. Implementation notes¶
-
Script location
-
The alias generation script lives under:
control/tools/provision/terragrunt/gen-stacks-symlinks.sh
-
It is invoked by:
- A Makefile target such as
terragrunt.stacksin the Terragrunt tooling Makefile. - CI jobs that validate the tree and detect drift.
- A Makefile target such as
-
Live tree
-
The script treats
infra/terraform/live-v1/as the source of truth for live stacks. -
It ignores:
.terragrunt-cache/.terraform/- The
stacks/directory itself.
-
Alias naming
-
The
path_to_namefunction:- Removes known prefixes as described in the Decision section.
- Applies provider-specific abbreviations.
- Normalises separators and dashes.
-
Any change to this function must be treated as a deliberate evolution of the contract and validated against existing aliases.
-
Idempotency
-
On each run, the script:
- Deletes the existing
stacks/directory. - Recreates aliases based on the current live tree.
- Deletes the existing
-
This ensures
stacks/is always derived from the current repository state. -
Testing
-
A simple CI step can:
- Run the alias generation script.
- Assert that
stacks/is non-empty. - Optionally compare the alias set against a known snapshot for regression detection.
7. Operational impact and validation¶
-
Operator workflows
-
Runbooks and how-to guides instruct operators to:
- Use
STACK=<alias>when working with Terragrunt via Make targets and wrapper scripts. - Avoid referencing long live paths directly in day-to-day commands.
- Use
-
CI/CD
-
Pipelines that plan, apply, and destroy stacks reference aliases in their configuration (for example
STACK=pve-core-vm-linux), simplifying configuration across environments. -
Evidence
-
Execution logs and artefacts for Terragrunt runs:
- Record the alias used.
- Optionally log the resolved path for traceability.
-
Evidence folders such as
output/artifacts/platform/terraform/terragrunt-stacks/can organise runs by alias. -
Failure modes
-
If alias generation fails (for example due to permissions or missing tools), CI and local Make targets can:
- Surface a clear error message.
- Point operators to the runbook for regenerating aliases.
- If alias collisions occur, the CI lints can block changes until naming is resolved.
8. References¶
- ADRs:
- ADR-0001 – ADR Process and Conventions
-
Runbooks and HOWTOs:
- Runbook – Terragrunt stack aliases and Make targets
-
Evidence:
-
Implementation:
control/tools/provision/terragrunt/gen-stacks-symlinks.shinfra/terraform/live-v1/stacks/