Terraform Execution Modes and HCP Workspace Governance for Multi-Cloud and On-Prem¶

Status: Accepted

1. Context¶

HybridOps.Studio uses Terraform and Terragrunt across:

On-prem infrastructure (currently Proxmox, future additional on-prem platforms).
Cloud platforms (Azure today, GCP/AWS later).

HCP Terraform is used as:

The system of record for Terraform state.
The control plane for policy enforcement (Sentinel/OPA), RBAC, and audit history.
The workspace catalogue for platform components.

However, execution constraints differ by platform:

Proxmox and other on-prem APIs live on private networks that HCP Terraform runners cannot reach.
Cloud providers are reachable from HCP Terraform’s SaaS runners and benefit from fully managed remote execution.

The platform must:

Keep all state and governance centralised in HCP Terraform.
Allow on-prem plans/applies to run locally on the control node while still using HCP Terraform workspaces and state.
Avoid manual drift, where a workspace execution mode is left in an invalid/undesired state.

2. Decision¶

HCP Terraform is the single source of truth for Terraform state, runs, and policy across HybridOps.Studio.
Execution mode is selected by platform type:
On-prem (Proxmox and similar):
- Workspaces use local execution.
- Terraform runs from the control node / Terragrunt runner inside the private network.
- Workspaces still use the HCP backend for state and governance.
Cloud (Azure, GCP, AWS):
- Workspaces use remote execution.
- Plans/applies run on HCP Terraform infrastructure.
Workspace execution mode is enforced by automation:
A script, configure-workspace-execution-mode.sh, is the single implementation point for setting execution mode.
Proxmox Terragrunt roots add an after_hook on terraform init that calls this script to enforce local execution for their workspace.
Cloud Terragrunt roots may enforce or validate remote execution in a similar way.
Workspace naming remains consistent:
Workspace names follow the pattern:
- hybridops-<provider>-<rel_path>
The same naming is used by Terragrunt, the execution-mode script, and documentation.

3. Rationale¶

Network reachability: HCP Terraform runners cannot reach private Proxmox APIs. Local execution on the control node is the only viable option while still using HCP as backend.
Governance: Keeping state, policies, and run metadata in one HCP organisation avoids split-brain between local state and SaaS state.
Repeatability: Centralising execution-mode logic in a single script prevents ad hoc per-workspace configuration.
Clarity for assessors and reviewers: The model is easy to explain: “HCP is always the source of truth; execution mode is chosen by where the API lives.”

4. Execution model¶

4.1 Workspace execution-mode script¶

Location:

control/tools/provision/terragrunt/configure-workspace-execution-mode.sh

Responsibilities:

Ensure a given HCP Terraform workspace exists with the desired execution mode (local or remote).
Be idempotent: safe to run on every terraform init from Terragrunt.
Log outcome clearly for evidence and debugging.

Usage pattern (called by Terragrunt, not directly by users):

./configure-workspace-execution-mode.sh <workspace_name> <org> <execution_mode>

workspace_name – HCP Terraform workspace name.
org – HCP Terraform organisation (default: hybridops-studio).
execution_mode – local or remote.

Prerequisites:

HCP Terraform credentials via init-tfc-env.sh:
Populates ~/.terraform.d/credentials.tfrc.json.
Network connectivity from the control node to app.terraform.io (or the relevant HCP endpoint).

4.2 Terragrunt integration for Proxmox¶

Proxmox live root (infra/terraform/live-v1/onprem/proxmox) includes a terraform block with an after_hook on init:

terraform {
  after_hook "configure_workspace_execution_mode" {
    commands = ["init"]
    execute  = [
      "${get_repo_root()}/control/tools/provision/terragrunt/configure-workspace-execution-mode.sh",
      "hybridops-${local.provider}-${replace(local.rel_path, "/", "-")}",
      "hybridops-studio",
      "local"
    ]
  }
}

This guarantees that:

The workspace exists with the expected name.
Execution mode is always corrected to local before any plan/apply.
HCP still holds the backend state and history for Proxmox modules.

4.3 Cloud workspaces¶

Cloud roots (for example, Azure) use:

remote execution, where plans and applies run on HCP Terraform infrastructure.
The same naming convention and backend organisation.
Optional use of the same script to assert/repair remote execution mode if needed.

5. Implications and risks¶

Positive¶

Consistent governance: Single HCP org for state and policy across on-prem and cloud.
Clear separation of concerns: Execution location is decoupled from state storage.
Operational safety: Proxmox workspaces can never be accidentally left in remote execution mode, which would fail at runtime.
Auditable: The execution-mode script and Terragrunt hooks provide repeatable behaviour and loggable evidence.

Risks / trade-offs¶

More moving parts: Adds another helper script to the provisioning toolchain.
Partial failure scenarios: If the execution-mode script fails, init may still succeed locally; operators must read logs and fix the workspace manually.
HCP dependency: Even local execution for Proxmox depends on HCP availability for state operations.

Mitigations:

Keep the script small, tested, and documented under control/tools/provision/terragrunt.
Make failure paths explicit and log clearly when workspace execution mode could not be reconciled.
Provide a runbook for diagnosing HCP workspace issues when terraform init hooks fail.

6. Alternatives considered¶

HCP-only remote execution for all platforms
Rejected: Proxmox and similar on-prem APIs are not reachable from public HCP runners.
Local state for on-prem, HCP for cloud only
Rejected: Splits the state/governance model; makes cross-environment reasoning and evidence more complex.
Per-workspace manual configuration in the HCP UI
Rejected: Error-prone, hard to automate, difficult to reproduce for assessors or colleagues.

The chosen approach keeps HCP as the common control plane while adapting execution to network realities.

7. Validation and evidence¶

Proxmox SDN module (onprem/proxmox/core/00-foundation/network-sdn):
Runs via Terragrunt with Proxmox workspace in local execution mode.
Uses HCP Terraform backend for state.
Proxmox Packer templates and env init scripts:
Use the same naming conventions and backend organisation.
Provide logs and evidence artefacts under output/ for assessors.
Future evidence items will include screenshots or CLI transcripts showing:
HCP workspace in local mode with successful runs for Proxmox.
HCP workspace in remote mode for Azure modules with the same naming scheme.

8. References¶

Terraform and Terragrunt layout – Backend, workspace naming, and state strategy.
Terragrunt tools README – Workspace execution-mode helper and usage.
ADR-0016 – Packer Cloud-Init VM Templates for Proxmox.
ADR-0601 – Nornir + Ansible Hybrid Automation.
HCP Terraform – Execution modes and workspaces.
Terragrunt – terraform block and hooks.

Maintainer: HybridOps.Studio
License: MIT-0 for code, CC BY 4.0 for documentation unless otherwise stated.