Cost Model¶
HybridOps treats cost as a first-class signal alongside availability and performance (ADR-0801). Budget constraints can block or gate DR and burst actions the same way signal failures can: not as a side-effect, but by design.
How it works¶
Every pipeline run emits a cost record to <runtime-root>/logs/cost/<env>/<component>-<run_id>.json. The cost record carries estimated spend for that run (compute, storage, network) with standard attribution tags.
Before DR or burst actions execute, the Cost Decision Service reads the pending cost record, evaluates it against the active policy, and returns one of three outcomes:
| Decision | Meaning |
|---|---|
ALLOW |
Action is within budget: proceed |
DENY |
Action would breach a guardrail: blocked |
SIMULATE_ONLY |
Budget not confirmed; run in proposal mode |
The decision is recorded alongside the action record regardless of outcome.
Attribution tags¶
These tags are applied consistently across Terraform resources, Packer builds, and CI pipelines. They map to Azure tags and GCP labels as appropriate.
| Tag | Example values |
|---|---|
cost:env |
dev, staging, prod |
cost:owner |
hybridops-tech |
cost:component |
ctrl01, rke2, netbox, edge |
cost:run_id |
CI build number or UUID |
cost:purpose |
dr-test, burst, baseline, deploy |
Policy configuration¶
Cost thresholds live in environment policy files, not in service code. The relevant fields:
policy:
decision:
prefer_cloud_with_credits: true # Avoid cloud spend if credits available
cloud_priority: ["azure", "gcp", "onprem"]
max_cost_per_hour_usd: 5 # Per-hour guardrail
When prefer_cloud_with_credits is true, the service favours clouds with active credits. cloud_priority controls the preference order when multiple targets are viable. max_cost_per_hour_usd is the hard guardrail: a DENY is issued if projected spend exceeds this threshold.
Cost record schema¶
{
"run_id": "<RUN_ID>",
"timestamp": "<ISO8601>",
"env": "<ENV>",
"owner": "<OWNER>",
"component": "<COMPONENT>",
"purpose": "<PURPOSE>",
"estimated_monthly_cost_usd": 12.34,
"currency": "USD",
"details": {
"compute": 8.50,
"storage": 3.20,
"network": 0.64
},
"source": "terraform-plan"
}
Estimates are derived from terraform plan JSON output or static price tables. The schema is the same whether emitted by a CI pipeline or read by the Cost Decision Service.
When a guardrail is breached¶
A DENY from the Cost Decision Service does not indicate a failure: it indicates the constraint is working. The cost guardrail breach runbook defines three response paths:
- Degraded mode: operate with minimal footprint, read-only services, no retry loops, until the next budget window.
- Override: document the request, obtain explicit approval, re-run with the override flag set. All override records are stored alongside the original decision.
- Postpone: defer the action until the budget resets; update cost and DR plans accordingly.
In all cases, the decision payload (cost-decision.json), the chosen path, and any approval records are written to <runtime-root>/logs/cost/ and cross-referenced in <runtime-root>/logs/dr/.