Skip to content

Cost Model

HybridOps treats cost as a first-class signal alongside availability and performance (ADR-0801). Budget constraints can block or gate DR and burst actions the same way signal failures can: not as a side-effect, but by design.


How it works

Every pipeline run emits a cost record to <runtime-root>/logs/cost/<env>/<component>-<run_id>.json. The cost record carries estimated spend for that run (compute, storage, network) with standard attribution tags.

Before DR or burst actions execute, the Cost Decision Service reads the pending cost record, evaluates it against the active policy, and returns one of three outcomes:

Decision Meaning
ALLOW Action is within budget: proceed
DENY Action would breach a guardrail: blocked
SIMULATE_ONLY Budget not confirmed; run in proposal mode

The decision is recorded alongside the action record regardless of outcome.


Attribution tags

These tags are applied consistently across Terraform resources, Packer builds, and CI pipelines. They map to Azure tags and GCP labels as appropriate.

Tag Example values
cost:env dev, staging, prod
cost:owner hybridops-tech
cost:component ctrl01, rke2, netbox, edge
cost:run_id CI build number or UUID
cost:purpose dr-test, burst, baseline, deploy

Policy configuration

Cost thresholds live in environment policy files, not in service code. The relevant fields:

policy:
  decision:
    prefer_cloud_with_credits: true   # Avoid cloud spend if credits available
    cloud_priority: ["azure", "gcp", "onprem"]
    max_cost_per_hour_usd: 5          # Per-hour guardrail

When prefer_cloud_with_credits is true, the service favours clouds with active credits. cloud_priority controls the preference order when multiple targets are viable. max_cost_per_hour_usd is the hard guardrail: a DENY is issued if projected spend exceeds this threshold.


Cost record schema

{
  "run_id": "<RUN_ID>",
  "timestamp": "<ISO8601>",
  "env": "<ENV>",
  "owner": "<OWNER>",
  "component": "<COMPONENT>",
  "purpose": "<PURPOSE>",
  "estimated_monthly_cost_usd": 12.34,
  "currency": "USD",
  "details": {
    "compute": 8.50,
    "storage": 3.20,
    "network": 0.64
  },
  "source": "terraform-plan"
}

Estimates are derived from terraform plan JSON output or static price tables. The schema is the same whether emitted by a CI pipeline or read by the Cost Decision Service.


When a guardrail is breached

A DENY from the Cost Decision Service does not indicate a failure: it indicates the constraint is working. The cost guardrail breach runbook defines three response paths:

  1. Degraded mode: operate with minimal footprint, read-only services, no retry loops, until the next budget window.
  2. Override: document the request, obtain explicit approval, re-run with the override flag set. All override records are stored alongside the original decision.
  3. Postpone: defer the action until the budget resets; update cost and DR plans accordingly.

In all cases, the decision payload (cost-decision.json), the chosen path, and any approval records are written to <runtime-root>/logs/cost/ and cross-referenced in <runtime-root>/logs/dr/.