Cost & Budget Management
AI inference costs can spiral fast when agents run unsupervised. Open Astra solves this with real-time cost dashboards, per-agent budget caps, and automatic enforcement — so you always know exactly what you're spending and no single agent can blow your budget. Most teams reduce their inference costs 30–50% in the first month just by seeing the breakdown.
Why this matters
- Predictable spend — hard caps prevent runaway agents from burning through your inference budget overnight
- Per-agent attribution — know exactly which agent is costing you money, down to the tool call
- Provider visibility — see spend by provider (Claude, OpenAI, Groq, etc.) to optimize your provider mix
- Budget inheritance — when agents spawn sub-agents, budgets flow downward with strict limits
Cost dashboard
The cost dashboard aggregates spend across your workspace by agent, provider, and time period. The default lookback is 3 months.
# Get workspace cost summary (default: last 3 months)
curl http://localhost:3000/costs \
-H "Authorization: Bearer ${JWT_TOKEN}"
# Response
{
"workspaceId": "ws_abc123",
"totalCost": 42.17,
"byAgent": {
"research-agent": 18.50,
"code-agent": 14.22,
"support-agent": 9.45
},
"byProvider": {
"claude": 28.90,
"openai": 10.12,
"groq": 3.15
},
"byMonth": [
{ "month": "2026-01", "cost": 15.30 },
{ "month": "2026-02", "cost": 14.80 },
{ "month": "2026-03", "cost": 12.07 }
]
}Detailed breakdown
For finer-grained analysis, use the breakdown endpoint with optional date range filtering.
# Detailed breakdown with date range
curl "http://localhost:3000/costs/breakdown?startDate=2026-02-01&endDate=2026-02-28" \
-H "Authorization: Bearer ${JWT_TOKEN}"
# Response
{
"meta": {
"workspaceId": "ws_abc123",
"months": 3,
"startDate": "2026-02-01",
"endDate": "2026-02-28",
"generatedAt": "2026-03-07T12:00:00.000Z"
},
"totalCost": 14.80,
"byAgent": { ... },
"byProvider": { ... },
"byTool": { ... }
}| Parameter | Type | Default | Description |
|---|---|---|---|
months | integer | 3 | Lookback period in months (used when no date range specified) |
startDate | string | — | ISO date for range start |
endDate | string | — | ISO date for range end |
Budget caps
Every agent can have resource constraints defined in astra.yml or via the API. Budgets control tokens, cost, tool calls, execution time, and spawn permissions.
# astra.yml — per-agent budget constraints
agents:
- id: research-agent
budget:
maxTotalTokens: 50000
maxCostCents: 50 # $0.50 per request
maxToolCalls: 20
maxDuration: 60000 # 60 seconds
maxChildAgents: 0 # cannot spawn sub-agentsSystem hard caps
These limits cannot be exceeded by any agent or swarm, regardless of configuration:
# System hard caps (cannot be exceeded by any agent)
maxPromptTokens: 500,000
maxCompletionTokens: 100,000
maxTotalTokens: 600,000
maxCostCents: 500 # $5.00 per swarm
maxToolCalls: 100
maxDuration: 300,000 ms # 5 minutes
maxSpawnDepth: 5
maxChildAgents: 20Role-based defaults
When an agent has a role but no explicit budget, these defaults apply:
# Role-based defaults (applied when no explicit budget is set)
researcher: 50,000 tokens | $0.50 | 20 tool calls | 60s | no spawn
analyst: 80,000 tokens | $1.00 | 15 tool calls | 90s | no spawn
writer: 30,000 tokens | $0.30 | 5 tool calls | 30s | no spawn
mediator: 40,000 tokens | $0.40 | 5 tool calls | 60s | no spawnBudget inheritance
When a parent agent spawns a sub-agent, budgets follow strict inheritance rules:
- Child budget is always ≤ parent budget (hard constraint)
maxSpawnDepthdecrements by 1 at each level- Tool permissions: child gets the intersection of parent's allowed tools and union of denied tools
- Memory permissions use AND logic — child can only have what parent allows
Cost ledger
The cost ledger tracks execution metrics at two levels:
| Ledger | Scope | What it tracks |
|---|---|---|
role_ledger | Global per agent role | Total executions, success rate, latency, tokens, cost, fitness score |
user_role_ledger | Per user per agent role | Same metrics scoped to individual users |
execution_log | Per execution | Individual execution records with latency, tokens, errors |
Leaderboard
The leaderboard ranks agents by usage and performance. Useful for identifying which agents consume the most resources and which perform best.
| Endpoint | Description |
|---|---|
GET /leaderboard | Top agents by period (day, week, month). Default limit: 20, max: 100 |
GET /leaderboard/:agentId | Hourly stats for a specific agent |
Differential privacy budget
For workspaces with differential privacy enabled, Open Astra tracks a privacy budget that limits the amount of information that can be extracted about any individual user. Owners can reset the budget when needed.
# Get differential privacy budget status
curl http://localhost:3000/security/dp-budget \
-H "Authorization: Bearer ${JWT_TOKEN}"
# Reset DP budget (owner only)
curl -X POST http://localhost:3000/security/dp-budget/reset \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-H "Content-Type: application/json" \
-d '{ "uid": "uid_alice" }'Related
- Budget Pre-Flight — how budgets are checked before inference
- Quotas — per-agent rate limits and token quotas
- Cost Tagging — per-tool cost attribution