Agents

Health Scorecard

The health scorecard provides a real-time reliability and performance dashboard for each agent. It tracks three dimensions: reliability (uptime, crash rate), efficiency (token spend, tool call ratio), and memory hygiene (staleness, contradiction rate). Scores are normalized to a 0–1 scale and updated after every agent turn.

Metrics

Each agent receives three composite scores computed from underlying signals:

ScoreFormulaSignals used
reliability1 - (crashRate * 0.6 + downtimeRatio * 0.4)Consecutive failures, uptime over rolling 24h window
efficiency1 - clamp((tokenSpend / tokenBudget) * 0.5 + (toolCallRatio - 1) * 0.5, 0, 1)Token spend vs. budget, ratio of tool calls to successful outcomes
memoryHygiene1 - (stalenessRatio * 0.5 + contradictionRate * 0.5)Fraction of stale memory entries, rate of contradiction detections

The composite health score is the weighted average: reliability * 0.4 + efficiency * 0.35 + memoryHygiene * 0.25.

Viewing scores

Scorecard data is available via the REST API. Both endpoints return scores rounded to four decimal places along with the raw signal values used to compute them.

Fetch the scorecard for a single agent:

bash
GET /agents/:id/health

# Response
{
  "agentId": "research-agent",
  "scores": {
    "reliability": 0.9400,
    "efficiency": 0.8120,
    "memoryHygiene": 0.7760,
    "composite": 0.8522
  },
  "status": "healthy",
  "updatedAt": "2025-11-14T09:31:00Z"
}

Fetch scorecards for all agents in the workspace:

bash
GET /agents/health

# Returns an array of scorecard objects, sorted by composite score ascending
# so the most degraded agents appear first.

Score thresholds

Each composite score falls into one of three status bands:

StatusComposite score rangeBehavior
Healthy> 0.8No action taken. Agent operates normally.
Degraded0.6 – 0.8Warning emitted on event bus. Alert webhooks fire if configured.
Critical< 0.6Alert webhook fires and the agent is flagged for review. Self-healing restarts if selfHealing.enabled is true.

Alerts

Configure webhook alerts to be notified when an agent enters the degraded or critical band. Alerts fire at most once per cooldownMinutes window per agent to prevent notification spam:

yaml
healthScorecard:
  alerts:
    enabled: true
    cooldownMinutes: 15
    webhooks:
      - url: https://hooks.example.com/openastra
        on:
          - degraded
          - critical
        headers:
          Authorization: Bearer ${WEBHOOK_SECRET}
    # Optional: only alert if a specific sub-score crosses a threshold
    thresholds:
      reliability: 0.7
      efficiency: 0.6
      memoryHygiene: 0.65

How scores are calculated

Scores are computed at the end of every agent turn using a rolling window of the last 50 turns (configurable via healthScorecard.windowSize). This means a single failure does not immediately crash the score — it is amortized across the window.

Raw signals are collected passively from the agent runtime:

  • crashRate — fraction of turns in the window that ended in an unhandled error
  • downtimeRatio — fraction of the last 24 hours the agent was in a paused or failed state
  • tokenSpend — total tokens consumed in the window vs. the agent's configured quotas.tokenBudget
  • toolCallRatio — average number of tool calls per turn; values above 1.0 inflate the efficiency penalty
  • stalenessRatio — fraction of the agent's memory entries older than memory.stalenessThresholdDays
  • contradictionRate — fraction of memory writes in the window that triggered a contradiction detection
yaml
healthScorecard:
  enabled: true
  windowSize: 50           # Number of turns to include in rolling window
  weights:
    reliability: 0.40
    efficiency: 0.35
    memoryHygiene: 0.25