Learning & Best Practices

Anti-Patterns & Gotchas

These are the most common mistakes seen in production Open Astra deployments — with explanations of why they happen and how to fix them.

1. Missing budget pre-flight

Symptom: Agent runs cost $50+ in a single turn, or loops for minutes without a result.

Cause: No budget block in the agent config. Without caps, the agent will call tools as many times as the model decides and use as many tokens as context allows.

yaml
agents:
  - id: researcher
    # ❌ No budget set — will run until context fills or cost explodes
    model:
      provider: openai
      modelId: gpt-4o
yaml
agents:
  - id: researcher
    # ✅ Hard caps prevent runaway cost
    budget:
      maxTokensPerTurn: 8000
      maxToolCallsPerTurn: 12
      maxCostUsdPerTurn: 0.50
    model:
      provider: openai
      modelId: gpt-4o
Budget pre-flight runs before inference — the turn is rejected early if the estimated cost exceeds limits. See Budget Pre-Flight for the full config schema.

2. Ignoring blackboard state

Symptom: Swarm agents silently skip steps or produce incomplete results because a dependency wasn't ready when they read the blackboard.

Cause: Agents read shared blackboard keys without checking whether the writer has finished. In concurrent swarms, write order is not guaranteed.

javascript
# Agent A writes a result
await tools.blackboard_write({ key: 'analysis', value: result })

# Agent B reads — but never checks if the key exists
const data = await tools.blackboard_read({ key: 'analysis' })
// ❌ data may be undefined if Agent A hasn't finished yet
javascript
# Agent B waits for the key with a timeout
const data = await tools.blackboard_read({
  key: 'analysis',
  waitMs: 10000,   // wait up to 10 s
  required: true   // throws if still missing after timeout
})
// ✅ guaranteed to have data or a clear error

Always use required: true and a waitMs timeout when reading blackboard keys that depend on another agent. See Blackboard for the full API.

3. Over-provisioning swarms

Symptom: Swarm tasks cost 10–20× more than expected with no quality improvement.

Cause: Spawning one agent per topic/file/item when a smaller number of batched agents would produce the same result. Each spawned agent has per-turn overhead (context assembly, memory retrieval, post-turn save).

javascript
# ❌ Spawning 20 specialist agents for a task one agent could handle
const agents = await Promise.all(
  topics.map(t => spawnAgent({ skill: 'researcher', topic: t }))
)
// Cost: 20× per-agent overhead + 20× inference calls
javascript
# ✅ Batch into 3–5 agents with chunked topics
const chunks = chunkArray(topics, Math.ceil(topics.length / 4))
const agents = await Promise.all(
  chunks.map(chunk => spawnAgent({ skill: 'researcher', topics: chunk }))
)
// Cost: 4× overhead, each agent covers multiple topics

A rule of thumb: spawn no more agents than you have independent quality-improving perspectives. For pure parallelism, batch items into 3–5 agents.

4. Skipping ethical check in public-facing agents

Symptom: Agents respond to harmful requests or leak workspace data to unauthorized users.

Cause: Ethical check and workspace grants are opt-in. Agents deployed to channels without these guards have no policy layer.

Always enable Ethical Check and configure Agent Grants for any agent reachable by untrusted users.

5. Unbounded memory retrieval

Symptom: Slow turns (3–8 s) even for simple questions, high token counts on every response.

Cause: Default memory retrieval pulls top-K results from all tiers. In a workspace with large knowledge graphs, this can inject thousands of tokens of context that aren't relevant.

Tune memory.retrievalTopK per tier and use Contextual Boosting to weight recent or high-confidence entries. Set memory.budget.maxTokens to hard-cap injected memory.

6. No compaction strategy for long sessions

Symptom: Agents start failing with context-length errors after extended conversations.

Cause: Session messages grow unbounded if auto-compaction is not configured.

Set compaction.strategy: rolling or compaction.strategy: summary in the agent config. See Compaction Forecast to predict when compaction will trigger.