Introduction
Open Astra is an agent runtime for engineering teams shipping real products. It is not a personal AI assistant — it is the self-hosted backend infrastructure your agents run inside: persistent memory, hierarchical swarms, 15 messaging channels, and a full REST API.
What Open Astra is
A TypeScript runtime that gives you:
- A multi-tenant API server — JWT auth, REST + WebSocket + SSE, per-workspace agent teams
- 5-tier memory — ephemeral sessions → daily notes → user profile → knowledge graph → procedural workflows, backed by PostgreSQL and Typesense with RRF fusion search
- Hierarchical swarms — a root agent decomposes tasks, spawns permission-sandboxed sub-agents, shares state via a blackboard, mediates conflicts with a debate protocol
- 10 inference providers — Grok, Groq, OpenAI, Gemini, Claude, Ollama, vLLM, Bedrock, Mistral, OpenRouter with per-provider prompt caching (up to 90% savings)
- 15 messaging channels — Telegram, Discord, Slack, WhatsApp, Signal, iMessage, Google Chat, Microsoft Teams, LINE, Viber, X, Email, Linear, Jira, Zapier
- 109 built-in skills and 67 tools, auto-discovered at startup with hot-swap on file change
- Workspace files — drop
.mdfiles into./workspace/and they are live on the next agent request - Self-healing, quotas, approval workflows, deep research, dream mode, persona evolution, heartbeat daemon
- Cost dashboard and diagnostics via
npx astra costsandnpx astra doctor
What it is not
- Not a no-code agent builder or visual workflow tool
- Not a single-user local app (though the standalone CLI works fine for personal use)
- Not a replacement for an LLM provider — it calls providers you configure
- Not a managed cloud service — you deploy and own it
What makes it different
Most agent frameworks are wrappers — thin layers over a single LLM call. Open Astra is a runtime. Five things set it apart:
- Memory that compounds — 5-tier architecture with RRF fusion across Typesense BM25+vector and pgvector. Entries decay by weight over time; duplicates are Jaccard-deduped; graph entities gain confidence from consistent extractions.
- Deterministic context assembly — context is built in a fixed order with defined token budgets. Graph hints inject related entities. Budget pre-flight trims tool schemas before the model call. You can audit exactly what was sent to any inference call.
- True multi-agent orchestration — hierarchical swarms with permission sandboxing. Sub-agents cannot escalate privilege beyond their parent's allow list. Blackboard state sharing and debate protocol for conflict resolution.
- Production infrastructure included — self-healing with exponential backoff, compaction forecasting, per-agent cost tracking, adaptive temperature, async tool dispatch, skill metrics, and 11 cron jobs running on schedule.
- No black boxes — no community plugins, no external MCP servers, no marketplace extensions. Every tool, skill, inference adapter, and memory operation is in this repository. If it runs in your agent loop, you can read it.
How it works
From an incoming message to a delivered response in three steps:
- Connect a channel — Slack, X, email, or any of the 14 integrations. Agents live where your team already works.
- Agents pull context from memory — 5-tier memory fuses pgvector and Typesense via RRF to surface what's relevant before every response.
- Act, report, and learn — tools run, sub-agents spawn, results come back through the same channel. Memory updates. The loop closes.
Architecture overview
Every request flows through the same path:
text
HTTP / WebSocket / Channel Message
→ gateway/index.ts Express + WS bootstrap
→ agents/loop.ts Core execution cycle
1. Resolve session {uid, surface, surfaceId}
2. Budget pre-flight Trim tool schemas if over token budget
3. Context assembly SOUL.md + workspace files + system prompt
+ graph hints + memory (RRF fusion) + history
4. Inference call Provider client (10 providers, adaptive temperature)
5. Tool loop Execute tool calls (maxToolCallsPerRound, default 8)
dependsOn sort → batch if batchable → async if async: true
6. Post-turn save Auto-save memory, emit agent.metrics, fire webhooks
Compaction forecast (warn 5 turns out)The streaming variant (loop-stream.ts) yields an AsyncGenerator<AgentStreamEvent> for SSE and WebSocket delivery.
Tech stack
| Layer | Technology |
|---|---|
| Runtime | Node.js 20+, TypeScript (ESM), strict mode |
| Gateway | Express, ws (WebSocket), SSE |
| Database | PostgreSQL 17 with pgvector extension |
| Search | Typesense 27.1 (hybrid BM25 + vector) |
| Validation | Zod — all external data, all tool params, all tool output schemas |
| Auth | JWT (jose) + bcrypt |
| Scheduler | 11 cron jobs (node-cron) including entry weight decay, entity confidence, Jaccard dedup |
Troubleshooting
Pick a symptom to see a targeted diagnosis path:
Next steps
- Why Open Astra — philosophy, key differences, and comparison with other frameworks
- Quick Start — running
npx astrain under 60 seconds - Creating Agents — defining agent teams in
astra.yml - 5-Tier Memory — how memory is stored, retrieved, and maintained