Introduction

Open Astra is an agent runtime for engineering teams shipping real products. It is not a personal AI assistant — it is the self-hosted backend infrastructure your agents run inside: persistent memory, hierarchical swarms, 15 messaging channels, and a full REST API.

What Open Astra is

A TypeScript runtime that gives you:

A multi-tenant API server — JWT auth, REST + WebSocket + SSE, per-workspace agent teams
5-tier memory — ephemeral sessions → daily notes → user profile → knowledge graph → procedural workflows, backed by PostgreSQL and Typesense with RRF fusion search
Hierarchical swarms — a root agent decomposes tasks, spawns permission-sandboxed sub-agents, shares state via a blackboard, mediates conflicts with a debate protocol
10 inference providers — Grok, Groq, OpenAI, Gemini, Claude, Ollama, vLLM, Bedrock, Mistral, OpenRouter with per-provider prompt caching (up to 90% savings)
15 messaging channels — Telegram, Discord, Slack, WhatsApp, Signal, iMessage, Google Chat, Microsoft Teams, LINE, Viber, X, Email, Linear, Jira, Zapier
109 built-in skills and 67 tools, auto-discovered at startup with hot-swap on file change
Workspace files — drop .md files into ./workspace/ and they are live on the next agent request
Self-healing, quotas, approval workflows, deep research, dream mode, persona evolution, heartbeat daemon
Cost dashboard and diagnostics via npx astra costs and npx astra doctor

What it is not

Not a no-code agent builder or visual workflow tool
Not a single-user local app (though the standalone CLI works fine for personal use)
Not a replacement for an LLM provider — it calls providers you configure
Not a managed cloud service — you deploy and own it

What makes it different

Most agent frameworks are wrappers — thin layers over a single LLM call. Open Astra is a runtime. Five things set it apart:

Memory that compounds — 5-tier architecture with RRF fusion across Typesense BM25+vector and pgvector. Entries decay by weight over time; duplicates are Jaccard-deduped; graph entities gain confidence from consistent extractions.
Deterministic context assembly — context is built in a fixed order with defined token budgets. Graph hints inject related entities. Budget pre-flight trims tool schemas before the model call. You can audit exactly what was sent to any inference call.
True multi-agent orchestration — hierarchical swarms with permission sandboxing. Sub-agents cannot escalate privilege beyond their parent's allow list. Blackboard state sharing and debate protocol for conflict resolution.
Production infrastructure included — self-healing with exponential backoff, compaction forecasting, per-agent cost tracking, adaptive temperature, async tool dispatch, skill metrics, and 11 cron jobs running on schedule.
No black boxes — no community plugins, no external MCP servers. Every tool, skill, inference adapter, and memory operation is in this repository. If it runs in your agent loop, you can read it.

How it works

From an incoming message to a delivered response in three steps:

Connect a channel — Slack, X, email, or any of the 14 integrations. Agents live where your team already works.
Agents pull context from memory — 5-tier memory fuses pgvector and Typesense via RRF to surface what's relevant before every response.
Act, report, and learn — tools run, sub-agents spawn, results come back through the same channel. Memory updates. The loop closes.

Architecture overview

Every request flows through the same path:

text

HTTP / WebSocket / Channel Message
  → gateway/index.ts        Express + WS bootstrap
    → agents/loop.ts        Core execution cycle
        1. Resolve session   {uid, surface, surfaceId}
        2. Budget pre-flight Trim tool schemas if over token budget
        3. Context assembly  SOUL.md + workspace files + system prompt
                             + graph hints + memory (RRF fusion) + history
        4. Inference call    Provider client (10 providers, adaptive temperature)
        5. Tool loop         Execute tool calls (maxToolCallsPerRound, default 8)
                             dependsOn sort → batch if batchable → async if async: true
        6. Post-turn save    Auto-save memory, emit agent.metrics, fire webhooks
                             Compaction forecast (warn 5 turns out)

The streaming variant (loop-stream.ts) yields an AsyncGenerator<AgentStreamEvent> for SSE and WebSocket delivery.

Tech stack

Layer	Technology
Runtime	Node.js 20+, TypeScript (ESM), strict mode
Gateway	Express, ws (WebSocket), SSE
Database	PostgreSQL 17 with pgvector extension
Search	Typesense 27.1 (hybrid BM25 + vector)
Validation	Zod — all external data, all tool params, all tool output schemas
Auth	JWT (jose) + bcrypt
Scheduler	11 cron jobs (node-cron) including entry weight decay, entity confidence, Jaccard dedup

Troubleshooting

Pick a symptom to see a targeted diagnosis path:

Next steps

Why Open Astra — philosophy, key differences, and comparison with other frameworks
Quick Start — running npx astra in under 60 seconds
Creating Agents — defining agent teams in astra.yml
5-Tier Memory — how memory is stored, retrieved, and maintained