Get Started

Introduction

Open Astra is an agent runtime for engineering teams shipping real products. It is not a personal AI assistant — it is the self-hosted backend infrastructure your agents run inside: persistent memory, hierarchical swarms, 15 messaging channels, and a full REST API.

What Open Astra is

A TypeScript runtime that gives you:

  • A multi-tenant API server — JWT auth, REST + WebSocket + SSE, per-workspace agent teams
  • 5-tier memory — ephemeral sessions → daily notes → user profile → knowledge graph → procedural workflows, backed by PostgreSQL and Typesense with RRF fusion search
  • Hierarchical swarms — a root agent decomposes tasks, spawns permission-sandboxed sub-agents, shares state via a blackboard, mediates conflicts with a debate protocol
  • 10 inference providers — Grok, Groq, OpenAI, Gemini, Claude, Ollama, vLLM, Bedrock, Mistral, OpenRouter with per-provider prompt caching (up to 90% savings)
  • 15 messaging channels — Telegram, Discord, Slack, WhatsApp, Signal, iMessage, Google Chat, Microsoft Teams, LINE, Viber, X, Email, Linear, Jira, Zapier
  • 109 built-in skills and 67 tools, auto-discovered at startup with hot-swap on file change
  • Workspace files — drop .md files into ./workspace/ and they are live on the next agent request
  • Self-healing, quotas, approval workflows, deep research, dream mode, persona evolution, heartbeat daemon
  • Cost dashboard and diagnostics via npx astra costs and npx astra doctor

What it is not

  • Not a no-code agent builder or visual workflow tool
  • Not a single-user local app (though the standalone CLI works fine for personal use)
  • Not a replacement for an LLM provider — it calls providers you configure
  • Not a managed cloud service — you deploy and own it

What makes it different

Most agent frameworks are wrappers — thin layers over a single LLM call. Open Astra is a runtime. Five things set it apart:

  • Memory that compounds — 5-tier architecture with RRF fusion across Typesense BM25+vector and pgvector. Entries decay by weight over time; duplicates are Jaccard-deduped; graph entities gain confidence from consistent extractions.
  • Deterministic context assembly — context is built in a fixed order with defined token budgets. Graph hints inject related entities. Budget pre-flight trims tool schemas before the model call. You can audit exactly what was sent to any inference call.
  • True multi-agent orchestration — hierarchical swarms with permission sandboxing. Sub-agents cannot escalate privilege beyond their parent's allow list. Blackboard state sharing and debate protocol for conflict resolution.
  • Production infrastructure included — self-healing with exponential backoff, compaction forecasting, per-agent cost tracking, adaptive temperature, async tool dispatch, skill metrics, and 11 cron jobs running on schedule.
  • No black boxes — no community plugins, no external MCP servers, no marketplace extensions. Every tool, skill, inference adapter, and memory operation is in this repository. If it runs in your agent loop, you can read it.

How it works

From an incoming message to a delivered response in three steps:

  1. Connect a channel — Slack, X, email, or any of the 14 integrations. Agents live where your team already works.
  2. Agents pull context from memory — 5-tier memory fuses pgvector and Typesense via RRF to surface what's relevant before every response.
  3. Act, report, and learn — tools run, sub-agents spawn, results come back through the same channel. Memory updates. The loop closes.

Architecture overview

Every request flows through the same path:

text
HTTP / WebSocket / Channel Message
  → gateway/index.ts        Express + WS bootstrap
    → agents/loop.ts        Core execution cycle
        1. Resolve session   {uid, surface, surfaceId}
        2. Budget pre-flight Trim tool schemas if over token budget
        3. Context assembly  SOUL.md + workspace files + system prompt
                             + graph hints + memory (RRF fusion) + history
        4. Inference call    Provider client (10 providers, adaptive temperature)
        5. Tool loop         Execute tool calls (maxToolCallsPerRound, default 8)
                             dependsOn sort → batch if batchable → async if async: true
        6. Post-turn save    Auto-save memory, emit agent.metrics, fire webhooks
                             Compaction forecast (warn 5 turns out)

The streaming variant (loop-stream.ts) yields an AsyncGenerator<AgentStreamEvent> for SSE and WebSocket delivery.

Tech stack

LayerTechnology
RuntimeNode.js 20+, TypeScript (ESM), strict mode
GatewayExpress, ws (WebSocket), SSE
DatabasePostgreSQL 17 with pgvector extension
SearchTypesense 27.1 (hybrid BM25 + vector)
ValidationZod — all external data, all tool params, all tool output schemas
AuthJWT (jose) + bcrypt
Scheduler11 cron jobs (node-cron) including entry weight decay, entity confidence, Jaccard dedup

Troubleshooting

Pick a symptom to see a targeted diagnosis path:

Next steps