Open Astra

The agent runtime
built for engineers.

A self-hosted agent runtime with vectorized memory across pgvector and Typesense, 10 LLM providers, and the channels your team already uses. Production infrastructure you actually own.

Open Source · Self-Hosted · Full REST API

View Docs → Star on GitHub

Runs locally · Node 20+ · Docker optional

$npx astra

Zero community plugins. Zero external MCP servers. Every tool, skill, and memory operation is code you can read and audit. If it runs in your agent loop, it is in this repository.

Why this matters →

106

Built-in Skills

Core Tools

LLM Providers

Channels

5-tier

Memory System

7-layer

Cache Stack

How it works

From message to action in three steps

Connect a channel

Slack, X, email — or all of them. Agents live where your team already works, no new interface to learn.

→

Agents pull context from memory

5-tier memory fuses pgvector and Typesense via RRF to surface exactly what's relevant before every response.

→

Act, report, and learn

Tools run, sub-agents spawn, results come back through the same channel. Memory updates. The loop closes.

What's inside

Everything you need to ship agents to production

5-Tier Memory

Ephemeral → daily notes → user profile → knowledge graph with multi-hop traversal → procedural workflows. RRF fusion search across Typesense and pgvector.

Hierarchical Swarms

A root agent decomposes tasks, spawns permission-sandboxed sub-agents, shares state via blackboard, mediates conflicts with a debate protocol, and synthesizes.

10 Inference Providers

Grok, Groq, OpenAI, Gemini, Claude, Ollama, vLLM, Bedrock, Mistral, OpenRouter — all behind one interface with per-provider prompt caching (up to 90% savings).

Workspace Files

Drop AGENTS.md, USER.md, IDENTITY.md, or any .md into ./workspace/. Hot-reloaded on save, SHA-256 checksummed, injected before every agent turn.

Self-Healing

Auto-restart crashed agents with exponential backoff. Auto-compact sessions approaching context limits. Dead session cleanup. All configurable in astra.yml.

Deep Research

Autonomous multi-day PhD-style research. Spawns sub-agent swarms for parallel investigation, synthesizes findings, delivers structured reports.

Dream Mode

Overnight simulation using local models only (Ollama). 100% on-device, privacy-first. Generates morning reports with extracted insights and entity updates.

Cost Dashboard

Per-agent, per-user, per-model token spend with trend visualization. Real-time quota enforcement. npx astra costs or the /costs API.

Diagnostics

12 checks across 6 categories: connectivity, config, migrations, agents, providers, memory. Run with npx astra doctor.

Heartbeat Daemon

Proactive background agents that monitor triggers — email, RSS, price thresholds, news keywords — and fire an agent when conditions are met. Cron-scheduled, zero manual polling.

Hot Reload

Skills and plugins reload instantly when files change on disk — no gateway restart required. Filesystem watcher with 1000ms debounce. Syntax errors roll back to the previous working version.

Persona Evolution

Agents A/B test prompt variants against each other and automatically promote the best performer. Feedback-driven — explicit ratings, task completion, tool efficiency — with safety guards and full rollback.

Secrets Management

Vault KV v2, AWS Secrets Manager, or env — swappable via one env var. Per-workspace API key overrides let tenants bring their own credentials. Every secret read is audit-logged.

7-Layer Cache Stack

Bloom filter → turn hash → SWR semantic cache (≥ 0.97 similarity) → negative result cache → graph edge LRU → swarm L1 → embedding batcher. Adaptive TTL, per-workspace budget, and version-pinned invalidation. Cache metrics at GET /cache/stats.

Auth Hardening

Zero-downtime JWT key rotation via JWT_SECRET_PREV — old tokens stay valid during rollover. Optional device fingerprint binding ties tokens to User-Agent + IP to block replay attacks.

MCP Server Support

Connect any MCP-compatible server — stdio, SSE, or streamable-HTTP. Tools are auto-registered as mcp:{server}:{tool} and available to all agents. Declare servers in astra.yml, no code required.

Kubernetes Ready

Production Helm chart with HPA, PodDisruptionBudget, init-container Postgres readiness check, and external secret support. helm install astra ./helm — one command to k8s.

Developer CLI

astra scaffold generates skill, route, and plugin boilerplate. astra bench measures agent latency and cost. astra replay re-runs any session in the terminal. astra migrate manages DB schema — all from the same CLI.

Full Dashboard

Complete visibility into every agent, token, and dollar

A built-in operations dashboard ships with every install — no Grafana, no Datadog, no third-party billing. Query everything via the REST API or npx astra costs in your terminal.

Cost Dashboard

Per-agent, per-provider, per-tool token spend with monthly trends and real-time quota enforcement. Filter by hour, day, week, or month. See exactly where every dollar goes — prompt tokens, completion tokens, and cache savings up to 90%.

GET /costsGET /costs/breakdownnpx astra costs

Agent Health Scorecard

Composite health score per agent built from reliability (crash rate, uptime), efficiency (token spend vs. budget, tool call ratio), and memory hygiene (staleness, contradictions). Green / yellow / red bands with configurable alert webhooks.

GET /agents/:id/healthGET /agents/health

Trace Viewer

Per-turn execution traces with duration, token counts, tool call breakdown, circuit breaker events, and provider state transitions. Filter by agent, status, or sort by latency.

GET /tracesGET /traces/:id

Team Health

Aggregate latency percentiles (p50/p95/p99), error rates, token efficiency, goal completion, and task throughput across all agents — with trend analysis over hour, day, or week windows.

GET /team-healthGET /team-health/trends

Leaderboard

Agents ranked by usage and performance across configurable periods. Hourly breakdowns per agent with pagination — see who's doing the most work and who's costing the most.

GET /leaderboard

Cache Stats

Hit rates across all 7 cache layers — semantic SWR, bloom filter, turn hash, graph edge LRU, and more. Per-provider cached token counts and cost savings over trailing 24-hour windows.

GET /cache/stats

Workspace Usage

Session counts, aggregate token and cost totals, top 5 agents and tools by usage, and active user counts — all scoped per workspace over configurable time windows up to 365 days.

GET /workspaces/:id/stats

Diagnostics

12 automated checks across connectivity, config, migrations, agents, providers, and memory — with pass/fail status, detailed error context, and JSON export for CI pipelines.

npx astra doctorGET /diagnostics

Get Started

Get started in minutes

Install with one command and you're running — on your computer or a server.

Quick Start →

The agent runtimebuilt for engineers.

From message to action in three steps

Connect a channel

Agents pull context from memory

Act, report, and learn

Everything you need to ship agents to production

5-Tier Memory

Hierarchical Swarms

10 Inference Providers

Workspace Files

Self-Healing

Deep Research

Dream Mode

Cost Dashboard

Diagnostics

Heartbeat Daemon

Hot Reload

Persona Evolution

Secrets Management

7-Layer Cache Stack

Auth Hardening

MCP Server Support

Kubernetes Ready

Developer CLI

Complete visibility into every agent, token, and dollar

Cost Dashboard

Agent Health Scorecard

Trace Viewer

Team Health

Leaderboard

Cache Stats

Workspace Usage

Diagnostics

Get started in minutes

The agent runtime
built for engineers.