5-Tier Memory System

Open Astra's memory system organizes everything an agent knows into five tiers of increasing abstraction and persistence. Tiers 1–2 capture what happened in conversations. Tiers 3–5 capture what the agent has learned. Before every inference call, all five tiers are searched and the results are fused — so agents get richer context the longer they run.

The five tiers

Tier	Name	Scope	Lifetime	Backend
1	Session messages	Session	Until compaction	pgvector HNSW (m=8, ef=32)
2	Daily notes	User + workspace	Permanent	Typesense hybrid + pgvector
3	User profile	User	Permanent, incremental updates	PostgreSQL JSON document
4	Knowledge graph	Workspace	Permanent with temporal decay	pgvector HNSW (m=24, ef=128)
5	Procedural memory	Workspace	Permanent, reinforced by use	PostgreSQL + prefix index

Write path — after every turn

Memory is written automatically in the post-turn save phase. You do not need to call any API to populate memory — it happens as a side effect of normal agent conversations.

text

# Every agent turn — post-turn save phase
1. Auto-extract daily notes   (categorized: decision, outcome, strategy, note, interaction)
2. Update user profile        (incremental merge into the profile JSON document)
3. Upsert graph entities      (extract entities + typed edges, increment confidence scores)
4. Store session message      (embed → pgvector HNSW Tier 1 insert)
5. Emit agent.metrics event   (cost, latency, tool calls)
6. Fire outbound webhooks     (if configured)

Read path — before every inference call

Memory retrieval runs inside searchAllTiers(), called by assembleContext() before every inference call. All five tiers are queried in parallel, results are fused with RRF, and the top results are injected into the system prompt.

text

# Before every inference call — context assembly phase
1. Query message              used as the search vector for all tiers
2. Tier 1 — session messages  pgvector cosine similarity (HNSW m=8, ef=32)
3. Tier 2 — daily notes       Typesense BM25 + vector hybrid search
4. Tier 3 — user profile      always injected in full (not searched)
5. Tier 4 — knowledge graph   pgvector cosine similarity (HNSW m=24, ef=128) + graph traversal
6. Tier 5 — procedural        prefix + keyword + semantic similarity
7. RRF fusion                 Reciprocal Rank Fusion across all tier results
8. Apply profile caps         maxContextChunks, minRelevanceScore (if memory profile assigned)
9. Inject into context        formatted blocks inserted before the conversation history

Tier detail

Tier 1 — Session messages

Raw conversation turns — every message sent to and from an agent. Embedded with text-embedding-3-small (or Gemini if configured) and stored in pgvector with HNSW m=8. Subject to compaction: when context fills, older messages are summarised and the originals are replaced. See Compaction Forecast.

Tier 2 — Daily notes

Structured observations extracted from conversations during the post-turn save. Each note has a category: decision, outcome, strategy, note, or interaction. Notes are searched via Typesense hybrid (BM25 + vector) and also summarised periodically. See Summarization.

Tier 3 — User profile

A single JSON document per user that accumulates structured knowledge: name, timezone, preferences, domain-specific context. Always injected in full — not searched. Updated incrementally as new facts are extracted.

json

{
  "name": "Alex",
  "timezone": "America/New_York",
  "preferredLanguage": "TypeScript",
  "domains": {
    "engineering": {
      "stack": ["Node", "PostgreSQL", "React"],
      "style": "prefers concise explanations with code examples"
    }
  }
}

Tier 4 — Knowledge graph

Entities and typed edges, embedded at m=24, ef=128 (higher quality than session messages because this data is queried more broadly across the workspace). Supports multi-hop traversal and graph hints injection. Edges decay in confidence over time. See Graph Memory.

Tier 5 — Procedural memory

Learned workflows stored as trigger-action pairs. Matched by prefix → keyword → semantic similarity for fast retrieval of common patterns. Reinforced each time they are successfully applied. See Knowledge Base.

Automated maintenance

Eleven cron jobs run on schedule to keep memory accurate and lean. Key memory maintenance jobs:

Job	Schedule	What it does
Entry weight decay	Daily	Reduces relevance scores of stale, unused entries over time
Jaccard dedup	Daily	Merges near-duplicate memory entries (Jaccard ≥ 0.85 threshold)
Entity confidence	Daily	Increments graph entity confidence on repeated extraction, decrements on absence
Cold store archival	Weekly	Moves entries below the access threshold to cold storage
Memory summarization	Daily	Condenses older daily notes into higher-level summaries

Configuring memory retrieval

By default, all five tiers are enabled for all agents with global workspace settings. You can override this per agent using Memory Profiles — controlling which tiers are active, how many results are injected, and the minimum relevance score threshold.

Explore memory in depth

Topic	What it covers
Workspace Memory	File-based context injected from `./workspace/`
Knowledge Base	Document ingestion, chunking, and retrieval
RAG Pipeline	How documents flow from upload to context injection
Graph Memory	Entities, typed edges, traversal, and confidence
Tiered Search	RRF fusion, Typesense BM25+vector, pgvector cosine
Summarization	Auto-summarization of daily notes and session history
Memory Profiles	Per-agent tier config, chunk limits, relevance thresholds
Entry Weight Decay	Stale entry scoring and the decay cron job
Jaccard Dedup	Near-duplicate detection and merging
Semantic Cache	Cost reduction via near-duplicate query caching
Cold Store	Archival of low-access entries to save retrieval cost
Cross-Workspace	Sharing memory across workspaces