Memory overview

5-Tier Memory System

Open Astra's memory system organizes everything an agent knows into five tiers of increasing abstraction and persistence. Tiers 1–2 capture what happened in conversations. Tiers 3–5 capture what the agent has learned. Before every inference call, all five tiers are searched and the results are fused — so agents get richer context the longer they run.

The five tiers

TierNameScopeLifetimeBackend
1Session messagesSessionUntil compactionpgvector HNSW (m=8, ef=32)
2Daily notesUser + workspacePermanentTypesense hybrid + pgvector
3User profileUserPermanent, incremental updatesPostgreSQL JSON document
4Knowledge graphWorkspacePermanent with temporal decaypgvector HNSW (m=24, ef=128)
5Procedural memoryWorkspacePermanent, reinforced by usePostgreSQL + prefix index

Write path — after every turn

Memory is written automatically in the post-turn save phase. You do not need to call any API to populate memory — it happens as a side effect of normal agent conversations.

text
# Every agent turn — post-turn save phase
1. Auto-extract daily notes   (categorized: decision, outcome, strategy, note, interaction)
2. Update user profile        (incremental merge into the profile JSON document)
3. Upsert graph entities      (extract entities + typed edges, increment confidence scores)
4. Store session message      (embed → pgvector HNSW Tier 1 insert)
5. Emit agent.metrics event   (cost, latency, tool calls)
6. Fire outbound webhooks     (if configured)

Read path — before every inference call

Memory retrieval runs inside searchAllTiers(), called by assembleContext() before every inference call. All five tiers are queried in parallel, results are fused with RRF, and the top results are injected into the system prompt.

text
# Before every inference call — context assembly phase
1. Query message              used as the search vector for all tiers
2. Tier 1 — session messages  pgvector cosine similarity (HNSW m=8, ef=32)
3. Tier 2 — daily notes       Typesense BM25 + vector hybrid search
4. Tier 3 — user profile      always injected in full (not searched)
5. Tier 4 — knowledge graph   pgvector cosine similarity (HNSW m=24, ef=128) + graph traversal
6. Tier 5 — procedural        prefix + keyword + semantic similarity
7. RRF fusion                 Reciprocal Rank Fusion across all tier results
8. Apply profile caps         maxContextChunks, minRelevanceScore (if memory profile assigned)
9. Inject into context        formatted blocks inserted before the conversation history

Tier detail

Tier 1 — Session messages

Raw conversation turns — every message sent to and from an agent. Embedded with text-embedding-3-small (or Gemini if configured) and stored in pgvector with HNSW m=8. Subject to compaction: when context fills, older messages are summarised and the originals are replaced. See Compaction Forecast.

Tier 2 — Daily notes

Structured observations extracted from conversations during the post-turn save. Each note has a category: decision, outcome, strategy, note, or interaction. Notes are searched via Typesense hybrid (BM25 + vector) and also summarised periodically. See Summarization.

Tier 3 — User profile

A single JSON document per user that accumulates structured knowledge: name, timezone, preferences, domain-specific context. Always injected in full — not searched. Updated incrementally as new facts are extracted.

json
{
  "name": "Alex",
  "timezone": "America/New_York",
  "preferredLanguage": "TypeScript",
  "domains": {
    "engineering": {
      "stack": ["Node", "PostgreSQL", "React"],
      "style": "prefers concise explanations with code examples"
    }
  }
}

Tier 4 — Knowledge graph

Entities and typed edges, embedded at m=24, ef=128 (higher quality than session messages because this data is queried more broadly across the workspace). Supports multi-hop traversal and graph hints injection. Edges decay in confidence over time. See Graph Memory.

Tier 5 — Procedural memory

Learned workflows stored as trigger-action pairs. Matched by prefix → keyword → semantic similarity for fast retrieval of common patterns. Reinforced each time they are successfully applied. See Knowledge Base.

Automated maintenance

Eleven cron jobs run on schedule to keep memory accurate and lean. Key memory maintenance jobs:

JobScheduleWhat it does
Entry weight decayDailyReduces relevance scores of stale, unused entries over time
Jaccard dedupDailyMerges near-duplicate memory entries (Jaccard ≥ 0.85 threshold)
Entity confidenceDailyIncrements graph entity confidence on repeated extraction, decrements on absence
Cold store archivalWeeklyMoves entries below the access threshold to cold storage
Memory summarizationDailyCondenses older daily notes into higher-level summaries

Configuring memory retrieval

By default, all five tiers are enabled for all agents with global workspace settings. You can override this per agent using Memory Profiles — controlling which tiers are active, how many results are injected, and the minimum relevance score threshold.

Explore memory in depth

TopicWhat it covers
Workspace MemoryFile-based context injected from ./workspace/
Knowledge BaseDocument ingestion, chunking, and retrieval
RAG PipelineHow documents flow from upload to context injection
Graph MemoryEntities, typed edges, traversal, and confidence
Tiered SearchRRF fusion, Typesense BM25+vector, pgvector cosine
SummarizationAuto-summarization of daily notes and session history
Memory ProfilesPer-agent tier config, chunk limits, relevance thresholds
Entry Weight DecayStale entry scoring and the decay cron job
Jaccard DedupNear-duplicate detection and merging
Semantic CacheCost reduction via near-duplicate query caching
Cold StoreArchival of low-access entries to save retrieval cost
Cross-WorkspaceSharing memory across workspaces