5-Tier Memory System
Open Astra's memory system organizes everything an agent knows into five tiers of increasing abstraction and persistence. Tiers 1–2 capture what happened in conversations. Tiers 3–5 capture what the agent has learned. Before every inference call, all five tiers are searched and the results are fused — so agents get richer context the longer they run.
The five tiers
| Tier | Name | Scope | Lifetime | Backend |
|---|---|---|---|---|
| 1 | Session messages | Session | Until compaction | pgvector HNSW (m=8, ef=32) |
| 2 | Daily notes | User + workspace | Permanent | Typesense hybrid + pgvector |
| 3 | User profile | User | Permanent, incremental updates | PostgreSQL JSON document |
| 4 | Knowledge graph | Workspace | Permanent with temporal decay | pgvector HNSW (m=24, ef=128) |
| 5 | Procedural memory | Workspace | Permanent, reinforced by use | PostgreSQL + prefix index |
Write path — after every turn
Memory is written automatically in the post-turn save phase. You do not need to call any API to populate memory — it happens as a side effect of normal agent conversations.
# Every agent turn — post-turn save phase
1. Auto-extract daily notes (categorized: decision, outcome, strategy, note, interaction)
2. Update user profile (incremental merge into the profile JSON document)
3. Upsert graph entities (extract entities + typed edges, increment confidence scores)
4. Store session message (embed → pgvector HNSW Tier 1 insert)
5. Emit agent.metrics event (cost, latency, tool calls)
6. Fire outbound webhooks (if configured)Read path — before every inference call
Memory retrieval runs inside searchAllTiers(), called by assembleContext() before every inference call. All five tiers are queried in parallel, results are fused with RRF, and the top results are injected into the system prompt.
# Before every inference call — context assembly phase
1. Query message used as the search vector for all tiers
2. Tier 1 — session messages pgvector cosine similarity (HNSW m=8, ef=32)
3. Tier 2 — daily notes Typesense BM25 + vector hybrid search
4. Tier 3 — user profile always injected in full (not searched)
5. Tier 4 — knowledge graph pgvector cosine similarity (HNSW m=24, ef=128) + graph traversal
6. Tier 5 — procedural prefix + keyword + semantic similarity
7. RRF fusion Reciprocal Rank Fusion across all tier results
8. Apply profile caps maxContextChunks, minRelevanceScore (if memory profile assigned)
9. Inject into context formatted blocks inserted before the conversation historyTier detail
Tier 1 — Session messages
Raw conversation turns — every message sent to and from an agent. Embedded with text-embedding-3-small (or Gemini if configured) and stored in pgvector with HNSW m=8. Subject to compaction: when context fills, older messages are summarised and the originals are replaced. See Compaction Forecast.
Tier 2 — Daily notes
Structured observations extracted from conversations during the post-turn save. Each note has a category: decision, outcome, strategy, note, or interaction. Notes are searched via Typesense hybrid (BM25 + vector) and also summarised periodically. See Summarization.
Tier 3 — User profile
A single JSON document per user that accumulates structured knowledge: name, timezone, preferences, domain-specific context. Always injected in full — not searched. Updated incrementally as new facts are extracted.
{
"name": "Alex",
"timezone": "America/New_York",
"preferredLanguage": "TypeScript",
"domains": {
"engineering": {
"stack": ["Node", "PostgreSQL", "React"],
"style": "prefers concise explanations with code examples"
}
}
}Tier 4 — Knowledge graph
Entities and typed edges, embedded at m=24, ef=128 (higher quality than session messages because this data is queried more broadly across the workspace). Supports multi-hop traversal and graph hints injection. Edges decay in confidence over time. See Graph Memory.
Tier 5 — Procedural memory
Learned workflows stored as trigger-action pairs. Matched by prefix → keyword → semantic similarity for fast retrieval of common patterns. Reinforced each time they are successfully applied. See Knowledge Base.
Automated maintenance
Eleven cron jobs run on schedule to keep memory accurate and lean. Key memory maintenance jobs:
| Job | Schedule | What it does |
|---|---|---|
| Entry weight decay | Daily | Reduces relevance scores of stale, unused entries over time |
| Jaccard dedup | Daily | Merges near-duplicate memory entries (Jaccard ≥ 0.85 threshold) |
| Entity confidence | Daily | Increments graph entity confidence on repeated extraction, decrements on absence |
| Cold store archival | Weekly | Moves entries below the access threshold to cold storage |
| Memory summarization | Daily | Condenses older daily notes into higher-level summaries |
Configuring memory retrieval
By default, all five tiers are enabled for all agents with global workspace settings. You can override this per agent using Memory Profiles — controlling which tiers are active, how many results are injected, and the minimum relevance score threshold.
Explore memory in depth
| Topic | What it covers |
|---|---|
| Workspace Memory | File-based context injected from ./workspace/ |
| Knowledge Base | Document ingestion, chunking, and retrieval |
| RAG Pipeline | How documents flow from upload to context injection |
| Graph Memory | Entities, typed edges, traversal, and confidence |
| Tiered Search | RRF fusion, Typesense BM25+vector, pgvector cosine |
| Summarization | Auto-summarization of daily notes and session history |
| Memory Profiles | Per-agent tier config, chunk limits, relevance thresholds |
| Entry Weight Decay | Stale entry scoring and the decay cron job |
| Jaccard Dedup | Near-duplicate detection and merging |
| Semantic Cache | Cost reduction via near-duplicate query caching |
| Cold Store | Archival of low-access entries to save retrieval cost |
| Cross-Workspace | Sharing memory across workspaces |