Memory

Tiered Search

Before every inference call, Open Astra retrieves relevant memory by searching across all tiers simultaneously and fusing the results using Reciprocal Rank Fusion (RRF). This gives you the precision of keyword search, the recall of semantic search, and the relational power of graph traversal — all in a single query.

Search sources

The search runs against three independent sources in parallel:

SourceMethodCovers
TypesenseHybrid BM25 + vector (multi-search)Session messages, daily notes, knowledge base, workspace memory
pgvectorCosine similarity (HNSW)Session messages (m=8/ef=32), graph entities (m=24/ef=128), RAG chunks
Graph traversalMulti-hop edge traversalEntity relationships (Tier 4)

Reciprocal Rank Fusion

RRF combines ranked lists from multiple search sources into a single ranked list without needing to know the absolute scores from each source. The formula for each document's RRF score is:

text
RRF(d) = Σ 1 / (k + rank(d, source))
where k = 60 (constant to prevent top-rank domination)

A document that appears in the top-10 of all three sources will consistently outrank a document that appears in only one source, even if it ranks lower in individual sources. This makes RRF robust to differences in scoring scale between Typesense and pgvector.

Typesense runs a BM25 keyword search and a vector search simultaneously, then combines them with a weighted average. The default vector weight is 0.7 (strongly semantic) with 0.3 for keyword match. This can be tuned per collection if needed.

Typesense indexes are kept in sync with PostgreSQL by the post-turn save routine — every new memory write is indexed in Typesense within 50ms.

The pgvector search uses IVFFlat or HNSW indexes depending on collection size and query pattern. Session messages use HNSW with m=8, ef=32 (lower quality, higher throughput — appropriate for short-term memory). Graph entities use HNSW with m=24, ef=128 (higher quality — appropriate for long-term semantic retrieval).

All vectors use the same embedding model to ensure cosine similarity comparisons are meaningful.

Result formatting

After RRF fusion, the top-K results are formatted into a structured block that is injected into the context assembler as the memory layer. Each result includes its tier, type, content excerpt, confidence, and a timestamp:

text
[Memory: decision | 2026-01-15]
We chose pgvector over Pinecone because it eliminates a separate service dependency...
confidence: 0.95

[Memory: note | 2026-02-10]
Alex prefers concise TypeScript examples over verbose prose explanations.
confidence: 0.87

Tuning search

yaml
settings:
  memory:
    searchTopK: 10            # Number of results to retrieve per source
    contextBudgetTokens: 8192 # Max tokens for memory context in assembler
    rrf:
      k: 60                   # RRF constant
      vectorWeight: 0.7       # Weight for vector vs. keyword in Typesense
    minConfidence: 0.5        # Exclude results below this confidence