Tiered Search
Before every inference call, Open Astra retrieves relevant memory by searching across all tiers simultaneously and fusing the results using Reciprocal Rank Fusion (RRF). This gives you the precision of keyword search, the recall of semantic search, and the relational power of graph traversal — all in a single query.
Search sources
The search runs against three independent sources in parallel:
| Source | Method | Covers |
|---|---|---|
| Typesense | Hybrid BM25 + vector (multi-search) | Session messages, daily notes, knowledge base, workspace memory |
| pgvector | Cosine similarity (HNSW) | Session messages (m=8/ef=32), graph entities (m=24/ef=128), RAG chunks |
| Graph traversal | Multi-hop edge traversal | Entity relationships (Tier 4) |
Reciprocal Rank Fusion
RRF combines ranked lists from multiple search sources into a single ranked list without needing to know the absolute scores from each source. The formula for each document's RRF score is:
RRF(d) = Σ 1 / (k + rank(d, source))
where k = 60 (constant to prevent top-rank domination)A document that appears in the top-10 of all three sources will consistently outrank a document that appears in only one source, even if it ranks lower in individual sources. This makes RRF robust to differences in scoring scale between Typesense and pgvector.
Typesense hybrid search
Typesense runs a BM25 keyword search and a vector search simultaneously, then combines them with a weighted average. The default vector weight is 0.7 (strongly semantic) with 0.3 for keyword match. This can be tuned per collection if needed.
Typesense indexes are kept in sync with PostgreSQL by the post-turn save routine — every new memory write is indexed in Typesense within 50ms.
pgvector cosine similarity
The pgvector search uses IVFFlat or HNSW indexes depending on collection size and query pattern. Session messages use HNSW with m=8, ef=32 (lower quality, higher throughput — appropriate for short-term memory). Graph entities use HNSW with m=24, ef=128 (higher quality — appropriate for long-term semantic retrieval).
All vectors use the same embedding model to ensure cosine similarity comparisons are meaningful.
Result formatting
After RRF fusion, the top-K results are formatted into a structured block that is injected into the context assembler as the memory layer. Each result includes its tier, type, content excerpt, confidence, and a timestamp:
[Memory: decision | 2026-01-15]
We chose pgvector over Pinecone because it eliminates a separate service dependency...
confidence: 0.95
[Memory: note | 2026-02-10]
Alex prefers concise TypeScript examples over verbose prose explanations.
confidence: 0.87Tuning search
settings:
memory:
searchTopK: 10 # Number of results to retrieve per source
contextBudgetTokens: 8192 # Max tokens for memory context in assembler
rrf:
k: 60 # RRF constant
vectorWeight: 0.7 # Weight for vector vs. keyword in Typesense
minConfidence: 0.5 # Exclude results below this confidence