Tiered Search

Before every inference call, Open Astra retrieves relevant memory by searching across all tiers simultaneously and fusing the results using Reciprocal Rank Fusion (RRF). This gives you the precision of keyword search, the recall of semantic search, and the relational power of graph traversal — all in a single query.

Search sources

The search runs against three independent sources in parallel:

Source	Method	Covers
Typesense	Hybrid BM25 + vector (multi-search)	Session messages, daily notes, knowledge base, workspace memory
pgvector	Cosine similarity (HNSW)	Session messages (m=8/ef=32), graph entities (m=24/ef=128), RAG chunks
Graph traversal	Multi-hop edge traversal	Entity relationships (Tier 4)

Reciprocal Rank Fusion

RRF combines ranked lists from multiple search sources into a single ranked list without needing to know the absolute scores from each source. The formula for each document's RRF score is:

text

RRF(d) = Σ 1 / (k + rank(d, source))
where k = 60 (constant to prevent top-rank domination)

A document that appears in the top-10 of all three sources will consistently outrank a document that appears in only one source, even if it ranks lower in individual sources. This makes RRF robust to differences in scoring scale between Typesense and pgvector.

Typesense hybrid search

Typesense runs a BM25 keyword search and a vector search simultaneously, then combines them with a weighted average. The default vector weight is 0.7 (strongly semantic) with 0.3 for keyword match. This can be tuned per collection if needed.

Typesense indexes are kept in sync with PostgreSQL by the post-turn save routine — every new memory write is indexed in Typesense within 50ms.

pgvector cosine similarity

The pgvector search uses IVFFlat or HNSW indexes depending on collection size and query pattern. Session messages use HNSW with m=8, ef=32 (lower quality, higher throughput — appropriate for short-term memory). Graph entities use HNSW with m=24, ef=128 (higher quality — appropriate for long-term semantic retrieval).

All vectors use the same embedding model to ensure cosine similarity comparisons are meaningful.

Result formatting

After RRF fusion, the top-K results are formatted into a structured block that is injected into the context assembler as the memory layer. Each result includes its tier, type, content excerpt, confidence, and a timestamp:

text

[Memory: decision | 2026-01-15]
We chose pgvector over Pinecone because it eliminates a separate service dependency...
confidence: 0.95

[Memory: note | 2026-02-10]
Alex prefers concise TypeScript examples over verbose prose explanations.
confidence: 0.87

Tuning search

yaml

settings:
  memory:
    searchTopK: 10            # Number of results to retrieve per source
    contextBudgetTokens: 8192 # Max tokens for memory context in assembler
    rrf:
      k: 60                   # RRF constant
      vectorWeight: 0.7       # Weight for vector vs. keyword in Typesense
    minConfidence: 0.5        # Exclude results below this confidence