Embedding Cache
The embedding cache is a two-layer cache that eliminates redundant embedding generation calls. The same text always produces the same embedding, so caching by content hash avoids re-calling the embedding provider (Gemini API or local n-gram fallback) for previously seen inputs.
Two-layer architecture
┌─────────────────┐ miss ┌──────────────────┐ miss
│ L1: In-Process │ ──────────→ │ L2: PostgreSQL │ ──────────→ Generate
│ LRU Map (2048) │ │ embedding_cache │ Embedding
└────────┬────────┘ └────────┬─────────┘
│ hit │ hit
↓ ↓
Return cached Return cached
(sub-microsecond) (single query)| Layer | Storage | Capacity | Latency |
|---|---|---|---|
| L1 | In-process Map (LRU) | 2,048 entries | Sub-microsecond |
| L2 | PostgreSQL embedding_cache | Unlimited | Single-query (~1–5ms) |
Cache key
The cache key is the SHA-256 hex digest of the raw input text. This ensures identical text always maps to the same key regardless of whitespace normalization or encoding differences in the caller.
import { createHash } from 'crypto'
function hashText(text: string): string {
return createHash('sha256').update(text).digest('hex')
}L1: In-process LRU
The L1 cache uses JavaScript's Map insertion-order guarantee to implement LRU eviction. On access, entries are deleted and re-inserted to move them to the "most recently used" position. When the map exceeds 2,048 entries, the oldest (first-inserted) entry is evicted.
// In-process LRU using Map (insertion-order iteration)
const LRU_SIZE = 2048 // max entries (not bytes)
function set(key, value) {
if (cache.has(key)) cache.delete(key) // move to end
cache.set(key, value)
if (cache.size > LRU_SIZE) {
const oldest = cache.keys().next().value
cache.delete(oldest) // evict oldest
}
}
function get(key) {
const value = cache.get(key)
if (value) {
cache.delete(key) // move to end (most recently used)
cache.set(key, value)
}
return value
}L2: PostgreSQL
The L2 cache stores embeddings as pgvector vector columns keyed by SHA-256 hash. Writes use ON CONFLICT DO NOTHING for idempotency. A text_preview column stores the first 200 characters for human inspection.
-- Table: embedding_cache
CREATE TABLE embedding_cache (
text_hash TEXT PRIMARY KEY, -- SHA-256 hex
text_preview TEXT, -- first 200 chars (for inspection)
embedding vector NOT NULL -- pgvector type
);Error handling
The cache is designed to never block or break the embedding pipeline. Failures are silently absorbed.
// getCachedEmbedding: silently returns null on any error
// → Never throws, never blocks the caller
// setCachedEmbedding: logs warning on Postgres failure
// → L1 write is synchronous and always succeeds
// → L2 write is fire-and-forget (ON CONFLICT DO NOTHING)Related
- Tiered Search — uses embeddings for vector similarity across Typesense and pgvector
- Semantic Cache — caches full responses (not embeddings) by query similarity
- Prompt Caching — provider-side prompt prefix caching (different layer)