RAG Pipeline
The Retrieval-Augmented Generation (RAG) pipeline lets you upload documents and make them searchable by agents. Documents are chunked, embedded, and stored in pgvector, then retrieved using cosine similarity search at inference time.
Document upload
Documents can be uploaded via the REST API or placed in the ./workspace/ directory for automatic ingestion. Supported formats: .txt, .md, .pdf, .docx, .html.
# Upload a document via REST API
curl -X POST http://localhost:3000/workspaces/:id/documents \\
-H "Authorization: Bearer ${JWT_TOKEN}" \\
-F "file=@./docs/architecture.pdf" \\
-F "title=System Architecture" \\
-F "tags=architecture,design"
# Response:
# { "documentId": "doc_abc123", "chunks": 47, "status": "processing" }Chunking strategy
Documents are split into chunks using a recursive character splitter with the following defaults:
| Parameter | Default | Description |
|---|---|---|
| Chunk size | 1000 tokens | Target size for each chunk |
| Chunk overlap | 200 tokens | Overlap between adjacent chunks to preserve context |
| Separators | \n\n, \n, . , | Tried in order — split at paragraph breaks before sentence breaks |
Each chunk is embedded using the configured embedding model and stored in a pgvector column. Chunks also store their position in the original document for citation purposes.
Retrieval
At inference time, the query is embedded and a cosine similarity search is performed against all chunks in the workspace. The top-K most similar chunks are included in context. Retrieval is integrated into the tiered memory search so it runs alongside Typesense BM25+vector and is fused using RRF.
{
"tool": "document_search",
"params": {
"query": "HNSW indexing performance benchmarks",
"limit": 5,
"minScore": 0.7
}
}Embedding providers
Embeddings are generated automatically. The provider is selected based on available API keys:
- If
GEMINI_API_KEYis set: uses Gemini text-embedding-004 (768 dimensions) - Otherwise: uses OpenAI text-embedding-3-small if
OPENAI_API_KEYis set (1536 dimensions) - Fallback: local ONNX model (384 dimensions, CPU inference)
POST /workspaces/:id/documents/reindex to trigger this.Document management
# List all documents
GET /workspaces/:id/documents
# Get document status
GET /workspaces/:id/documents/:docId
# Delete a document and all its chunks
DELETE /workspaces/:id/documents/:docId
# Re-index all documents (e.g. after changing embedding model)
POST /workspaces/:id/documents/reindex