RAG Pipeline

The Retrieval-Augmented Generation (RAG) pipeline lets you upload documents and make them searchable by agents. Documents are chunked, embedded, and stored in pgvector, then retrieved using cosine similarity search at inference time.

Document upload

Documents can be uploaded via the REST API or placed in the ./workspace/ directory for automatic ingestion. Supported formats: .txt, .md, .pdf, .docx, .html.

bash

# Upload a document via REST API
curl -X POST http://localhost:3000/workspaces/:id/documents \\
  -H "Authorization: Bearer ${JWT_TOKEN}" \\
  -F "file=@./docs/architecture.pdf" \\
  -F "title=System Architecture" \\
  -F "tags=architecture,design"

# Response:
# { "documentId": "doc_abc123", "chunks": 47, "status": "processing" }

Chunking strategy

Documents are split into chunks using a recursive character splitter with the following defaults:

Parameter	Default	Description
Chunk size	1000 tokens	Target size for each chunk
Chunk overlap	200 tokens	Overlap between adjacent chunks to preserve context
Separators	`\n\n`, `\n`, `.` ,	Tried in order — split at paragraph breaks before sentence breaks

Each chunk is embedded using the configured embedding model and stored in a pgvector column. Chunks also store their position in the original document for citation purposes.

Retrieval

At inference time, the query is embedded and a cosine similarity search is performed against all chunks in the workspace. The top-K most similar chunks are included in context. Retrieval is integrated into the tiered memory search so it runs alongside Typesense BM25+vector and is fused using RRF.

json

{
  "tool": "document_search",
  "params": {
    "query": "HNSW indexing performance benchmarks",
    "limit": 5,
    "minScore": 0.7
  }
}

Embedding providers

Embeddings are generated automatically. The provider is selected based on available API keys:

If GEMINI_API_KEY is set: uses Gemini text-embedding-004 (768 dimensions)
Otherwise: uses OpenAI text-embedding-3-small if OPENAI_API_KEY is set (1536 dimensions)
Fallback: local ONNX model (384 dimensions, CPU inference)

⚠Changing the embedding model after documents have been uploaded requires re-embedding all existing chunks. Use POST /workspaces/:id/documents/reindex to trigger this.

Document management

bash

# List all documents
GET /workspaces/:id/documents

# Get document status
GET /workspaces/:id/documents/:docId

# Delete a document and all its chunks
DELETE /workspaces/:id/documents/:docId

# Re-index all documents (e.g. after changing embedding model)
POST /workspaces/:id/documents/reindex