Context Assembly
Before every inference call, Open Astra assembles the full context that gets sent to the model. This is done by context/assembler.ts, which builds a ChatMessage[] array in a fixed, deterministic order. Understanding this order helps you write effective system prompts and manage token budgets.
Assembly order
Context is assembled in the following order, from first to last in the message array:
| Layer | Source | Purpose | Token budget |
|---|---|---|---|
| 1. SOUL.md | SOUL.md file in project root | Ethics and behavioral constraints for all agents | ~2K tokens (stable) |
| 2. Workspace files | ./workspace/*.md | Team context, standards, identity docs | Up to 150K total (20K per file) |
| 3. System prompt | Agent's systemPromptTemplate | Agent-specific instructions and persona | ~4K tokens typical |
| 4. Memory context | 5-tier memory search (RRF fusion) | Relevant facts, history, and learned workflows | ~8K tokens typical |
| 5. Conversation history | PostgreSQL session messages | Recent turns in this session | Remainder of context window |
SOUL.md — stable prefix
SOUL.md defines the ethical constraints and behavioral guardrails that apply to all agents. It is loaded once at startup and cached in process memory.
Being the first content in every prompt makes it a stable prefix — all providers with prompt caching (Claude at 90%, OpenAI at 50–90%, Gemini at 90%, Grok at 75%) will cache this content and not re-process it on subsequent calls, saving significant cost.
Workspace files
Any .md file placed in the ./workspace/ directory is automatically injected into context after SOUL.md. This is the primary mechanism for giving all agents shared context about your team, project, or domain.
Priority files are injected first and always included:
workspace/AGENTS.md— agent behavior guidelines and capabilitiesworkspace/USER.md— user preferences and contextworkspace/IDENTITY.md— agent identity and persona overrides
All other .md files are included after the priority files, sorted alphabetically. Each file has a 20K token budget; the total workspace budget is 150K tokens. Files that exceed the limit are truncated with a warning appended.
System prompt template
The agent's systemPromptTemplate is rendered with Handlebars-style variable interpolation and inserted after the workspace files. Available variables:
systemPromptTemplate: |
You are {{agent.displayName}}, a specialist in {{domain}}.
Today is {{date}}.
Workspace: {{workspace.name}}
User: {{user.name}} ({{user.id}})
{{#if memory}}
Context from memory:
{{memory}}
{{/if}}Memory context
Before inference, the assembler runs a search across all 5 memory tiers using the current user's message as the query. Results are fused using Reciprocal Rank Fusion (RRF) across Typesense BM25+vector and pgvector cosine similarity. The top-K results are formatted and inserted into context before the conversation history.
See Memory Search for details on the RRF algorithm and ranking.
Token budgets and compaction
When the total assembled context would exceed maxContextTokens for the configured model, the assembler applies these strategies in order:
- Trim memory context to the top results that fit
- Truncate workspace files that exceed their per-file budget
- Compact old conversation history via a summarization call (triggered at 85% threshold)
- Drop the oldest raw messages until the total fits