Context Assembly

Before every inference call, Open Astra assembles the full context that gets sent to the model. This is done by context/assembler.ts, which builds a ChatMessage[] array in a fixed, deterministic order. Understanding this order helps you write effective system prompts and manage token budgets.

Assembly order

Context is assembled in the following order, from first to last in the message array:

Layer	Source	Purpose	Token budget
1. SOUL.md	`SOUL.md` file in project root	Ethics and behavioral constraints for all agents	~2K tokens (stable)
2. Workspace files	`./workspace/*.md`	Team context, standards, identity docs	Up to 150K total (20K per file)
3. System prompt	Agent's `systemPromptTemplate`	Agent-specific instructions and persona	~4K tokens typical
4. Memory context	5-tier memory search (RRF fusion)	Relevant facts, history, and learned workflows	~8K tokens typical
5. Conversation history	PostgreSQL session messages	Recent turns in this session	Remainder of context window

SOUL.md — stable prefix

SOUL.md defines the ethical constraints and behavioral guardrails that apply to all agents. It is loaded once at startup and cached in process memory.

Being the first content in every prompt makes it a stable prefix — all providers with prompt caching (Claude at 90%, OpenAI at 50–90%, Gemini at 90%, Grok at 75%) will cache this content and not re-process it on subsequent calls, saving significant cost.

💡Keep SOUL.md short and stable. Any change to it invalidates the provider-side cache for all agents. See Prompt Caching for details.

Workspace files

Any .md file placed in the ./workspace/ directory is automatically injected into context after SOUL.md. This is the primary mechanism for giving all agents shared context about your team, project, or domain.

Priority files are injected first and always included:

workspace/AGENTS.md — agent behavior guidelines and capabilities
workspace/USER.md — user preferences and context
workspace/IDENTITY.md — agent identity and persona overrides

All other .md files are included after the priority files, sorted alphabetically. Each file has a 20K token budget; the total workspace budget is 150K tokens. Files that exceed the limit are truncated with a warning appended.

System prompt template

The agent's systemPromptTemplate is rendered with Handlebars-style variable interpolation and inserted after the workspace files. Available variables:

yaml

systemPromptTemplate: |
  You are {{agent.displayName}}, a specialist in {{domain}}.
  Today is {{date}}.
  Workspace: {{workspace.name}}
  User: {{user.name}} ({{user.id}})
  {{#if memory}}
  Context from memory:
  {{memory}}
  {{/if}}

Memory context

Before inference, the assembler runs a search across all 5 memory tiers using the current user's message as the query. Results are fused using Reciprocal Rank Fusion (RRF) across Typesense BM25+vector and pgvector cosine similarity. The top-K results are formatted and inserted into context before the conversation history.

See Memory Search for details on the RRF algorithm and ranking.

Token budgets and compaction

When the total assembled context would exceed maxContextTokens for the configured model, the assembler applies these strategies in order:

Trim memory context to the top results that fit
Truncate workspace files that exceed their per-file budget
Compact old conversation history via a summarization call (triggered at 85% threshold)
Drop the oldest raw messages until the total fits

⚠SOUL.md and the system prompt are never truncated. They are always included in full. Plan your token budget accordingly.