Multi-Tenant Implementation Guide
This guide walks through building a SaaS product on top of Open Astra where each customer gets their own isolated agent team. It covers workspace provisioning, data isolation, cost attribution, and permission boundaries.
Architecture overview
The recommended multi-tenant pattern uses one Open Astra workspace per customer. Workspaces are the natural isolation boundary: each has its own memory, agent configs, secrets, and quotas.
| Boundary | Enforced by | Configurable? |
|---|---|---|
| Memory isolation | Workspace-scoped pgvector and Typesense collections | No — always isolated |
| Agent isolation | Workspace ID on every turn record | No — always isolated |
| Cost attribution | workspace_id on all token usage records | Built-in |
| Secret isolation | Per-workspace secrets store | Yes |
| Cross-workspace reads | Agent Grants — denied by default | Yes — opt-in only |
Provisioning a tenant workspace
When a new customer signs up, create their workspace via the REST API and set spending limits for the plan tier:
# POST /workspaces
{
"id": "customer-acme",
"displayName": "ACME Corp",
"plan": "pro",
"ownerId": "user_acme_admin",
"limits": {
"maxAgents": 10,
"maxTokensPerMonth": 5000000,
"maxCostUsdPerMonth": 50.00
}
}Workspace creation automatically provisions:
- An isolated memory namespace (pgvector schema + Typesense collection)
- A default admin user and role
- A quota enforcement row linked to the
limitsblock
Agent templates per plan
Rather than duplicating agent configs per tenant, use template variables that resolve to workspace-level values at runtime. This means all tenants on the same plan share one agent definition:
# astra.yml — agent template used for all tenant workspaces
agents:
- id: assistant
displayName: "{{workspace.displayName}} Assistant"
model:
provider: openai
modelId: gpt-4o-mini # cost-controlled for SaaS
systemPromptTemplate: |
You are the AI assistant for {{workspace.displayName}}.
You only have access to data within this workspace.
Never reference other customers or other workspaces.
budget:
maxTokensPerTurn: 4000
maxCostUsdPerMonth: "{{workspace.limits.maxCostUsdPerMonth}}"
permissions:
workspaceId: "{{workspace.id}}" # hard-scoped to tenant workspacePlan differences (model tier, token limits) are expressed as workspace-level limit overrides, not separate agent files.
Permission boundaries
The most critical security control in a multi-tenant deployment is ensuring agents cannot read or write across workspace boundaries. This is enforced at the database layer by default, but should also be explicitly denied at the grants layer:
# workspace/grants config — prevent cross-tenant reads
workspace:
grants:
- role: member
allow:
- memory:read # own workspace only
- memory:write
- agent:invoke
deny:
- memory:cross-workspace-read
- workspace:list # can't enumerate other workspacesCost attribution and billing
Every token usage record carries a workspace_id. Use this to generate per-tenant billing data:
-- Monthly cost per workspace for billing
SELECT
w.id AS workspace_id,
w.display_name,
SUM(t.cost_usd) AS total_cost_usd,
SUM(t.input_tokens + t.output_tokens) AS total_tokens,
COUNT(DISTINCT t.session_id) AS sessions
FROM turns t
JOIN workspaces w ON t.workspace_id = w.id
WHERE t.created_at >= date_trunc('month', now())
GROUP BY w.id, w.display_name
ORDER BY total_cost_usd DESC;Combine with Agent Quotas to enforce hard monthly spend limits per workspace. When a workspace hits its limit, new turns are rejected with a 429 Quota Exceeded response.
Onboarding flow
A typical tenant onboarding sequence:
- Create workspace via
POST /workspaceswith plan limits - Create admin user and issue JWT credentials
- Apply agent template for the plan tier
- Configure channels (Slack, Teams, etc.) if required
- Seed knowledge base with customer-specific documents via Knowledge Base
- Run smoke test: send a test message and verify response + memory write
Scaling considerations
- Database: At 100+ workspaces, consider Typesense collection sharding by workspace prefix to keep search indices manageable.
- Connection pooling: Use PgBouncer or Supabase pooler — each workspace generates independent query load.
- Async memory extraction: At high concurrency, post-turn memory extraction should use a background job queue (Redis + BullMQ) rather than running inline.