Multi-Tenant Implementation Guide

This guide walks through building a SaaS product on top of Open Astra where each customer gets their own isolated agent team. It covers workspace provisioning, data isolation, cost attribution, and permission boundaries.

Architecture overview

The recommended multi-tenant pattern uses one Open Astra workspace per customer. Workspaces are the natural isolation boundary: each has its own memory, agent configs, secrets, and quotas.

Boundary	Enforced by	Configurable?
Memory isolation	Workspace-scoped pgvector and Typesense collections	No — always isolated
Agent isolation	Workspace ID on every turn record	No — always isolated
Cost attribution	`workspace_id` on all token usage records	Built-in
Secret isolation	Per-workspace secrets store	Yes
Cross-workspace reads	Agent Grants — denied by default	Yes — opt-in only

Provisioning a tenant workspace

When a new customer signs up, create their workspace via the REST API and set spending limits for the plan tier:

json

# POST /workspaces
{
  "id": "customer-acme",
  "displayName": "ACME Corp",
  "plan": "pro",
  "ownerId": "user_acme_admin",
  "limits": {
    "maxAgents": 10,
    "maxTokensPerMonth": 5000000,
    "maxCostUsdPerMonth": 50.00
  }
}

Workspace creation automatically provisions:

An isolated memory namespace (pgvector schema + Typesense collection)
A default admin user and role
A quota enforcement row linked to the limits block

Agent templates per plan

Rather than duplicating agent configs per tenant, use template variables that resolve to workspace-level values at runtime. This means all tenants on the same plan share one agent definition:

yaml

# astra.yml — agent template used for all tenant workspaces
agents:
  - id: assistant
    displayName: "{{workspace.displayName}} Assistant"
    model:
      provider: openai
      modelId: gpt-4o-mini            # cost-controlled for SaaS
    systemPromptTemplate: |
      You are the AI assistant for {{workspace.displayName}}.
      You only have access to data within this workspace.
      Never reference other customers or other workspaces.
    budget:
      maxTokensPerTurn: 4000
      maxCostUsdPerMonth: "{{workspace.limits.maxCostUsdPerMonth}}"
    permissions:
      workspaceId: "{{workspace.id}}"  # hard-scoped to tenant workspace

Plan differences (model tier, token limits) are expressed as workspace-level limit overrides, not separate agent files.

Permission boundaries

The most critical security control in a multi-tenant deployment is ensuring agents cannot read or write across workspace boundaries. This is enforced at the database layer by default, but should also be explicitly denied at the grants layer:

yaml

# workspace/grants config — prevent cross-tenant reads
workspace:
  grants:
    - role: member
      allow:
        - memory:read       # own workspace only
        - memory:write
        - agent:invoke
      deny:
        - memory:cross-workspace-read
        - workspace:list    # can't enumerate other workspaces

ℹEven with grant-level denials, the database query layer enforces workspace scoping on every memory read. Both layers are required for defense in depth.

Cost attribution and billing

Every token usage record carries a workspace_id. Use this to generate per-tenant billing data:

sql

-- Monthly cost per workspace for billing
SELECT
  w.id AS workspace_id,
  w.display_name,
  SUM(t.cost_usd) AS total_cost_usd,
  SUM(t.input_tokens + t.output_tokens) AS total_tokens,
  COUNT(DISTINCT t.session_id) AS sessions
FROM turns t
JOIN workspaces w ON t.workspace_id = w.id
WHERE t.created_at >= date_trunc('month', now())
GROUP BY w.id, w.display_name
ORDER BY total_cost_usd DESC;

Combine with Agent Quotas to enforce hard monthly spend limits per workspace. When a workspace hits its limit, new turns are rejected with a 429 Quota Exceeded response.

Onboarding flow

A typical tenant onboarding sequence:

Create workspace via POST /workspaces with plan limits
Create admin user and issue JWT credentials
Apply agent template for the plan tier
Configure channels (Slack, Teams, etc.) if required
Seed knowledge base with customer-specific documents via Knowledge Base
Run smoke test: send a test message and verify response + memory write

Scaling considerations

Database: At 100+ workspaces, consider Typesense collection sharding by workspace prefix to keep search indices manageable.
Connection pooling: Use PgBouncer or Supabase pooler — each workspace generates independent query load.
Async memory extraction: At high concurrency, post-turn memory extraction should use a background job queue (Redis + BullMQ) rather than running inline.