Backup & Recovery Runbook

This runbook covers backup procedures for PostgreSQL and Typesense, workspace-scoped memory restore, RTO/RPO targets, and platform-specific guidance for DigitalOcean Managed Databases.

RTO / RPO targets

Scenario	RPO (data loss tolerance)	RTO (downtime tolerance)
Full instance failure	Up to 24 hours (daily backup)	30–60 min (restore + verify)
Accidental workspace deletion	Up to 24 hours (daily backup)	15–30 min (workspace restore)
Corrupt memory data	Up to 24 hours (daily backup)	10–20 min (table-level restore)
DigitalOcean PITR (Point-in-Time Recovery)	Up to 1 minute (WAL streaming)	20–40 min (cluster rebuild)

The default daily backup schedule achieves 24-hour RPO. Enable WAL archiving or use DigitalOcean's PITR to reduce RPO to minutes.

What to back up

System	Contains	Backup method	Criticality
PostgreSQL	All agent data: sessions, turns, memory, knowledge graph, workspace config	`pg_dump` / PITR	Critical
Typesense	Search index — a derived copy of memory_entries and documents	Collection export (JSONL)	High (can be rebuilt from PG, but takes time)
`astra.yml`	Agent, skill, tool, channel configuration	Git version control	High
Environment variables / secrets	API keys, DB credentials	Secrets manager snapshot	Critical

ℹTypesense data is a derived index — it can be fully rebuilt from PostgreSQL using npx astra reindex. Back it up to reduce recovery time, but PostgreSQL is the source of truth.

PostgreSQL backup

Use pg_dump for logical backups. Schedule daily runs with a cron job and store the output in object storage (S3, DigitalOcean Spaces):

bash

# Full logical backup (all workspaces)
pg_dump \
  --format=custom \
  --compress=9 \
  --no-acl \
  --no-owner \
  "$DATABASE_URL" \
  -f backup-$(date +%Y%m%d-%H%M%S).dump

# Workspace-scoped backup (single tenant)
pg_dump \
  --format=custom \
  --table='memory_entries' \
  --table='knowledge_graph_*' \
  --table='sessions' \
  --table='turns' \
  --where="workspace_id = 'ws_acme'" \
  "$DATABASE_URL" \
  -f ws-acme-$(date +%Y%m%d).dump

PostgreSQL restore

bash

# Full restore to a new database
pg_restore \
  --clean \
  --if-exists \
  --no-acl \
  --no-owner \
  -d "$DATABASE_URL" \
  backup-20260101-120000.dump

# Restore a single workspace's memory to a running instance
# 1. Restore into a staging table
pg_restore \
  --table=memory_entries \
  --table=knowledge_graph_nodes \
  --table=knowledge_graph_edges \
  -d "$DATABASE_URL" \
  ws-acme-20260101.dump

# 2. Promote staging rows to live (in psql)
INSERT INTO memory_entries SELECT * FROM memory_entries_staging
  WHERE workspace_id = 'ws_acme'
  ON CONFLICT (id) DO NOTHING;

Restoring a single workspace's memory

To restore one workspace without touching others:

Restore the dump into temporary tables (using --table flags and a staging schema).
Filter rows by workspace_id and insert into live tables with ON CONFLICT DO NOTHING to avoid overwriting newer data.
Re-trigger search indexing for the workspace: npx astra reindex --workspace ws_acme.
Verify memory is accessible: send a test agent turn and check retrieved memory results.

Typesense backup

Export each collection as JSONL. Store alongside the PostgreSQL dump so they're from the same point in time:

bash

# Export all collections (run against your Typesense host)
for collection in $(curl -s "http://localhost:8108/collections" \
    -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" | jq -r '.[].name'); do
  curl -s "http://localhost:8108/collections/$collection/documents/export" \
    -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \
    > "typesense-$collection-$(date +%Y%m%d).jsonl"
  echo "Exported $collection"
done

Typesense restore

bash

# Re-import a collection (collection must exist with correct schema first)
curl -X POST "http://localhost:8108/collections/memory_ws_acme/documents/import?action=upsert" \
  -H "X-TYPESENSE-API-KEY: $TYPESENSE_API_KEY" \
  -H "Content-Type: text/plain" \
  --data-binary @typesense-memory_ws_acme-20260101.jsonl

If the collection schema has changed since the backup was taken, recreate the collection with the new schema before importing. The collection schema is defined in src/search/collections.ts.

DigitalOcean Managed Database specifics

DigitalOcean Managed PostgreSQL provides automated daily backups and optional Point-in-Time Recovery (PITR). PITR is available on Business-tier clusters and allows restoring to any second within the last 7 days.

bash

# DigitalOcean Managed DB: enable automated backups via doctl
doctl databases backups list <database-id>

# Trigger a manual backup (before a risky migration)
doctl databases maintenance-window update <database-id> \
  --day saturday --hour "02:00"

# Restore to a new cluster from a backup
doctl databases create astra-restore \
  --engine pg \
  --version 16 \
  --restore-from-database-name astra-db \
  --restore-from-timestamp "2026-01-01T12:00:00Z"

Key DO-specific notes:

pgvector is pre-installed on DigitalOcean Managed PostgreSQL 14+. You do not need to install it manually after restore.
PITR restores create a new cluster — update DATABASE_URL in your environment after the restore cluster is ready.
Connection pooling (PgBouncer) is a separate service — after restoring to a new cluster, update the pooler to point to the new host.
Backups are stored in the same region as your cluster. For disaster recovery across regions, enable cross-region replica or use pg_dump + Spaces to a different region.

Recommended backup schedule

Frequency	Method	Retention
Hourly	WAL archiving to S3/Spaces (if enabled)	7 days
Daily	`pg_dump` + Typesense JSONL export	30 days
Weekly	Full `pg_dump` snapshot	90 days
Before migrations	Manual `pg_dump`	Keep until migration verified

Recovery checklist

Stop the gateway (docker compose down astra) to prevent writes during restore.
Restore PostgreSQL from backup.
Run migrations to ensure schema is current: npx drizzle-kit migrate.
Restore Typesense collections, or rebuild from PG: npx astra reindex.
Restart the gateway: docker compose up -d astra.
Run npx astra doctor and verify all agents report healthy.
Send a test message to a representative agent and verify memory retrieval is working.