Experimentation

Open Astra provides tools for testing and comparing agent performance: a prompt playground for ad-hoc testing, batch inference for parallel evaluation, A/B tests for production comparison, parameter sweeps for tuning, and tournaments for head-to-head evaluation.

Prompt playground

Stateless inference endpoint for quick testing — no session, no memory, no tools. Supports all providers. Rate limited to 10 requests per minute per user.

bash

# Stateless inference (no session, no memory)
curl -X POST http://localhost:3000/playground/chat \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{ "role": "user", "content": "Explain HNSW indexing in 3 sentences" }],
    "model": { "provider": "claude", "modelId": "claude-sonnet-4-20250514" },
    "temperature": 0.3,
    "maxTokens": 500
  }'

# Rate limit: 10 requests/minute per user

A WebSocket endpoint at /playground/ws/:agentId is also available for interactive agent testing with sessions.

Batch inference

Send 1–50 prompts to an agent in parallel. Concurrency is controlled by maxConcurrency (default 5, max 20).

bash

# Batch inference (1-50 prompts, concurrent)
curl -X POST http://localhost:3000/batch-inference \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "analysis-agent",
    "prompts": [
      "Summarize Q1 revenue trends",
      "Summarize Q2 revenue trends",
      "Summarize Q3 revenue trends"
    ],
    "maxConcurrency": 5
  }'

A/B testing

Compare two agents in production with deterministic user-to-variant assignment. Default traffic split is 50/50.

bash

# Create an A/B test
curl -X POST http://localhost:3000/ab-tests \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Concise vs. Detailed responses",
    "agentAId": "concise-agent",
    "agentBId": "detailed-agent",
    "trafficSplit": 0.5,
    "metric": "user_rating"
  }'

# Get variant for a user (deterministic hash)
curl http://localhost:3000/ab-tests/ab_abc123/variant?uid=uid_alice \
  -H "Authorization: Bearer ${JWT_TOKEN}"

# Record result
curl -X POST http://localhost:3000/ab-tests/ab_abc123/results \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{ "agentId": "concise-agent", "rating": 4 }'

Parameter sweeps

Test an agent across a grid of temperature and max token combinations. Runs asynchronously in the background. Default grid: temperatures [0.1, 0.5, 0.7, 1.0] × maxTokens [256, 512, 1024].

bash

# Run a parameter sweep (async)
curl -X POST http://localhost:3000/param-sweeps \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "agentId": "code-agent",
    "testPrompt": "Write a binary search in Rust",
    "parameterGrid": {
      "temperatures": [0.1, 0.5, 0.7, 1.0],
      "maxTokensList": [256, 512, 1024]
    }
  }'

# Response (202 Accepted — runs in background)
{ "id": "sweep_abc123", "status": "running" }

# Check results
curl http://localhost:3000/param-sweeps/sweep_abc123 \
  -H "Authorization: Bearer ${JWT_TOKEN}"

Tournaments

Pit 2–20 agents against each other on 1–50 test prompts. The tournament evaluates all agents and determines a winner.

bash

# Create and run a tournament
curl -X POST http://localhost:3000/tournaments \
  -H "Authorization: Bearer ${JWT_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Best code reviewer",
    "agentIds": ["review-a", "review-b", "review-c"],
    "testPrompts": ["Review this auth middleware...", "Review this caching layer..."],
    "metric": "response_quality"
  }'

# Run the tournament
curl -X POST http://localhost:3000/tournaments/t_abc123/run \
  -H "Authorization: Bearer ${JWT_TOKEN}"