Experimentation
Open Astra provides tools for testing and comparing agent performance: a prompt playground for ad-hoc testing, batch inference for parallel evaluation, A/B tests for production comparison, parameter sweeps for tuning, and tournaments for head-to-head evaluation.
Prompt playground
Stateless inference endpoint for quick testing — no session, no memory, no tools. Supports all providers. Rate limited to 10 requests per minute per user.
# Stateless inference (no session, no memory)
curl -X POST http://localhost:3000/playground/chat \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"messages": [{ "role": "user", "content": "Explain HNSW indexing in 3 sentences" }],
"model": { "provider": "claude", "modelId": "claude-sonnet-4-20250514" },
"temperature": 0.3,
"maxTokens": 500
}'
# Rate limit: 10 requests/minute per userA WebSocket endpoint at /playground/ws/:agentId is also available for interactive agent testing with sessions.
Batch inference
Send 1–50 prompts to an agent in parallel. Concurrency is controlled by maxConcurrency (default 5, max 20).
# Batch inference (1-50 prompts, concurrent)
curl -X POST http://localhost:3000/batch-inference \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"agentId": "analysis-agent",
"prompts": [
"Summarize Q1 revenue trends",
"Summarize Q2 revenue trends",
"Summarize Q3 revenue trends"
],
"maxConcurrency": 5
}'A/B testing
Compare two agents in production with deterministic user-to-variant assignment. Default traffic split is 50/50.
# Create an A/B test
curl -X POST http://localhost:3000/ab-tests \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"name": "Concise vs. Detailed responses",
"agentAId": "concise-agent",
"agentBId": "detailed-agent",
"trafficSplit": 0.5,
"metric": "user_rating"
}'
# Get variant for a user (deterministic hash)
curl http://localhost:3000/ab-tests/ab_abc123/variant?uid=uid_alice \
-H "Authorization: Bearer ${JWT_TOKEN}"
# Record result
curl -X POST http://localhost:3000/ab-tests/ab_abc123/results \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-H "Content-Type: application/json" \
-d '{ "agentId": "concise-agent", "rating": 4 }'Parameter sweeps
Test an agent across a grid of temperature and max token combinations. Runs asynchronously in the background. Default grid: temperatures [0.1, 0.5, 0.7, 1.0] × maxTokens [256, 512, 1024].
# Run a parameter sweep (async)
curl -X POST http://localhost:3000/param-sweeps \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"agentId": "code-agent",
"testPrompt": "Write a binary search in Rust",
"parameterGrid": {
"temperatures": [0.1, 0.5, 0.7, 1.0],
"maxTokensList": [256, 512, 1024]
}
}'
# Response (202 Accepted — runs in background)
{ "id": "sweep_abc123", "status": "running" }
# Check results
curl http://localhost:3000/param-sweeps/sweep_abc123 \
-H "Authorization: Bearer ${JWT_TOKEN}"Tournaments
Pit 2–20 agents against each other on 1–50 test prompts. The tournament evaluates all agents and determines a winner.
# Create and run a tournament
curl -X POST http://localhost:3000/tournaments \
-H "Authorization: Bearer ${JWT_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"name": "Best code reviewer",
"agentIds": ["review-a", "review-b", "review-c"],
"testPrompts": ["Review this auth middleware...", "Review this caching layer..."],
"metric": "response_quality"
}'
# Run the tournament
curl -X POST http://localhost:3000/tournaments/t_abc123/run \
-H "Authorization: Bearer ${JWT_TOKEN}"