Async Dispatch vs. Batching
Open Astra offers two complementary mechanisms for reducing tool execution time: batching runs multiple independent tool calls in parallel within the same turn, and async dispatch offloads long-running tools to a background job so the agent can respond immediately. They solve different problems and can be used together.
Batching: parallelism within a turn
When the agent decides to call multiple tools that are independent of each other, batching fires them concurrently. The agent waits for all results before continuing inference. This eliminates sequential latency without any architectural complexity.
Turn timeline (no batching, no async):
Agent ──► file_read(a) ──────── 80ms ──► file_read(b) ──── 80ms ──► inference ──► response
[blocks] [blocks]
Total: 80 + 80 + inference ≈ 460msTurn timeline (batching enabled):
Agent ──► file_read(a) ┐
file_read(b) ┘──── 80ms (parallel) ──► inference ──► response
[both fire at once]
Total: 80 + inference ≈ 380ms (-17% vs no batching)Batching works best when:
- Tools are fast (under 500ms each)
- Tools have no dependency on each other's output
- The agent needs all results before it can respond
Async dispatch: background jobs across turns
Async dispatch separates starting a job from using its results. The agent dispatches the tool call, receives a job ID immediately, responds to the user, and polls for results on a future turn. This keeps turn latency low for tools that take seconds or minutes.
Turn timeline (async dispatch):
Turn 1: Agent ──► web_crawl.dispatch() ──► "job_id: abc" ──► inference ──► response
[non-blocking, 1ms] "I've started the crawl,
I'll report back."
───── background ─────────────────────────────────────────────────────────────
web_crawl runs for 8 seconds in background
Turn 2: Agent ──► web_crawl.poll(job_id) ──► results ──► inference ──► response
[ready, 2ms] "Here's what I found…"Async works best when:
- The tool takes more than ~2 seconds
- The user doesn't need to wait for the result before getting a response
- Results can be used in a follow-up turn or surfaced via webhook
Decision matrix
| Criterion | Use batching | Use async | Use neither |
|---|---|---|---|
| Execution time | Fast (<500ms) | Slow (>2s) | Any — if sequential |
| Dependencies between calls | None | None | Output feeds next call |
| User needs immediate result | Yes | No | Yes |
| Multiple calls in one turn | Yes | No | No |
| Tool is idempotent / retriable | Recommended | Required | Optional |
| Side effects (writes, sends) | Safe if idempotent | Avoid | OK |
Performance profiles
| Configuration | Turn latency | Throughput | UX |
|---|---|---|---|
| No batching, no async | High (sum of all tools) | Low | User waits for everything |
| Batching only | Medium (max of batch) | Medium | User waits for longest tool |
| Async only | Low (dispatch is instant) | High | User gets immediate response, waits for follow-up |
| Batching + async | Low | High | Best for mixed fast/slow tool workloads |
Configuring batching
tools:
batching:
enabled: true
maxBatchSize: 8
maxWaitMs: 50See Batching for the full config reference and tool-level overrides.
Configuring async dispatch
# Mark a tool as async in astra.yml
tools:
- id: web_crawl
async: true
pollIntervalMs: 2000 # agent polls every 2s on follow-up turns
timeoutMs: 120000 # job expires after 2 minutes if not polledSee Async Dispatch for job lifecycle, polling, and webhook delivery.
Choosing per tool
// Pseudocode: how to decide at agent-config level
const toolConfig = {
// Use batching: multiple independent reads in the same turn
"file_read": { batchable: true },
"db_query": { batchable: true },
"memory_read": { batchable: true },
// Use async dispatch: long-running, non-blocking work
"web_crawl": { async: true, timeoutMs: 60000 },
"code_index": { async: true, timeoutMs: 300000 },
"report_gen": { async: true, timeoutMs: 120000 },
// Use neither: fast, sequential, stateful
"shell_exec": { batchable: false, async: false },
"file_write": { batchable: false, async: false },
}Using both together
A single turn can use both mechanisms simultaneously. For example, an agent doing research might:
- Batch three fast
memory_readcalls (parallel, results used immediately). - Dispatch one async
web_crawljob for deep research (non-blocking). - Respond to the user with the memory results plus a note that the crawl is running.
- On the next turn, poll the crawl job and synthesize the full result.
This pattern — batch fast tools, async slow tools — is the highest-performance configuration for research and data-heavy agents.