Async Dispatch vs. Batching

Open Astra offers two complementary mechanisms for reducing tool execution time: batching runs multiple independent tool calls in parallel within the same turn, and async dispatch offloads long-running tools to a background job so the agent can respond immediately. They solve different problems and can be used together.

Batching: parallelism within a turn

When the agent decides to call multiple tools that are independent of each other, batching fires them concurrently. The agent waits for all results before continuing inference. This eliminates sequential latency without any architectural complexity.

text

Turn timeline (no batching, no async):

 Agent ──► file_read(a) ──────── 80ms ──► file_read(b) ──── 80ms ──► inference ──► response
           [blocks]                        [blocks]

 Total: 80 + 80 + inference ≈ 460ms

text

Turn timeline (batching enabled):

 Agent ──► file_read(a) ┐
           file_read(b) ┘──── 80ms (parallel) ──► inference ──► response
           [both fire at once]

 Total: 80 + inference ≈ 380ms   (-17% vs no batching)

Batching works best when:

Tools are fast (under 500ms each)
Tools have no dependency on each other's output
The agent needs all results before it can respond

Async dispatch: background jobs across turns

Async dispatch separates starting a job from using its results. The agent dispatches the tool call, receives a job ID immediately, responds to the user, and polls for results on a future turn. This keeps turn latency low for tools that take seconds or minutes.

text

Turn timeline (async dispatch):

 Turn 1:  Agent ──► web_crawl.dispatch() ──► "job_id: abc" ──► inference ──► response
                     [non-blocking, 1ms]                                      "I've started the crawl,
                                                                               I'll report back."
          ───── background ─────────────────────────────────────────────────────────────
                    web_crawl runs for 8 seconds in background

 Turn 2:  Agent ──► web_crawl.poll(job_id) ──► results ──► inference ──► response
                     [ready, 2ms]                                          "Here's what I found…"

Async works best when:

The tool takes more than ~2 seconds
The user doesn't need to wait for the result before getting a response
Results can be used in a follow-up turn or surfaced via webhook

Decision matrix

Criterion	Use batching	Use async	Use neither
Execution time	Fast (<500ms)	Slow (>2s)	Any — if sequential
Dependencies between calls	None	None	Output feeds next call
User needs immediate result	Yes	No	Yes
Multiple calls in one turn	Yes	No	No
Tool is idempotent / retriable	Recommended	Required	Optional
Side effects (writes, sends)	Safe if idempotent	Avoid	OK

Performance profiles

Configuration	Turn latency	Throughput	UX
No batching, no async	High (sum of all tools)	Low	User waits for everything
Batching only	Medium (max of batch)	Medium	User waits for longest tool
Async only	Low (dispatch is instant)	High	User gets immediate response, waits for follow-up
Batching + async	Low	High	Best for mixed fast/slow tool workloads

Configuring batching

yaml

tools:
  batching:
    enabled: true
    maxBatchSize: 8
    maxWaitMs: 50

See Batching for the full config reference and tool-level overrides.

Configuring async dispatch

yaml

# Mark a tool as async in astra.yml
tools:
  - id: web_crawl
    async: true
    pollIntervalMs: 2000     # agent polls every 2s on follow-up turns
    timeoutMs: 120000        # job expires after 2 minutes if not polled

See Async Dispatch for job lifecycle, polling, and webhook delivery.

Choosing per tool

javascript

// Pseudocode: how to decide at agent-config level
const toolConfig = {
  // Use batching: multiple independent reads in the same turn
  "file_read":    { batchable: true },
  "db_query":     { batchable: true },
  "memory_read":  { batchable: true },

  // Use async dispatch: long-running, non-blocking work
  "web_crawl":    { async: true, timeoutMs: 60000 },
  "code_index":   { async: true, timeoutMs: 300000 },
  "report_gen":   { async: true, timeoutMs: 120000 },

  // Use neither: fast, sequential, stateful
  "shell_exec":   { batchable: false, async: false },
  "file_write":   { batchable: false, async: false },
}

Using both together

A single turn can use both mechanisms simultaneously. For example, an agent doing research might:

Batch three fast memory_read calls (parallel, results used immediately).
Dispatch one async web_crawl job for deep research (non-blocking).
Respond to the user with the memory results plus a note that the crawl is running.
On the next turn, poll the crawl job and synthesize the full result.

This pattern — batch fast tools, async slow tools — is the highest-performance configuration for research and data-heavy agents.