Tools

Async Dispatch vs. Batching

Open Astra offers two complementary mechanisms for reducing tool execution time: batching runs multiple independent tool calls in parallel within the same turn, and async dispatch offloads long-running tools to a background job so the agent can respond immediately. They solve different problems and can be used together.

Batching: parallelism within a turn

When the agent decides to call multiple tools that are independent of each other, batching fires them concurrently. The agent waits for all results before continuing inference. This eliminates sequential latency without any architectural complexity.

text
Turn timeline (no batching, no async):

 Agent ──► file_read(a) ──────── 80ms ──► file_read(b) ──── 80ms ──► inference ──► response
           [blocks]                        [blocks]

 Total: 80 + 80 + inference ≈ 460ms
text
Turn timeline (batching enabled):

 Agent ──► file_read(a) ┐
           file_read(b) ┘──── 80ms (parallel) ──► inference ──► response
           [both fire at once]

 Total: 80 + inference ≈ 380ms   (-17% vs no batching)

Batching works best when:

  • Tools are fast (under 500ms each)
  • Tools have no dependency on each other's output
  • The agent needs all results before it can respond

Async dispatch: background jobs across turns

Async dispatch separates starting a job from using its results. The agent dispatches the tool call, receives a job ID immediately, responds to the user, and polls for results on a future turn. This keeps turn latency low for tools that take seconds or minutes.

text
Turn timeline (async dispatch):

 Turn 1:  Agent ──► web_crawl.dispatch() ──► "job_id: abc" ──► inference ──► response
                     [non-blocking, 1ms]                                      "I've started the crawl,
                                                                               I'll report back."
          ───── background ─────────────────────────────────────────────────────────────
                    web_crawl runs for 8 seconds in background

 Turn 2:  Agent ──► web_crawl.poll(job_id) ──► results ──► inference ──► response
                     [ready, 2ms]                                          "Here's what I found…"

Async works best when:

  • The tool takes more than ~2 seconds
  • The user doesn't need to wait for the result before getting a response
  • Results can be used in a follow-up turn or surfaced via webhook

Decision matrix

CriterionUse batchingUse asyncUse neither
Execution timeFast (<500ms)Slow (>2s)Any — if sequential
Dependencies between callsNoneNoneOutput feeds next call
User needs immediate resultYesNoYes
Multiple calls in one turnYesNoNo
Tool is idempotent / retriableRecommendedRequiredOptional
Side effects (writes, sends)Safe if idempotentAvoidOK

Performance profiles

ConfigurationTurn latencyThroughputUX
No batching, no asyncHigh (sum of all tools)LowUser waits for everything
Batching onlyMedium (max of batch)MediumUser waits for longest tool
Async onlyLow (dispatch is instant)HighUser gets immediate response, waits for follow-up
Batching + asyncLowHighBest for mixed fast/slow tool workloads

Configuring batching

yaml
tools:
  batching:
    enabled: true
    maxBatchSize: 8
    maxWaitMs: 50

See Batching for the full config reference and tool-level overrides.

Configuring async dispatch

yaml
# Mark a tool as async in astra.yml
tools:
  - id: web_crawl
    async: true
    pollIntervalMs: 2000     # agent polls every 2s on follow-up turns
    timeoutMs: 120000        # job expires after 2 minutes if not polled

See Async Dispatch for job lifecycle, polling, and webhook delivery.

Choosing per tool

javascript
// Pseudocode: how to decide at agent-config level
const toolConfig = {
  // Use batching: multiple independent reads in the same turn
  "file_read":    { batchable: true },
  "db_query":     { batchable: true },
  "memory_read":  { batchable: true },

  // Use async dispatch: long-running, non-blocking work
  "web_crawl":    { async: true, timeoutMs: 60000 },
  "code_index":   { async: true, timeoutMs: 300000 },
  "report_gen":   { async: true, timeoutMs: 120000 },

  // Use neither: fast, sequential, stateful
  "shell_exec":   { batchable: false, async: false },
  "file_write":   { batchable: false, async: false },
}

Using both together

A single turn can use both mechanisms simultaneously. For example, an agent doing research might:

  1. Batch three fast memory_read calls (parallel, results used immediately).
  2. Dispatch one async web_crawl job for deep research (non-blocking).
  3. Respond to the user with the memory results plus a note that the crawl is running.
  4. On the next turn, poll the crawl job and synthesize the full result.

This pattern — batch fast tools, async slow tools — is the highest-performance configuration for research and data-heavy agents.