WebSocket

The gateway exposes a full-duplex WebSocket endpoint alongside the REST API, sharing the same http.Server. Clients connect once, authenticate via JWT, and receive streamed agent responses, presence updates, push notifications, and heartbeats over a single persistent connection. The WebSocket server is implemented in gateway/ws.ts (connection management) and gateway/ws-handler.ts (message handling and agent streaming).

Connection

Connect to ws://host/ws with a JWT access token in the query string.

typescript

// Browser
const ws = new WebSocket('ws://localhost:8080/ws?token=<jwt>');

ws.onopen = () => console.log('connected');
ws.onmessage = (e) => {
  const frame = JSON.parse(e.data);
  console.log(frame.type, frame);
};
ws.onclose = (e) => {
  if (e.code === 4001) console.error('Missing token');
  if (e.code === 4003) console.error('Invalid or expired token');
};

Authentication

The token is validated on upgrade. If authentication fails, the socket is closed immediately with a WebSocket close code:

Close code	Meaning
`4001`	Missing token — no `token` query parameter was provided
`4003`	Invalid or expired token — JWT verification failed

Welcome frame

On successful authentication the server immediately sends a welcome frame:

json

// Server sends immediately after successful auth
{
  "type": "connected",
  "timestamp": "2026-02-27T12:00:00.000Z"
}

Multi-device support

A single user (uid) can have any number of concurrent WebSocket connections — one per device, browser tab, or CLI session. The gateway stores connections in a Map<uid, Set<AuthenticatedSocket>>. All outbound events (stream chunks, presence, push notifications) are fanned out to every socket in the user's set.

typescript

// Internal connection storage
// One uid can have N concurrent sockets (multi-device)
const connections = new Map<string, Set<AuthenticatedSocket>>();

// Each AuthenticatedSocket carries:
// - socket: WebSocket
// - uid: string
// - connectedAt: Date
// - inFlight: { abortController: AbortController, id: string } | null

Heartbeat

The server sends a ping frame every 30 seconds. Clients must respond with pong. Sockets that fail to respond before the next heartbeat cycle are terminated.

json

// Server → Client every 30 seconds
{ type: "ping" }

// Client must respond with
{ type: "pong" }

// If no pong received before next cycle, socket is terminated

Lifecycle events

The event bus emits events on each connection lifecycle change, which other subsystems (logging, metrics, autonomous agents) can subscribe to:

typescript

// Emitted on the typed event bus for each connection lifecycle event
bus.emit('ws.connected',    { uid, socketCount });
bus.emit('ws.disconnected', { uid, socketCount, code, reason });

Inbound message frames

All inbound frames are validated with Zod. There are exactly two valid frame types: message and ping.

type: "message"

Send a message to an agent. The id field is a client-generated request identifier that the server echoes back on every outbound frame related to this request.

json

// type: "message" — send a message to an agent
{
  "type": "message",
  "id": "req_abc123",               // string, required — client-generated request ID
  "agentId": "chad",                // string, default "chad"
  "message": "What is the weather?", // string, 1–10 000 chars
  "surface": "chat",                // "chat" | "prop" | "group", default "chat"
  "surfaceId": "main"               // string, default "main"
}

Field	Type	Required	Default	Constraints
`type`	`"message"`	Yes	—	Literal
`id`	`string`	Yes	—	Client-generated request ID
`agentId`	`string`	No	`"chad"`	Must match a registered agent
`message`	`string`	Yes	—	1 – 10 000 characters
`surface`	`string`	No	`"chat"`	`"chat"` \| `"prop"` \| `"group"`
`surfaceId`	`string`	No	`"main"`	Identifies the specific surface instance

type: "ping"

Client-initiated keep-alive. The server responds with { type: "pong" }.

json

// type: "ping" — keep-alive from client
{
  "type": "ping"
}

Zod validation schema

The server parses every inbound frame through a discriminated union. Invalid JSON, unknown types, or constraint violations all produce an error frame.

typescript

import { z } from 'zod';

const messageFrame = z.object({
  type:      z.literal('message'),
  id:        z.string(),
  agentId:   z.string().default('chad'),
  message:   z.string().min(1).max(10000),
  surface:   z.enum(['chat', 'prop', 'group']).default('chat'),
  surfaceId: z.string().default('main'),
});

const pingFrame = z.object({
  type: z.literal('ping'),
});

const inboundFrame = z.discriminatedUnion('type', [messageFrame, pingFrame]);

Validation errors

If the payload fails validation, the server responds with an error frame. The socket is not closed — the client can retry.

json

// Invalid JSON
{ "type": "error", "message": "Invalid JSON" }

// Unknown frame type
{ "type": "error", "message": "Unknown frame type: subscribe" }

// Zod validation failure
{ "type": "error", "message": "message: String must contain at least 1 character(s)" }

Outbound event types

The server pushes frames to clients for agent streaming, presence, system notifications, and more. Every frame has a type field.

Type	Direction	Description
`connected`	Server → Client	Welcome frame on successful auth
`session_resolved`	Server → Client	Session found or created for this request
`stream_chunk`	Server → Client	Incremental text delta during streaming
`tool_call`	Server → Client	Tool call started; sent again with `complete: true` when arguments are ready
`tool_result`	Server → Client	Tool finished executing — includes duration and optional error
`stream_done`	Server → Client	Final result — full content, session ID, usage stats, tools used
`error`	Server → Client	Error tied to a specific request
`presence`	Server → Client	Agent status change (starting, streaming, tool_calling, completed, error)
`pong`	Server → Client	Response to client-initiated `ping`
`line_alert`	Server → Client	Pushed by external channel adapters
`briefing`	Server → Client	Pushed by the cron scheduler (e.g. daily briefing)
`agent_response`	Server → Client	Pushed by autonomous agents running in the background

session_resolved

json

// Sent after session lookup / creation
{
  "type": "session_resolved",
  "id": "req_abc123",
  "sessionId": "ses_xyz",
  "isNew": true
}

stream_chunk

Sent for each token batch as the agent streams its response. Clients concatenate content values to build the full response progressively.

json

// Incremental text delta — one per token batch
{
  "type": "stream_chunk",
  "id": "req_abc123",
  "content": "The weather in"
}

tool_call

Sent when the model initiates a tool call. A second frame with complete: true is sent once the full arguments have been parsed from the stream.

json

// Tool call started
{
  "type": "tool_call",
  "id": "req_abc123",
  "index": 0,
  "toolCallId": "tc_1",
  "name": "web_search"
}

// Tool call complete (includes parsed arguments)
{
  "type": "tool_call",
  "id": "req_abc123",
  "index": 0,
  "toolCallId": "tc_1",
  "name": "web_search",
  "complete": true,
  "arguments": "{\"query\":\"weather today\"}"
}

tool_result

Sent after the tool finishes executing. The error field is null on success or contains the error string on failure.

json

// After tool execution finishes
{
  "type": "tool_result",
  "id": "req_abc123",
  "name": "web_search",
  "durationMs": 320,
  "error": null
}

stream_done

Marks the end of the streaming response. Contains the full assembled content, session ID, token usage, and a list of all tools invoked during this turn.

json

// Final frame — full response + metadata
{
  "type": "stream_done",
  "id": "req_abc123",
  "content": "The weather in San Francisco is 62°F and sunny.",
  "sessionId": "ses_xyz",
  "usage": {
    "promptTokens": 1240,
    "completionTokens": 38,
    "totalTokens": 1278
  },
  "toolsUsed": ["web_search"]
}

error

json

// Error tied to a specific request
{
  "type": "error",
  "id": "req_abc123",
  "message": "Rate limit exceeded"
}

Rate limiting

WebSocket messages are rate-limited separately from the HTTP API. The gateway uses an in-memory sliding window scoped to each uid.

typescript

// In-memory sliding window per uid
const WINDOW_MS   = 60_000;   // 60 seconds
const MAX_MESSAGES = 30;       // per window

// Timestamps older than WINDOW_MS are pruned every 60 seconds
// Exceeding the limit returns:
{
  "type": "error",
  "id": "req_abc123",
  "message": "Rate limit exceeded"
}

Parameter	Value
Window	60 seconds (sliding)
Max messages per window	30
Cleanup interval	Every 60 seconds — timestamps older than the window are pruned
Scope	Per `uid` (shared across all of the user's sockets)

Per-uid, not per-socket. Rate limits are tracked by user ID. A user with three open tabs shares a single 30-message window across all of them.

Concurrency

The gateway enforces a maximum of 1 in-flight agent request per socket. This prevents a single tab from flooding the agent pipeline. Because the limit is per socket rather than per uid, a user with multiple devices can still have parallel requests — one per connection.

typescript

// Max 1 in-flight agent request PER SOCKET (not per uid)
// A user with 3 devices can have 3 parallel requests

// Second message while one is in progress:
{
  "type": "error",
  "id": "req_def456",
  "message": "A request is already in progress on this connection"
}

// Each in-flight request carries an AbortController
// Closing the socket calls abortController.abort()
// → the inference provider cancels the request upstream

Socket close aborts in-flight requests. Each in-flight request holds an AbortController. When the socket closes (browser tab closed, network drop, etc.), the controller's abort() is called immediately, canceling the inference request upstream. This prevents orphaned agent runs from consuming resources.

Back-pressure

When the agent streams tokens faster than the client can consume them, the TCP send buffer fills up. The gateway monitors socket.bufferedAmount and pauses streaming if it exceeds 64 KB, polling every 50 ms until the buffer drains below threshold.

typescript

// Before sending each chunk, the gateway checks:
if (socket.bufferedAmount > 65_536) {   // 64 KB threshold
  // Pause streaming — poll every 50ms
  while (socket.bufferedAmount > 65_536) {
    await sleep(50);
  }
}
// Then resume sending chunks

Why 64 KB? This threshold is a balance between latency and throughput. Smaller values reduce memory pressure but increase the number of pause/resume cycles. 64 KB accommodates typical bursts from fast inference providers without stalling on healthy connections.

Presence system

The gateway tracks real-time agent status per user per session. Every state transition pushes a presence frame to all of the user's WebSocket connections, allowing every device to display live agent activity.

State machine

text

starting → streaming → tool_calling → completed
                                    ↘ error

Status	Meaning
`starting`	Request received, session resolved, about to call inference
`streaming`	Inference provider is streaming tokens
`tool_calling`	Executing a tool call returned by the model
`completed`	Turn finished successfully
`error`	Turn failed with an error

Presence frame

json

// Pushed to ALL of the user's WS connections on every state change
{
  "type": "presence",
  "agentId": "chad",
  "sessionId": "ses_xyz",
  "status": "streaming",
  "currentTool": null,
  "timestamp": "2026-02-27T12:00:01.000Z"
}

Storage

Presence state is stored in an in-memory Map<uid, Map<sessionId, AgentPresenceState>>. This is intentionally not persisted — presence is ephemeral and reconstructed from live activity.

typescript

// In-memory presence storage
const presence = new Map<string, Map<string, AgentPresenceState>>();
//                       uid           sessionId

interface AgentPresenceState {
  agentId: string;
  sessionId: string;
  status: 'starting' | 'streaming' | 'tool_calling' | 'completed' | 'error';
  currentTool: string | null;
  updatedAt: Date;
}

Cleanup

A cleanup job runs every 60 seconds and removes entries whose updatedAt is older than 5 minutes. This prevents stale presence from lingering if an agent turn completes without a final state transition (e.g. process crash).

Push API

Other subsystems (channels, cron scheduler, autonomous agents) use the push API to send frames to connected clients without going through the agent loop.

typescript

// Push a frame to all of a user's connected devices
const sentCount = pushToUser(uid, {
  type: 'briefing',
  content: 'Good morning — here is your daily briefing.'
});
// sentCount = number of sockets that received the frame

// Broadcast to every connected user
const totalSent = broadcast({
  type: 'line_alert',
  message: 'Scheduled maintenance in 15 minutes'
});
// totalSent = total frames sent across all users

// Check how many unique users are online
const online = getConnectedUserCount();

Function	Signature	Returns
`pushToUser`	`(uid: string, event: object)`	Number of sockets that received the frame
`broadcast`	`(event: object)`	Total frames sent across all connected users
`getConnectedUserCount`	`()`	Number of unique `uid`s with at least one live socket

Push types in the wild. The line_alert type is pushed when an external channel adapter receives a message that should be surfaced in the web UI. The briefing type is pushed by the cron scheduler for daily briefings. The agent_response type is pushed when an autonomous agent completes a background task.

Topic	What it covers
Gateway Overview	Full gateway architecture — middleware stack, route groups, SSE transport, health endpoints
Event System	All typed events emitted on the event bus, including `ws.connected` and `ws.disconnected`
Agent Loop	The 7-step execution cycle that runs when a WebSocket message reaches the agent
API Reference	Full REST API documentation — the HTTP counterpart to the WebSocket interface

WebSocket

Connection

Authentication

Welcome frame

Multi-device support

Heartbeat

Lifecycle events

Inbound message frames

type: "message"

type: "ping"

Zod validation schema

Validation errors

Outbound event types

session_resolved

stream_chunk

tool_call

tool_result

stream_done

error

Rate limiting

Concurrency

Back-pressure

Presence system

State machine

Presence frame

Storage

Cleanup

Push API

Related pages