WebSocket
The gateway exposes a full-duplex WebSocket endpoint alongside the REST API, sharing the same http.Server. Clients connect once, authenticate via JWT, and receive streamed agent responses, presence updates, push notifications, and heartbeats over a single persistent connection. The WebSocket server is implemented in gateway/ws.ts (connection management) and gateway/ws-handler.ts (message handling and agent streaming).
Connection
Connect to ws://host/ws with a JWT access token in the query string.
// Browser
const ws = new WebSocket('ws://localhost:8080/ws?token=<jwt>');
ws.onopen = () => console.log('connected');
ws.onmessage = (e) => {
const frame = JSON.parse(e.data);
console.log(frame.type, frame);
};
ws.onclose = (e) => {
if (e.code === 4001) console.error('Missing token');
if (e.code === 4003) console.error('Invalid or expired token');
};Authentication
The token is validated on upgrade. If authentication fails, the socket is closed immediately with a WebSocket close code:
| Close code | Meaning |
|---|---|
4001 | Missing token — no token query parameter was provided |
4003 | Invalid or expired token — JWT verification failed |
Welcome frame
On successful authentication the server immediately sends a welcome frame:
// Server sends immediately after successful auth
{
"type": "connected",
"timestamp": "2026-02-27T12:00:00.000Z"
}Multi-device support
A single user (uid) can have any number of concurrent WebSocket connections — one per device, browser tab, or CLI session. The gateway stores connections in a Map<uid, Set<AuthenticatedSocket>>. All outbound events (stream chunks, presence, push notifications) are fanned out to every socket in the user's set.
// Internal connection storage
// One uid can have N concurrent sockets (multi-device)
const connections = new Map<string, Set<AuthenticatedSocket>>();
// Each AuthenticatedSocket carries:
// - socket: WebSocket
// - uid: string
// - connectedAt: Date
// - inFlight: { abortController: AbortController, id: string } | nullHeartbeat
The server sends a ping frame every 30 seconds. Clients must respond with pong. Sockets that fail to respond before the next heartbeat cycle are terminated.
// Server → Client every 30 seconds
{ type: "ping" }
// Client must respond with
{ type: "pong" }
// If no pong received before next cycle, socket is terminatedLifecycle events
The event bus emits events on each connection lifecycle change, which other subsystems (logging, metrics, autonomous agents) can subscribe to:
// Emitted on the typed event bus for each connection lifecycle event
bus.emit('ws.connected', { uid, socketCount });
bus.emit('ws.disconnected', { uid, socketCount, code, reason });Inbound message frames
All inbound frames are validated with Zod. There are exactly two valid frame types: message and ping.
type: "message"
Send a message to an agent. The id field is a client-generated request identifier that the server echoes back on every outbound frame related to this request.
// type: "message" — send a message to an agent
{
"type": "message",
"id": "req_abc123", // string, required — client-generated request ID
"agentId": "chad", // string, default "chad"
"message": "What is the weather?", // string, 1–10 000 chars
"surface": "chat", // "chat" | "prop" | "group", default "chat"
"surfaceId": "main" // string, default "main"
}| Field | Type | Required | Default | Constraints |
|---|---|---|---|---|
type | "message" | Yes | — | Literal |
id | string | Yes | — | Client-generated request ID |
agentId | string | No | "chad" | Must match a registered agent |
message | string | Yes | — | 1 – 10 000 characters |
surface | string | No | "chat" | "chat" | "prop" | "group" |
surfaceId | string | No | "main" | Identifies the specific surface instance |
type: "ping"
Client-initiated keep-alive. The server responds with { type: "pong" }.
// type: "ping" — keep-alive from client
{
"type": "ping"
}Zod validation schema
The server parses every inbound frame through a discriminated union. Invalid JSON, unknown types, or constraint violations all produce an error frame.
import { z } from 'zod';
const messageFrame = z.object({
type: z.literal('message'),
id: z.string(),
agentId: z.string().default('chad'),
message: z.string().min(1).max(10000),
surface: z.enum(['chat', 'prop', 'group']).default('chat'),
surfaceId: z.string().default('main'),
});
const pingFrame = z.object({
type: z.literal('ping'),
});
const inboundFrame = z.discriminatedUnion('type', [messageFrame, pingFrame]);Validation errors
If the payload fails validation, the server responds with an error frame. The socket is not closed — the client can retry.
// Invalid JSON
{ "type": "error", "message": "Invalid JSON" }
// Unknown frame type
{ "type": "error", "message": "Unknown frame type: subscribe" }
// Zod validation failure
{ "type": "error", "message": "message: String must contain at least 1 character(s)" }Outbound event types
The server pushes frames to clients for agent streaming, presence, system notifications, and more. Every frame has a type field.
| Type | Direction | Description |
|---|---|---|
connected | Server → Client | Welcome frame on successful auth |
session_resolved | Server → Client | Session found or created for this request |
stream_chunk | Server → Client | Incremental text delta during streaming |
tool_call | Server → Client | Tool call started; sent again with complete: true when arguments are ready |
tool_result | Server → Client | Tool finished executing — includes duration and optional error |
stream_done | Server → Client | Final result — full content, session ID, usage stats, tools used |
error | Server → Client | Error tied to a specific request |
presence | Server → Client | Agent status change (starting, streaming, tool_calling, completed, error) |
pong | Server → Client | Response to client-initiated ping |
line_alert | Server → Client | Pushed by external channel adapters |
briefing | Server → Client | Pushed by the cron scheduler (e.g. daily briefing) |
agent_response | Server → Client | Pushed by autonomous agents running in the background |
session_resolved
// Sent after session lookup / creation
{
"type": "session_resolved",
"id": "req_abc123",
"sessionId": "ses_xyz",
"isNew": true
}stream_chunk
Sent for each token batch as the agent streams its response. Clients concatenate content values to build the full response progressively.
// Incremental text delta — one per token batch
{
"type": "stream_chunk",
"id": "req_abc123",
"content": "The weather in"
}tool_call
Sent when the model initiates a tool call. A second frame with complete: true is sent once the full arguments have been parsed from the stream.
// Tool call started
{
"type": "tool_call",
"id": "req_abc123",
"index": 0,
"toolCallId": "tc_1",
"name": "web_search"
}
// Tool call complete (includes parsed arguments)
{
"type": "tool_call",
"id": "req_abc123",
"index": 0,
"toolCallId": "tc_1",
"name": "web_search",
"complete": true,
"arguments": "{\"query\":\"weather today\"}"
}tool_result
Sent after the tool finishes executing. The error field is null on success or contains the error string on failure.
// After tool execution finishes
{
"type": "tool_result",
"id": "req_abc123",
"name": "web_search",
"durationMs": 320,
"error": null
}stream_done
Marks the end of the streaming response. Contains the full assembled content, session ID, token usage, and a list of all tools invoked during this turn.
// Final frame — full response + metadata
{
"type": "stream_done",
"id": "req_abc123",
"content": "The weather in San Francisco is 62°F and sunny.",
"sessionId": "ses_xyz",
"usage": {
"promptTokens": 1240,
"completionTokens": 38,
"totalTokens": 1278
},
"toolsUsed": ["web_search"]
}error
// Error tied to a specific request
{
"type": "error",
"id": "req_abc123",
"message": "Rate limit exceeded"
}Rate limiting
WebSocket messages are rate-limited separately from the HTTP API. The gateway uses an in-memory sliding window scoped to each uid.
// In-memory sliding window per uid
const WINDOW_MS = 60_000; // 60 seconds
const MAX_MESSAGES = 30; // per window
// Timestamps older than WINDOW_MS are pruned every 60 seconds
// Exceeding the limit returns:
{
"type": "error",
"id": "req_abc123",
"message": "Rate limit exceeded"
}| Parameter | Value |
|---|---|
| Window | 60 seconds (sliding) |
| Max messages per window | 30 |
| Cleanup interval | Every 60 seconds — timestamps older than the window are pruned |
| Scope | Per uid (shared across all of the user's sockets) |
Concurrency
The gateway enforces a maximum of 1 in-flight agent request per socket. This prevents a single tab from flooding the agent pipeline. Because the limit is per socket rather than per uid, a user with multiple devices can still have parallel requests — one per connection.
// Max 1 in-flight agent request PER SOCKET (not per uid)
// A user with 3 devices can have 3 parallel requests
// Second message while one is in progress:
{
"type": "error",
"id": "req_def456",
"message": "A request is already in progress on this connection"
}
// Each in-flight request carries an AbortController
// Closing the socket calls abortController.abort()
// → the inference provider cancels the request upstreamAbortController. When the socket closes (browser tab closed, network drop, etc.), the controller's abort() is called immediately, canceling the inference request upstream. This prevents orphaned agent runs from consuming resources. Back-pressure
When the agent streams tokens faster than the client can consume them, the TCP send buffer fills up. The gateway monitors socket.bufferedAmount and pauses streaming if it exceeds 64 KB, polling every 50 ms until the buffer drains below threshold.
// Before sending each chunk, the gateway checks:
if (socket.bufferedAmount > 65_536) { // 64 KB threshold
// Pause streaming — poll every 50ms
while (socket.bufferedAmount > 65_536) {
await sleep(50);
}
}
// Then resume sending chunksPresence system
The gateway tracks real-time agent status per user per session. Every state transition pushes a presence frame to all of the user's WebSocket connections, allowing every device to display live agent activity.
State machine
starting → streaming → tool_calling → completed
↘ error| Status | Meaning |
|---|---|
starting | Request received, session resolved, about to call inference |
streaming | Inference provider is streaming tokens |
tool_calling | Executing a tool call returned by the model |
completed | Turn finished successfully |
error | Turn failed with an error |
Presence frame
// Pushed to ALL of the user's WS connections on every state change
{
"type": "presence",
"agentId": "chad",
"sessionId": "ses_xyz",
"status": "streaming",
"currentTool": null,
"timestamp": "2026-02-27T12:00:01.000Z"
}Storage
Presence state is stored in an in-memory Map<uid, Map<sessionId, AgentPresenceState>>. This is intentionally not persisted — presence is ephemeral and reconstructed from live activity.
// In-memory presence storage
const presence = new Map<string, Map<string, AgentPresenceState>>();
// uid sessionId
interface AgentPresenceState {
agentId: string;
sessionId: string;
status: 'starting' | 'streaming' | 'tool_calling' | 'completed' | 'error';
currentTool: string | null;
updatedAt: Date;
}Cleanup
A cleanup job runs every 60 seconds and removes entries whose updatedAt is older than 5 minutes. This prevents stale presence from lingering if an agent turn completes without a final state transition (e.g. process crash).
Push API
Other subsystems (channels, cron scheduler, autonomous agents) use the push API to send frames to connected clients without going through the agent loop.
// Push a frame to all of a user's connected devices
const sentCount = pushToUser(uid, {
type: 'briefing',
content: 'Good morning — here is your daily briefing.'
});
// sentCount = number of sockets that received the frame
// Broadcast to every connected user
const totalSent = broadcast({
type: 'line_alert',
message: 'Scheduled maintenance in 15 minutes'
});
// totalSent = total frames sent across all users
// Check how many unique users are online
const online = getConnectedUserCount();| Function | Signature | Returns |
|---|---|---|
pushToUser | (uid: string, event: object) | Number of sockets that received the frame |
broadcast | (event: object) | Total frames sent across all connected users |
getConnectedUserCount | () | Number of unique uids with at least one live socket |
line_alert type is pushed when an external channel adapter receives a message that should be surfaced in the web UI. The briefing type is pushed by the cron scheduler for daily briefings. The agent_response type is pushed when an autonomous agent completes a background task. Related pages
| Topic | What it covers |
|---|---|
| Gateway Overview | Full gateway architecture — middleware stack, route groups, SSE transport, health endpoints |
| Event System | All typed events emitted on the event bus, including ws.connected and ws.disconnected |
| Agent Loop | The 7-step execution cycle that runs when a WebSocket message reaches the agent |
| API Reference | Full REST API documentation — the HTTP counterpart to the WebSocket interface |