Gateway

WebSocket

The gateway exposes a full-duplex WebSocket endpoint alongside the REST API, sharing the same http.Server. Clients connect once, authenticate via JWT, and receive streamed agent responses, presence updates, push notifications, and heartbeats over a single persistent connection. The WebSocket server is implemented in gateway/ws.ts (connection management) and gateway/ws-handler.ts (message handling and agent streaming).

Connection

Connect to ws://host/ws with a JWT access token in the query string.

typescript
// Browser
const ws = new WebSocket('ws://localhost:8080/ws?token=<jwt>');

ws.onopen = () => console.log('connected');
ws.onmessage = (e) => {
  const frame = JSON.parse(e.data);
  console.log(frame.type, frame);
};
ws.onclose = (e) => {
  if (e.code === 4001) console.error('Missing token');
  if (e.code === 4003) console.error('Invalid or expired token');
};

Authentication

The token is validated on upgrade. If authentication fails, the socket is closed immediately with a WebSocket close code:

Close codeMeaning
4001Missing token — no token query parameter was provided
4003Invalid or expired token — JWT verification failed

Welcome frame

On successful authentication the server immediately sends a welcome frame:

json
// Server sends immediately after successful auth
{
  "type": "connected",
  "timestamp": "2026-02-27T12:00:00.000Z"
}

Multi-device support

A single user (uid) can have any number of concurrent WebSocket connections — one per device, browser tab, or CLI session. The gateway stores connections in a Map<uid, Set<AuthenticatedSocket>>. All outbound events (stream chunks, presence, push notifications) are fanned out to every socket in the user's set.

typescript
// Internal connection storage
// One uid can have N concurrent sockets (multi-device)
const connections = new Map<string, Set<AuthenticatedSocket>>();

// Each AuthenticatedSocket carries:
// - socket: WebSocket
// - uid: string
// - connectedAt: Date
// - inFlight: { abortController: AbortController, id: string } | null

Heartbeat

The server sends a ping frame every 30 seconds. Clients must respond with pong. Sockets that fail to respond before the next heartbeat cycle are terminated.

json
// Server → Client every 30 seconds
{ type: "ping" }

// Client must respond with
{ type: "pong" }

// If no pong received before next cycle, socket is terminated

Lifecycle events

The event bus emits events on each connection lifecycle change, which other subsystems (logging, metrics, autonomous agents) can subscribe to:

typescript
// Emitted on the typed event bus for each connection lifecycle event
bus.emit('ws.connected',    { uid, socketCount });
bus.emit('ws.disconnected', { uid, socketCount, code, reason });

Inbound message frames

All inbound frames are validated with Zod. There are exactly two valid frame types: message and ping.

type: "message"

Send a message to an agent. The id field is a client-generated request identifier that the server echoes back on every outbound frame related to this request.

json
// type: "message" — send a message to an agent
{
  "type": "message",
  "id": "req_abc123",               // string, required — client-generated request ID
  "agentId": "chad",                // string, default "chad"
  "message": "What is the weather?", // string, 1–10 000 chars
  "surface": "chat",                // "chat" | "prop" | "group", default "chat"
  "surfaceId": "main"               // string, default "main"
}
FieldTypeRequiredDefaultConstraints
type"message"YesLiteral
idstringYesClient-generated request ID
agentIdstringNo"chad"Must match a registered agent
messagestringYes1 – 10 000 characters
surfacestringNo"chat""chat" | "prop" | "group"
surfaceIdstringNo"main"Identifies the specific surface instance

type: "ping"

Client-initiated keep-alive. The server responds with { type: "pong" }.

json
// type: "ping" — keep-alive from client
{
  "type": "ping"
}

Zod validation schema

The server parses every inbound frame through a discriminated union. Invalid JSON, unknown types, or constraint violations all produce an error frame.

typescript
import { z } from 'zod';

const messageFrame = z.object({
  type:      z.literal('message'),
  id:        z.string(),
  agentId:   z.string().default('chad'),
  message:   z.string().min(1).max(10000),
  surface:   z.enum(['chat', 'prop', 'group']).default('chat'),
  surfaceId: z.string().default('main'),
});

const pingFrame = z.object({
  type: z.literal('ping'),
});

const inboundFrame = z.discriminatedUnion('type', [messageFrame, pingFrame]);

Validation errors

If the payload fails validation, the server responds with an error frame. The socket is not closed — the client can retry.

json
// Invalid JSON
{ "type": "error", "message": "Invalid JSON" }

// Unknown frame type
{ "type": "error", "message": "Unknown frame type: subscribe" }

// Zod validation failure
{ "type": "error", "message": "message: String must contain at least 1 character(s)" }

Outbound event types

The server pushes frames to clients for agent streaming, presence, system notifications, and more. Every frame has a type field.

TypeDirectionDescription
connectedServer → ClientWelcome frame on successful auth
session_resolvedServer → ClientSession found or created for this request
stream_chunkServer → ClientIncremental text delta during streaming
tool_callServer → ClientTool call started; sent again with complete: true when arguments are ready
tool_resultServer → ClientTool finished executing — includes duration and optional error
stream_doneServer → ClientFinal result — full content, session ID, usage stats, tools used
errorServer → ClientError tied to a specific request
presenceServer → ClientAgent status change (starting, streaming, tool_calling, completed, error)
pongServer → ClientResponse to client-initiated ping
line_alertServer → ClientPushed by external channel adapters
briefingServer → ClientPushed by the cron scheduler (e.g. daily briefing)
agent_responseServer → ClientPushed by autonomous agents running in the background

session_resolved

json
// Sent after session lookup / creation
{
  "type": "session_resolved",
  "id": "req_abc123",
  "sessionId": "ses_xyz",
  "isNew": true
}

stream_chunk

Sent for each token batch as the agent streams its response. Clients concatenate content values to build the full response progressively.

json
// Incremental text delta — one per token batch
{
  "type": "stream_chunk",
  "id": "req_abc123",
  "content": "The weather in"
}

tool_call

Sent when the model initiates a tool call. A second frame with complete: true is sent once the full arguments have been parsed from the stream.

json
// Tool call started
{
  "type": "tool_call",
  "id": "req_abc123",
  "index": 0,
  "toolCallId": "tc_1",
  "name": "web_search"
}

// Tool call complete (includes parsed arguments)
{
  "type": "tool_call",
  "id": "req_abc123",
  "index": 0,
  "toolCallId": "tc_1",
  "name": "web_search",
  "complete": true,
  "arguments": "{\"query\":\"weather today\"}"
}

tool_result

Sent after the tool finishes executing. The error field is null on success or contains the error string on failure.

json
// After tool execution finishes
{
  "type": "tool_result",
  "id": "req_abc123",
  "name": "web_search",
  "durationMs": 320,
  "error": null
}

stream_done

Marks the end of the streaming response. Contains the full assembled content, session ID, token usage, and a list of all tools invoked during this turn.

json
// Final frame — full response + metadata
{
  "type": "stream_done",
  "id": "req_abc123",
  "content": "The weather in San Francisco is 62°F and sunny.",
  "sessionId": "ses_xyz",
  "usage": {
    "promptTokens": 1240,
    "completionTokens": 38,
    "totalTokens": 1278
  },
  "toolsUsed": ["web_search"]
}

error

json
// Error tied to a specific request
{
  "type": "error",
  "id": "req_abc123",
  "message": "Rate limit exceeded"
}

Rate limiting

WebSocket messages are rate-limited separately from the HTTP API. The gateway uses an in-memory sliding window scoped to each uid.

typescript
// In-memory sliding window per uid
const WINDOW_MS   = 60_000;   // 60 seconds
const MAX_MESSAGES = 30;       // per window

// Timestamps older than WINDOW_MS are pruned every 60 seconds
// Exceeding the limit returns:
{
  "type": "error",
  "id": "req_abc123",
  "message": "Rate limit exceeded"
}
ParameterValue
Window60 seconds (sliding)
Max messages per window30
Cleanup intervalEvery 60 seconds — timestamps older than the window are pruned
ScopePer uid (shared across all of the user's sockets)
Per-uid, not per-socket. Rate limits are tracked by user ID. A user with three open tabs shares a single 30-message window across all of them.

Concurrency

The gateway enforces a maximum of 1 in-flight agent request per socket. This prevents a single tab from flooding the agent pipeline. Because the limit is per socket rather than per uid, a user with multiple devices can still have parallel requests — one per connection.

typescript
// Max 1 in-flight agent request PER SOCKET (not per uid)
// A user with 3 devices can have 3 parallel requests

// Second message while one is in progress:
{
  "type": "error",
  "id": "req_def456",
  "message": "A request is already in progress on this connection"
}

// Each in-flight request carries an AbortController
// Closing the socket calls abortController.abort()
// → the inference provider cancels the request upstream
Socket close aborts in-flight requests. Each in-flight request holds an AbortController. When the socket closes (browser tab closed, network drop, etc.), the controller's abort() is called immediately, canceling the inference request upstream. This prevents orphaned agent runs from consuming resources.

Back-pressure

When the agent streams tokens faster than the client can consume them, the TCP send buffer fills up. The gateway monitors socket.bufferedAmount and pauses streaming if it exceeds 64 KB, polling every 50 ms until the buffer drains below threshold.

typescript
// Before sending each chunk, the gateway checks:
if (socket.bufferedAmount > 65_536) {   // 64 KB threshold
  // Pause streaming — poll every 50ms
  while (socket.bufferedAmount > 65_536) {
    await sleep(50);
  }
}
// Then resume sending chunks
Why 64 KB? This threshold is a balance between latency and throughput. Smaller values reduce memory pressure but increase the number of pause/resume cycles. 64 KB accommodates typical bursts from fast inference providers without stalling on healthy connections.

Presence system

The gateway tracks real-time agent status per user per session. Every state transition pushes a presence frame to all of the user's WebSocket connections, allowing every device to display live agent activity.

State machine

text
starting → streaming → tool_calling → completed
                                    ↘ error
StatusMeaning
startingRequest received, session resolved, about to call inference
streamingInference provider is streaming tokens
tool_callingExecuting a tool call returned by the model
completedTurn finished successfully
errorTurn failed with an error

Presence frame

json
// Pushed to ALL of the user's WS connections on every state change
{
  "type": "presence",
  "agentId": "chad",
  "sessionId": "ses_xyz",
  "status": "streaming",
  "currentTool": null,
  "timestamp": "2026-02-27T12:00:01.000Z"
}

Storage

Presence state is stored in an in-memory Map<uid, Map<sessionId, AgentPresenceState>>. This is intentionally not persisted — presence is ephemeral and reconstructed from live activity.

typescript
// In-memory presence storage
const presence = new Map<string, Map<string, AgentPresenceState>>();
//                       uid           sessionId

interface AgentPresenceState {
  agentId: string;
  sessionId: string;
  status: 'starting' | 'streaming' | 'tool_calling' | 'completed' | 'error';
  currentTool: string | null;
  updatedAt: Date;
}

Cleanup

A cleanup job runs every 60 seconds and removes entries whose updatedAt is older than 5 minutes. This prevents stale presence from lingering if an agent turn completes without a final state transition (e.g. process crash).

Push API

Other subsystems (channels, cron scheduler, autonomous agents) use the push API to send frames to connected clients without going through the agent loop.

typescript
// Push a frame to all of a user's connected devices
const sentCount = pushToUser(uid, {
  type: 'briefing',
  content: 'Good morning — here is your daily briefing.'
});
// sentCount = number of sockets that received the frame

// Broadcast to every connected user
const totalSent = broadcast({
  type: 'line_alert',
  message: 'Scheduled maintenance in 15 minutes'
});
// totalSent = total frames sent across all users

// Check how many unique users are online
const online = getConnectedUserCount();
FunctionSignatureReturns
pushToUser(uid: string, event: object)Number of sockets that received the frame
broadcast(event: object)Total frames sent across all connected users
getConnectedUserCount()Number of unique uids with at least one live socket
Push types in the wild. The line_alert type is pushed when an external channel adapter receives a message that should be surfaced in the web UI. The briefing type is pushed by the cron scheduler for daily briefings. The agent_response type is pushed when an autonomous agent completes a background task.
TopicWhat it covers
Gateway OverviewFull gateway architecture — middleware stack, route groups, SSE transport, health endpoints
Event SystemAll typed events emitted on the event bus, including ws.connected and ws.disconnected
Agent LoopThe 7-step execution cycle that runs when a WebSocket message reaches the agent
API ReferenceFull REST API documentation — the HTTP counterpart to the WebSocket interface