Gateway

Middleware

The gateway uses 12 middleware modules that execute in a fixed order on every request. Six run globally on all routes, six are added on agent-scoped routes for authentication, budget enforcement, workspace resolution, and rate limiting. An error handler sits at the end to catch any AppError subclass and serialize it to a structured JSON response.

Execution order

Order matters. Each middleware depends on work done by the ones before it. For example, jwtAuth() must run before tokenBudget() because the budget check needs req.uid.

text
# Global middleware — applied to every request in order
1. securityHeaders()
2. correlationId()
3. cors()
4. express.json({ limit: "1mb" })
5. csrfProtection()
6. requestLogger()

# Agent-scoped middleware — added on agent routes after global middleware
7. jwtAuth()
8. tokenBudget()
9. workspaceResolver()
10. rateLimiter({ window: 60s, max: 30 })
11. userRateLimit()
12. requestTimeout(120_000)

# Error handler — always last
errorHandler()
No external dependencies. The security headers middleware is hand-rolled (helmet-equivalent). The CSRF middleware uses the double-submit cookie pattern. The rate limiter uses Postgres directly. There are no third-party middleware packages beyond express.json().

Global middleware

These six modules run on every inbound HTTP request, in the order listed.

1. securityHeaders()

Sets a strict baseline of HTTP security headers and removes the X-Powered-By header that Express adds by default. This is a hand-rolled helmet equivalent with no external dependencies.

HeaderValue
X-Content-Type-Optionsnosniff
X-Frame-OptionsDENY
Strict-Transport-Securitymax-age=63072000; includeSubDomains; preload
Content-Security-Policydefault-src 'self'; script-src 'self'; object-src 'none'; frame-ancestors 'none'
Referrer-Policystrict-origin-when-cross-origin
Permissions-Policycamera=(), microphone=(), geolocation=()
X-Powered-Byremoved
typescript
// gateway/middleware/security-headers.ts
export function securityHeaders() {
  return (req, res, next) => {
    res.removeHeader("X-Powered-By");
    res.setHeader("X-Content-Type-Options", "nosniff");
    res.setHeader("X-Frame-Options", "DENY");
    res.setHeader("Strict-Transport-Security",
      "max-age=63072000; includeSubDomains; preload");
    res.setHeader("Content-Security-Policy",
      "default-src 'self'; script-src 'self'; object-src 'none'; frame-ancestors 'none'");
    res.setHeader("Referrer-Policy", "strict-origin-when-cross-origin");
    res.setHeader("Permissions-Policy",
      "camera=(), microphone=(), geolocation=()");
    next();
  };
}

2. correlationId()

Reads the X-Request-ID header from the incoming request. If present, it is preserved; if absent, a new UUID v4 is generated. The ID is attached to req.requestId and echoed back in the response headers. Downstream middleware, the request logger, and structured error responses all reference this ID for end-to-end tracing.

typescript
// gateway/middleware/correlation-id.ts
import { randomUUID } from "crypto";

export function correlationId() {
  return (req, res, next) => {
    const id = req.headers["x-request-id"] || randomUUID();
    req.requestId = id;
    res.setHeader("X-Request-ID", id);
    next();
  };
}

3. cors()

In production, only origins listed in the ALLOWED_ORIGINS environment variable (comma-separated) are permitted. In development (NODE_ENV !== "production"), all origins are allowed. Preflight OPTIONS requests are answered immediately with 204 No Content.

typescript
// gateway/middleware/cors.ts
export function cors() {
  const allowed = process.env.ALLOWED_ORIGINS?.split(",").map(s => s.trim()) || [];
  const isDev = "production" !== "production";

  return (req, res, next) => {
    const origin = req.headers.origin;
    if (isDev || allowed.includes(origin)) {
      res.setHeader("Access-Control-Allow-Origin", origin);
      res.setHeader("Access-Control-Allow-Credentials", "true");
      res.setHeader("Access-Control-Allow-Methods",
        "GET, POST, PUT, PATCH, DELETE, OPTIONS");
      res.setHeader("Access-Control-Allow-Headers",
        "Content-Type, Authorization, X-Request-ID, X-Workspace-Id, X-CSRF-Token");
    }
    if (req.method === "OPTIONS") return res.sendStatus(204);
    next();
  };
}
Do not leave ALLOWED_ORIGINS unset in production. If the variable is missing, no origins will be allowed and all cross-origin requests will fail silently.

4. express.json({ limit: "1mb" })

Standard Express JSON body parser with a 1 MB size limit. The verify callback captures the raw Buffer as req.rawBody so that channel webhook endpoints can perform HMAC signature verification against the exact bytes received on the wire.

typescript
// Inline in gateway/index.ts
app.use(express.json({
  limit: "1mb",
  verify: (req, _res, buf) => {
    // Capture raw body for HMAC verification by channel webhooks
    req.rawBody = buf;
  }
}));

5. csrfProtection()

Implements the double-submit cookie pattern. On state-changing requests (POST, PUT, PATCH, DELETE), the middleware reads the csrf_token cookie and requires a matching x-csrf-token header. If they do not match, the request is rejected with 403.

Requests that carry a Bearer token in the Authorization header are exempt. The JWT itself acts as proof of intent, and requiring a separate CSRF token on API calls would break programmatic clients without improving security.

typescript
// gateway/middleware/csrf.ts
export function csrfProtection() {
  return (req, res, next) => {
    // Safe methods are exempt
    if (["GET", "HEAD", "OPTIONS"].includes(req.method)) return next();

    // Bearer token holders are exempt — the token itself is proof of intent
    const auth = req.headers.authorization;
    if (auth?.startsWith("Bearer ")) return next();

    // Double-submit cookie: csrf_token cookie must match x-csrf-token header
    const cookie = req.cookies?.csrf_token;
    const header = req.headers["x-csrf-token"];
    if (!cookie || !header || cookie !== header) {
      return res.status(403).json({
        error: "CSRF validation failed",
        code: "CSRF_INVALID",
        status: 403
      });
    }
    next();
  };
}

6. requestLogger()

Attaches a finish listener to the response. When the response completes, it emits a single structured JSON log line containing the HTTP method, path, status code, duration in milliseconds, authenticated user ID (if available), and the correlation ID. This is the primary source for request-level observability.

typescript
// gateway/middleware/request-logger.ts
export function requestLogger() {
  return (req, res, next) => {
    const start = Date.now();
    res.on("finish", () => {
      const duration = Date.now() - start;
      console.log(JSON.stringify({
        method: req.method,
        path: req.originalUrl,
        status: res.statusCode,
        duration,
        uid: req.uid || null,
        requestId: req.requestId
      }));
    });
    next();
  };
}

Agent-scoped middleware

These six modules are added on agent routes (/agent/*) after the global middleware. They handle authentication, budget enforcement, workspace resolution, and multi-layer rate limiting.

7. jwtAuth()

Extracts the Bearer token from the Authorization header and verifies it as an HS256 JWT using JWT_SECRET. On success, req.uid is set to the user ID from the token payload.

For zero-downtime key rotation, the middleware first tries the current JWT_SECRET. If verification fails and JWT_SECRET_PREV is set, it retries with the previous key. This gives you a window to rotate keys without invalidating existing tokens.

typescript
// gateway/middleware/jwt-auth.ts
import jwt from "jsonwebtoken";

export function jwtAuth() {
  return (req, res, next) => {
    const token = req.headers.authorization?.replace("Bearer ", "");
    if (!token) return res.status(401).json({
      error: "Missing authentication token",
      code: "AUTH_REQUIRED", status: 401
    });

    // Try current secret first, then previous secret for key rotation
    try {
      const payload = jwt.verify(token, process.env.JWT_SECRET, {
        algorithms: ["HS256"]
      });
      req.uid = payload.uid;
      next();
    } catch {
      if (process.env.JWT_SECRET_PREV) {
        try {
          const payload = jwt.verify(token, process.env.JWT_SECRET_PREV, {
            algorithms: ["HS256"]
          });
          req.uid = payload.uid;
          return next();
        } catch { /* fall through */ }
      }
      return res.status(401).json({
        error: "Invalid or expired token",
        code: "AUTH_INVALID", status: 401
      });
    }
  };
}
Key rotation procedure: Set JWT_SECRET_PREV to the current secret, then update JWT_SECRET to the new value. Deploy. Tokens signed with either key will be accepted. After the access token TTL passes (default 15 minutes), remove JWT_SECRET_PREV.

8. tokenBudget()

Enforces per-user token budget limits. There are two thresholds:

  • Hard cap -- When a user's cumulative token consumption reaches the hard cap, the request is rejected with 429 and a BUDGET_HARD_CAP error code.
  • Soft cap -- When consumption exceeds the soft cap but is below the hard cap, the request proceeds but the middleware sets req.budgetDegraded = true and req.overrideModel to a cheaper fallback model. The agent loop reads these flags and switches models transparently.
typescript
// gateway/middleware/token-budget.ts
export function tokenBudget() {
  return async (req, res, next) => {
    const usage = await getUserBudgetUsage(req.uid);
    const limits = await getUserBudgetLimits(req.uid);

    // Hard cap — block the request entirely
    if (usage.tokens >= limits.hardCap) {
      return res.status(429).json({
        error: "Token budget exhausted",
        code: "BUDGET_HARD_CAP",
        status: 429,
        details: { used: usage.tokens, limit: limits.hardCap }
      });
    }

    // Soft cap — allow but downgrade model
    if (usage.tokens >= limits.softCap) {
      req.budgetDegraded = true;
      req.overrideModel = limits.fallbackModel; // e.g. "gpt-4o-mini"
    }

    next();
  };
}

9. workspaceResolver()

Reads the X-Workspace-Id header and validates that the authenticated user (req.uid) is a member of the specified workspace. If the header is missing, returns 400. If the user is not a member, returns 403. On success, attaches req.workspace with the workspace ID, the user's role, and the workspace name.

typescript
// gateway/middleware/workspace-resolver.ts
export function workspaceResolver() {
  return async (req, res, next) => {
    const workspaceId = req.headers["x-workspace-id"];
    if (!workspaceId) return res.status(400).json({
      error: "Missing X-Workspace-Id header",
      code: "WORKSPACE_REQUIRED", status: 400
    });

    const membership = await getWorkspaceMembership(req.uid, workspaceId);
    if (!membership) return res.status(403).json({
      error: "Not a member of this workspace",
      code: "WORKSPACE_FORBIDDEN", status: 403
    });

    req.workspace = {
      id: workspaceId,
      role: membership.role,
      name: membership.workspaceName
    };
    next();
  };
}

10. rateLimiter({ window: 60s, max: 30 })

Postgres-backed sliding window rate limiter. Timestamps are stored in the rate_limit_entries table so limits survive process restarts and work correctly across horizontal replicas. The default configuration allows 30 requests per 60-second window per user.

On every request the middleware atomically deletes expired entries, counts remaining entries, and conditionally inserts a new entry. It sets X-RateLimit-Limit and X-RateLimit-Remaining response headers.

If Postgres is unreachable (connection timeout, pool exhausted), the middleware falls back to an in-memory sliding window transparently. This ensures rate limiting never blocks requests due to a database issue.

typescript
// gateway/middleware/rate-limiter.ts
// Postgres-backed sliding window — survives restarts, works across replicas
export function rateLimiter({ window = 60, max = 30 } = {}) {
  return async (req, res, next) => {
    const key = req.uid || req.ip;
    const now = Date.now();
    const windowStart = now - window * 1000;

    try {
      // Atomic: delete expired + count remaining + insert new entry
      const count = await db.query(`
        WITH cleaned AS (
          DELETE FROM rate_limit_entries
          WHERE key = $1 AND timestamp < $2
        ),
        current AS (
          SELECT COUNT(*) AS cnt FROM rate_limit_entries
          WHERE key = $1 AND timestamp >= $2
        ),
        inserted AS (
          INSERT INTO rate_limit_entries (key, timestamp)
          SELECT $1, $3 WHERE (SELECT cnt FROM current) < $4
          RETURNING 1
        )
        SELECT (SELECT cnt FROM current) AS count
      `, [key, windowStart, now, max]);

      const used = parseInt(count.rows[0].count);
      res.setHeader("X-RateLimit-Limit", max);
      res.setHeader("X-RateLimit-Remaining", Math.max(0, max - used - 1));

      if (used >= max) {
        return res.status(429).json({
          error: "Rate limit exceeded",
          code: "RATE_LIMIT", status: 429,
          details: { retryAfter: window, limit: max, window: window + "s" }
        });
      }
      next();
    } catch {
      // Postgres unreachable — fall back to in-memory sliding window
      inMemoryRateLimiter(key, window, max, req, res, next);
    }
  };
}
Why Postgres instead of Redis? Open Astra already requires Postgres. Using it for rate limiting avoids introducing Redis as an additional dependency. The atomic CTE query is fast enough for the expected request volume, and the in-memory fallback handles transient database issues.

11. userRateLimit()

In-memory per-uid sliding window that provides a fast second layer of rate limiting. It enforces USER_RATE_LIMIT_RPM (default 60) requests per minute plus a USER_RATE_LIMIT_BURST (default 10) burst allowance. This catches abuse faster than the Postgres-backed limiter and does not require a database round-trip.

typescript
// gateway/middleware/user-rate-limit.ts
// In-memory per-uid sliding window — fast path for per-user throttling
const windows = new Map<string, number[]>();

export function userRateLimit() {
  const rpm = parseInt(process.env.USER_RATE_LIMIT_RPM || "60");
  const burst = parseInt(process.env.USER_RATE_LIMIT_BURST || "10");
  const maxPerMinute = rpm + burst;

  return (req, res, next) => {
    const uid = req.uid;
    const now = Date.now();
    const cutoff = now - 60_000;

    let timestamps = windows.get(uid) || [];
    timestamps = timestamps.filter(t => t > cutoff);
    timestamps.push(now);
    windows.set(uid, timestamps);

    if (timestamps.length > maxPerMinute) {
      return res.status(429).json({
        error: "Per-user rate limit exceeded",
        code: "USER_RATE_LIMIT", status: 429,
        details: { rpm, burst, used: timestamps.length }
      });
    }
    next();
  };
}

12. requestTimeout(120_000)

Sets a hard 120-second timeout on every agent request. If the timeout fires and response headers have not been sent yet, it returns a 504 Gateway Timeout error. If headers have already been sent (typical for SSE streams), it calls res.end() to close the stream gracefully rather than destroying the socket.

typescript
// gateway/middleware/request-timeout.ts
export function requestTimeout(ms = 120_000) {
  return (req, res, next) => {
    const timer = setTimeout(() => {
      if (res.headersSent) {
        // SSE stream already started — close gracefully
        res.end();
      } else {
        res.status(504).json({
          error: "Request timeout",
          code: "TIMEOUT", status: 504,
          details: { timeoutMs: ms }
        });
      }
    }, ms);

    res.on("finish", () => clearTimeout(timer));
    res.on("close", () => clearTimeout(timer));
    next();
  };
}

Additional middleware

These modules are available but not part of the default request stack. They are applied selectively on specific routes.

granularRateLimiter

A multi-window rate limiter that checks per-second (burst), per-minute (sustained), and per-hour (quota) windows simultaneously. A request must pass all three windows to proceed. Used on endpoints that need tighter burst protection than the default rateLimiter provides.

typescript
// gateway/middleware/granular-rate-limiter.ts
// Checks per-second, per-minute, and per-hour windows simultaneously
granularRateLimiter({
  perSecond: 5,   // burst protection
  perMinute: 60,  // sustained rate
  perHour: 1000   // hourly cap
})

requireRole(minimum)

RBAC enforcement middleware. Open Astra uses four capability roles that form a strict hierarchy:

RoleRankTypical permissions
owner4Full control: manage members, billing, delete workspace
editor3Create/edit agents, skills, tools, memory
tool_runner2Chat with agents, run tools, view data
viewer1Read-only access to conversations and data

Roles are mapped from the workspace_members.role column. The middleware compares the user's role rank against the required minimum. If the user's rank is lower, the request is rejected with 403 Forbidden.

typescript
// gateway/middleware/require-role.ts
// RBAC hierarchy: owner > editor > tool_runner > viewer
const ROLE_RANK = { owner: 4, editor: 3, tool_runner: 2, viewer: 1 };

export function requireRole(minimum: keyof typeof ROLE_RANK) {
  return (req, res, next) => {
    const userRank = ROLE_RANK[req.workspace?.role];
    if (!userRank || userRank < ROLE_RANK[minimum]) {
      return res.status(403).json({
        error: "Insufficient permissions",
        code: "FORBIDDEN", status: 403,
        details: { required: minimum, current: req.workspace?.role }
      });
    }
    next();
  };
}

etagCache()

Intercepts res.json(), computes an MD5 weak ETag from the serialized response body, and sets Cache-Control: private, must-revalidate. If the request includes an If-None-Match header that matches the ETag, returns 304 Not Modified without sending the body. Currently used on GET /agents and GET /skills.

typescript
// gateway/middleware/etag-cache.ts
// Intercepts res.json(), computes weak ETag, returns 304 on match
export function etagCache() {
  return (req, res, next) => {
    const originalJson = res.json.bind(res);
    res.json = (body) => {
      const serialized = JSON.stringify(body);
      const hash = createHash("md5").update(serialized).digest("hex");
      const etag = `W/"${hash}"`;

      res.setHeader("ETag", etag);
      res.setHeader("Cache-Control", "private, must-revalidate");

      if (req.headers["if-none-match"] === etag) {
        return res.status(304).end();
      }
      return originalJson(body);
    };
    next();
  };
}

bruteForceProtection()

Applied on the /auth/login route. Tracks failed login attempts per IP using an in-memory sliding window (15 minutes). After 5 failures, progressive delays kick in (1s, 2s, 4s, 8s... up to 60s). After 20 failures, the IP is fully blocked for 30 minutes. Successful logins clear the counter. All attempts (pass/fail) are persisted to login_attempts for audit.

ThresholdValueEffect
Delay starts5 failuresProgressive delay: 2^(n-5) seconds, capped at 60s
Full block20 failuresIP blocked for 30 minutes, returns 429 with Retry-After
Window15 minutesFailures older than 15 minutes are pruned

WebSocket rate limiting

Three rate limiters protect the WebSocket layer. checkConnectionRateLimit limits new connections to 10 per IP per minute. checkMessageRateLimit limits messages to 100 per connection per minute — violations close the socket with code 1008. Stale tracking entries are cleaned up every 60 seconds.

rlsContext()

Row-Level Security middleware applied on sessions, traces, and memory-profiles routes. Acquires a dedicated Postgres client per request, opens a transaction, and calls SET rls_context(uid, workspaceId) so Postgres RLS policies can filter rows. The client is committed and released when the response finishes. If setup fails, the middleware falls back to the standard pool (fail-open).

hmacVerify()

Generic HMAC webhook signature verifier shared by all channel webhook endpoints (Telegram, Discord, Slack, WhatsApp, etc.). Configurable algorithm (sha256 or sha1), encoding (hex or base64), signature header name, and prefix (e.g. sha256=). Uses timingSafeEqual to prevent timing attacks. Reads from req.rawBody captured by the body parser.

typescript
// gateway/middleware/hmac-verify.ts
import { timingSafeEqual, createHmac } from "crypto";

export function hmacVerify({
  secret,
  header,              // e.g. "x-hub-signature-256"
  algorithm = "sha256", // sha256 | sha1
  encoding = "hex",     // hex | base64
  prefix = ""           // e.g. "sha256=" for signed webhook signatures
}) {
  return (req, res, next) => {
    const signature = req.headers[header];
    if (!signature) return res.status(401).json({
      error: "Missing webhook signature", code: "HMAC_MISSING", status: 401
    });

    const expected = prefix + createHmac(algorithm, secret)
      .update(req.rawBody)
      .digest(encoding);

    const a = Buffer.from(signature);
    const b = Buffer.from(expected);
    if (a.length !== b.length || !timingSafeEqual(a, b)) {
      return res.status(401).json({
        error: "Invalid webhook signature", code: "HMAC_INVALID", status: 401
      });
    }
    next();
  };
}

Error handler

The error handler is registered last in the middleware chain. It catches any error that extends AppError and serializes it to a consistent JSON shape: { error, code, status, details? }. Unknown errors (errors that do not extend AppError) are logged and returned as a generic 500 Internal Server Error with no internal details exposed.

typescript
// gateway/middleware/error-handler.ts
export function errorHandler() {
  return (err, req, res, _next) => {
    if (err instanceof AppError) {
      return res.status(err.status).json({
        error: err.message,
        code: err.code,
        status: err.status,
        ...(err.details && { details: err.details })
      });
    }

    // Unknown errors become generic 500s — never leak internals
    console.error("Unhandled error:", err);
    return res.status(500).json({
      error: "Internal server error",
      code: "INTERNAL_ERROR",
      status: 500
    });
  };
}
Error classStatusCode
AuthError401AUTH_REQUIRED / AUTH_INVALID
ForbiddenError403FORBIDDEN
ValidationError400VALIDATION_ERROR
NotFoundError404NOT_FOUND
ConflictError409CONFLICT
RateLimitError429RATE_LIMIT
QuotaExceededError429QUOTA_EXCEEDED
AgentError500AGENT_ERROR
ToolError500TOOL_ERROR
InferenceError502INFERENCE_ERROR
ConfigError500CONFIG_ERROR

How the stack is assembled

Global middleware is registered with app.use() at the top of gateway/index.ts. Agent-scoped middleware is passed as an array to route group registrations. The error handler is registered last.

typescript
// gateway/index.ts — how the stack is assembled
import { securityHeaders } from "./middleware/security-headers";
import { correlationId } from "./middleware/correlation-id";
import { cors } from "./middleware/cors";
import { csrfProtection } from "./middleware/csrf";
import { requestLogger } from "./middleware/request-logger";
import { jwtAuth } from "./middleware/jwt-auth";
import { tokenBudget } from "./middleware/token-budget";
import { workspaceResolver } from "./middleware/workspace-resolver";
import { rateLimiter } from "./middleware/rate-limiter";
import { userRateLimit } from "./middleware/user-rate-limit";
import { requestTimeout } from "./middleware/request-timeout";
import { errorHandler } from "./middleware/error-handler";

// ── Global middleware (every request) ────────────────────────
app.use(securityHeaders());
app.use(correlationId());
app.use(cors());
app.use(express.json({ limit: "1mb", verify: (req, _, buf) => { req.rawBody = buf; } }));
app.use(csrfProtection());
app.use(requestLogger());

// ── Route groups ─────────────────────────────────────────────
app.use("/auth", authRoutes);                 // public — no JWT
app.use("/health", healthRoutes);             // public — no JWT
app.use("/webhooks/channels", channelRoutes); // HMAC-verified per channel

// ── Agent-scoped middleware ──────────────────────────────────
const agentStack = [
  jwtAuth(),
  tokenBudget(),
  workspaceResolver(),
  rateLimiter({ window: 60, max: 30 }),
  userRateLimit(),
  requestTimeout(120_000)
];

app.use("/agent", agentStack, agentRoutes);
app.use("/agents", [jwtAuth(), workspaceResolver()], agentCrudRoutes);
app.use("/sessions", [jwtAuth()], sessionRoutes);
// ...remaining route groups

// ── Error handler (always last) ──────────────────────────────
app.use(errorHandler());

Configuration

Environment variables that control middleware behavior:

VariableDefaultUsed byDescription
ALLOWED_ORIGINShttp://localhost:3000cors()Comma-separated list of allowed CORS origins. In development all origins are permitted regardless of this value.
JWT_SECRET--jwtAuth()HMAC-HS256 signing key for access and refresh tokens. Must be at least 32 characters in production.
JWT_ACCESS_EXPIRES15mjwtAuth()Access token time-to-live. Accepts ms-compatible duration strings (15m, 1h, 30s).
JWT_REFRESH_EXPIRES7djwtAuth()Refresh token time-to-live.
JWT_SECRET_PREV--jwtAuth()Previous signing key for zero-downtime key rotation. Set this to the old secret before updating JWT_SECRET.
BIND_JWT_TO_DEVICEfalsejwtAuth()When true, the JWT payload includes a device fingerprint and tokens are rejected if the fingerprint does not match the requesting device.
USER_RATE_LIMIT_RPM60userRateLimit()Maximum requests per minute per authenticated user.
USER_RATE_LIMIT_BURST10userRateLimit()Additional burst allowance above the RPM limit. Effective max per minute is RPM + burst.
JWT_SECRET in production. If JWT_SECRET is shorter than 32 characters when NODE_ENV=production, the gateway will refuse to start and log a CONFIG_ERROR. Use a cryptographically random string: openssl rand -base64 48.

See also

  • Gateway overview -- architecture, transports, route groups, and health endpoints
  • Auth hardening -- JWT rotation, CSRF, device binding, key management
  • Quotas -- per-agent token and cost budget configuration
  • Workspaces -- workspace membership and role assignment
  • API reference -- full REST API documentation