Documentation

Rate limits

Understand how the gateway enforces request throughput and concurrent stream limits.

Requests per minute

The gateway enforces a per-key request budget. Default limits are configured by your plan and can be overridden per API key.

Each successful request decrements the token bucket. When the bucket is empty, the gateway responds with RATE_LIMITED.

Concurrent streams

Streaming endpoints (SSE and WebSocket) share a concurrent stream cap per API key. When the concurrent cap is reached, new streams are rejected.

Streaming note

WebSocket upgrades are rate-limited during the handshake, and the slot is released after the socket closes.

Monthly token quotas

Keys with a monthly token quota return X-TokenLimit-Limit and X-TokenLimit-Remaining headers. When the quota is exhausted, the gateway returns USAGE_LIMIT_REACHED.

Headers and errors

X-RateLimit-Remaining tracks remaining request capacity. Retry-After is set on 429 responses.

Rate-limited responses include an error field set to RATE_LIMITED with a descriptive message.

Was this page helpful?