Documentation
Rate limits
Understand how the gateway enforces request throughput and concurrent stream limits.
Requests per minute
The gateway enforces a per-key request budget. Default limits are configured by your plan and can be overridden per API key.
Each successful request decrements the token bucket. When the bucket is empty, the gateway responds with RATE_LIMITED.
Concurrent streams
Streaming endpoints (SSE and WebSocket) share a concurrent stream cap per API key. When the concurrent cap is reached, new streams are rejected.
Streaming note
Monthly token quotas
Keys with a monthly token quota return X-TokenLimit-Limit and X-TokenLimit-Remaining headers. When the quota is exhausted, the gateway returns USAGE_LIMIT_REACHED.
Headers and errors
X-RateLimit-Remaining tracks remaining request capacity. Retry-After is set on 429 responses.
Rate-limited responses include an error field set to RATE_LIMITED with a descriptive message.