Documentation
Streaming
Low-latency streaming with SSE and WebSocket protocols.
Chat streaming (SSE)
Set stream: true on chat completions to receive SSE events. The gateway emits chunk events followed by a done event with usage totals.
event: chunk
data: {"type":"MESSAGE_TYPE_INTERIM_RESULT","content":{"chunkKind":"CHUNK_KIND_TEXT_TOKEN","messageText":"Hello"}}
event: done
data: {"conversationId":"conv_123","messageId":"msg_123","usage":{"inputTokens":5,"outputTokens":11,"totalTokens":16}}WebSocket streaming
WebSocket streaming powers audio transcription, speech synthesis, and avatar streaming. Each connection is authenticated during the upgrade request.
Audio transcription
Send binary PCM frames, receive transcription JSON.
Speech + avatar
Send a JSON request once, then receive audio and optional blendshape frames.
Browser note
WebSocket upgrades require auth headers. Browsers cannot set custom headers on WebSocket connections, so route streaming requests through a server-side proxy if you need browser playback.
Session control
Send a JSON message with
{"type":"end"} to close transcription streams cleanly.Event formats
Transcription events
{
"type": "transcription",
"sessionId": "stt_123",
"text": "hello world",
"confidence": 0.91,
"alternatives": []
}Audio events
{
"type": "audio",
"sessionId": "tts_123",
"audio": "<base64>",
"format": "pcm_s16le",
"sampleRate": 48000,
"duration": 1.2
}Blendshape events
{
"type": "blendshape",
"sessionId": "avatar_123",
"frameIndex": 12,
"ptsMs": 240,
"coeffs": {
"jawOpen": 0.42,
"mouthSmileLeft": 0.18
}
}Was this page helpful?