Documentation

Streaming

Low-latency streaming with SSE and WebSocket protocols.

Chat streaming (SSE)

Set stream: true on chat completions to receive SSE events. The gateway emits chunk events followed by a done event with usage totals.

event: chunk

data: {"type":"MESSAGE_TYPE_INTERIM_RESULT","content":{"chunkKind":"CHUNK_KIND_TEXT_TOKEN","messageText":"Hello"}}

event: done

data: {"conversationId":"conv_123","messageId":"msg_123","usage":{"inputTokens":5,"outputTokens":11,"totalTokens":16}}

WebSocket streaming

WebSocket streaming powers audio transcription, speech synthesis, and avatar streaming. Each connection is authenticated during the upgrade request.

Audio transcription

Send binary PCM frames, receive transcription JSON.

Speech + avatar

Send a JSON request once, then receive audio and optional blendshape frames.

Browser note

WebSocket upgrades require auth headers. Browsers cannot set custom headers on WebSocket connections, so route streaming requests through a server-side proxy if you need browser playback.

Session control

Send a JSON message with {"type":"end"} to close transcription streams cleanly.

Event formats

Transcription events

{
  "type": "transcription",
  "sessionId": "stt_123",
  "text": "hello world",
  "confidence": 0.91,
  "alternatives": []
}

Audio events

{
  "type": "audio",
  "sessionId": "tts_123",
  "audio": "<base64>",
  "format": "pcm_s16le",
  "sampleRate": 48000,
  "duration": 1.2
}

Blendshape events

{
  "type": "blendshape",
  "sessionId": "avatar_123",
  "frameIndex": 12,
  "ptsMs": 240,
  "coeffs": {
    "jawOpen": 0.42,
    "mouthSmileLeft": 0.18
  }
}
Was this page helpful?