API Reference

Text to speech

Synthesize audio from text via HTTP or WebSocket streaming.

POST/v1/audio/speech

Requires avatar:interact or avatar:use. Returns base64-encoded audio and metadata.

Request body

FieldTypeNotes
textstringRequired input text.
sessionIdstringOptional session identifier.
voiceIdstringVoice selection (default if omitted).
emotionstringEmotion preset (neutral by default).
speednumberPlayback speed multiplier.
curl https://<gateway-host>/v1/audio/speech \
  -H "Authorization: Bearer $DISRUPTIVERAIN_CLIENT_ID:$DISRUPTIVERAIN_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Disruptive Rain.",
    "voiceId": "default",
    "emotion": "neutral",
    "speed": 1.0
  }'

Speed parameter

The speed field is accepted but may not affect synthesis output yet.

Response

{
  "sessionId": "tts_123",
  "audio": "<base64>",
  "format": "pcm_s16le",
  "sampleRate": 48000,
  "duration": 1.2
}
WS/v1/audio/speech/stream

Open a WebSocket, send a single JSON payload, and receive audio frames. Optional blendshape frames can be included by setting includeBlendshapes to true.

import WebSocket from 'ws';

const ws = new WebSocket('wss://<gateway-host>/v1/audio/speech/stream', {
  headers: {
    Authorization: `Bearer ${process.env.DISRUPTIVERAIN_CLIENT_ID}:${process.env.DISRUPTIVERAIN_CLIENT_SECRET}`,
  },
});

ws.onopen = () => {
  ws.send(JSON.stringify({
    text: 'Welcome to Disruptive Rain.',
    voiceId: 'default',
    emotion: 'neutral',
    speed: 1.0,
    includeBlendshapes: false,
  }));
};

ws.onmessage = (event) => {
  const message = JSON.parse(event.data);
  if (message.type === 'audio') {
    console.log('audio frame', message.audio);
  }
};

Blendshapes

Blendshape frames are optional for speech streams but always included for avatar streams.

Browser note

WebSocket upgrades require auth headers. Browsers cannot set custom headers on WebSocket connections, so proxy streaming through your backend if you need browser speech synthesis.
Was this page helpful?