API Reference
Text to speech
Synthesize audio from text via HTTP or WebSocket streaming.
POST
/v1/audio/speechRequires avatar:interact or avatar:use. Returns base64-encoded audio and metadata.
Request body
| Field | Type | Notes |
|---|---|---|
| text | string | Required input text. |
| sessionId | string | Optional session identifier. |
| voiceId | string | Voice selection (default if omitted). |
| emotion | string | Emotion preset (neutral by default). |
| speed | number | Playback speed multiplier. |
curl https://<gateway-host>/v1/audio/speech \
-H "Authorization: Bearer $DISRUPTIVERAIN_CLIENT_ID:$DISRUPTIVERAIN_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to Disruptive Rain.",
"voiceId": "default",
"emotion": "neutral",
"speed": 1.0
}'Speed parameter
The
speed field is accepted but may not affect synthesis output yet.Response
{
"sessionId": "tts_123",
"audio": "<base64>",
"format": "pcm_s16le",
"sampleRate": 48000,
"duration": 1.2
}WS
/v1/audio/speech/streamOpen a WebSocket, send a single JSON payload, and receive audio frames. Optional blendshape frames can be included by setting includeBlendshapes to true.
import WebSocket from 'ws';
const ws = new WebSocket('wss://<gateway-host>/v1/audio/speech/stream', {
headers: {
Authorization: `Bearer ${process.env.DISRUPTIVERAIN_CLIENT_ID}:${process.env.DISRUPTIVERAIN_CLIENT_SECRET}`,
},
});
ws.onopen = () => {
ws.send(JSON.stringify({
text: 'Welcome to Disruptive Rain.',
voiceId: 'default',
emotion: 'neutral',
speed: 1.0,
includeBlendshapes: false,
}));
};
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'audio') {
console.log('audio frame', message.audio);
}
};Blendshapes
Blendshape frames are optional for speech streams but always included for avatar streams.
Browser note
WebSocket upgrades require auth headers. Browsers cannot set custom headers on WebSocket connections, so proxy streaming through your backend if you need browser speech synthesis.
Was this page helpful?