Streaming

predictor.sh supports streaming responses for real-time output.

Chat Completions Streaming

Enable streaming to receive tokens as they're generated:

curl https://abc123.predictor.sh/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Response Format (SSE)

data: {"id":"chatcmpl-123","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Once"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" upon"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" a"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" time"}}]}
data: {"id":"chatcmpl-123","choices":[{"finish_reason":"stop"}]}
data: [DONE]

Python Example

JavaScript Example


TTS Streaming

Stream audio chunks as they're synthesized (Kokoro engine only):

Response Format

Decoding Audio Chunks

Each chunk contains base64-encoded MP3 audio. Decode and concatenate for playback:

ElevenLabs Streaming Endpoint


Whisper Streaming

For long audio files, transcription streams segment-by-segment:


Headers

Streaming responses use these headers:

Timeouts

  • Default request timeout: 60 seconds

  • Streaming connections stay open until completion or error

  • Configure timeout in predictor.yaml:

Last updated