Streaming
predictor.sh supports streaming responses for real-time output.
Chat Completions Streaming
Enable streaming to receive tokens as they're generated:
curl https://abc123.predictor.sh/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'Response Format (SSE)
data: {"id":"chatcmpl-123","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Once"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" upon"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" a"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" time"}}]}
data: {"id":"chatcmpl-123","choices":[{"finish_reason":"stop"}]}
data: [DONE]Python Example
JavaScript Example
TTS Streaming
Stream audio chunks as they're synthesized (Kokoro engine only):
Response Format
Decoding Audio Chunks
Each chunk contains base64-encoded MP3 audio. Decode and concatenate for playback:
ElevenLabs Streaming Endpoint
Whisper Streaming
For long audio files, transcription streams segment-by-segment:
Headers
Streaming responses use these headers:
Timeouts
Default request timeout: 60 seconds
Streaming connections stay open until completion or error
Configure timeout in
predictor.yaml:
Last updated