Streaming

predictor.sh supports streaming responses for real-time output.

Chat Completions Streaming

Enable streaming to receive tokens as they're generated:

curl https://abc123.predictor.sh/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Response Format (SSE)

data: {"id":"chatcmpl-123","choices":[{"delta":{"role":"assistant"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Once"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" upon"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" a"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" time"}}]}
data: {"id":"chatcmpl-123","choices":[{"finish_reason":"stop"}]}
data: [DONE]

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://abc123.predictor.sh/v1",
    api_key="pred_your_token"
)

stream = client.chat.completions.create(
    model="default",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

JavaScript Example

const stream = await client.chat.completions.create({
  model: 'default',
  messages: [{ role: 'user', content: 'Write a poem' }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

TTS Streaming

Stream audio chunks as they're synthesized (Kokoro engine only):

curl https://abc123.predictor.sh/v1/audio/speech \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "This is a long text that will be streamed.",
    "voice": "alloy",
    "stream_format": "sse"
  }'

Response Format

data: {"audio": "<base64-chunk>", "chunk_index": 0}
data: {"audio": "<base64-chunk>", "chunk_index": 1}
data: {"audio": "<base64-chunk>", "chunk_index": 2}
data: [DONE]

Decoding Audio Chunks

Each chunk contains base64-encoded MP3 audio. Decode and concatenate for playback:

import base64
import requests

response = requests.post(
    "https://abc123.predictor.sh/v1/audio/speech",
    headers={
        "Authorization": "Bearer pred_your_token",
        "Content-Type": "application/json"
    },
    json={
        "input": "Hello, this is streaming audio.",
        "voice": "alloy",
        "stream_format": "sse"
    },
    stream=True
)

audio_chunks = []
for line in response.iter_lines():
    if line.startswith(b"data: ") and not line.endswith(b"[DONE]"):
        import json
        data = json.loads(line[6:])
        if "audio" in data:
            audio_chunks.append(base64.b64decode(data["audio"]))

# Combine chunks
full_audio = b"".join(audio_chunks)
with open("output.mp3", "wb") as f:
    f.write(full_audio)

ElevenLabs Streaming Endpoint

curl https://abc123.predictor.sh/v1/text-to-speech/af_bella/stream \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}'

Whisper Streaming

For long audio files, transcription streams segment-by-segment:

data: {"text": "First segment of speech.", "start": 0.0, "end": 5.2}
data: {"text": "Second segment continues here.", "start": 5.2, "end": 11.4}
data: [DONE]

Headers

Streaming responses use these headers:

Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Timeouts

Default request timeout: 60 seconds
Streaming connections stay open until completion or error
Configure timeout in predictor.yaml:

timeout: 120  # seconds

PreviousAudio Nextpredictor.yaml

Last updated 1 month ago

hashtagChat Completions Streaming

hashtagResponse Format (SSE)

hashtagPython Example

hashtagJavaScript Example

hashtagTTS Streaming

hashtagResponse Format

hashtagDecoding Audio Chunks

hashtagElevenLabs Streaming Endpoint

hashtagWhisper Streaming

hashtagHeaders

hashtagTimeouts