Text-to-Speech

predictor.sh supports two TTS engines: Kokoro (ONNX) and Parler-TTS (Candle).

Engine Comparison

Feature

Kokoro

Parler-TTS

Quality

Best

Good

Speed

Faster

Slower

Streaming

✅ Full SSE

❌ No

Voices

10 built-in

6 presets + custom

Voice Control

Fixed voices

Prompt-based

The engine is auto-detected from the model path:

Path contains "kokoro" → Kokoro engine
Path contains "parler" → Parler-TTS engine

Kokoro TTS (Recommended)

High-quality, fast synthesis with streaming support.

Available Voices

Voice ID

Description

Gender

Accent

af_bella

Warm, neutral

Female

American

af_sarah

Clear, professional

Female

American

af_nicole

Energetic, bright

Female

American

af_sky

Soft, gentle

Female

American

am_adam

Neutral, clear

Male

American

am_michael

Deep, authoritative

Male

American

bf_emma

Warm, approachable

Female

British

bf_isabella

Elegant, refined

Female

British

bm_george

Classic, distinguished

Male

British

bm_lewis

Friendly, conversational

Male

British

OpenAI Voice Mapping

When using OpenAI voice names, they map to Kokoro voices:

OpenAI Voice

Kokoro Voice

alloy

af_bella

echo

am_adam

fable

bm_george

onyx

am_michael

nova

af_nicole

shimmer

af_sky

Parler-TTS

Prompt-based voice synthesis using natural language descriptions.

Voice Presets

Preset

Description

alloy

A neutral, clear female voice

echo

A neutral, clear male voice

fable

A warm, storytelling male voice

onyx

A deep, authoritative male voice

nova

An energetic, bright female voice

shimmer

A soft, gentle female voice

Custom Voice Descriptions

Parler-TTS accepts custom voice descriptions:

{
  "input": "Hello world",
  "voice": "A cheerful British female with a warm, friendly tone"
}

API Usage

OpenAI-Compatible API

curl https://your-endpoint.predictor.sh/v1/audio/speech \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, this is a test.",
    "voice": "alloy",
    "response_format": "mp3"
  }' \
  --output speech.mp3

With Streaming (Kokoro only)

curl https://your-endpoint.predictor.sh/v1/audio/speech \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, this is a streaming test.",
    "voice": "alloy",
    "stream_format": "sse"
  }'

ElevenLabs-Compatible API

# Non-streaming
curl https://your-endpoint.predictor.sh/v1/text-to-speech/af_bella \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world"
  }' \
  --output speech.mp3

# Streaming
curl https://your-endpoint.predictor.sh/v1/text-to-speech/af_bella/stream \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello world"
  }'

List Available Voices

curl https://your-endpoint.predictor.sh/v1/voices \
  -H "Authorization: Bearer $TOKEN"

Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://your-endpoint.predictor.sh/v1",
    api_key="pred_your_token"
)

# Generate speech
response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello, this is a test."
)

# Save to file
response.stream_to_file("speech.mp3")

Output Formats

Format

Extension

Notes

MP3

.mp3

Default, smaller files

WAV

.wav

Uncompressed, larger

Speed Control

Adjust speech speed from 0.5x to 2.0x:

{
  "input": "Hello world",
  "voice": "alloy",
  "speed": 1.5
}

Streaming Format

When streaming is enabled, audio is delivered as Server-Sent Events:

data: {"audio": "<base64-encoded-chunk>", "chunk_index": 0}
data: {"audio": "<base64-encoded-chunk>", "chunk_index": 1}
data: [DONE]

Each chunk contains base64-encoded audio that can be decoded and played progressively.

PreviousSpeech-to-Text (Whisper)NextOverview

Last updated 1 month ago

hashtagEngine Comparison

hashtagKokoro TTS (Recommended)

hashtagAvailable Voices

hashtagOpenAI Voice Mapping

hashtagParler-TTS

hashtagVoice Presets

hashtagCustom Voice Descriptions

hashtagAPI Usage

hashtagOpenAI-Compatible API

hashtagWith Streaming (Kokoro only)

hashtagElevenLabs-Compatible API

hashtagList Available Voices

hashtagPython SDK

hashtagOutput Formats

hashtagSpeed Control

hashtagStreaming Format

Engine Comparison

Kokoro TTS (Recommended)

Available Voices

OpenAI Voice Mapping

Parler-TTS

Voice Presets

Custom Voice Descriptions

API Usage

OpenAI-Compatible API

With Streaming (Kokoro only)

ElevenLabs-Compatible API

List Available Voices

Python SDK

Output Formats

Speed Control

Streaming Format